TWI417798B - High - speed reverse transfer neural network system with elastic structure and learning function - Google Patents

High - speed reverse transfer neural network system with elastic structure and learning function Download PDF

Info

Publication number
TWI417798B
TWI417798B TW97145030A TW97145030A TWI417798B TW I417798 B TWI417798 B TW I417798B TW 97145030 A TW97145030 A TW 97145030A TW 97145030 A TW97145030 A TW 97145030A TW I417798 B TWI417798 B TW I417798B
Authority
TW
Taiwan
Prior art keywords
layer
neural network
network system
output
learning function
Prior art date
Application number
TW97145030A
Other languages
Chinese (zh)
Other versions
TW201020939A (en
Original Assignee
Nat Taipei University Oftechnology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nat Taipei University Oftechnology filed Critical Nat Taipei University Oftechnology
Priority to TW97145030A priority Critical patent/TWI417798B/en
Publication of TW201020939A publication Critical patent/TW201020939A/en
Application granted granted Critical
Publication of TWI417798B publication Critical patent/TWI417798B/en

Links

Description

具彈性結構與學習功能的高速倒傳遞類神經網路系統 High-speed inverted transmission neural network system with elastic structure and learning function

本發明係關於一種具彈性結構與學習功能的高速倒傳遞類神經網路系統,特別是指一種具有高度平行運算的類神經網路系統,當中每一層的節點間相互連結,以形成了一個能夠處理複雜工作的網路。 The invention relates to a high-speed inverted-transfer neural network system with elastic structure and learning function, in particular to a neural network system with highly parallel operation, in which nodes of each layer are connected to each other to form a capable A network that handles complex work.

以往傳統類神經網路運算系統都是實現在軟體上面,許多的即時控制系統更是受限於軟體上速度,也因為速度的限制,使得在即時的應用上,網路系統無法龐大,造成系統參數無法獲得大量資訊,進而使其效能無法突顯;而近年來,平行處理的觀念越來越濃厚,中央處理器也隨著時代進步以及技術的瓶頸逐漸往平行架構發展,意味者當平行處理時,所執行的時間及效能可以大幅的提升。而目前的研究指出,許多類神經網路逐漸被硬體實現,原因不外乎取決於速度,尤其是當網路系統龐大時,若是經由軟體運算,其執行時間相當冗長。也因此,各種適用於硬體上的類神經網路系統逐步擴展出研究,而研究成果似乎也顯示出逐步而增加,尤其是當網路訓練也能夠取代軟體時,在應用層面上更是一大突破。 In the past, traditional neural network computing systems were implemented on software. Many real-time control systems were limited by the speed of the software, and because of the speed limitation, the network system could not be huge in real-time applications, resulting in a system. The parameters can not get a lot of information, so that its performance can not be highlighted; in recent years, the concept of parallel processing has become more and more powerful, and the central processor has gradually evolved into parallel structure with the progress of the times and the bottleneck of technology, meaning that when parallel processing The time and performance of execution can be greatly improved. The current research indicates that many types of neural networks are gradually implemented by hardware, and the reason depends on the speed. Especially when the network system is large, the execution time is quite long if it is via software. Therefore, various kinds of neural network systems suitable for hardware have gradually expanded the research, and the research results seem to show a gradual increase, especially when the network training can replace the software, it is even more at the application level. A big breakthrough.

但隨者網路訓練也能在硬體上實現時,硬體成本成為重要考量的指標。若是使用大量的硬體,則經濟效益上似乎和需耗時冗長的軟體無法取捨。環狀串列架構中,僅適用於多層感知器架構,雖然可由控制單元的設計以控制層數運算,但網路系統仍舊受限於硬體。而許多較優異的學習演算法似乎也指向於神經元數目隨者疊代次數變動,以動態調整方式使其網 路學習的收斂速度更快。如常見的網路修剪法,幅狀基底網路的垂直最小平方法,在說明若是硬體的架構為固定不變,則無法表現出此類學習演算法的優異性。 However, when network training can be implemented on hardware, hardware cost becomes an important indicator. If you use a lot of hardware, it seems that the economics and the software that takes time and length are unavoidable. In the ring-and-column architecture, it is only applicable to the multi-layer perceptron architecture. Although the control unit can be designed to control the number of layers, the network system is still limited by hardware. And many of the better learning algorithms also seem to point to the number of neurons with the number of iterations, which is dynamically adjusted to make it Road learning converges faster. For example, the common network pruning method, the vertical minimum flat method of the web-based base network, shows that the superiority of such learning algorithms cannot be demonstrated if the hardware architecture is fixed.

而習知關於類神經網路技術的專利如下所述: The patents on neural network technology are as follows:

1. 美國專利第5,087,826號:其提出的架構乃對於每個神經元的鏈結皆使用一個乘法器運算輸入與鍵值的乘積(x.w),以形成一個二維的陣列架構,運算速度雖快但耗費大量的硬體成本且產生大量的匯流排,不利於設計。 1. U.S. Patent No. 5,087,826: The proposed architecture uses a multiplier input and a product of key values ( x.w ) for each neuron chain to form a two-dimensional array architecture. Although it is fast but consumes a lot of hardware costs and generates a large number of bus bars, it is not conducive to design.

2. CNAPS(Dan Hammerstrom,"A VLSI architecture for high-performance,low-cost,on-chip learning," Proceedings of International Joint Conference on Neural Networks,1990,pp.537-544.Dan Hammerstrom,Digital VLSI for Neural Networks,The Handbook of Brain Theory and Neural Networks,Second Edition,Michael Arbib,MIT Press,2003.);CNAPS的優點在於每個運算節點內建了加法與乘法器,且可藉由指令匯流排輸入指令控制其運算,每個節點相當於一個簡單的算數單元(Arithmetic Unit),所以在計算不同類神經網路的演算法有很大的彈性。應用該架構於類神經網路的硬體開發上,由於運算節點使用相同架構,可以很容易的進行節點數量的調整。很可惜的,該架構使用的控制指令十分繁複,故需搭配軟體進行指令的編譯,也沒有專屬於活化函數運算的硬體架構,並因採用了通用性的架構,因而在運算速度上有所犧牲。 2. CNAPS (Dan Hammerstrom, "A VLSI architecture for high-performance, low-cost, on-chip learning," Proceedings of International Joint Conference on Neural Networks, 1990, pp. 537-544. Dan Hammerstrom, Digital VLSI for Neural Networks, The Handbook of Brain Theory and Neural Networks, Second Edition, Michael Arbib, MIT Press, 2003.); The advantage of CNAPS is that each operation node has built-in addition and multipliers, and can be controlled by command bus input commands. Its operation, each node is equivalent to a simple arithmetic unit (Arithmetic Unit), so the algorithm for computing different types of neural networks is very flexible. Applying this architecture to the hardware development of neural networks, the number of nodes can be easily adjusted because the computing nodes use the same architecture. Unfortunately, the control instructions used in this architecture are very complicated, so it is necessary to compile the instructions with the software, and there is no hardware architecture dedicated to the activation function. Because of the universal architecture, the operation speed is sacrifice.

3. 美國專利第5,091,864號:其提出的架構雖沒有CNAPS那樣的彈 性,但其架構精簡且更容易設計及使用,不僅提升了運算的速度也降低了開發的成本,同時也簡化了控制單元的複雜程度,縮短開發的週期。其輸入資料是採用串接的方式傳遞,亦即資料會先傳入第一個運算單元,經過兩個週期後才會傳入第二個運算單元,但因為資料到達各個運算單元的時間不同,也增添了控制單元設計的難度。其中較可取的部分是精簡活化函數的數量。此架構考量到一維陣列的特性,資料是配合時脈週期一筆筆的進行輸入與輸出,因此運算單元並不需要同時進行活化函數的計算,進而將活化函數從神經元中取出,獨立置放於陣列的回傳部分,僅需設計一個活化函數即可完成運算,也不會因此耽誤到運算的速度。此外該架構設計了一組移位暫存器,可以將運算完畢的資料先儲存,再一筆筆往回傳遞,傳遞的同時所有運算單元可立即進行下一筆資料的運算,充分的節省時間。該架構的缺點是沒有學習部分的機制,僅能針對訓練完成的網路進行回想運算。 3. U.S. Patent No. 5,091,864: The proposed architecture does not have a bullet like CNAPS. Sexuality, but its architecture is simple and easier to design and use, which not only improves the speed of computing but also reduces the cost of development, but also simplifies the complexity of the control unit and shortens the development cycle. The input data is transmitted in a serial connection, that is, the data is first transmitted to the first operation unit, and the second operation unit is passed after two cycles, but because the data arrives at each operation unit at different times, It also adds to the difficulty of designing the control unit. The preferred part of this is the number of reduced activation functions. This architecture considers the characteristics of the one-dimensional array. The data is input and output with the pen pulse cycle. Therefore, the operation unit does not need to perform the calculation of the activation function at the same time, and then the activation function is taken out from the neuron and placed independently. In the backhaul part of the array, only one activation function needs to be designed to complete the operation, and it will not delay the operation. In addition, the architecture is designed with a set of shift registers, which can store the calculated data first and then transfer it back and forth. All the arithmetic units can immediately perform the next data calculation, which saves time. The disadvantage of this architecture is that there is no mechanism for learning parts, and only the retrieving operation can be performed for the network that is trained to complete.

4. 美國專利第5,799,134號:其所提出的架構與美國專利第5,091,864號的概念相似,但輸入資料是採用並聯方式連接,亦即所有運算單元於同一時刻接收到相同的輸入訊號。並藉由在運算單元中增添減法器及多工器使得運算的變化更加靈活。但同樣沒有學習部分的機制,僅能執行類神經網路的回想功能。 4. U.S. Patent No. 5,799,134: The proposed architecture is similar to the concept of U.S. Patent No. 5,091,864, but the input data is connected in parallel, that is, all arithmetic units receive the same input signal at the same time. And by adding a subtractor and a multiplexer to the arithmetic unit, the change of the operation is more flexible. But there is also no mechanism for learning part, only the echo function of the neural network can be performed.

5. WO 2008/067676 A1:其提出可調整的人工類神經網路系統與架構,該系統透過分段的方式進行人工內神經網路的前向與逆向運 算,且分段的方式可以依據需求予以調整。惟該系統的硬體須將實際所使用的硬體晶片、分段數目以及邏輯元件使用限制輸入到軟體程式中,再由軟體程式所產生所需要的硬體描述語言程式碼,經編譯後,再下載到硬體晶片中。因此一旦晶片規劃後,其分段數目無法變更。 5. WO 2008/067676 A1: It proposes an adjustable artificial neural network system and architecture that performs forward and reverse transport of artificial neural networks in a segmented manner. The calculation and segmentation can be adjusted according to the needs. However, the hardware of the system must input the actual hardware chip, the number of segments, and the logic component usage limit into the software program, and then the hardware description program generates the required hardware description language code. Then download it to the hardware chip. Therefore, once the wafer is planned, the number of segments cannot be changed.

由此可見,上述習用及現行之方法,實非一良善之設計,而亟待加以改良。 It can be seen that the above-mentioned conventional and current methods are not a good design and need to be improved.

本案發明人鑑於上述習用方法所衍生的各項缺點,乃亟思加以改良創新,並經多年苦心孤詣潛心研究後,終於成功研發完成本件具彈性結構與學習功能的高速倒傳遞類神經網路系統。 In view of the shortcomings derived from the above-mentioned conventional methods, the inventor of the present invention has improved and innovated, and after years of painstaking research, finally succeeded in research and development of this high-speed inverted-transfer neural network system with elastic structure and learning function.

本發明之目的即在於提供一種具彈性結構與學習功能的高速倒傳遞類神經網路系統,可提供具彈性結構與學習功能的高速倒傳遞類神經網路系統,係同時具備有回想功能與學習功能的類神經網路系統。 The object of the present invention is to provide a high-speed reverse transmission neural network system with elastic structure and learning function, which can provide a high-speed reverse transmission neural network system with elastic structure and learning function, and has the function of recalling and learning at the same time. Functional neural network system.

達成上述發明目的之一種具彈性結構與學習功能的高速倒傳遞類神經網路系統,係由前向運算區塊、逆向運算區塊及控制單元所組成;藉由環狀串列多資料匯流排架構,進行倒傳遞類神經網路的運算,使其具有回想與學習的完整功能;整個倒傳遞網路運算中,可劃分為前向運算及逆向運算,在有限制的運算單元陣列中,透過分段計算方式可以對龐大的網路系統進行訓練及回想;由於類神經網路非常適合進行硬體平行處理的計算,在此本發明以單 指令多資料匯流排(Single instruction multiple data,簡稱:SIMD)架構設計方式,將硬體陣列部分訊號共用,以指令方式處理倒傳遞網路的運算,另外係以環狀串列搭配高速管線設計方式及分段計算網路系統,設計出限制硬體下合成出龐大網路系統,以取代需要冗長計算的軟體;而該控制單元以分段計算倒傳遞網路系統,將硬體成本固定,透過記憶體的容量以決定網路系統大小,且計算單元的硬體設計方式以管線化及硬體多重性架構為主,並以同步時脈運算簡化控制器的設計;另外在速度上面,則是以提昇工作時脈為首選,使得系統整體運作上更為快速;本發明可改善以往類神經網路硬體系統,在透過較少的邏輯元件數目且兼具彈性的同時,還能達到更佳的執行效能者。 A high-speed inverse transfer neural network system with elastic structure and learning function for achieving the above object is composed of a forward operation block, a reverse operation block and a control unit; Architecture, the operation of the inverse-transfer-like neural network, which has the complete function of recalling and learning; the whole inverse-transmission network operation can be divided into forward operation and reverse operation, in the limited array of operation units, through The segmentation calculation method can train and recall the huge network system; since the neural network is very suitable for the calculation of hardware parallel processing, the present invention is single Instruction multiple data (Single instruction multiple data, SIMD) architecture design method, sharing the hardware array part signal, processing the inverse transfer network operation by command mode, and designing the ring string with high-speed pipeline design And segmented computing network system, designed to limit the hardware to synthesize a huge network system to replace the software that requires lengthy calculation; and the control unit calculates the reverse transmission network system by segmentation, fixing the hardware cost through The capacity of the memory determines the size of the network system, and the hardware design of the computing unit is based on pipelined and hardware multi-layer architecture, and the controller design is simplified by synchronous clock operation. In addition, the speed is above The improvement of the working clock is preferred, which makes the system as a whole more efficient. The present invention can improve the previous neural network hardware system, and achieve better performance while transmitting a small number of logical components and being flexible. Executive performance.

請參閱圖一,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之實施系統圖,由圖中可知,該系統主要是由前向運算區塊11、逆向運算區塊12、控制單元13及多工器所組成;當前向運算區塊11進行前向運算時,資料從輸入層經由隱藏層加權運算,透過活化函數處理後,再傳向輸出層計算出網路輸出值,而每一層神經元只會影響下一層神經元狀態;倘若此時僅進行回想功能,則將前向運算的結果透過多工器的切換,將計算結果輸出;如果神經網路進行學習,且輸出結果與目標值相差太大時,則將前向運算結果轉為逆向運算區塊12的輸入參數,進行逆向運算,並將誤差訊號回傳,透過修改各層神經元的權重值,期望誤差達到容忍範圍之內;另外該控制單元13用來控制整體系統的流程,即於適當的時機傳 送控制訊號,來控制前向運算區塊11及逆向運算區塊12之運作順序;由於倒傳遞類神經網路的運算乃為多層的架構,除了輸入層以外,每一層的輸入都是前一層運算後的輸出,再加上每一層都是進行相同的運算模式,因此可以使用相同的運算單元來進行不同層的運算,亦即使用運算單元進行隱藏層第一個節點的運算,同樣也使用同一個運算單元進行輸出層第一個節點的運算,如此一來,僅需要數量等同最大單層隱藏層個數的運算單元,即可進行多層的倒傳遞類神經網路運算,大幅的縮減了硬體的使用量;由於運算單元陣列運算完單一層後,會將運算結果繼續使用於下一層的運算,因此本發明將運算結果直接傳送到輸入資料匯流排進行下一層的運算,形成一環形架構,因此資料直接可輸入進行運算,不需經由控制單元再做處理,將有助於縮短運算的時間。 Please refer to FIG. 1 , which is a system diagram of a high-speed inverse transfer neural network system with an elastic structure and a learning function according to the present invention. As can be seen from the figure, the system mainly includes a forward operation block 11 and a reverse operation block 12 . The control unit 13 and the multiplexer are composed; when the forward operation is performed on the operation block 11, the data is weighted from the input layer via the hidden layer, processed by the activation function, and then transmitted to the output layer to calculate the network output value. , and each layer of neurons only affects the state of the next layer of neurons; if only the recall function is performed at this time, the result of the forward operation is switched through the multiplexer, and the calculation result is output; if the neural network learns, When the difference between the output result and the target value is too large, the forward operation result is converted into the input parameter of the reverse operation block 12, the reverse operation is performed, and the error signal is transmitted back. By modifying the weight value of each layer of neurons, the expected error is reached. Within the tolerance; in addition, the control unit 13 is used to control the flow of the overall system, that is, at the appropriate time The control signal is sent to control the operation sequence of the forward operation block 11 and the reverse operation block 12; since the operation of the reverse transfer type neural network is a multi-layer architecture, except for the input layer, the input of each layer is the previous layer. The output after the operation, plus each layer is the same operation mode, so the same operation unit can be used to perform different layer operations, that is, the operation of the first node of the hidden layer using the arithmetic unit is also used. The same arithmetic unit performs the operation of the first node of the output layer. In this way, only a number of arithmetic units equal to the maximum number of single-layer hidden layers are needed, and multi-layer reverse transfer-like neural network operations can be performed, which is greatly reduced. The amount of hardware used; since the operation unit array completes a single layer, the operation result will continue to be used for the operation of the next layer. Therefore, the present invention directly transfers the operation result to the input data bus to perform the operation of the next layer to form a ring. Architecture, so the data can be directly input for calculation, and no need to be processed through the control unit, which will help shorten the operation time. between.

請參閱圖二A及圖二B,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之分段計算及硬體共構示意圖,在整個倒傳遞網路運算中,可劃分為前向運算及逆向運算,在此以SIMD架構設計方式,將硬體陣列部分訊號共用,以指令方式處理倒傳遞網路的運算;本發明進行設計改良,以高速管線設計方式及分段計算網路系統,設計出功能較完整的硬體,以取代需要冗長計算的軟體,而改良後的分段式計算架構如圖二A所示,採用分段計算方式可使硬體架構仍擁有環狀串列特性,僅需在控制倒傳遞網路流程時做分段計算及權重位置設計;以單層網路為例,若是隱藏層個數擁有7個,而前向運算區塊中運算單元串列數目為3個,則須分段計算3次方可完成運算; 另外由於倒傳遞網路整個訓練過程中,分成前向運算及逆向運算,而前向運算區塊在計算時,逆向則是閒置的,反之亦然;其中前向運算係以運算單元陣列為主,而運算單元中則是採取乘加運算以計算神經元的加權總合;而逆向計算過程中,在計算隱藏層的時候,可發現計算過程中採用乘加運算,只是權重順序上及輸入資料上的不同以及不需經過活化函數運算,因此在前向運算中的運算單元,可以透過配置記憶體提供給逆向過程中計算隱藏層誤差項使用;基於順序的關係,因此採用的記憶體為佇列方式;而計算隱藏層誤差項的輸入訊號則是需要從δ佇列中讀取到運算單元陣列的輸入匯流排上,但由於系統採用彈性架構設計,因此運算單元陣列有可能小於實際網路的節點數目,此時需要採用分段計算;無論運算單元陣列是否大於網路系統,都須先存放到記憶體上,由於隱藏層的佇列存放的神經元輸出資料,在逆向運算之前就已讀取完畢,因此可利用隱藏層佇列來存放逆向過程中運算單元陣列的結果,既不會耽誤運算流程,更可以節省邏輯元件使用量,如圖二B可知,前向運算及逆向運算訊號走向雷同;另外本發明之分段計算的設定可以在硬體合成後根據需求進行動態變更。 Please refer to FIG. 2A and FIG. 2B, which are schematic diagrams of segmentation calculation and hardware co-construction of a high-speed inverse transmission neural network system with elastic structure and learning function according to the present invention, which can be divided in the whole inverse network operation. For the forward operation and the reverse operation, in the SIMD architecture design mode, the hardware array part signals are shared, and the operation of the inverse transfer network is processed in an instruction manner; the invention is designed and improved, and the high-speed pipeline design method and the segmentation calculation are performed. The network system is designed with more complete hardware to replace the software that requires lengthy calculation. The improved segmented computing architecture is shown in Figure 2A. The segmentation calculation method can still make the hardware architecture still have a ring. The serial string feature only needs to perform segmentation calculation and weight position design when controlling the reverse network process; for a single-layer network, for example, if there are 7 hidden layers, and the operation unit in the forward operation block If the number of serials is three, the calculation must be performed in 3 steps to complete the calculation. In addition, since the reverse transmission network is divided into forward operation and reverse operation during the whole training process, and the forward operation block is calculated, The direction is idle, and vice versa; wherein the forward operation is dominated by the array of arithmetic units, while the arithmetic unit is multiplied and added to calculate the weighted sum of the neurons; and in the reverse calculation process, the calculation is hidden. Layered At that time, it can be found that the multiplication and addition operations are used in the calculation process, only the difference in the weight order and the input data, and the activation function is not required. Therefore, the operation unit in the forward operation can be provided to the reverse process through the configuration memory. Calculate the use of hidden layer error terms; based on the order relationship, the memory used is in the queue mode; and the input signal for calculating the hidden layer error term needs to be read from the δ伫 column to the input bus of the arithmetic unit array. However, because the system adopts flexible architecture design, the array of arithmetic units may be smaller than the number of nodes in the actual network. In this case, segmentation calculation is required; whether the array of computing units is larger than the network system, it must be stored in the memory first. Since the output data of the neurons stored in the hidden layer of the hidden layer has been read before the reverse operation, the hidden layer array can be used to store the result of the operation unit array in the reverse process, which does not delay the operation process, and can Saving the usage of logic components, as shown in Figure 2B, the forward operation and the reverse operation signal go to the same; Computing an outer segment of the present invention may be changed dynamically set according to the needs of the hardware in the synthesis.

請參閱圖三A及圖三B,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之運算單元系統圖及運算單元訊號時序圖,而該運算單元內部架構主要可分為三大部分,分別為記憶體(堆疊架構的記憶體1411、佇列架構的記憶體1412)、乘法累加器142還有移位暫存器143,如圖中所示,此三部份皆為獨立運作,且採用管線設計方式,將運算拆解獨立,可大幅提高效能,而前向運算區塊中串接許多運算單元,運算單元就等同於 神經元,主要是將權重值與輸入值做乘法累加;在前向運算時,隱藏層的權重優先被使用,其次為輸出層,但在逆向運算過程中,由於輸出層的δ優先被計算出來,相對的輸出層的權重修正量及修正後的權重,也是輸出層較隱藏層優先計算出,由於存放輸入層到輸出層所有權重的記憶體方式為佇列,在讀出權重值佇列後,為了搭配學習過程中前向運算能夠以正確的順序且立即讀出對應的權重,在此每個運算單元中內部設計具有堆疊(Stack)架構的記憶體1411,用來依照順序存放權重值,而不需要透過額外的位址線來控制記憶體內容;在逆向過程中,由於在計算輸出層的δ所需用到的誤差值也是乘加運算,因此利用硬體共構特性進行計算,將輸出層連結到隱藏層的權重存放到運算單元內部,其權重值順序完全不同且不需要隱藏層偏權值的權重,由於是將輸出層當做輸入值,而隱藏層則當做輸出層神經元的輸出,因此運算單元內部除了前向運算所使用的堆疊方式記憶體外,需要設計另一佇列方式記憶體,用來計算逆向運算時所需要的權重值;因為在運算單元內部中配置了兩種型態的記憶體方式,因此需要額外增加多工器144用來選擇,並為了簡化控制器的設計方式,在每個運算單元中有額外的定址方式,用來寫入權重匯流排15的到各個運算單元中堆疊架構的記憶體1411或佇列架構的記憶體1412裡面,亦即運算單元中堆疊架構的記憶體1411或佇列架構的記憶體1412之寫入訊號須與運算單元定址搭配,而堆疊架構的記憶體1411或佇列架構的記憶體1412讀取訊號則因平行處理運算資料,所以可以共用;因為前向運算區塊是由運算單元串 接而成的,所以各個運算單元計算乘加運算時是同時進行,當計算完畢後,各個運算單元中運算完畢的值則透過移位元暫存器的方式,將資料傳送到下一層級,因此運算單元中內部採用了許多暫存器,其目的在於解決非同步訊號處理問題,硬體同步訊號能夠使控制器的設計更為簡化。 Please refer to FIG. 3A and FIG. 3B, which are diagrams of the operation unit system and the operation unit signal timing diagram of the high-speed inverse transmission neural network system with elastic structure and learning function, and the internal structure of the operation unit can be mainly divided into The three parts are respectively memory (the memory of the stacked architecture 1411, the memory 1412 of the array structure), the multiply accumulator 142, and the shift register 143, as shown in the figure, all three parts are Independent operation, and pipeline design, the operation is dismantled independently, which can greatly improve the performance. In the forward operation block, many arithmetic units are connected in series, and the operation unit is equivalent to Neurons mainly multiply the weight value and the input value by multiplication; in the forward operation, the weight of the hidden layer is used first, followed by the output layer, but in the reverse operation, the δ priority of the output layer is calculated. The weight correction amount of the opposite output layer and the modified weight are also calculated preferentially from the hidden layer of the output layer. Since the memory mode in which the input layer to the output layer is heavily weighted is a queue, after the weight value is read, In order to match the forward operation in the learning process, the corresponding weights can be read out in the correct order, and in this operation unit, a memory 1411 having a stack structure is internally designed to store the weight values in order. It is not necessary to control the memory content through an additional address line; in the reverse process, since the error value required for calculating the δ of the output layer is also a multiply-and-accumulate operation, the calculation is performed using the hardware co-structural characteristic. The weight of the output layer connected to the hidden layer is stored inside the operation unit, and the weight value order is completely different and the weight of the hidden layer partial weight is not required. The layer is used as the input value, and the hidden layer is used as the output of the output layer neuron. Therefore, in addition to the stacking mode memory used by the forward operation, the operation unit needs to design another matrix memory to calculate the inverse operation. The required weight value; because two types of memory modes are arranged inside the arithmetic unit, an additional multiplexer 144 is required for selection, and in order to simplify the design of the controller, there is a The additional addressing mode is used to write the memory 1411 of the stacking structure of the weight bus 15 to the memory structure of the stacking structure of the computing unit, or the memory 1411 or the queue structure of the stacked architecture in the computing unit. The write signal of the memory 1412 must be addressed with the operation unit, and the memory 1411 of the stacked architecture or the memory 1412 of the queue structure can be shared because the processing data is processed in parallel; because the forward operation block String of arithmetic units After the calculation, the calculation and multiplication operations are performed simultaneously. When the calculation is completed, the calculated values in the respective operation units are transferred to the next level through the shifting of the meta-register. Therefore, many registers are used internally in the arithmetic unit to solve the problem of asynchronous signal processing. The hardware synchronization signal can simplify the design of the controller.

請參閱圖四A及圖四B,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算區塊之系統圖及運算時序圖,由於其結構為多層感知機架構,因此每一層運算都擁有前後層的關係,若是硬體也以層數設計的話,則成本及邏輯元件將會隨者網路系統成正比,因此可把網路的前層都從輸入資料匯流排16給定資料,而後層運算的結果則是需要利用記憶體存放下來,在下一層運算時,再將存放值置於輸入資料匯流排16上,即可大幅減少硬體的使用量;而前向運算區塊其實是由多個運算單元14組合為主,運算單元14越多,則一次能夠計算前向網路的數量越大,但運算單元14合成的數量往往受到實際硬體的牽制,因此若是要擁有彈性系統,則需要將運算結果值存放下來,如此的設計不僅可合成出龐大的網路系統,變動的彈性設計更適用於多種類神經網路演算法;由圖四A可知,當中有一多工器18用來控制運算結果是否須經過活化函數17轉換,其目的在於逆向運算時,由於採用硬體共構,計算誤差項(Error)是不需要透過活化函數17轉換,但計算隱藏層誤差項也是會受限於運算單元合成數,因此需要將該值存放下來;另外在不考慮輸出延遲時間情況下,前向運算區塊每個時脈完成的週期如圖四B,當乘法累加器運算完畢後,需等待的時間則是輸入節點的數 量,因此一個時脈能夠計算一個輸入節點乘上運算單元陣列的內部記憶體權重值,而當所有輸入節點的訊號傳送完畢時,則須馬上將移位(Shift)訊號致能,以將每一個節點計算值逐點傳送到活化函數17,而傳送等待的時脈則與運算單元的個數成比例關係;此外前向運算區塊中,會將運算單元陣列的輸入值及經過活化函數17後的輸出值分別傳送到層級輸入資料匯流排(Layer Input Bus)19與層級輸出資料匯流排(Layer Output Bus)20這兩條匯流排上,為了使逆向運算區塊的層級輸入堆疊及層級輸出堆疊存放於此兩條匯流排的值,用來計算訓練時所需要的神經元輸入和輸出。 Please refer to FIG. 4A and FIG. 4B, which are system diagrams and operation timing diagrams of the forward operation block of the high-speed inverse transfer neural network system with elastic structure and learning function according to the present invention, and the structure is a multi-layer perceptron architecture. Therefore, each layer of operation has a relationship between the front and the back. If the hardware is also designed in layers, the cost and logic components will be proportional to the network system, so the front layer of the network can be connected from the input data. 16 given data, and the result of the back-level operation is stored in memory. In the next layer of operation, the stored value is placed on the input data bus 16, which can greatly reduce the amount of hardware used; The operation block is actually composed of a plurality of operation units 14 in combination. The more the operation unit 14, the larger the number of forward networks can be calculated at one time, but the number of the synthesis units 14 is often constrained by the actual hardware. If you want to have a flexible system, you need to store the result of the operation. This design can not only synthesize a huge network system, but also the flexible design is suitable for a variety of nerves. The road algorithm; as shown in Fig. 4A, there is a multiplexer 18 for controlling whether the operation result has to be converted by the activation function 17, and the purpose is to calculate the error term (Error) due to the hardware co-construction in the reverse operation. There is no need to convert through the activation function 17, but the calculation of the hidden layer error term is also limited by the number of arithmetic unit synthesis, so the value needs to be stored; in addition, the forward operation block is used every time without considering the output delay time. The cycle of pulse completion is shown in Figure 4B. When the multiply accumulator is completed, the time to wait is the number of input nodes. Quantity, so a clock can calculate an input node multiplied by the internal memory weight value of the arithmetic unit array, and when the signal transmission of all input nodes is completed, the Shift signal must be enabled immediately to A node calculation value is transferred to the activation function 17 point by point, and the transmission waiting clock is proportional to the number of operation units; in addition, in the forward operation block, the input value of the operation unit array and the activation function 17 are passed. The subsequent output values are respectively transmitted to the two levels of the Layer Input Bus 19 and the Layer Output Bus 20, in order to make the hierarchical input stack and the hierarchical output of the reverse operation block. The values stored in the two bus bars are stacked to calculate the neuron input and output required for training.

請參閱圖五A及圖五B,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之逆向運算區塊之結構圖及堆疊時序圖,該逆向運算被劃分為四大類,分別為計算δ、Error、△w及權重更新,而計算隱藏層的誤差項時,會使用到前向運算區塊中的運算單元陣列,由於逆向運算僅計算隱藏層的誤差項傳遞回前向運算,因此採用資料流(Data Flow)及管線方式,將逆向運算區塊的佇列或是堆疊逐步讀出訊號或寫入;在前向運算區塊中,會將神經網路各層的輸入值及輸出值存放起來,其中輸入值由於是分段計算,因此串列輸入匯流排(Series Input Bus)26會因為分段計算而匯入資料,為了簡化輸入層級堆疊的記憶體容量,因此設計一旗標訊號用來控制輸入層級堆疊的寫入,該旗標僅在分段計算時最後一次運算啟動,如圖五A及圖五B所示,神經網路的輸出值則是被存入到層級輸出堆疊中,其中隱藏層佇列僅存隱藏層神經元輸出的值;當前向運算完畢後,首先會先計算δ,而該計算δ可分為二部份,,其中j為輸出節點數,i為隱藏層節點數,首先先計算,控制單元送出學習區塊中的層級輸出堆疊來讀取訊號,經過2時脈週期(clock cycles)後,送出訓練樣本的理想輸出值D j 且資料也自輸出堆疊依序輸出訊號後;同時間送入到減法器21與乘法器22計算,求出,再送入乘法器22求出,同時送出訓練樣本的理想值也已求出,最後再將此兩個運算結果送入乘法器22進行運算,可得輸出層的,其中錯誤選擇(Error Select)訊號用來選擇不同的計算,判斷值為n為輸出層時,Error Select=0,否則給定1訊號;當輸出層的計算完畢後,會先被δ佇列存放資料,將其傳送到PE陣列的輸入資料匯流排(Input Data Bus)上,進行計算,但此時需注意實體運算單元數是否足夠隱藏層數目,因此δ佇列需要有重新讀取功能用來傳送到PE陣列的輸入資料匯流排(Input Data Bus)上,而PE佇列的權重值則是一開始權重初始化時就已配置完畢。在此PE陣列運算的結果不需要傳入活化函數進行計算,因此轉換致能(Transfer Enable)設定為0;而運算完畢後的,則是依次存放在隱藏層先進先出佇列(Hidden FIFO)內;而後續將層級輸出堆疊來讀取訊號維持隱藏層節點數,讀取出隱藏層的輸出,進行計算,而則是由Hidden FIFO讀取出來,再將兩者資料傳送到乘法器22進行運算,得到隱藏層的;該隱藏層δ是和輸出層δ演算法類似,必須先將層級輸出堆疊讀取出訊號,此時讀出的訊號為隱藏層神經元輸出訊號,其他差異在於計算誤差項時,隱藏層δ是需要使用到前向運算區塊的運算單元陣列且δ佇列也需擁有 重覆讀取的功能,當輸出層和隱藏層的δ皆計算完畢,且都存放到δ佇列後,則開始計算△w;由於△w的數量為(輸入層個數+1)×隱藏層個數+(隱藏層個數+1)×輸出層個數,因此採用管線方式設計來提高速度;將δ依順序讀取的同時,送入乘法器22和學習率η相乘,而後再送入乘法器22與具有可重覆讀取的層級輸入堆疊相乘,最後再和慣性因數η m 與△w相乘後的結果相加起來,而在運算出新的△w同時,△w佇列也會將值存放下來;另外由於系統採用批次訓練,因此以加法器23和批次佇列,可取代所有權重個數的加法累加器,在未到達下一次疊代時,△w累加的內容則存放累加至批次佇列上,而每次批次訓練前,控制單元會寫入經由線性反饋移位暫存器(Linear Feedback Shift Register,LFSR)產生的隨機擾動量[-2-10,+2-10]到批次佇列內,使得演算法能夠跳脫出局部解,而計算誤差的地方則是會放置累加器24,每次批次訓練完畢後用比較器25去檢驗目前誤差是否達到容許範圍內。 Please refer to FIG. 5A and FIG. 5B, which are a structural diagram and a stacking timing diagram of a reverse operation block of a high-speed inverse transfer neural network system with an elastic structure and a learning function according to the present invention. The reverse operation is divided into four categories. For calculating δ, Error , △ w and weight update respectively, when calculating the error term of the hidden layer, the arithmetic unit array in the forward operation block is used, and only the error term of the hidden layer is calculated and transmitted back to the forward direction due to the reverse operation. Operation, therefore, using the data flow (Data Flow) and pipeline mode, the reverse operation block is queued or stacked to read signals or writes step by step; in the forward operation block, the input values of the neural network layers are And the output value is stored, wherein the input value is segmented, so the serial input bus 26 will import data due to the segmentation calculation, so in order to simplify the memory capacity of the input level stack, design one The flag signal is used to control the writing of the input level stack. The flag is only started when the segment calculation is performed. As shown in Figure 5A and Figure 5B, the output value of the neural network is saved. Into the hierarchical output stack, where the hidden layer queue only stores the value of the hidden layer neuron output; after the current direction is completed, the δ is first calculated first, and the calculation δ can be divided into two parts. versus , where j is the number of output nodes, and i is the number of hidden layer nodes. The control unit sends the hierarchical output stack in the learning block to read the signal. After 2 clock cycles, the ideal output value D j of the training sample is sent and the data is also outputted from the output stack in sequence; The time is sent to the subtractor 21 and the multiplier 22 to calculate and And then sent to the multiplier 22 to find The ideal value of the training sample is also obtained. Finally, the two operation results are sent to the multiplier 22 for operation, and the output layer is obtained. The Error Select signal is used to select different calculations. When the value is n is the output layer, Error Select=0, otherwise a 1 signal is given; when the output layer is After the calculation is completed, the data will be stored in the δ伫 column and transmitted to the input data bus of the PE array for calculation. However, it is necessary to pay attention to whether the number of physical operation units is enough to hide the number of layers. Therefore, the δ伫 column needs to have a re-read function for transmission to the input data bus of the PE array, and the weight of the PE queue. The value is already configured at the beginning of the weight initialization. The result of this PE array operation does not need to be passed to the activation function for calculation, so the transfer enable (Transfer Enable) is set to 0; , is stored in the hidden layer first-in first-out queue (Hidden FIFO); and then the layer output is stacked to read the signal to maintain the number of hidden layer nodes, read out the output of the hidden layer, Calculation, and It is read by the Hidden FIFO, and then the data is transferred to the multiplier 22 for operation to obtain a hidden layer. The hidden layer δ is similar to the output layer δ algorithm. The layer output stack must first read out the signal. The signal read at this time is the hidden layer neuron output signal. The other difference is the hidden layer δ when calculating the error term. It is necessary to use the array of operation units of the forward operation block and the δ伫 column also needs to have the function of repeated reading. When the δ of the output layer and the hidden layer are calculated, and all are stored in the δ伫 column, the process starts. Calculate △ w ; Since the number of Δ w is (the number of input layers + 1) × the number of hidden layers + (the number of hidden layers + 1) × the number of output layers, the pipeline design is used to increase the speed; Simultaneously reading, the input multiplier 22 is multiplied by the learning rate η, and then sent to the multiplier 22 multiplied by the hierarchical input stack having repeatable reading, and finally multiplied by the inertia factors η m and Δ w The subsequent results are added together, and while the new △ w is calculated, the Δ w伫 column will also store the value; in addition, since the system uses batch training, the adder 23 and the batch queue can replace all The weighted accumulator of the number of weights, not reached the next time Era, △ w accumulation storing the accumulated content to the batch queue, and before each batch of training, the control unit writes a randomly generated via linear feedback shift register (Linear Feedback Shift Register, LFSR) The disturbance amount [-2 -10 , +2 -10 ] is in the batch queue, so that the algorithm can jump out of the local solution, and the place where the error is calculated is to place the accumulator 24, after each batch training The comparator 25 is used to check if the current error is within the allowable range.

請參閱圖六,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之控制單元流程圖,該控制單元負責掌控整個系統運算,由於所有數值計算,皆以堆疊或是佇列存放內容,因此控制器僅需將每個佇列及堆疊的寫入或讀取分別以有限狀態機來掌控時序及目前狀態即可,而除了儲存元件外,控制器也需要計算分段運算,累加器清除,多工器訊號選擇線及運算單元陣列位置等;而輸入樣本佇列、輸出樣本佇列及結果存放佇列,控制器則是當起始訊號用;由圖中可知,控制單元流程一共被劃分為三大區塊,其中初始化產生 起始訊號,當堆疊初始化完畢後,系統馬上執行前向運算,如此可縮短等待時間,另外佇列的初始化更新則可利用前向運算過程中運作即可;若在逆向運算時,佇列初始化若未完成的話,則會等待佇列初始化完畢後才進行逆向運算;而當進行前向運算時,會先評估是否需要分段計算,而後再傳入活化函數,接著會判斷是否為輸出層,若不是則代表目前要計算隱藏層到輸出層,因此也會先評估是否分段計算;在送入活化函數後,則會判斷是否目前為回想狀態,若不是,會等待運算單元佇列訊號完畢後,則開始計算輸出層δ,而當計算出輸出層的δ時,會接續計算隱藏層δ,但由於硬體共構關係,因此在此也要考量是否用分段計算來計算誤差項,但僅須計算△w及更新權重即可完成一次訓練。 Please refer to FIG. 6 , which is a flow chart of a control unit of a high-speed inverse transmission neural network system with elastic structure and learning function according to the present invention. The control unit is responsible for controlling the entire system operation, and all the numerical calculations are stacked or queued. The content is stored, so the controller only needs to write and read each queue and stack in a finite state machine to control the timing and current state, and in addition to the storage component, the controller needs to calculate the segmentation operation. Accumulator clearing, multiplexer signal selection line and arithmetic unit array position; and input sample queue, output sample queue and result storage queue, controller is used as start signal; as can be seen from the figure, control unit The process is divided into three blocks, in which the initialization generates the start signal. When the stack is initialized, the system immediately performs the forward operation, which can shorten the waiting time. In addition, the initialization update of the queue can utilize the forward operation process. It can be operated; if the initialization of the queue is not completed in the reverse operation, it will wait for the queue to be initialized before the reverse operation. When performing the forward operation, it first evaluates whether the segmentation calculation is needed, and then passes the activation function, and then judges whether it is the output layer. If not, it means that the hidden layer is currently calculated to the output layer, so it will also Evaluate whether the segmentation calculation is performed; after the activation function is sent, it will judge whether it is the recall state. If not, it will wait for the operation unit to complete the output signal δ, and then calculate the output layer δ. At the same time, the hidden layer δ is successively calculated, but due to the hardware co-construction relationship, it is also necessary to consider whether to use the segmentation calculation to calculate the error term, but only need to calculate Δ w and update the weight to complete a training.

請參閱圖七A至圖七D,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之權重存放時序圖及狀態圖,其權重初使化由於使用在用固定的硬體數,所以須將權重值作一個搭配,用來配合計算輸入,方可使運算正常使用,而權重初始化不是只有初始化才會使用,是計算每次疊代時,系統便會依照所設計的權重配置方式儲存到佇列或是堆疊,方可使計算不會出錯;在系統開始時,必須先作初始化動作,將隨機產生的[-0.5,+0.5]權重存入到各個運算單元中的堆疊及佇列,以供回想及學習過程中使用;如圖七A所示,若實體運算單元的數量為3個,欲合成的網路系統為2-7-5的二層網路,控制單元會依照由Nios II給定的網路系統參數進行初始化;為了配合訊號讀出的順序,所以會從輸出層最後一個節點的權重值開始以倒序寫入 ~,而寫入到PE 2堆疊則為輸出層個數除上運算單元合成數取餘數,PE陣列位置再依次遞減。同樣的再將下一個輸出節點的權重值~存放到PE 1堆疊中,當寫入到PE 1堆疊後,下一次寫入的位置則為PE 3;當隱藏層對輸出層寫入完畢後,接者開始寫入輸入層對隱藏層的權重,同時PE陣列也會紀錄每個堆疊中的目前位置,方便前向運算中計算第二層網路使用,如圖七B所示,寫入的初始PE堆疊位置則為隱藏層個數除上運算單元合成數取餘數,也就是PE 1開始寫入隱藏層最後一個節點權重值~以倒序的方式寫入,而當寫入到PE 1堆疊位置時,再從最高位置開始遞減,直到最後一節點寫入完畢,在此要注意的是當寫入第一個值時,運算單元中的堆疊便會記錄每個陣列中的讀取位址,用來處理分段計算時,權重值的順序能正確無誤;而學習過程中同樣也需要使用權重值進行計算,故在將權重值寫入PE陣列的同時,也必須將其寫入到學習單元區塊中的權重值序列(Weight FIFO)內,也就是需要(2+1)*7+(7+1)*×5共61個權重值,分散在PE環狀陣列中堆疊內,而學習區塊則是放置在權重值序列;另外在逆向運算過程中,由於採用硬體共構的關係,故PE陣列也是需要權重值計算,但儲存和讀取的順序與前向過程中完全不同,因此將學習區塊的權重值重覆讀取出來,而逆向過程中,PE陣列運算不需要偏權(Bias),因此當Bias的權重值從權重值佇列讀出後,將PE佇列的寫入訊號設為0,而其寫入方式如圖七C所示,當分段計算網路越多次,權重值佇列讀取的次數也越多,當權重值初始化完成後,運算單元內部的權重存放情況如圖七D所示。 Please refer to FIG. 7A to FIG. 7D, which are timing diagrams and state diagrams of weight storage of the high-speed reverse transmission neural network system with elastic structure and learning function according to the present invention, and the weights are initialized due to the use of fixed hardware. Number, so the weight value must be used as a match to match the calculation input, so that the operation can be used normally, and the weight initialization is not only used for initialization. It is calculated according to the designed weight for each iteration. The configuration mode is stored in the queue or stacked, so that the calculation can be made without error; at the beginning of the system, the initialization action must be performed first, and the randomly generated [-0.5, +0.5] weights are stored in the stack in each operation unit. And queues for recall and learning use; as shown in Figure 7A, if the number of physical units is 3, the network system to be synthesized is a 2-7-5 layer 2 network, the control unit It will be initialized according to the network system parameters given by Nios II; in order to match the order of signal reading, it will be written in reverse order from the weight value of the last node of the output layer. ~ When writing to the PE 2 stack, the number of output layers is divided by the number of synthesis units, and the position of the PE array is successively decremented. Similarly, the weight value of the next output node ~ Stored in the PE 1 stack. After being written to the PE 1 stack, the next write location is PE 3 ; when the hidden layer writes to the output layer, the receiver begins to write the weight of the input layer to the hidden layer. At the same time, the PE array also records the current position in each stack, which is convenient for calculating the second layer network usage in the forward operation. As shown in Figure 7B, the initial PE stack position written is the number of hidden layers. The arithmetic unit synthesizes the remainder, that is, PE 1 starts writing the last node weight value of the hidden layer. ~ Write in reverse order, and when writing to the PE 1 stacking position, decrement from the highest position until the last node is written. Note that when writing the first value, the arithmetic unit The stack in the record records the read address in each array. When processing the segmentation calculation, the order of the weight values can be correct. In the learning process, the weight value is also used for calculation, so the weight value is used. When writing to the PE array, it must also be written into the weight FIFO in the learning unit block, that is, (2+1)*7+(7+1)*×5 total 61 The weight value is dispersed in the stack of the PE ring array, and the learning block is placed in the weight value sequence. In addition, in the reverse operation process, the PE array also needs the weight value calculation because of the hardware co-construction relationship. However, the order of storage and reading is completely different from that in the forward process, so the weight value of the learning block is repeatedly read out, and in the reverse process, the PE array operation does not require bias (Bias), so when Bias After the weight value is read from the weight value column, the PE column is written. The signal is set to 0, and its writing mode is as shown in Figure 7C. When the segmentation calculation network is more than once, the weight value is read more times. When the weight value is initialized, the operation unit is internal. The weight storage is shown in Figure 7D.

請參閱圖八A至圖八D,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算分段計算示意圖及時序圖,當權重值堆疊初始化後,接下來進行的是倒傳遞神經網路的前向運算,前向運算主要可劃分為層數以及分段,每處理一層網路運算即需考慮是否採用分段計算;而當控制單元傳送PE陣列中的堆疊讀取訊號讀出權重值,同時將Bias存放到輸入資料匯流排上,而後將輸入樣本佇列讀出,由於輸入佇列訊號包含Bias的輸入值共有(2+1)=3個輸入,因此需要維持三個時脈週期,但由於運算單元串列硬體數量只有三個,所以只能先計算前三個節點,如圖八A中的Step 1,而當運算完後,便將運算結果存放到隱藏層佇列及層級輸出佇列,接著進行Step2及Step3的分段運算,注意的是在Step3為最後一次分段計算,因此輸入層中的Bias與輸入樣本佇列會先存放到層級輸入佇列,用來做逆向運算,而圖八B為Step1的時序圖,另外圖八C及圖八D則為前向運算時序圖。 Please refer to FIG. 8A to FIG. 8D, which are schematic diagrams and timing diagrams of the forward operation segmentation of the high-speed inverse transfer neural network system with elastic structure and learning function according to the present invention. After the weight value stack is initialized, the following is performed. It is the forward operation of the inverse neural network. The forward operation can be divided into the number of layers and the segmentation. Each layer of network operation needs to consider whether to use segmentation calculation. When the control unit transmits the stack read in the PE array. The access signal reads the weight value, and the Bias is stored on the input data bus, and then the input sample is read out. Since the input queue signal contains the input value of Bias (2+1)=3 inputs, it is required. The three clock cycles are maintained, but since the number of hardware units in the arithmetic unit is only three, the first three nodes can only be calculated first, as shown in Step 1 in Figure 8A. When the operation is completed, the operation result is stored. Go to the hidden layer queue and the hierarchical output queue, and then perform the segmentation operations of Step2 and Step3. Note that Step3 is the last segmentation calculation, so the Bias and input sample queues in the input layer will be stored first. Stage input queue, used for the reverse operation, and is a timing chart of FIG Step1 eight of B, C, and additionally the previous figure eight eight D compared to FIG operation timing chart.

請參閱圖九A至圖九C,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之逆向運算分段計算示意圖及時序圖,當進行逆向運算前,需要先判斷初始化的PE佇列是否已經寫入完畢,由於前向運算在PE堆疊寫入完畢後就開始運算,但PE佇列由於需要重覆讀取學習區塊中的權重值佇列,因此所耗費的時間在於判斷若是分段次數多時,是有可能大於前向運算。在計算出權重值的修正量之前,必須要算出,而在前向運算過程中,控制單元已將各層的輸入與輸出分別存入學習單元內的層級輸入與層級輸出堆疊中; 其中計算δ可分為二部份,,其中j為輸出節點數,i為隱藏層節點數,首先會先計算,控制單元送出學習區塊中的Layer Output Stack讀取訊號,經過2 clock cycles後,送出訓練樣本的理想輸出值D j 且資料也自輸出堆疊依序輸出。此時送入減法器及乘法器進行計算,求出,再送入乘法器求出。同時送出訓練樣本的理想值也已求出,最後再將此兩個運算結果送入乘法器運算,可得輸出層的,而如圖九A所示,其中Error Select訊號用來進行計算,判斷值為n為輸出層時,Error Select=0,否則給定1訊號;而當輸出層的計算完畢後,會先被δ佇列存放資料,將其傳送到PE陣列的Input Data Bus上,進行計算,但此時需注意實體運算單元數是否足夠隱藏層數目如圖九B所示,因此δ佇列需要有重新讀取功能用來傳送到PE陣列的Input Data Bus上,而PE佇列的權重值則是一開始權重初始化時就已配置完畢,在此PE陣列運算的結果不需要傳入活化函數進行計算,因此Transfer Enable設定為0;而運算完畢後的,則是依次存放在Hidden FIFO內,接著將Layer Output Stack讀取訊號維持隱藏層節點數,讀取出隱藏層的輸出,進行計算,而則是由Hidden FIFO讀取出來,再將兩者資料傳送到乘法器運算,得到隱藏層的;另外要算出權重修正量,必須先算出,初始時先將δ佇列做重置動作,因為在逆向過程中PE陣列運算已經讀取過,由圖九C可知,控制單元會先送出Layer inputrd=1訊號,讀出輸出層的輸入值,此時和學習率η相乘求出,隔一個時脈週期後,設定delta w FIFO rd=1,在從△w佇列取出與慣性因數η m 相乘,而後控制單元再從δ佇列讀出一筆並與該層所有的輸入值進行運算求出Please refer to FIG. 9A to FIG. 9C, which are schematic diagrams and timing diagrams of the inverse operation segmentation calculation of the high-speed inverse transmission neural network system with elastic structure and learning function according to the present invention. Before performing the reverse operation, it is necessary to judge the initialization. Whether the PE queue has been written, because the forward operation starts after the PE stack is written, but the PE queue needs to repeatedly read the weight value queue in the learning block, so the time spent is If it is judged that the number of segments is large, it is likely to be greater than the forward operation. Before calculating the correction amount of the weight value, it must be calculated In the forward operation process, the control unit has separately stored the input and output of each layer into the hierarchical input and hierarchical output stack in the learning unit; wherein the calculation δ can be divided into two parts. versus , where j is the number of output nodes, and i is the number of hidden layer nodes. The control unit sends the Layer Output Stack read signal in the learning block. After 2 clock cycles, the ideal output value D j of the training sample is sent and the data is also output from the output stack. At this time, it is sent to the subtractor and the multiplier for calculation. and And then sent to the multiplier to find . The ideal value of sending the training sample at the same time has also been obtained. Finally, the two operation results are sent to the multiplier operation, and the output layer is obtained. And as shown in FIG. 9A, wherein the Error Select signal is used for calculation, and when the value is n is the output layer, Error Select=0, otherwise a 1 signal is given; and when the output layer is After the calculation is completed, the data will be stored in the δ伫 column and transmitted to the Input Data Bus of the PE array for calculation. However, it should be noted at this time whether the number of physical operation units is sufficient for the number of hidden layers as shown in FIG. 9B. Therefore, the δ伫 column needs to have a re-read function for transmission to the Input Data Bus of the PE array, and the weight of the PE queue. The value is configured at the beginning of the weight initialization. The result of the PE array operation does not need to be passed to the activation function for calculation, so Transfer Enable is set to 0; , in turn, stored in the Hidden FIFO, then the Layer Output Stack read signal maintains the number of hidden layer nodes, reads the output of the hidden layer, and performs Calculation, and It is read by the Hidden FIFO, and then the data is transferred to the multiplier operation to obtain the hidden layer. In addition, calculate the weight correction amount Must be calculated first versus Initially, the δ伫 column is first reset, because the PE array operation has been read in the reverse process. As can be seen from Figure 9C, the control unit will first send the Layer inputrd=1 signal to read the input value of the output layer. At this time, multiply the learning rate η After one clock cycle, set delta w FIFO rd=1, and take it out from the △ w伫 column Multiplied by the inertia factor η m , and then the control unit reads a sum from the δ 伫 column And calculate all the input values of the layer .

請參閱圖十,為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之回想時脈流程圖,由圖中可知,運算單元合成數與神經網路系統會產生四種不同組合,在經過活化函數運算後,若是運算單元合成足夠時,僅需等待神經元輸出值經過環狀架構,而等待的時間則為該層的神經元輸出節點;若是合成數不足時,則會等待的時間則為分段運算完畢加上該層除上運算單元合成數的餘數;若是該層餘數為0,該層的餘數則為運算單元合成數,而該層的商數則減1;另外表1為每個流程所需要花費的時脈數,可以用來求得式(1)~公式(4),其中單位為clock cycles,每個時脈為10ns,而I為輸入層節點數、H為隱藏層節點數、O為輸出層節點數、MaxPE為運算單元合成數量、HMOD為隱藏層除運算單元合成數量的商數、HREM為隱藏層除運算單元合成數量的餘數、OMOD為輸出層除運算單元合成數量的商數、OREM為輸出層除運算單元合成數量的餘數;公式(1)~公式(4)之使用情況為:其中當運算單元合成數足夠於隱藏層及輸出層,使用公式(1)41+I+2.H+O,而當運算單元合成數足夠於隱藏層但不足輸出層,使用公式(2)41+I+2.H+O REM +O MOD .(8+H+2.MaxPE),另外當運算單元合成數足夠於輸出層但不足隱藏層,使用公式(3)41+I+H+O+H REM +H MOD .(8+I+2.MaxPE) ,若當運算單元合成數皆不足於隱藏層及輸出層,使用公式(4)41+I+H+H REM +O REM +H MOD .(8+I+2.MaxPE)+O MOD .(8+H+2.MaxPE) Please refer to FIG. 10 , which is a flow chart of the reciprocating clock of the high-speed inverse transmission neural network system with elastic structure and learning function according to the present invention. It can be seen from the figure that the arithmetic unit synthesis number and the neural network system generate four different combinations. After the activation function operation, if the arithmetic unit is synthesizing enough, it only needs to wait for the neuron output value to pass through the ring structure, and the waiting time is the neuron output node of the layer; if the synthesis number is insufficient, it will wait The time is the completion of the segmentation operation plus the remainder of the layer divided by the arithmetic unit; if the remainder of the layer is 0, the remainder of the layer is the composite number of the arithmetic unit, and the quotient of the layer is decremented by 1; Table 1 shows the number of clocks required for each process. It can be used to find equations (1) to (4), where the unit is clock cycles, each clock is 10 ns, and I is the number of input layer nodes. H is the number of hidden layer nodes, O is the number of output layer nodes, MaxPE is the number of arithmetic unit synthesis, H MOD is the quotient of the hidden layer division unit, and H REM is the remainder of the hidden layer division unit synthesis number, O MOD Output layer In addition to the quotient of the number of synthesized units, O REM is the remainder of the output layer divided by the number of units synthesized; the use of equations (1) to (4) is: where the number of synthesized units is sufficient for the hidden layer and the output layer, Use the formula (1) 41 + I + 2. H + O , and when the unit synthesis number is sufficient for the hidden layer but insufficient for the output layer, use the formula (2) 41 + I + 2. H + O REM + O MOD . (8+ H +2. MaxPE ), in addition, when the unit synthesis number is sufficient for the output layer but not enough for the hidden layer, use the formula (3) 41 + I + H + O + H REM + H MOD . (8+ I +2. MaxPE ), if the unit synthesis number is not enough for the hidden layer and the output layer, use the formula (4)41+ I + H + H REM + O REM + H MOD . (8+ I +2. MaxPE )+ O MOD . (8+ H +2. MaxPE )

本發明所提供之具彈性結構與學習功能的高速倒傳遞類神經網路系統,與其他習用技術相互比較時,更具備下列優點: The high-speed reverse-transfer neural network system with elastic structure and learning function provided by the invention has the following advantages when compared with other conventional technologies:

1. 由於本發明是在於利用控制器搭配倒傳遞硬體開發出完善功能的倒傳遞網路,以及利用有限制硬體成本下,捨棄掉網路系統的大小受限於運算單元陣列觀念,利用分段計算方式,則可進行龐大的倒傳遞網路運算。 1. Since the present invention is based on the use of a controller with a reverse transfer hardware to develop a perfect function of the reverse transfer network, and the use of limited hardware costs, the size of the discarded network system is limited by the concept of the arithmetic unit array, the use In the segmentation calculation mode, a huge reverse transfer network operation can be performed.

2. 本發明之硬體實現方式,皆採用管線搭配多重性並行運算架構,利用此等方式,使整體運算速度提昇,並可根據使用者需求自行評估判斷是否足夠在即時控制系統上做應用,達到客製化。 2. The hardware implementation of the present invention uses a pipeline with a multi-parallel computing architecture. By using these methods, the overall computing speed is improved, and the user can easily evaluate whether it is sufficient to apply on the instant control system according to the user's needs. Customized.

3. 由於誤差度及準確率差異性相當小,而運算速度比照軟體下快速許多,因此本發明的倒傳遞網路硬體設計已足夠取代軟體計算,在應用層面上更可運用在低階的嵌入式系統或是以傳統加速介面卡方式實現。 3. Since the error degree and the accuracy difference are quite small, and the operation speed is much faster than that under the software, the inverse transfer network hardware design of the present invention is sufficient to replace the software calculation, and can be applied to the lower order at the application level. Embedded systems are implemented in a traditional accelerated interface card.

4. 本發明最主要的特性就是高度平行運算架構,而每一層的節點間相互連結,形成了一個能夠處理複雜工作的網路;本發明具有回想及線上學習功能,改善了以往類神經網路硬體系統,而時脈採用100Mhz且控制單元能以限制硬體成本下,將倒傳遞網路分段計算,而不需要重新規劃及設計整個系統。 4. The most important feature of the present invention is a highly parallel computing architecture, and the nodes of each layer are connected to each other to form a network capable of handling complex work; the present invention has the function of recalling and online learning, and improves the neural network of the past. The hardware system, while the clock is 100Mhz and the control unit can segment the back-transfer network without limiting the hardware cost, without re-planning and designing the entire system.

上列說明係針對本發明之一可行實施例之具體說明,惟該實施例並非用以限制本發明之專利範圍,凡未脫離本發明技藝精神所為之等效實施或變更,均應包含於本案之專利範圍中。 The above description is intended to be illustrative of a possible embodiment of the invention, and is not intended to limit the scope of the invention. In the scope of patents.

綜上所述,本案不但在技術思想上確屬創新,並能較習用物品增進上述多項功效,應以充分符合新穎性及進步性之法定發明專利要件,爰依法提出申請,懇請 貴局核准本件發明專利申請案,以勵發明,至成德便。 To sum up, this case is not only innovative in terms of technical thinking, but also able to enhance the above-mentioned multiple functions compared with conventional articles. It should be submitted in accordance with the law in accordance with the statutory invention patents that fully meet the novelty and progressiveness, and you are requested to approve this article. Invented the patent application, in order to invent the invention, to Chengde.

11‧‧‧前向運算區塊 11‧‧‧ Forward Operation Block

12‧‧‧逆向運算區塊 12‧‧‧Reverse operation block

13‧‧‧控制單元 13‧‧‧Control unit

14‧‧‧運算單元 14‧‧‧ arithmetic unit

1411‧‧‧堆疊架構的記憶體 1411‧‧‧Stacked memory

1412‧‧‧佇列架構的記憶體 1412‧‧‧ Memory of the array architecture

142‧‧‧乘法累加器 142‧‧‧Multiply accumulator

143‧‧‧移位暫存器 143‧‧‧Shift register

144‧‧‧多工器 144‧‧‧Multiplexer

15‧‧‧權重匯流排 15‧‧‧weight bus

16‧‧‧輸入資料匯流排 16‧‧‧Input data bus

17‧‧‧活化函數 17‧‧‧Activation function

18‧‧‧多工器 18‧‧‧Multiplexer

19‧‧‧層級輸入資料匯流排 19‧‧‧Level input data bus

20‧‧‧層級輸出資料匯流排 20‧‧‧Level output data bus

21‧‧‧減法器 21‧‧‧Subtractor

22‧‧‧乘法器 22‧‧‧Multiplier

23‧‧‧加法器 23‧‧‧Adder

24‧‧‧累加器 24‧‧‧ accumulator

25‧‧‧比較器 25‧‧‧ Comparator

26‧‧‧串列輸入匯流排 26‧‧‧Serial input bus

圖一為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之實施系統圖;圖二A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之分段計算示意圖;圖二B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之硬體共構示意圖;圖三A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之運算單元系統圖;圖三B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統 之運算單元訊號時序圖;圖四A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算區塊之系統圖;圖四B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算區塊之時序圖;圖五A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之逆向運算區塊之結構圖;圖五B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之堆疊時序圖;圖六為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之控制單元流程圖;圖七A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之隱藏層到輸出層堆疊的寫入時序圖;圖七B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之輸入層到隱藏層堆疊的寫入時序圖;圖七C為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之運算單元中佇列寫入時序圖;圖七D為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之權重值初始化狀態圖;圖八A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算之分段計算示意圖; 圖八B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算之一次分段計算的訊號關係圖;圖八C為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算之第一層運算時序圖;圖八D為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之前向運算之第二層運算時序圖;圖九A為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之δ訊號圖;圖九B為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之逆向運算之分段計算示意圖;圖九C為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之△w訊號圖;以及圖十為本發明具彈性結構與學習功能的高速倒傳遞類神經網路系統之回想時脈流程圖。 FIG. 1 is a system diagram of an implementation of a high-speed inverse transfer neural network system with elastic structure and learning function; FIG. 2A is a sectional calculation of a high-speed inverse transfer neural network system with elastic structure and learning function according to the present invention; FIG. 2B is a schematic diagram showing the hardware co-coordination of the high-speed inverse transmission neural network system with elastic structure and learning function; FIG. 3A is a high-speed inverse transmission neural network with elastic structure and learning function according to the present invention; The operation unit system diagram of the system; FIG. 3B is a timing diagram of the operation unit signal of the high-speed inverse transmission type neural network system with elastic structure and learning function; FIG. 4A is a high-speed analysis of the elastic structure and the learning function of the present invention. A system diagram of a forward-looking computing block before transmitting a neural network system; FIG. 4B is a timing diagram of a forward-running operational block of a high-speed inverse-transfer neural network system with elastic structure and learning function; FIG. The invention discloses a structural diagram of a reverse operation block of a high-speed inverse transfer type neural network system with elastic structure and learning function; FIG. 5B is a flexible structure and a learning function of the present invention. The stacking timing diagram of the fast reverse transfer type neural network system; FIG. 6 is a flow chart of the control unit of the high speed reverse transfer type neural network system with elastic structure and learning function; FIG. 7A is the elastic structure and learning of the present invention The write timing diagram of the hidden layer to the output layer stack of the functional high-speed inverse transfer type neural network system; FIG. 7B is the input layer to the hidden layer of the high-speed inverse transfer type neural network system with elastic structure and learning function of the present invention The write timing diagram of the stack; FIG. 7C is a timing sequence diagram of the write sequence in the operation unit of the high-speed inverse transfer neural network system with elastic structure and learning function; FIG. 7D is an elastic structure and learning of the present invention The weighted value initialization state diagram of the functional high-speed inverse transmission-like neural network system; FIG. 8A is a schematic diagram of the segmentation calculation of the forward operation of the high-speed inverse transmission-like neural network system with elastic structure and learning function; FIG. The signal relationship diagram of the segmentation calculation of the forward operation of the high-speed inverse transmission type neural network system with elastic structure and learning function of the present invention; FIG. 8C is a bullet of the present invention The first layer operation timing diagram of the high-speed inverse transfer type neural network system of structure and learning function; FIG. 8D is the first step of the forward operation of the high-speed inverse transfer type neural network system with elastic structure and learning function The second layer operation timing chart; FIG. 9A is a δ signal diagram of the high-speed inverse transmission type neural network system with elastic structure and learning function; FIG. 9B is a high-speed reverse transmission type nerve with elastic structure and learning function according to the present invention; Schematic diagram of the segmentation calculation of the reverse operation of the network system; FIG. 9C is a Δw signal diagram of the high-speed inverse transmission neural network system with elastic structure and learning function; and FIG. 10 is an elastic structure and learning of the present invention A functional high-speed reverse-transfer-like neural network system recalls the clock flow chart.

11‧‧‧前向運算區塊 11‧‧‧ Forward Operation Block

12‧‧‧逆向運算區塊 12‧‧‧Reverse operation block

13‧‧‧控制單元13‧‧‧Control unit

Claims (15)

一種具彈性結構與學習功能的高速倒傳遞類神經網路系統,該系統主要包括:一前向運算區塊,當進行前向運算時,資料從輸入層經由隱藏層加權運算,透過活化函數處理後,再傳向輸出層計算出網路輸出值,而每一層神經元只會影響下一層神經元狀態,若此時僅進行回想功能,則將前向運算的結果透過多工器的切換,將計算結果輸出;一逆向運算區塊,若神經網路進行學習,且輸出結果與目標值相差太大時,則將前向運算結果轉為逆向運算區塊的輸入參數,進行逆向運算,並將誤差訊號回傳,透過修改各層神經元的權重值,期望誤差達到容忍範圍之內;一控制單元,係用來控制整體系統的流程,並傳送控制訊號,來控制前向運算區塊及逆向運算區塊之運作順序;該具彈性結構與學習功能的高速倒傳遞類神經網路系統的運算乃為多層的架構,除了輸入層以外,每一層的輸入都是前一層運算後的輸出,再加上每一層都是進行相同的運算模式,因此可以使用相同的運算單元來進行不同層的運算;而當運算單元陣列運算完單一層後,會將運算結果繼續使用於下一層的運算,因此該具彈性結構與學習功能的高速倒傳遞類神經網路系統將運算結果直接傳送到輸入資料匯流排進行下一層的運算,形成一環形架構,因此資料直接可輸入進行運算,不需經由控制單元再做處理。 A high-speed inverse transfer neural network system with elastic structure and learning function, the system mainly comprises: a forward operation block, when performing forward operation, data is processed from the input layer via a hidden layer weighting operation through an activation function After that, the network output value is calculated to the output layer, and each layer of neurons only affects the state of the next layer of neurons. If only the recall function is performed at this time, the result of the forward operation is switched through the multiplexer. Outputting the calculation result; a reverse operation block, if the neural network learns, and the output result is too different from the target value, the forward operation result is converted into the input parameter of the reverse operation block, and the reverse operation is performed, and The error signal is transmitted back, and the expected error is within the tolerance range by modifying the weight value of each layer of neurons; a control unit is used to control the flow of the overall system and transmit control signals to control the forward operation block and reverse The operation sequence of the operation block; the operation of the high-speed inverse transfer type neural network system with elastic structure and learning function is a multi-layer architecture, except Outside the layer, the input of each layer is the output of the previous layer operation, and each layer is the same operation mode, so the same operation unit can be used to perform different layer operations; After the single layer is completed, the operation result will continue to be used for the operation of the next layer. Therefore, the high-speed inverse transfer type neural network system with elastic structure and learning function directly transfers the operation result to the input data bus to perform the next layer operation. Forming a ring structure, the data can be directly input for calculation, and no further processing is required via the control unit. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類 神經網路系統,其中該倒傳遞類神經網路運算中,可劃分為前向運算及逆向運算,以單指令多資料匯流排(SIMD)架構設計方式,將硬體陣列部分訊號共用,以指令方式處理倒傳遞網路的運算。 High-speed inverted transfer type with elastic structure and learning function as described in item 1 of the patent application scope The neural network system, wherein the inverse transfer type neural network operation can be divided into forward operation and reverse operation, and the single-instruction multi-data bus (SIMD) architecture design mode is used to share the hardware array part signals to instruct the instruction The way to handle the operation of the inverted network. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該運算單元內至少包含有記憶體、乘法累加器還有移位暫存器,而此三部份皆為獨立運作,且採用管線設計方式,將運算拆解獨立進行來大幅提高效能。 The high-speed inverse transfer neural network system with elastic structure and learning function as described in claim 1 wherein the arithmetic unit includes at least a memory, a multiply accumulator, and a shift register. All three parts are operated independently, and the pipeline design method is adopted, and the operation and disassembly are independently performed to greatly improve the performance. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該運算單元內部中配置了兩種型態的記憶體方式,因此需要額外增加多工器用來選擇,並為了簡化控制器的設計方式,在每個運算單元中有額外的定址方式,用來寫入權重匯流排的到各個運算單元中堆疊架構的記憶體或佇列架構的記憶體裡面。 For example, the high-speed reverse-transfer neural network system with elastic structure and learning function described in claim 1 of the patent scope, wherein two types of memory modes are arranged in the internal unit of the operation unit, so an additional multiplexer is required. To select, and in order to simplify the design of the controller, there is an additional addressing method in each arithmetic unit for writing the weight bus to the memory of the stacked architecture or the memory of the queue architecture in each computing unit. . 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該運算單元內設計具有堆疊架構的記憶體及佇列架構記憶體;而該堆疊架構的記憶體係用來依照順序存放權重值,而不需要透過額外的位址線來控制記憶體內容,另外該佇列架構記憶體,用來計算逆向運算時所需要的權重值。 The high-speed inverted-transfer neural network system with elastic structure and learning function as described in claim 1, wherein the computing unit is designed with a stacked architecture memory and a matrix architecture memory; and the stacked architecture The memory system is used to store the weight values in order, without the need to control the memory contents through additional address lines. In addition, the array structure memory is used to calculate the weight values required for the reverse operation. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該前向運算區塊是由運算單元串接而成的,所以各個運算單元計算乘加運算時是同時進行,當計算完畢後,各個運算單元中運算完畢的值則透過移位元暫存器的方式,將資料傳送到下一 層級。 The high-speed inverse transfer neural network system with elastic structure and learning function as described in claim 1 of the patent scope, wherein the forward operation block is serially connected by the operation unit, so each operation unit calculates the multiplication and addition. The calculation is performed simultaneously. When the calculation is completed, the calculated values in the respective operation units are transferred to the next by means of the shifting element register. Level. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該控制單元負責掌控整個系統運算,主要負責儲存元件、計算分段運算、運算單元陣列位置及做為起始訊號使用。 For example, the high-speed inverse transfer neural network system with flexible structure and learning function as described in claim 1 of the patent scope, wherein the control unit is responsible for controlling the entire system operation, and is mainly responsible for storing components, calculating segmentation operations, and computing unit array positions. And use as a start signal. 如申請專利範圍第7項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該儲存元件係指控制單元需將每個佇列及堆疊的寫入或讀取分別以有限狀態機來掌控時序及目前狀態。 The high-speed inverted-transfer neural network system with flexible structure and learning function as described in claim 7 of the patent application, wherein the storage component means that the control unit needs to write or read each queue and stack respectively. A finite state machine controls the timing and current state. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該計算分段運算係指控制單元需要負責累加器清除,多工器訊號選擇線及運算單元陣列位置等。 The high-speed inverse transfer neural network system with flexible structure and learning function as described in claim 1 of the patent scope, wherein the calculation of the segmentation operation means that the control unit needs to be responsible for the accumulator clearing, the multiplexer signal selection line and the operation. Unit array position, etc. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該做為起始訊號使用係指對於輸入樣本佇列、輸出樣本佇列及結果存放佇列,控制單元則是當起始訊號使用。 For example, the high-speed inverse transmission neural network system with elastic structure and learning function as described in claim 1 of the patent application, wherein the use as the initial signal means that the input sample is queued, the output sample is queued, and the result is stored. Column, the control unit is used when the start signal. 如申請專利範圍第1項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該控制單元的流程概略可分為:(1)產生起始訊號,當堆疊初始化完畢後,系統馬上執行前向運算,如此可縮短等待時間,另外佇列的初始化更新則可利用前向運算過程中運作即可;(2)若在逆向運算時,佇列初始化若未完成的話,則會等待佇列初始化完畢後才進行逆向運算;(3)當進行前向運算時,會先評估是否需要分段計算,而後再傳入 活化函數,接著會判斷是否為輸出層,若不是則代表目前要計算隱藏層到輸出層,因此也會先評估是否分段計算;(4)在送入活化函數後,則會判斷是否目前為回想狀態,若不是,會等待運算單元佇列訊號完畢後,則開始計算輸出層δ,而當計算出輸出層的δ時,會接續計算隱藏層δ,但由於硬體共構關係,因此在此也要考量是否用分段計算來計算誤差項,但僅須計算△w及更新權重即可完成一次訓練。 For example, the high-speed inverse transmission neural network system with flexible structure and learning function described in claim 1 of the patent scope, wherein the flow of the control unit can be roughly divided into: (1) generating a start signal, when the stack is initialized The system immediately performs the forward operation, which can shorten the waiting time, and the initialization update of the queue can be operated by the forward operation; (2) if the initialization of the queue is not completed when the operation is reversed, It will wait for the reverse operation after the initialization of the queue; (3) When performing the forward operation, it will first evaluate whether the segmentation calculation is needed, and then pass the activation function, and then judge whether it is the output layer, if not, it represents At present, the hidden layer to the output layer is calculated, so the segmentation calculation is also evaluated first; (4) after the activation function is sent, it is judged whether it is currently the recall state, and if not, it will wait for the arithmetic unit to wait for the signal to be completed. Then, the output layer δ is calculated, and when the δ of the output layer is calculated, the hidden layer δ is successively calculated, but due to the hardware co-construction relationship, it is also considered whether segmentation is used here. Calculate to calculate the error term, but only need to calculate △ w and update the weight to complete a training. 如申請專利範圍第11項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該前向運算係為多層感知機系統,因此每一層運算都擁有前後層的關係,因此可把網路的前層都從輸入資料匯流排給定資料,而後層運算的結果則是需要利用記憶體存放下來,在下一層運算時,再將存放值置於輸入資料匯流排上。 The high-speed inverse transmission neural network system with flexible structure and learning function as described in claim 11 of the patent scope, wherein the forward operation system is a multi-layer perceptron system, so each layer operation has a relationship of front and back layers, The front layer of the network can be given the given data from the input data, and the result of the later layer operation needs to be stored by the memory. In the next layer of operation, the stored value is placed on the input data bus. 如申請專利範圍第11項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該逆向運算被劃分為四大類,分別為計算δ、Error、△w及權重更新,而計算隱藏層的誤差項時,會使用到前向運算區塊中的運算單元陣列,並採用資料流及管線方式,將逆向運算區塊的佇列或是堆疊逐步讀出訊號或寫入。 The high-speed inverse transfer neural network system with elastic structure and learning function as described in claim 11 of the patent scope, wherein the reverse operation is divided into four categories, namely, calculating δ, Error , Δ w, and weight update, respectively. When calculating the error term of the hidden layer, the array of the operation unit in the forward operation block is used, and the data stream and the pipeline mode are used to gradually read or write the signal or the stack of the reverse operation block. 如申請專利範圍第11項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中該送入活化函數後,運算單元合成數與神經網路系統會產生四種不同組合,該四種組合分別為運算單元合成數足夠於隱藏層及輸出層、運算單元合成數足夠於隱藏層但不足輸出層、運 算單元合成數足夠於輸出層但不足隱藏層及運算單元合成數皆不足於隱藏層及輸出層,若是運算單元合成足夠於隱藏層及輸出層時,僅需等待神經元輸出值經過環狀架構,而等待的時間則為該層的神經元輸出節點;若是合成數不足於隱藏層及輸出層的任一或二者時,則等待的時間則為分段運算完畢加上該層除上運算單元合成數的餘數。 For example, the high-speed inverse transfer neural network system with flexible structure and learning function described in claim 11 of the patent scope, wherein after the activation function is sent, the arithmetic unit synthesis number and the neural network system generate four different combinations. The four combinations are respectively the arithmetic unit synthesis number is sufficient for the hidden layer and the output layer, and the arithmetic unit synthesis number is sufficient for the hidden layer but insufficient for the output layer, The unit synthesis number is sufficient for the output layer but the hidden layer and the arithmetic unit synthesis number are not enough for the hidden layer and the output layer. If the operation unit synthesis is sufficient for the hidden layer and the output layer, only the neuron output value needs to wait for the ring structure. The waiting time is the neuron output node of the layer; if the composite number is less than either or both of the hidden layer and the output layer, the waiting time is the segmentation operation plus the layer division operation The remainder of the unit's composite number. 如申請專利範圍第14項所述之具彈性結構與學習功能的高速倒傳遞類神經網路系統,其中當運算單元合成數足夠於隱藏層及輸出層,則等待神經元輸出值經過環狀架構的時間,使用下列公式計算等待時間:41+I+2.H+O,而當運算單元合成數足夠於隱藏層但不足輸出層,則等待神經元輸出值經過環狀架構的時間,使用下列公式計算等待時間:41+I+2.H+O REM +O MOD .(8+H+2.MaxPE),另外當運算單元合成數足夠於輸出層但不足隱藏層,則等待神經元輸出值經過環狀架構的時間,使用下列公式計算等待時間:41+I+H+O+H REM +H MOD .(8+I+2.MaxPE),若當運算單元合成數皆不足於隱藏層及輸出層,則等待神經元輸出值經過環狀架構的時間,使用下列公式計算等待時間:41+I+H+H REM +O REM +H MOD .(8+I+2.MaxPE)+O MOD .(8+H+2.MaxPE);其中I為輸入層節點數、H為隱藏層節點數、O為輸出層節點數、MaxPE為運算單元合成數量、HMOD為隱藏層除運算單元合成數量的商數、HREM為隱藏層除運算單元合成數量的餘數、OMOD為輸出層除運算單 元合成數量的商數、OREM為輸出層除運算單元合成數量的餘數。 For example, the high-speed inverse transfer neural network system with flexible structure and learning function as described in claim 14 of the patent application, wherein when the arithmetic unit synthesis number is sufficient for the hidden layer and the output layer, waiting for the neuron output value to pass through the ring structure Time, use the following formula to calculate the waiting time: 41 + I + 2. H + O , and when the unit synthesis number is sufficient for the hidden layer but insufficient for the output layer, wait for the time when the neuron output value passes through the ring structure, and calculate the wait time using the following formula: 41+ I + 2. H + O REM + O MOD . (8+ H +2. MaxPE ), in addition, when the unit synthesis number is sufficient for the output layer but insufficient for the hidden layer, wait for the time when the neuron output value passes through the ring structure, and calculate the waiting time using the following formula: 41+ I + H + O + H REM + H MOD . (8+ I +2. MaxPE ), if the synthesized number of the arithmetic unit is not enough for the hidden layer and the output layer, wait for the time when the neuron output value passes through the ring structure, and calculate the waiting time using the following formula: 41+ I + H + H REM + O REM + H MOD . (8+ I +2. MaxPE )+ O MOD . (8+ H +2. MaxPE ); where I is the number of input layer nodes, H is the number of hidden layer nodes, O is the number of output layer nodes, MaxPE is the number of arithmetic unit synthesis, and H MOD is the hidden layer divided by the number of arithmetic units. The quotient, H REM is the remainder of the number of combinations of the hidden layer in the hidden layer, O MOD is the quotient of the number of synthesized elements of the output layer, and O REM is the remainder of the number of combinations of the output layer divided by the arithmetic unit.
TW97145030A 2008-11-21 2008-11-21 High - speed reverse transfer neural network system with elastic structure and learning function TWI417798B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97145030A TWI417798B (en) 2008-11-21 2008-11-21 High - speed reverse transfer neural network system with elastic structure and learning function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97145030A TWI417798B (en) 2008-11-21 2008-11-21 High - speed reverse transfer neural network system with elastic structure and learning function

Publications (2)

Publication Number Publication Date
TW201020939A TW201020939A (en) 2010-06-01
TWI417798B true TWI417798B (en) 2013-12-01

Family

ID=44832456

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97145030A TWI417798B (en) 2008-11-21 2008-11-21 High - speed reverse transfer neural network system with elastic structure and learning function

Country Status (1)

Country Link
TW (1) TWI417798B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105446821A (en) * 2015-11-11 2016-03-30 哈尔滨工程大学 Improved neural network based fault diagnosis method for intelligent underwater robot propeller
US9805304B2 (en) 2015-05-21 2017-10-31 Google Inc. Prefetching weights for use in a neural network processor
TWI657381B (en) * 2017-03-03 2019-04-21 美商慧與發展有限責任合夥企業 Electronic device and related method
CN109920248A (en) * 2019-03-05 2019-06-21 南通大学 A kind of public transport arrival time prediction technique based on GRU neural network
CN110459056A (en) * 2019-08-26 2019-11-15 南通大学 A kind of public transport arrival time prediction technique based on LSTM neural network
US11399079B2 (en) 2018-02-14 2022-07-26 Eingot Llc Zero-knowledge environment based networking engine

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140143191A1 (en) 2012-11-20 2014-05-22 Qualcomm Incorporated Piecewise linear neuron modeling
TWI688871B (en) 2019-08-27 2020-03-21 國立清華大學 Matrix multiplication device and operation method thereof
CN111738439B (en) * 2020-07-21 2020-12-29 电子科技大学 Artificial intelligence processing method and processor supporting online learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW279231B (en) * 1995-04-18 1996-06-21 Nat Science Council This invention is related to a new neural network for prediction
TW280890B (en) * 1993-03-31 1996-07-11 Motorola Inc

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW280890B (en) * 1993-03-31 1996-07-11 Motorola Inc
TW279231B (en) * 1995-04-18 1996-06-21 Nat Science Council This invention is related to a new neural network for prediction

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805304B2 (en) 2015-05-21 2017-10-31 Google Inc. Prefetching weights for use in a neural network processor
US10049322B2 (en) 2015-05-21 2018-08-14 Google Llc Prefetching weights for use in a neural network processor
TWI636368B (en) * 2015-05-21 2018-09-21 谷歌有限責任公司 Circuit and method for performing nueral network
US11853865B2 (en) 2015-05-21 2023-12-26 Google Llc Prefetching weights for use in a neural network processor
US10878316B2 (en) 2015-05-21 2020-12-29 Google Llc Prefetching weights for use in a neural network processor
US11281966B2 (en) 2015-05-21 2022-03-22 Google Llc Prefetching weights for use in a neural network processor
CN105446821B (en) * 2015-11-11 2019-05-17 哈尔滨工程大学 A kind of Intelligent Underwater Robot propeller method for diagnosing faults based on improvement neural network
CN105446821A (en) * 2015-11-11 2016-03-30 哈尔滨工程大学 Improved neural network based fault diagnosis method for intelligent underwater robot propeller
US11315009B2 (en) 2017-03-03 2022-04-26 Hewlett Packard Enterprise Development Lp Analog multiplier-accumulators
TWI657381B (en) * 2017-03-03 2019-04-21 美商慧與發展有限責任合夥企業 Electronic device and related method
US11399079B2 (en) 2018-02-14 2022-07-26 Eingot Llc Zero-knowledge environment based networking engine
CN109920248A (en) * 2019-03-05 2019-06-21 南通大学 A kind of public transport arrival time prediction technique based on GRU neural network
CN109920248B (en) * 2019-03-05 2021-09-17 南通大学 Bus arrival time prediction method based on GRU neural network
CN110459056A (en) * 2019-08-26 2019-11-15 南通大学 A kind of public transport arrival time prediction technique based on LSTM neural network

Also Published As

Publication number Publication date
TW201020939A (en) 2010-06-01

Similar Documents

Publication Publication Date Title
TWI417798B (en) High - speed reverse transfer neural network system with elastic structure and learning function
JP7166389B2 (en) Systems and integrated circuits for bit-serial computation in neural networks
JP6977239B2 (en) Matrix multiplier
CN109358900B (en) Artificial neural network forward operation device and method supporting discrete data representation
CN107341547B (en) Apparatus and method for performing convolutional neural network training
CN111897579B (en) Image data processing method, device, computer equipment and storage medium
CN108304922A (en) Computing device and computational methods for neural computing
TW201331855A (en) High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes
CN116541647A (en) Operation accelerator, processing method and related equipment
WO2017177442A1 (en) Discrete data representation supported device and method for forward operation of artificial neural network
US11314674B2 (en) Direct memory access architecture with multi-level multi-striding
WO2021108356A1 (en) Tile subsystem and method for automated data flow and data processing within an integrated circuit architecture
CN111183418A (en) Configurable hardware accelerator
CN110689123B (en) Long-short term memory neural network forward acceleration system and method based on pulse array
CN110580519A (en) Convolution operation structure and method thereof
KR102349138B1 (en) High-speed computer accelerators with pre-programmed functions
WO2022182573A1 (en) Time-multiplexed use of reconfigurable hardware
TW200923803A (en) Hardware neural network learning and recall architecture
WO2017177446A1 (en) Discrete data representation-supporting apparatus and method for back-training of artificial neural network
WO2020230374A1 (en) Arithmetic operation device and arithmetic operation system
RU2294561C2 (en) Device for hardware realization of probability genetic algorithms
JPH04503720A (en) Flexible control device and method for digital signal processing device
Meng et al. Ppoaccel: A high-throughput acceleration framework for proximal policy optimization
Liu et al. A cloud server oriented FPGA accelerator for LSTM recurrent neural network
CN115310037A (en) Matrix multiplication computing unit, acceleration unit, computing system and related method

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees