JP2021013048A

JP2021013048A - Spiking neural network by 3d network on-chip

Info

Publication number: JP2021013048A
Application number: JP2019124541A
Authority: JP
Inventors: アブダラアブデラゼクベン; Ben Abdallah Abderazek; フィテーヴー; Vu Huy The; 雅之久田; Masayuki Hisada
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2021-02-04
Anticipated expiration: 2039-07-03
Also published as: JP7277682B2

Abstract

To provide a spiking neural network by multicast spike routing algorithm that leverages the unique 3D structure of a brain and reduces communication delay between neurons for enabling seamless implementation of a large-scale SNN (Spiking Neural Network)-based computing system.SOLUTION: A spiking neural network randomly determines a plurality of barycenters, calculates distances from each of a plurality of transmission destination routers implemented in a 3D network on-chip to each of the barycenters, assigns the plurality of transmission destination routers to any of a plurality of subgroups corresponding to the respective barycenters based on the calculated distances, re-determines a plurality of barycenters based on an assignment result, identifies a transmission path of a packet based on the re-determined barycenters, and transmits the packet using the identified transmission path.SELECTED DRAWING: Figure 2

Description

本発明は、３次元ネットワークオンチップによるスパイキングニューラルネットワークに関する。 The present invention relates to a spiking neural network using a three-dimensional network on chip.

近年、神経科学の研究は、個々のニューロンの構造及び動作について多くのことを明らかにし、医療ツールによって、脳のさまざまな領域の神経活動が感覚刺激に従う様子についての理解が可能になってきている。また、ソフトウェアベースの人工知能（ＡＩ：ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）の進歩は、従来のフォンノイマンコンピューティングスタイルのボトルネックを克服させる脳のような機能を有するデバイス及びシステムの構築技術の最先端に我々を到達させている。 In recent years, neuroscience research has revealed much about the structure and behavior of individual neurons, and medical tools have made it possible to understand how neural activity in different regions of the brain follows sensory stimuli. .. In addition, advances in software-based artificial intelligence (AI) have brought us to the forefront of technology for building devices and systems with brain-like functions that overcome the bottlenecks of traditional von Neumann computing styles. I'm letting you.

ニューロインスパイアードシステムまたはニューロモルフィックシステムと従来の情報処理システムとの間の主な違いは、ニューロインスパイアードシステムやニューロモルフィックシステムがメモリ構造及び組織を使用していることにある。フォンノイマンスタイルに基づくシステムが、メインメモリ領域から物理的に分離された１つまたは複数の中央処理装置を有しているのに対し、生物学的（スパイキング）ニューラルネットワークシステム及び人工ニューラルネットワークシステムのそれぞれでは、共局在化されたメモリ及び計算の分散が行われている。スパイキングニューラルネットワーク（ＳｐｉｋｉｎｇＮｅｕｒａｌＮｅｔｗｏｒｋｓ：以下、ＳＮＮと呼ぶ）に基づくニューロインスパイアードテクノロジーは、脳についてのより良い理解を獲得し、そして、生物学に触発された新しい計算を探求するために注目を集めている。ＳＮＮは、視覚認識タスクや分類タスク等のいくつかのアプリケーションに正常に適用されている（非特許文献１）。また、ニューロモルフィックハードウェアの実装は、大規模ネットワークをリアルタイムで実行することを可能にする。これは、ニューロロボティクス制御、ブレインマシンインタフェース及びロボットによる意思決定を含むいくつかのアプリケーションにおいて重要な要件である。 The main difference between a neuroinspired or neuromorphic system and a conventional information processing system is that the neuroinspired or neuromorphic system uses a memory structure and organization. Systems based on the von Neumann style have one or more central processing units physically separated from the main memory area, whereas biological (spiking) neural network systems and artificial neural network systems. In each of these, co-localized memory and computational distribution are performed. Neuroinspired technology based on spiking neural networks (hereinafter referred to as SNNs) is focused on gaining a better understanding of the brain and exploring new biology-inspired calculations. I'm collecting. SNN has been successfully applied to some applications such as visual recognition tasks and classification tasks (Non-Patent Document 1). The implementation of neuromorphic hardware also makes it possible to run large networks in real time. This is an important requirement in some applications, including neurorobotics control, brain-machine interfaces and robotic decision making.

ＳＮＮは、スパイク事象を介して通信するニューロンの並列アレイに基づいて哺乳動物の脳における情報処理の模倣を試みる。ニューロンが各伝播サイクルにおいて発火する典型的な多層パーセプトロンネットワークとは異なり、ＳＮＮモデルのニューロンは、膜電位が特定の値に達したときにのみ発火する。ＳＮＮにおいて、情報は、一致符号化、レート符号化、時間符号化等のさまざまな符号化方式を用いることによって符号化される（特許文献１）。ＳＮＮでは、通常、他のニューロンからの外部刺激によって十分な刺激を受けた場合に、ニューロンが神経線維を伝達可能な電圧スパイク（スパイクあたり持続時間は約１ｍｓ）を生成する統合発火型ニューロンモデル（非特許文献２及び３）が採用される。これらのパルスは、振幅、形状及び持続時間が異なるが、一般的に、同一のイベントとして取り扱われる。また、Ｈｏｄｇｋｉｎ−Ｈｕｘｌｅｙのコンダクタンスに基づくニューロン（非特許文献４）は、生物学的ニューロンのイオンチャネルの非線形及び確率的な力を効率的にモデル化するためによく使用される。しかしながら、Ｈｏｄｇｋｉｎ−Ｈｕｘｌｅｙモデルは、大規模なシミュレーションやハードウェア実装に使用するには複雑すぎるという問題がある。 SNN attempts to mimic information processing in the mammalian brain based on a parallel array of neurons that communicate via spike events. Unlike typical multi-layer perceptron networks, where neurons fire in each propagation cycle, neurons in the SNN model fire only when the membrane potential reaches a certain value. In SNN, information is encoded by using various coding methods such as match coding, rate coding, and time coding (Patent Document 1). In SNNs, an integrated firing neuron model that produces voltage spikes (duration per spike of about 1 ms) that allows neurons to transmit nerve fibers when sufficiently stimulated by external stimuli from other neurons. Non-patent documents 2 and 3) are adopted. These pulses differ in amplitude, shape and duration, but are generally treated as the same event. Also, Hodgkin-Huxley conductance-based neurons (Non-Patent Document 4) are often used to efficiently model the non-linear and stochastic forces of ion channels in biological neurons. However, the Hodgkin-Huxley model has the problem that it is too complex to use for large-scale simulations and hardware implementations.

近年、多数のディープＳＮＮが提案されている（非特許文献５）。これらは、多くのスパイキングニューロンから構成されており、さまざまなパターン認識タスクにおいて成功している（非特許文献１及び６）。しかしながら、これらのモデルは、多層として知られているが、伝統的なディープニューラルネットワークと比較して多くの訓練可能な層を持っていないことに言及すべきである。これは、従来のＡＮＮ（ＡｒｔｉｆｉｃｉａｌＮｅｕｒａｌＮｅｔｗｏｒｋｓ）の逆伝播のように、スパイキングディープネットワークを直接的に訓練するための効率的な学習規則がないためである（非特許文献５）。一方、大規模なＳＮＮは、脳の複雑な活動をシミュレートするために求められる。例えば、Ｓｐａｎｕと呼ばれる２５０万ニューロンモデルが存在している（非特許文献７）。Ｓｐａｎｕは、神経解剖学、神経生理学及び心理的行動の多くの側面を捉え、数字認識タスクについても精度良く実行する。ディープＳＮＮにおいて、ニューロン間の通信は、実装時に不可欠な役割を果たす。多数のニューロンを平面構造にマッピングし、その結果として得られる平面のダイを貫通シリコンビア（ＴＳＶ：Ｔｈｒｏｕｇｈ−Ｓｉｌｉｃｏｎｖｉａ）を用いて積み重ねることによって、通信待ち時間を大幅に短縮することが可能になる。 In recent years, a large number of deep SNNs have been proposed (Non-Patent Document 5). These are composed of many spiking neurons and have been successful in various pattern recognition tasks (Non-Patent Documents 1 and 6). However, it should be noted that these models, known as multi-layers, do not have many trainable layers compared to traditional deep neural networks. This is because there is no efficient learning rule for directly training the spiking deep network like the conventional back propagation of ANN (Artificial Neural Networks) (Non-Patent Document 5). Large-scale SNNs, on the other hand, are required to simulate complex brain activity. For example, there is a 2.5 million neuron model called Spain (Non-Patent Document 7). Spanu captures many aspects of neuroanatomy, neurophysiology and psychological behavior and also accurately performs numerical recognition tasks. In deep SNNs, communication between neurons plays an essential role during implementation. By mapping a large number of neurons to a planar structure and stacking the resulting planar dies using through silicon vias (TSVs: Through-Silicon Via), it is possible to significantly reduce communication latency. ..

ＳＮＮのソフトウェアシミュレーションは、ニューロシステムの挙動を調べるための適切な方法である。しかしながら、ソフトウェアによる大規模な（深い）ＳＮＮシステムのシミュレーションは低速である。他の手法としては、独立したスパイクを正確に生成し、同時にスパイクをリアルタイムで出力する可能性を提供するハードウェア実装がある。ハードウェア実装は、ソフトウェアシミュレーションよりも計算速度が向上するという利点を有するため、固有の並列処理を行った場合における利点を最大限に活用することが可能である。そして、複数のニューロコアを持つ特殊なハードウェアキテクチャは、ニューラルネットワーク固有の並列処理を活用することで、低電力で高い処理速度を実現することが可能にある。そのため、ＳＮＮは、組み込みニューロモルフィックデバイスや制御アプリケーションに適している。 SNN software simulations are a good way to investigate the behavior of neurosystems. However, software simulation of large (deep) SNN systems is slow. Another approach is a hardware implementation that provides the possibility to accurately generate independent spikes while simultaneously outputting spikes in real time. Hardware implementation has the advantage of higher computational speed than software simulation, so it is possible to take full advantage of the advantages of performing unique parallel processing. A special hardware texture having a plurality of neurocores can realize a high processing speed with low power by utilizing the parallel processing peculiar to the neural network. Therefore, SNNs are suitable for embedded neuromorphic devices and control applications.

大量のシナプスを持つスパイキングニューラルネットワークアーキテクチャ（ニューロモルフィック）をハードウェアで構築する際に解決する必要がある課題には、低消費電力での小型の超並列アーキテクチャ、効率的なニューロコーディングスキーム、及び、軽量なオンチップ学習アルゴリズムの構築が含まれる。他の主要な課題は、ニューロコアとそのコアに転送されるオフチップデータとの間でデータを通信させるオンチップ通信及びルーティングネットワークである。さらに、接続されるニューロンの数は、現在のマルチコア/マルチプロセッサＳｏＣ（ＳｙｓｔｅｍｏｎａＣｈｉｐ）プラットフォームにおいて相互接続される必要があるＰＥ（ＰｒｏｃｅｓｓｉｎｇＥｌｅｍｅｎｔ）の数の少なくとも１０３倍である（非特許文献８）。上記の制約により、このような頭脳に似たＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）の展開は、困難なオンチップ相互接続の問題となる（非特許文献９）。ＳＮＮにおいて、各ニューロンは、入力スパイク、シナプス荷重、現在の膜電位、及び、一定の漏れ係数を含むいくつかのパラメータの関数である内部膜電位を維持する（非特許文献１０）。ニューロンの活動は、ニューロン及びニューラルシステムの機能的特性を決定する上で重要な役割を果たすニューロンの連結性によって制約されている。脳の連結性は、一般的に、以下のようないくつかのスケールで記述される：（１）個々のニューロンをマイクロスケールでリンクする個々のシナプス接続、（２）メソスケールにおいてニューロンの細胞集団を結ぶネットワーク、そして、（３）マクロスケールで線維経路によって結び付けられた脳領域。 The challenges that need to be resolved when building a spiking neural network architecture (neuromorphic) with a large number of synapses in hardware include a small massively parallel architecture with low power consumption, an efficient neurocoding scheme, and so on. It also includes the construction of lightweight on-chip learning algorithms. Another major challenge is on-chip communication and routing networks that allow data to communicate between the neurocore and the off-chip data transferred to that core. Further, the number of connected neurons is at least 103 times the number of PEs (Processing Elements) that need to be interconnected in the current multi-core / multi-processor System on a Chip (SoC) platform (Non-Patent Document 8). ). Due to the above restrictions, the development of such a brain-like IC (Integrated Circuit) becomes a difficult on-chip interconnection problem (Non-Patent Document 9). In the SNN, each neuron maintains an internal membrane potential that is a function of several parameters including input spikes, synaptic weights, current membrane potentials, and constant leakage coefficients (Non-Patent Document 10). Neuronal activity is constrained by neuronal connectivity, which plays an important role in determining the functional properties of neurons and neural systems. Brain connectivity is generally described on several scales: (1) individual synaptic connections that link individual neurons on a microscale, and (2) a population of neurons on the mesoscale. And (3) macroscale brain regions connected by fibrous pathways.

適切なニューロンとネットワークモデルとを備えた効率的なＳＮＮにおいて、ニューロンへのディラックデルタ関数や整形後シナプス電位（ＥＰＳＰ：ＥｘｃｉｔａｔｏｒｙＰｏｓｔＳｙｎａｐｔｉｃＰｏｔｅｎｔｉａｌ／ＩＰＳＰ：ＩｎｈｉｂｉｔｏｒｙＰｏｓｔＳｙｎａｐｔｉｃＰｏｔｅｎｔｉａｌ）等のシナプス入力の到着時間は、ニューロンの出力（スパイク）の時間に大きな影響を与える。その結果、図１に示すように、タイミング違反は、スパイキングニューロンの適切な機能（発火）やシステム全体のオンチップ学習機能に影響を与える。 In an efficient SNN with an appropriate neuron and network model, synaptic inputs such as the Dirac Delta function to the neuron and the post-orthopedic synaptic potential (EPSP: Excitatory PostSynaptic Potential / IPSP: Inhibitory PostSynaptic Potential) It has a great effect on the output (spiking) time of. As a result, as shown in FIG. 1, timing violations affect the proper functioning (firing) of spiking neurons and the on-chip learning function of the entire system.

通信媒体としての共有バスは、マルチキャストルーティングを備えた大規模で複雑なＳＮＮチップ／システムの実装に適していない。これは、ニューロンを追加すると、チップの通信容量が減少し、さらに、共有バスの長さが長くなるためにニューロンの発火率に影響を与える可能性があるからである。また、ニューラル接続における非線形の増加は、専用のポイントツーポイント通信方式を使用した直接的な実施において非常に重要である。 Shared buses as communication media are not suitable for implementing large and complex SNN chips / systems with multicast routing. This is because the addition of neurons reduces the communication capacity of the chip and also increases the length of the shared bus, which can affect the firing rate of neurons. Also, the increase in non-linearity in neural connections is very important in direct implementation using a dedicated point-to-point communication method.

二次元パケット交換ネットワークオンチップ（２Ｄ−Ｎｏｃ：Ｔｗｏ−ｄｉｍｅｎｓｉｏｎａｌｐａｃｋｅｔ−ｓｗｉｔｃｈｅｄＮｅｔｗｏｒｋ−ｏｎ−Ｃｈｉｐ）は、従来提案されてきたＳＮＮに基づく共有通信媒体に見られる相互接続問題に対処するための潜在的な解決策として考えられてきた（非特許文献９及び１２）。しかしながら、このような相互接続戦略は、特に大規模ＳＮＮチップにおいて、低い電力消費で高い拡張性を達成することを困難にする。パケット交換ＮｏＣとは別に、回線交換ＮｏＣを使用すると、さまざまなルーティング/スイッチングメカニズムのパフォーマンスを調べることが可能になる。回線交換ＮｏＣは、パケット交換と比較して、ハードウェアの複雑さが小さくエネルギー効率が高いが、セットアップ時間が長くなる。 Two-dimensional packet-switched network-on-chip (2D-Noc: Two-dimensional packet-switched Network-on-Chip) has the potential to address the interconnect problems found in previously proposed SNN-based shared communication media. It has been considered as a solution (Non-Patent Documents 9 and 12). However, such interconnect strategies make it difficult to achieve high scalability with low power consumption, especially in large SNN chips. Circuit-switched NoCs, separate from packet-switched NoCs, allow the performance of various routing / switching mechanisms to be investigated. Circuit switching NoC has less hardware complexity and higher energy efficiency than packet switching, but requires longer setup time.

ここ数年で、３Ｄ−ＩＣとメッシュベースのＮｏＣの利点は、特にＡＩを搭載したチップにおいて、ＩＣ設計の新たな領域を開く有望なアーキテクチャに融合された。ＮｏＣの並列性は、短いワイヤ長と３Ｄ−ＩＣの相互接続の低消費電力のおかげで、３次元において強化することが可能である。その結果、３Ｄ-ＮｏＣパラダイムは、将来のＩＣ設計にとって最も先進的で好都合なアーキテクチャの１つであると考えられている。３Ｄ-ＮｏＣは、非常に高い帯域幅であって低消費電力の相互接続（非特許文献１３）を提供し、新たな人工知能（ＡＩ）アプリケーションの高い要件を満たすことが可能になる。３Ｄ−ＮｏＣとＳＮＮとを組み合わせる場合、スパイキングニューロンは、ＰＥ（ニューロコア）と見なすことが可能になる。ニューロン間の接続性は、拡張性のある相互接続ネットワークを介してスパイクパケットを送信する形で実装される。なお、この場合、ＰＥは、３Ｄ−ＮｏＣルータに接続されたＳＮＰＣ（ＳｐｉｋｉｎｇＮｕｅｒｏｎＰｒｏｃｅｓｓｉｎｇＣｏｒｅ）を指しており、ＮｏＣチャネルは、ニューロンのシナプスに類似しており、さらに、ＮｏＣトポロジは、ニューロンがネットワーク内で相互接続される方法を指している。 Over the last few years, the advantages of 3D-ICs and mesh-based NoCs have been combined with promising architectures that open up new areas of IC design, especially on AI-equipped chips. NoC parallelism can be enhanced in three dimensions thanks to the short wire length and the low power consumption of the 3D-IC interconnect. As a result, the 3D-NoC paradigm is considered to be one of the most advanced and convenient architectures for future IC design. 3D-NoC provides a very high bandwidth and low power consumption interconnect (Non-Patent Document 13), making it possible to meet the high requirements of new artificial intelligence (AI) applications. When 3D-NoC and SNN are combined, spiking neurons can be considered as PE (neurocore). Connectivity between neurons is implemented by sending spike packets over an scalable interconnect network. In this case, PE refers to an SNPC (Spiking Nueron Processing Core) connected to a 3D-NoC router, the NoC channel resembles a neuron synapse, and the NoC topology is a neuron network. Refers to the way they are interconnected within.

ＳＮＮのハードウェア実装の主な問題の１つは、それらの信頼性に関する可能性である。ＳＮＮには、生物学的神経モデルによって触発された大規模で平行な構造のおかげで、いくつかの固有のフォールトトレランス特性があると言われているが、実際の場合に関しては必ずしもそうではない（非特許文献１４）。実際、半導体部品の継続的な縮小から引き継がれた課題により、ハードウェアでのＳＮＮの実装は、さまざまな障害にさらされる（非特許文献１４）。歩留まりが大きな問題となる場合、組み込みシステム向けの大規模なＳＮＮの統合に進むにつれて、障害リスクはさらに重要になる（非特許文献１５）。ニューロン間通信の信頼性を考慮する場合、特に重要なアプリケーション（航空宇宙、自動運転車、生物医学など）で発生する場合において、障害は、システムのパフォーマンスに影響を与える可能性がある。このような障害は、望ましくない不正確さ、または、不可逆的であって深刻な結果を招く可能性がある。ＳＮＮでは、ニューロン間接続に障害が発生すると、シナプス後ニューロンが無反応状態または無反応に近い状態（低発火活動状態）になる。図１（ｃ）に示すように、Ｎ１からＮ４への接続にリンク切れが存在する場合、Ｎ４の潜在的な膜では、図１（ｂ）の場合のように、出力スパイクを発火させる閾値に到達しない。これにより、シナプス後ニューロンの発火率が低下する。 One of the main issues with hardware implementation of SNNs is their reliability potential. SNNs are said to have some unique fault tolerance properties, thanks to the large, parallel structures inspired by biological neural models, but not necessarily in practice (in practice). Non-Patent Document 14). In fact, the challenges inherited from the continuous shrinkage of semiconductor components expose the implementation of SNNs in hardware to a variety of obstacles (Non-Patent Document 14). When yield is a major issue, failure risk becomes even more important as large-scale SNN integration for embedded systems progresses (Non-Patent Document 15). When considering the reliability of interneuron communication, failures can affect system performance, especially when occurring in critical applications (aerospace, self-driving cars, biomedicine, etc.). Such obstacles can have unwanted inaccuracies or irreversible and serious consequences. In SNN, when the connection between neurons is impaired, the postsynaptic neurons become unresponsive or nearly unresponsive (low firing activity state). As shown in FIG. 1 (c), if there is a broken link in the connection from N1 to N4, the potential membrane of N4 will reach the threshold for firing the output spike, as in FIG. 1 (b). Not reachable. This reduces the firing rate of postsynaptic neurons.

従って、レートコーディング方法に基づいたＳＮＮモデルの全体的なパフォーマンスに影響を与える可能性がある（非特許文献１６）。発火率が低いニューロンは、発火率のノイズと分散を増加させるスパイクの一時的なジッタの影響を受けやすくなる（非特許文献１７）。その結果、効率的なフォールトトレラント技術が必要となる。このようなメカニズムでは、回復時間が重要な要件の１つになる。図１（ｄ）に示すように、フォールトトレラントルーティング方法の長い待ち時間が発火率に影響を与える可能性がある。特に、スパイク間の相対的なタイミングに基づく一時的なコーディング方法を使用するＳＮＮモデルに影響を与える可能性がある。 Therefore, it may affect the overall performance of the SNN model based on the rate coding method (Non-Patent Document 16). Neurons with low firing rates are susceptible to the noise of firing rates and the temporary jitter of spikes that increase dispersion (Non-Patent Document 17). As a result, efficient fault-tolerant technology is required. With such a mechanism, recovery time is one of the important requirements. As shown in FIG. 1 (d), the long waiting time of the fault-tolerant routing method may affect the firing rate. In particular, it can affect SNN models that use temporary coding methods based on the relative timing between spikes.

そのため、シリコンへの大規模なＳＮＮの統合により、効率的なフォールトトレラントソリューションを見つけるという課題がより重要になる。 Therefore, the challenge of finding an efficient fault-tolerant solution becomes more important with the large-scale integration of SNNs into silicon.

米国特許出願公開第２０１４／０３５１１９０号明細書U.S. Patent Application Publication No. 2014/0351190 特開２０１５−１１９３８７号公報JP-A-2015-119387

Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energy-efficient object recognition,” Int. J. Comput. Vision, vol. 113, no. 1, pp. 54-66, May 2015.Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energy-efficient object recognition,” Int. J. Comput. Vision, vol. 113, no. 1, pp. 54-66, May 2015. N. Burkitt, “A review of the integrate-and-_re neuron model: I.homogeneous synaptic input,” Biol. Cybern., vol. 95, no. 1, pp. 1-19, Jun. 2006. [Online]. Available: http://dx.doi.org/10.1007/s00422-006-0068-6N. Burkitt, “A review of the integrate-and-_re neuron model: I.homogeneous synaptic input,” Biol. Cybern., Vol. 95, no. 1, pp. 1-19, Jun. 2006. [Online] . Available: http://dx.doi.org/10.1007/s00422-006-0068-6 K. Suzuki, Y. Okuyama, and A. B. Abdallah, “Hardware design of a leaky integrate and fire neuron core towards the design of a low-power neuro-inspired spike-based multicore soc,” in Information Processing Society Tohoku Branch Conference, February 2018.K. Suzuki, Y. Okuyama, and AB Abdallah, “Hardware design of a leaky integrate and fire neuron core towards the design of a low-power neuro-inspired spike-based multicore soc,” in Information Processing Society Tohoku Branch Conference, February 2018. J. H Goldwyn, N. S Imennov, M. Famulare, and E. Shea-Brown, ”Stochastic differential equation models for ion channel noise in hodgkin-huxley neurons,” in Phys. Rev. E, vol. 83, no. 1, 2011, pp. 4190-4208.J. H Goldwyn, N. S Imennov, M. Famulare, and E. Shea-Brown, "Stochastic differential equation models for ion channel noise in hodgkin-huxley neurons," in Phys. Rev. E, vol. 83, no. 1, 2011, pp. 4190-4208. A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida, “Deep learning in spiking neural networks,” Neural Networks, 04 2018.A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida, “Deep learning in spiking neural networks,” Neural Networks, 04 2018. P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience, vol. 9, p. 99, 2015.P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience, vol. 9, p. 99, 2015. C. Eliasmith, T. C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, and D. Rasmussen, “A large-scale model of the functioning brain.” Science, vol. 338 6111, pp. 1202-1205, 2012.C. Eliasmith, TC Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, and D. Rasmussen, “A large-scale model of the functioning brain.” Science, vol. 338 6111, pp. 1202 -1205, 2012. S. Furber and S. Temple, “Neural systems engineering,” Journal of the Royal Society Interface, vol. 4, no. 13, pp. 193-206, Sep 2006.S. Furber and S. Temple, “Neural systems engineering,” Journal of the Royal Society Interface, vol. 4, no. 13, pp. 193-206, Sep 2006. S. Carrillo, J. Harkin, L. J. McDaid, F. Morgan, S. Pande, S. Cawley, and B. McGinley, “Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 12, pp. 2451-2461, Dec 2013.S. Carrillo, J. Harkin, LJ McDaid, F. Morgan, S. Pande, S. Cawley, and B. McGinley, “Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 12, pp. 2451-2461, Dec 2013. W. Maas, “Networks of spiking neurons: The third generation of neural network models,” Trans. Soc. Comput. Simul. Int.,vol. 14, no. 4, pp. 1659-1671, Dec. 1997. [Online]. Available: http://dl.acm.org/citation.cfm?id=281543.281637W. Maas, “Networks of spiking neurons: The third generation of neural network models,” Trans. Soc. Comput. Simul. Int., Vol. 14, no. 4, pp. 1659-1671, Dec. 1997. [Online ]. Available: http://dl.acm.org/citation.cfm?id=281543.281637 A. Ben Abdallah, Advanced Multicore Systems-On-Chip Architecture, On-Chip Network, Design. Springer, 2017.A. Ben Abdallah, Advanced Multicore Systems-On-Chip Architecture, On-Chip Network, Design. Springer, 2017. R. Hojabr, M. Modarressi, M. Daneshtalab, A. Yasoubi, and A. Khonsari, “Customizing clos network-on-chip for neural networks,” IEEE Transactions on Computers, vol. 66, no. 11, pp. 1865-1877, Nov 2017.R. Hojabr, M. Modarressi, M. Daneshtalab, A. Yasoubi, and A. Khonsari, “Customizing clos network-on-chip for neural networks,” IEEE Transactions on Computers, vol. 66, no. 11, pp. 1865 -1877, Nov 2017. K. N. Dang, A. B. Ahmed, Y. Okuyama, and B. A. Abderazek, “Scalable design methodology and online algorithm for tsv-cluster defects recovery in highly reliable 3d-noc systems,” IEEE Transactions on Emerging Topics in Computing, pp. 1-1, 2017.KN Dang, AB Ahmed, Y. Okuyama, and BA Abderazek, “Scalable design methodology and online algorithm for tsv-cluster defects recovery in highly reliable 3d-noc systems,” IEEE Transactions on Emerging Topics in Computing, pp. 1-1, 2017. C. Torres-Huitzil and B. Girau, “Fault and error tolerance in neural networks: A review,” IEEE Access, vol. 5, pp. 17322-17341, 2017.C. Torres-Huitzil and B. Girau, “Fault and error tolerance in neural networks: A review,” IEEE Access, vol. 5, pp. 17322-17341, 2017. P. M. Furth and A. G. Andreou, “On fault probabilities and yield models for vlsi neural networks,” IEEE Journal of Solid-State Circuits, vol. 32, no. 8, pp. 1284-1287, Aug 1997.P. M. Furth and A. G. Andreou, “On fault probabilities and yield models for vlsi neural networks,” IEEE Journal of Solid-State Circuits, vol. 32, no. 8, pp. 1284-1287, Aug 1997. P. U. Diehl, D. Neil, J. Binas, M. Cook, S. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on NeuralNetworks (IJCNN), July 2015, pp. 1-8.PU Diehl, D. Neil, J. Binas, M. Cook, S. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on Neural Networks ( IJCNN), July 2015, pp. 1-8. M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities and challenges,” Frontiers in Neuroscience, vol. 12, p. 774, 2018. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2018.00774M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities and challenges,” Frontiers in Neuroscience, vol. 12, p. 774, 2018. [Online]. Available: https://www.frontiersin.org/ article / 10.3389 / funins.2018.00774 D. Vainbrand and R. Ginosar, “Scalable network-on-chip architecture for configurable neural networks," Microprocess. Microsyst., vol. 35, no. 2, pp. 152-166, Mar. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.micpro.2010.08.005D. Vainbrand and R. Ginosar, “Scalable network-on-chip architecture for configurable neural networks,” Microprocess. Microsyst., Vol. 35, no. 2, pp. 152-166, Mar. 2011. [Online]. Available : http://dx.doi.org/10.1016/j.micpro.2010.08.005 B. A. Akram and B. A. Abderazek, “Adaptive fault-tolerant architecture and routing algorithm for reliable many-core 3d-noc systems,” J. Parallel Distrib. Comput., vol. 93, no. C, pp. 30-43, Jul. 2016.BA Akram and BA Abderazek, “Adaptive fault-tolerant architecture and routing algorithm for reliable many-core 3d-noc systems,” J. Parallel Distrib. Comput., Vol. 93, no. C, pp. 30-43, Jul. 2016. W. Gerstner and W. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.W. Gerstner and W. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002. K. N. Dang, M. Meyer, Y. Okuyama, and A. B. Abdallah, “Reliability assessment and quantitative evaluation of soft-error resilient 3d network-on-chip systems,” in 2016 IEEE 25th Asian Test Symposium (ATS), Nov 2016, pp. 161-166.KN Dang, M. Meyer, Y. Okuyama, and AB Abdallah, “Reliability assessment and quantitative evaluation of soft-error resilient 3d network-on-chip systems,” in 2016 IEEE 25th Asian Test Symposium (ATS), Nov 2016, pp . 161-166. X. Lin and L. M. Ni, “Multicast communication in multicomputer networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 10, pp. 1105-1117, Oct 1993.X. Lin and L. M. Ni, “Multicast communication in multicomputer networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 10, pp. 1105-1117, Oct 1993. S. H. Strogatz, “Exploring complex networks,” vol. 410, pp. 268-276, 03 2001.S. H. Strogatz, “Exploring complex networks,” vol. 410, pp. 268-276, 03 2001. F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G. J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 10, pp. 1537-1557, Oct 2015.F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, GJ Nam, B. Taba, M. Beakes, B . Brezzo, JB Kuang, R. Manohar, WP Risk, B. Jackson, and DS Modha, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 10, pp. 1537-1557, Oct 2015.

ルーティングアルゴリズムは、ニューロン通信のパフォーマンスに重要な役割を果たすために、ＳＮＮで最も効率的な回復メカニズムの１つと見なされている。ルーティングアルゴリズムは、ネットワーク全体の負荷分散と、障害のないシナリオでのシステムの全体的な遅延とに影響を与える可能性がある（非特許文献１１）。与えられたＳＮＮのトラフィックパターンは、シナプス前ニューロンがシナプス後ニューロンのサブセットにスパイクを送信する１対多の方法であるため、大規模なＳＮＮでの従来のユニキャストベースのルーティングの使用は、非効率的である（非特許文献１８）。さらに、フォールトトレランスの要件を考慮する場合、ニューロン間通信の遅延を最小限に抑えるために、ルーティングアルゴリズムを慎重に選択する必要がある。さもなければ、障害が回避されたという事実にもかかわらず、シナプス後ノードの精度が低下する可能性がある。図１（ｄ）は、このような場合の明確な例を示している。この図では、不適切なルーティングによる長い待ち時間が、シナプス後ニューロンによる出力スパイクのタイムリーな発火を妨げる可能性があることを示している。 Routing algorithms are considered one of the most efficient recovery mechanisms in SNNs because they play an important role in the performance of neuronal communication. The routing algorithm can affect the load distribution of the entire network and the overall delay of the system in a fault-free scenario (Non-Patent Document 11). The use of traditional unicast-based routing in large SNNs is non-existent because the traffic pattern of a given SNN is a one-to-many way for presynaptic neurons to send spikes to a subset of postsynaptic neurons. It is efficient (Non-Patent Document 18). In addition, when considering fault tolerance requirements, routing algorithms must be carefully selected to minimize delays in interneuron communication. Otherwise, the accuracy of the postsynaptic node can be reduced despite the fact that the failure has been avoided. FIG. 1 (d) shows a clear example of such a case. The figure shows that long latency due to improper routing can prevent timely firing of output spikes by postsynaptic neurons.

そこで、本発明の目的は、脳の固有の３Ｄ構造を活用し、大規模なＳＮＮベースのコンピューティングシステムのシームレスな実装を可能にする新しいマルチキャストスパイクルーティングアルゴリズムを提案することにより、ニューロン間の通信遅延を削減することである。 Therefore, an object of the present invention is to utilize a unique 3D structure of the brain to propose a new multicast spike routing algorithm that enables seamless implementation of a large-scale SNN-based computing system, thereby communicating between neurons. It is to reduce the delay.

本発明の一態様では、３次元ネットワークオンチップによるスパイキングニューラルネットワークであって、複数の重心をランダムに決定し、前記３次元ネットワークオンチップに実装された複数の送信先ルータのそれぞれから前記複数の重心のそれぞれまでの距離を算出し、算出した前記距離に基づいて、前記複数の送信先ルータを前記複数の重心のそれぞれに対応する複数のサブグループのいずれかに割り当て、前記複数のサブグループに対する前記複数の送信先ルータの割り当て結果に基づいて、前記複数の重心を再決定し、前記３次元ネットワークオンチップに実装された送信元ルータから前記複数の送信先ルータに含まれる第１の送信先ルータに対してパケットが送信される場合、再決定した前記複数の重心に基づいて、前記パケットの送信経路を特定し、特定した前記送信経路を用いて前記パケットを送信する。 In one aspect of the present invention, it is a spiking neural network by a three-dimensional network on-chip, and a plurality of centers of gravity are randomly determined, and the plurality of destination routers mounted on the three-dimensional network on-chip are selected. The distances to each of the centers of gravity are calculated, and based on the calculated distances, the plurality of destination routers are assigned to any of the plurality of subgroups corresponding to the plurality of centers of gravity, and the plurality of subgroups are assigned. Based on the allocation result of the plurality of destination routers to, the plurality of center of gravity is redetermined, and the first transmission included in the plurality of destination routers from the source router mounted on the three-dimensional network on chip. When a packet is transmitted to the destination router, the transmission route of the packet is specified based on the redetermined plurality of center of gravity, and the packet is transmitted using the specified transmission route.

脳の固有の３Ｄ構造を活用し、大規模なＳＮＮベースのコンピューティングシステムのシームレスな実装を可能にする新しいマルチキャストスパイクルーティングアルゴリズムを提案することにより、ニューロン間の通信遅延を削減する。 It reduces communication delays between neurons by leveraging the unique 3D structure of the brain and proposing a new multicast spiking routing algorithm that enables seamless implementation of large-scale SNN-based computing systems.

図１は、発火率に対する接続障害の影響の例を示す図である。FIG. 1 is a diagram showing an example of the effect of a connection failure on the ignition rate. 図２は、システムアーキテクチャの概要を示す図である。FIG. 2 is a diagram showing an outline of the system architecture. 図３は、ＳＮＰＣアーキテクチャを示す図である。FIG. 3 is a diagram showing the SNPC architecture. 図４は、ＦＴＭＣ−３ＤＲアーキテクチャを示す図である。FIG. 4 is a diagram showing the FTMC-3DR architecture. 図５は、ＫＭＣＲマルチキャストルーティング擬似コードのアルゴリズムを示す図である。FIG. 5 is a diagram showing an algorithm of the KMCR multicast routing pseudo code. 図６は、６×３×２メッシュの３ＤＮｏＣのＫＭＣＲアルゴリズムについての例を示す図である。FIG. 6 is a diagram showing an example of a 6 × 3 × 2 mesh 3D NoC KMCR algorithm. 図７は、プライマリブランチとバックアップブランチとを示す図である。FIG. 7 is a diagram showing a primary branch and a backup branch. 図８は、プライマリブランチとバックアップブランチとのオフライン計算についてのＦＴＭＰ−ＫＭＣＲアルゴリズムを示す図である。FIG. 8 is a diagram showing the FTMP-KMCR algorithm for offline calculation of the primary branch and the backup branch. 図９は、「ｓｏｎ」、バックアップ時、「ｆａｔｈｅｒ」、「ｇｒａｎｄｆａｔｈｅｒ」に対応する各ルータに適用される障害管理アルゴリズムを示す図である。FIG. 9 is a diagram showing a failure management algorithm applied to each router corresponding to “son”, “father”, and “grandfather” at the time of backup.

以下、図面を参照して本発明の実施の形態について説明する。各実施の形態は、本発明のより良い理解のために準備されている。ただし、かかる実施の形態は、本発明の技術的範囲を限定するものではない。また、本発明の範囲は、特許請求の範囲及びこれと同等のものを網羅している。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Each embodiment is prepared for a better understanding of the present invention. However, such embodiments do not limit the technical scope of the present invention. In addition, the scope of the present invention covers the scope of claims and the equivalent thereof.

最初に、スパイクトラフィックルーティング用の低遅延マルチキャストルーティングスキームに基づく本発明による３ＤＦＴ−ＳＮＮアーキテクチャについて説明を行う。図２は、システムアーキテクチャの概要を示す図である。 First, the 3DFT-SNN architecture according to the present invention based on the low-latency multicast routing scheme for spike traffic routing will be described. FIG. 2 is a diagram showing an outline of the system architecture.

図２に示されるように、システム１００（３ＤＦＴ−ＳＮＮシステム１００）は、スパイキングニューラルタイル１０がいくつかの積み重ねられた２Ｄ層からなり、従来の３Ｄ−ＮｏＣアーキテクチャに基づいている（非特許文献１３及び１９）。具体的に、図２では、４×４の２Ｄ層からなるスパイキングニューラルタイル１０が積み重ねられている例が示されている。 As shown in FIG. 2, the system 100 (3DFT-SNN system 100) consists of several stacked 2D layers of spiking neural tiles 10 and is based on a conventional 3D-NoC architecture (Non-Patent Document). 13 and 19). Specifically, FIG. 2 shows an example in which spiking neural tiles 10 composed of 4 × 4 2D layers are stacked.

スパイキングニューラルタイル１０は、スパイキングニューラルプロセッシングコア（ＳｐｉｋｉｎｇＮｅｕｒａｌＰｒｏｃｅｓｓｉｎｇＣｏｒｅ：以下、ＳＮＰＣ１と呼ぶ）と、フォールトトレラントマルチキャストルータ（Ｆａｕｌｔ−ＴｏｌｅｒａｎｔＭｕｌｔｉｃａｓｔＲｏｕｔｅｒ：以下、ＦＴＭＣ−３ＤＲ２とも呼ぶ）とから構成される。ＳＮＮに関連して、スパイクニューロンはＰＥを指しており、ニューロン間接続は、拡張性のある３Ｄ−ＮｏＣを介してスパイク（パケット）を送信する形で実装され、さらに、トポロジは、ネットワーク内でニューロンが相互接続される方法を指している。図３に示すように、３ＤＦＴ−ＳＮＮシステム１００内の各ＳＮＰＣ１は、スパイキングニューロンのアレイを使用して着信スパイクを処理する。 The spiking neural tile 10 is composed of a spiking neural processing core (hereinafter also referred to as SNPC1) and a fault-tolerant multicast router (hereinafter also referred to as FTMC-3DR2). .. In relation to SNNs, spike neurons point to PEs, interneuron connections are implemented in the form of sending spikes (packets) via scalable 3D-NoCs, and the topology is in the network. It refers to the way neurons are interconnected. As shown in FIG. 3, each SNPC1 in the 3DFT-SNN system 100 uses an array of spiking neurons to process incoming spikes.

ＳＮＰＣ１は、システム１００の主要な処理ユニットである。図３に示す例では、入力スパイクが最初にデコードされて、それらのシナプス後ニューロンが決定される。重み値は、クロスバーベースのシナプスを介して、ＬＩＦ（ＬｅａｋｙＩｎｔｅｇｒａｔｅ−ａｎｄ−Ｆｉｒｅ）ニューロンの配列に蓄積される（非特許文献２０）。 The SNPC 1 is the main processing unit of the system 100. In the example shown in FIG. 3, the input spikes are first decoded to determine their postsynaptic neurons. Weight values are accumulated in the sequence of LIF (Leaky Integrate-and-Fire) neurons via crossbar-based synapses (Non-Patent Document 20).

ＳＮＰＣ１は、クロスバーアプローチに基づいている。ここでは、オンチップＳＲＡＭを使用してＮ×Ｎクロスバー（Ｎはニューロンの数）を実装する。各シナプスは、５ビットで表され、シナプスタイプ（すなわち、興奮性及び抑制性）のために１ビットが用いられ、重みのために４ビットが用いられる。以下、ＳＮＰＣ１の主要コンポーネントについて説明を行う。 SNPC1 is based on a crossbar approach. Here, an on-chip SRAM is used to implement an N × N crossbar (where N is the number of neurons). Each synapse is represented by 5 bits, 1 bit is used for synaptic type (ie excitatory and inhibitory) and 4 bits are used for weighting. Hereinafter, the main components of SNPC1 will be described.

デコーダ１１は、着信スパイク（パケット）ごとにシナプス後ニューロンを決定する。宛先ニューラルタイルに到着すると、着信スパイクは、ローカルルータによってローカルのＳＮＰＣ１に転送される。デコーダ１１は、ニューロンＩＤに基づいて、ルックアップテーブル（ＬＵＴ：ＬｏｏｋｕｐＴａｂｌｅ）を検索してシナプス後ニューロンを決定する。この情報は、ニューラル計算のために制御ユニット１２に送信される。 The decoder 11 determines postsynaptic neurons for each incoming spike (packet). Upon arriving at the destination neural tile, the incoming spike is forwarded by the local router to the local SNPC1. The decoder 11 searches a look-up table (LUT: Lookup Table) based on the neuron ID to determine a postsynaptic neuron. This information is transmitted to the control unit 12 for neural calculation.

制御ユニット１２は、ニューラルコアの全体的な動作を制御するように設計されている。制御ユニット１２は、ニューラルコアの構成モードと動作モードの両方を制御する。制御ユニット１２は、単一のタイムステップの間にニューロンを更新することを保証する。 The control unit 12 is designed to control the overall operation of the neural core. The control unit 12 controls both the configuration mode and the operation mode of the neural core. The control unit 12 ensures that neurons are updated during a single time step.

シナプスクロスバーには、シナプスのクロスポイント配列が含まれている。各シナプスは、行（軸）と列（樹状突起）間の接続（シナプス）の表示するビットであって、読み取り、設定またはリセットが可能なビットを格納する。このビットは、デコードの完了後に書き込まれている間に、ニューラル計算のために読み取られる。 The synaptic crossbar contains a synaptic crosspoint sequence. Each synapse is a bit that displays a connection (synapse) between rows (axises) and columns (dendrites) and stores bits that can be read, set, or reset. This bit is read for neural computation while being written after the decoding is complete.

シナプスメモリ１３（以下、ｓｙｍ＿ｍｅｍ１３とも呼ぶ）は、クロスバーとシナプス強度の設定に使用されるシナプス情報を格納する場所である。シナプス情報は、トレーニングフェーズにおいて更新され、推論操作において読み取りが行われる。 The synapse memory 13 (hereinafter, also referred to as sym_mem13) is a place for storing synapse information used for setting the crossbar and synapse intensity. Synaptic information is updated during the training phase and read during inference operations.

ニューラルメモリ１４（以下、ｎｅｕ＿ｍｅｍ１４とも呼ぶ）は、ニューラルパラメータに使用される。各パラメータは、ニューラル計算のために読み取られる。そして、ニューラル計算が行われた後、各パラメータは、ニューロンの現在の状態を保存するために更新される。 The neural memory 14 (hereinafter, also referred to as neu_mem14) is used for the neural parameters. Each parameter is read for neural calculation. Then, after the neural calculation is performed, each parameter is updated to save the current state of the neuron.

ＬＩＦアレイ１５は、ニューラル計算が実行されるニュートロンコアの主要な計算ユニットである。データは、シナプスクロスバーから読み取られ、ｓｙｍ＿ｍｅｍ１３及びｎｅｕ＿ｍｅｍ１４は、このユニットにおいて計算される。ここでは、複数のＬＩＦニューロンが実装されている。より正確には、複数のニューロンが順次実行される間に、物理的なＬＩＦ計算ユニットが実装される。これは、デジタルロジックの高速動作を利用するだけでなく、エリアコストと消費電力を削減する。 The LIF array 15 is the main computational unit of the neutron core on which neural computations are performed. Data are read from the synaptic crossbar and sym_mem13 and neu_mem14 are calculated in this unit. Here, a plurality of LIF neurons are implemented. More precisely, a physical LIF computing unit is implemented while multiple neurons are executed sequentially. This not only takes advantage of the high speed operation of digital logic, but also reduces area costs and power consumption.

エンコーダ１６は、ＬＩＦアレイ１５から生成されたスパイクを詰めるように設計されている。ニューロンの膜電位が閾値を超えると、ニューロンは、スパイク（発火）を生成する。このスパイクは、エンコーダ１６に送信され、そこでローカルルータを介してネットワークに流入される前にパケットに詰められる。 The encoder 16 is designed to pack the spikes generated from the LIF array 15. When a neuron's membrane potential exceeds a threshold, the neuron produces spikes (firing). This spike is transmitted to the encoder 16 where it is packed into packets before being introduced into the network through the local router.

構成情報１７は、ニューラルコアの構成に使用される。この情報には、シナプス及びニューロンモデルに関連する構成パラメータが含まれている。ニューラルコアの構成は、システムが動作する前であってアプリケーションのマッピングが行われる間に実行される。 The configuration information 17 is used to configure the neural core. This information includes configuration parameters related to synaptic and neuron models. Neural core construction is performed before the system is up and running while the application is mapped.

次に、フォールトトレラントマルチキャスト３Ｄルータ（Ｆａｕｌｔ−ＴｏｌｅｒａｎｔＭｕｌｔｉｃａｓｔ３ＤＲｏｕｔｅｒ：以下、ＦＴＭＣ−３ＤＲ２）アーキテクチャについて説明を行う。図４は、ＦＴＭＣ−３ＤＲアーキテクチャを示す図である。 Next, the fault-tolerant multicast 3D router (Fault-Tolarant Multicast 3D Router: hereinafter, FTMC-3DR2) architecture will be described. FIG. 4 is a diagram showing the FTMC-3DR architecture.

各ニューロンは、数千の他のニューロンに接続できるため、ＦＴＭＣ−３ＤＲ２は、効率的なスパイク配信のためにマルチキャストルーティングをサポートする。ＦＴＭＣ−３ＤＲ２は、従来の３ＤＲアーキテクチャに基づいている（非特許文献１３、１９及び２１）。スパイク時間が情報のエンコードに使用されるため、ＦＴＭＣ−３ＤＲ２の遅延は、非常に短くなるはずである。システム１０の各ルータ２には、最大７つの入力ポートと７つの出力ポートがあり、そのうちの６つの入力/出力ポートが隣接ルータ専用であり、１つの入力/出力ポートがスイッチをＳＮＰＣ１に接続するために使用される。そして、ＦＴＭＣ−３ＤＲ２には、スイッチアロケーター２２に加えて、各方向のそれぞれに対応する７つの入力ポートモジュール２１が含まれる。また、ＦＴＭＣ−３ＤＲ２には、次のＳＮＰＣ１へのスパイクの転送を処理するクロスバーモジュール２３が含まれる。入力ポートモジュールは、入力バッファ２１ａとマルチキャストルーティングモジュール２１ｂとの２つの主要な要素で構成されている。 Since each neuron can connect to thousands of other neurons, FTMC-3DR2 supports multicast routing for efficient spike delivery. FTMC-3DR2 is based on a conventional 3DR architecture (Non-Patent Documents 13, 19 and 21). The delay of FTMC-3DR2 should be very short as the spike time is used to encode the information. Each router 2 of the system 10 has a maximum of 7 input ports and 7 output ports, 6 of which are dedicated to neighboring routers and 1 input / output port connects the switch to SNPC1. Used for. The FTMC-3DR2 includes, in addition to the switch allocator 22, seven input port modules 21 corresponding to each of the directions. The FTMC-3DR2 also includes a crossbar module 23 that processes the transfer of spikes to the next SNPC1. The input port module is composed of two main elements, an input buffer 21a and a multicast routing module 21b.

ルータ２は、バッファ書き込み（ＢＷ）、ルーティング計算（ＲＣ）、スイッチ調停（ＳＡ）、及びクロスバー横断（ＣＴ）の４つのパイプラインステージで設計されている。最初の段階において、着信スパイク（パケット）は、処理される前に入力バッファ２１ａに格納される。次に、パケットの送信元アドレス（Ｘ_Ｓ；Ｙ_Ｓ；Ｚ_Ｓ）が抽出及び計算され、出力ポートが決定される。ルーティング計算の後、選択された出力ポートを使用するために、リクエスト（ｓｗ＿ｒｅｑｕｅｓｔ信号）がスイッチアロケーター２２に送信される。スイッチアロケーター２２には、一般的なＳｔａｌｌ／Ｇｏフロー制御２２ａ（非特許文献１１）と、Ｍａｔｒｉｘ−ａｒｂｉｔｅｒスケジューラ２２ｂとの２つの主要コンポーネントが含まれる。ここでは、高速計算、安価な実装及び強力な公平性を提供するために、優先度が最も低いＭａｔｒｉｘ−ａｒｂｉｔｅｒが採用されている（非特許文献１１）。最後に、パケットは、（ｓｗ＿ｇｒａｎｔ信号を介して）許可された後、クロスバー２３を通過する目的の出力ポートに送信される。 Router 2 is designed with four pipeline stages: buffer write (BW), routing calculation (RC), switch arbitration (SA), and crossbar crossing (CT). In the first stage, the incoming spike (packet) is stored in the input buffer 21a before being processed. Then, the source address of the packet _{_{_{(X S; Y S; Z}}} S) is extracted and calculated, the output port is determined. After the routing calculation, a request (sw_request signal) is sent to the switch allocator 22 to use the selected output port. The switch allocator 22 includes two main components: a general Stall / Go flow control 22a (Non-Patent Document 11) and a Matrix-arbiter scheduler 22b. Here, the Matrix-arbiter, which has the lowest priority, is adopted in order to provide high-speed calculation, inexpensive implementation, and strong fairness (Non-Patent Document 11). Finally, the packet is allowed (via the sw_grant signal) and then sent to the desired output port through the crossbar 23.

ルータ２は、ルーティングパイプラインステージでのソフトエラーに加えて（非特許文献２１）、入力バッファ２１ａ、クロスバー２３及びリンクにおけるハードの欠点を処理するための冗長構造リソースを使用したシステム再構成に基づく高度な回復技術に依存している（非特許文献１３及び１９）。これらのメカニズムは、システムにおいて発生する障害を軽減することを目的としている。 In addition to soft errors in the routing pipeline stage (Non-Patent Document 21), router 2 is used for system reconfiguration using redundant structural resources to handle hardware defects in the input buffer 21a, crossbar 23, and links. It relies on advanced recovery techniques based on it (Non-Patent Documents 13 and 19). These mechanisms are aimed at mitigating the failures that occur in the system.

次に、マルチキャストスパイクルーティングアルゴリズムに基づくＫ−ｍｅａｎｓクラスタリング（Ｋ−ｍｅａｎｓＣｌｕｓｔｅｒｉｎｇ：以下、ＫＭＣＲと呼ぶ）について説明を行う。 Next, K-means clustering (hereinafter referred to as KMCR) based on the multicast spike routing algorithm will be described.

前述のように、３ＤメッシュのＮｏＣは、拡張性を有する状態で複数の２Ｄ−ＮＮレイヤーを積み重ねて大規模ネットワークを作成するのに適している。ＳＮＮでは、通常、１つのニューロンが他の多くのニューロンに接続される。したがって、ニューラルプロセッシングコア間には、大量の１対多の通信が存在する。 As described above, the NoC of the 3D mesh is suitable for creating a large-scale network by stacking a plurality of 2D-NN layers in a state of being expandable. In SNNs, one neuron is usually connected to many other neurons. Therefore, there is a large amount of one-to-many communication between the neural processing cores.

本発明におけるルーティングアルゴリズムは、Ｋ−ｍｅａｎｓクラスタリング法とツリーベースのルーティングとの組み合わせに基づいている。ツリーベースのメカニズムは、マルチキャスト通信で使用される一般的な方法である。このルーティングメカニズムでは、宛先グループがソースノードから分割されて、パケットの「ツリー」ルーティングパスが形成される。ツリーベースの方法の主な欠点の１つは、中間ノードでパケットがブロックされる可能性が高いために、トラフィックが競合することである（非特許文献２２）。この問題に対処するために、本発明では、宛先セットをサブセットに分割するＫ−ｍｅａｎｓを採用している。Ｋ−ｍｅａｎｓの採用は、シナプス後ニューロンがしばしば互いに隣接しているという観察結果から得られている。従来の研究では、ＳＮＮのニューロン間通信の局所性が高いことを示している（非特許文献２３）。これにより、同じ領域内にあるニューロングループは、着信スパイクを共有することが可能になる。したがって、３Ｄ−ＮｏＣシステムにマッピングされると、ＳＮＮレイヤーのニューロンは、１つのコアまたは近くのコアに分散される。これにより、Ｋ−ｍｅａｎｓを最大限に活用して効果的なパーティションを取得し、トラフィック負荷のバランスを取るとともにＮｏＣシステムの高い輻輳を緩和することが可能になる。 The routing algorithm in the present invention is based on a combination of K-means clustering and tree-based routing. The tree-based mechanism is a common method used in multicast communication. In this routing mechanism, destination groups are split from source nodes to form a "tree" routing path for packets. One of the main drawbacks of the tree-based method is traffic contention due to the high likelihood that packets will be blocked at intermediate nodes (Non-Patent Document 22). In order to deal with this problem, the present invention employs K-means that divides the destination set into subsets. The adoption of K-means comes from the observation that postsynaptic neurons are often adjacent to each other. Previous studies have shown that SNN's interneuron communication is highly localized (Non-Patent Document 23). This allows neuron groups within the same area to share incoming spikes. Therefore, when mapped to a 3D-NoC system, neurons in the SNN layer are distributed in one core or nearby cores. This makes it possible to make the best use of K-means to acquire an effective partition, balance the traffic load, and alleviate the high congestion of the NoC system.

したがって、図５のアルゴリズムに示すように、提案されたルーティング方法は、最初に宛先をいくつかのサブグループに分割する。これを行うために、提案されたルーティング方法では、Ｋ−ｍｅａｎｓクラスタリングメカニズムを採用して、サブセットの重心と、そのラベル付きの目的地を見つける。ここでの重心は、そのサブグループ内の他のすべてとの平均距離が最小のノードである。 Therefore, as shown in the algorithm of FIG. 5, the proposed routing method first divides the destination into several subgroups. To do this, the proposed routing method employs a K-means clustering mechanism to find the centroid of the subset and its labeled destination. The center of gravity here is the node with the smallest average distance to everything else in its subgroup.

アルゴリズムは、重心を決定するために、まず、利用可能なターゲットからランダムに重心を選択する。次に、アルゴリズムは、次のステップを計算する。 The algorithm first randomly selects a center of gravity from the available targets in order to determine the center of gravity. The algorithm then calculates the next step.

（１）図５のアルゴリズムの１０行目に示すように、各目的地から重心までの距離は、マンハッタン距離を使用して計算される。 (1) As shown in the 10th line of the algorithm of FIG. 5, the distance from each destination to the center of gravity is calculated using the Manhattan distance.

（２）これらの距離に基づいて、目的地は、最も近い重心を持つサブグループに割り当てられる。 (2) Based on these distances, destinations are assigned to the subgroup with the closest center of gravity.

（３）最後に、サブグループが一時的に形成された後、すべての要素の平均を取ることにより、重心の位置が更新される。そして、これらの更新は、重心が変更されなくなるまで反復して行われる。 (3) Finally, after the subgroup is temporarily formed, the position of the center of gravity is updated by averaging all the elements. Then, these updates are repeated until the center of gravity does not change.

重心を決定した後、ソースノードからターゲットへのルーティングパスは、２段階によって形成される。第１段階では、一般的な方法である次元順序ルーティング（ＤｉｍｅｎｓｉｏｎＯｒｄｅｒＲｏｕｔｉｎｇ：以下、ＤＯＲと呼ぶ）を使用して、各ソースから重心までのルートを決定する（非特許文献１１）。この点から、与えられたソースから重心への同じルートがマージされる。これにより、ユニキャストベースの方法と比較して、ソースから送信する必要があるスパイクパケットの数を減らすことが可能になる。ＸＹＺやＺＹＸ等のＤＯＲの特定のバリエーションを使用することは、図６に示す例においてさらに説明されているように、最適化されてバランスが取れたトラフィックを得るためのアプリケーションマッピング方法に依存する。なお、ＺＹＸは、Ｚ次元がルーティング計算で最初に実行され、次にＹ、Ｘが実行されることを意味している。そして、この段階の終わりに、ソースから重心までの「ツリー」の一部が形成される。続いて、第２段階では、第１段階と同様のルーティング計算を行い、重心から目的地までの「ツリー」の他の部分を確立する。２つの段階の後、与えられた送信元ノードからその宛先への「ツリー」ルートが構築され、さらに、計算されたルーティング情報を使用することによって、ルータに接続されたルーティングテーブルが更新される。 After determining the center of gravity, the routing path from the source node to the target is formed in two steps. In the first step, a route from each source to the center of gravity is determined using a general method, dimensional order routing (hereinafter referred to as DOR) (Non-Patent Document 11). From this point, the same route from the given source to the centroid is merged. This makes it possible to reduce the number of spike packets that need to be sent from the source compared to the unicast-based method. The use of a particular variation of DOR, such as XYZ or ZYX, depends on the application mapping method for obtaining optimized and balanced traffic, as further illustrated in the example shown in FIG. Note that ZYX means that the Z dimension is executed first in the routing calculation, and then Y and X are executed. Then, at the end of this stage, a part of the "tree" from the source to the center of gravity is formed. Subsequently, in the second stage, the same routing calculation as in the first stage is performed to establish another part of the "tree" from the center of gravity to the destination. After two steps, a "tree" route from a given source node to its destination is built, and the calculated routing information is used to update the routing table connected to the router.

次に、６×３×２の３ＤＮｏＣ−ＳＮＮシステムにマッピングされた１８×１８の完全に接続されたＳＮＮアプリケーションの例について説明を行う。ここでは、各スパイキングタイルがＳＮＰＣにおいて１つのニューロンを持っているものと仮定する。 Next, an example of an 18x18 fully connected SNN application mapped to a 6x3x2 3D NoC-SNN system will be described. Here we assume that each spiking tile has one neuron in the SNPC.

図６に示すように、Ｌ１におけるタイル/ノード（ソースノード）は、Ｌ２におけるすべてのノード（宛先ノード）に出力を送信する。特定の場合において、レイヤーＬ１におけるソースノードであるノード３（以下、このような場合、ノード３を「３」と表記する）は、レイヤーＬ２におけるすべてのノードにスパイクパケットを送信する必要がある。クラスター数ｋが２の場合、宛先セットは、「２６」及び「２９」を重心とする２つのサブセットに分割される（図６（ａ））。次に、図６（ｂ）に示すように、ソースから両方の重心への「ツリー」ルートが決定される。このマッピング方法では、ＤＯＲにおけるＺＹＸバージョンが選択されている。これにより、スパイクが複数の層間リンクを通過するため、第１層の中間ノード（すなわち、「８」及び「１１」）のトラフィック競合を緩和できる。一方、ＸＹＺまたはＹＸＺのいずれかを使用する場合、Ｌ１におけるすべてのソースノードは、「１１」及び「８」を介してスパイクを重心（すなわち、「２６」及び「２９」）に送信する必要がある。そのため、「１１」と「２６」とのレイヤー間のリンク、及び、「８」と「２９」とのレイヤー間のリンクにおいて高いトラフィック輻輳が発生する。図６（ｃ）に示すように、重心から目的地へのルートが計算された後、ツリーの他の部分が形成される。最後に、図６（ｄ）に示すように、「３」からＬ２におけるそれぞれへのルーティング「ツリー」が形成される。 As shown in FIG. 6, the tile / node (source node) in L1 transmits an output to all the nodes (destination node) in L2. In a specific case, the node 3 which is the source node in the layer L1 (hereinafter, in such a case, the node 3 is referred to as “3”) needs to transmit the spike packet to all the nodes in the layer L2. When the number of clusters k is 2, the destination set is divided into two subsets having "26" and "29" as the centers of gravity (FIG. 6A). Next, a "tree" route from the source to both centroids is determined, as shown in FIG. 6 (b). In this mapping method, the ZYX version in DOR is selected. As a result, the spikes pass through the plurality of interlayer links, so that the traffic contention of the intermediate nodes (that is, "8" and "11") of the first layer can be mitigated. On the other hand, when using either XYZ or YXZ, all source nodes in L1 need to send spikes to the center of gravity (ie, "26" and "29") via "11" and "8". is there. Therefore, high traffic congestion occurs in the links between the layers "11" and "26" and the links between the layers "8" and "29". As shown in FIG. 6 (c), after the route from the center of gravity to the destination is calculated, other parts of the tree are formed. Finally, as shown in FIG. 6D, a routing "tree" from "3" to each in L2 is formed.

最適なクラスター数の選択：前述のように、クラスターの数（ｋ）は、提案されたルーティングアルゴリズム（ＫＭＣＲ）を適用する前に決定する必要がある。直感的に、ｋが小さい場合、宛先セットは、大きなサブセットに分割される。これにより、中間ノード（すなわち、重心）での輻輳が大きくなり、ネットワークの輻輳が大きくなる可能性がある。一方、ｋが大きい場合、各ソースノードは、与えられたパケットの複数のコピーを重心に送信できる。これにより、待ち時間が長くなる場合がある。ｋが宛先の数と等しい場合、本発明におけるルーティングアルゴリズムは、ユニキャストベースのマルチキャストのように動作する。ｋの選択は、主に宛先ノードの分布に依存することに言及することが重要である。幸いなことに、ｋの最適値を選択するために採用可能ないくつかの優れた観測が存在する。まず、前述のように、ＳＮＮには、高いニューロン間通信の局所性がある。これにより、同じグループ（レイヤー）内のニューロンが近くのニューラルプロセッシングコアにマッピングされる状況が発生する。これは、Ｋ−ｍｅａｎｓクラスタリングアルゴリズムを効率的に機能させることを可能にする。第二に、一般的なＳＮＮアプリケーションにおける宛先ノードの数は多くない。実際、レイヤー内のニューロンの数は、多層モデルに基づいた深層学習の場合、数百から数千であり、それぞれ数百のニューロン（ＳＮＰＣの場合は２５６のニューロン）を含めることが可能な数十のコアに収容することが可能である（非特許文献２４）。したがって、ＳＮＮアプリケーションをマッピングした後、宛先の分布を視覚化することによってクラスターの数を決定することが可能になる。ただし、特定の場合における最適なｋの値を選択するには、ｋ以外の異なる値によってパフォーマンスシステムを評価する必要がある。 Choosing the Optimal Number of Clusters: As mentioned above, the number of clusters (k) needs to be determined before applying the proposed routing algorithm (KMCR). Intuitively, if k is small, the destination set is divided into large subsets. As a result, the congestion at the intermediate node (that is, the center of gravity) becomes large, and the congestion of the network may become large. On the other hand, when k is large, each source node can send multiple copies of a given packet to the centroid. This may increase the waiting time. If k is equal to the number of destinations, the routing algorithm in the present invention behaves like unicast-based multicast. It is important to note that the choice of k depends primarily on the distribution of destination nodes. Fortunately, there are some good observations that can be employed to select the optimal value for k. First, as mentioned above, SNNs have high locality of interneuron communication. This creates a situation where neurons in the same group (layer) are mapped to nearby neural processing cores. This makes it possible for the K-means clustering algorithm to work efficiently. Second, the number of destination nodes in a typical SNN application is not large. In fact, the number of neurons in a layer can range from hundreds to thousands for deep learning based on a multi-layer model, each of which can contain hundreds of neurons (256 neurons in the case of SNPC). It is possible to accommodate it in the core of (Non-Patent Document 24). Therefore, after mapping the SNN application, it is possible to determine the number of clusters by visualizing the distribution of destinations. However, in order to select the optimum value of k in a particular case, it is necessary to evaluate the performance system by a different value other than k.

上記の観察結果に基づいて、最適なｋは、次の２つのステップによって決定することが可能である。 Based on the above observations, the optimum k can be determined by the following two steps.

（１）ＳＮＮアプリケーションをマッピングした後、宛先セットを視覚化することによってクラスターの数を見つける。 (1) After mapping the SNN application, find the number of clusters by visualizing the destination set.

（２）：ｋの値（（１）で見つけられたクラスターの数及び他のいくつかの値を含む）を変化させることによってシステムを評価し、最適なケースを選択する。 (2): The system is evaluated by varying the value of k (including the number of clusters found in (1) and some other values) and the optimal case is selected.

次に、マルチキャストルーティングアルゴリズムに基づく最短経路のＫ−ｍｅａｎｓクラスタリング（ＳｈｏｒｔｅｓｔＰａｔｈＫ−ｍｅａｎｓＣｌｕｓｔｅｒｉｎｇ：以下、ＳＰ−ＫＭＣＲと呼ぶ）について説明を行う。 Next, K-means clustering (Shortest Path K-means clustering: hereinafter referred to as SP-KMCR) of the shortest path based on the multicast routing algorithm will be described.

前述のように、ＫＭＣＲでは、送信元ノードがスパイクパケットを重心に送信し、次に重心がスパイクを宛先に送信する。重心の使用において、重心から目的地までの全体の距離が最小であることが保証される。ただし、これにより、異なるソースからのトラフィックが重心に集中するため、重心へのリンクにおいてトラフィックの輻輳が発生する可能性がある。 As described above, in KMCR, the source node transmits the spike packet to the center of gravity, and then the center of gravity transmits the spike to the destination. In the use of the center of gravity, the total distance from the center of gravity to the destination is guaranteed to be minimal. However, this can result in traffic congestion on the link to the centroid because traffic from different sources is concentrated in the centroid.

この問題に対処するために、本発明では、新しいルーティング方法の提案を行う。Ｋ−ｍｅａｎｓを採用することによって宛先サブセットを決定した後、本発明では、初めに、与えられたソースからサブセット内のすべてのノードまでのホップ数を計算する。次に、本発明では、各サブセットについて、ソースへの最短パスを持つノードを選択する。ＫＭＣＲの場合とは異なり、ソースは、重心ノードではなく各サブセットの最短パスノードにスパイクパケットを送信する。以下、この方法をＳＰ−ＫＭＣＲと呼ぶ。これにより、トラフィックの輻輳の潜在的な問題が解消され、平均遅延も削減される。なお、新しい方法では、ＫＭＣＲと比較した場合、最短パスを見つけるためにより多くの計算が必要になる。ただし、新しい方法とＫＭＣＲの両方の計算は、オフラインで実行される。したがって、実行時のオーバーヘッドは、両方のアルゴリズムで同じになる。 In order to deal with this problem, the present invention proposes a new routing method. After determining the destination subset by adopting K-means, the present invention first calculates the number of hops from a given source to all the nodes in the subset. The present invention then selects the node with the shortest path to the source for each subset. Unlike the case of KMCR, the source sends the spike packet to the shortest path node of each subset instead of the centroid node. Hereinafter, this method is referred to as SP-KMCR. This eliminates the potential problem of traffic congestion and reduces average latency. Note that the new method requires more calculations to find the shortest path when compared to KMCR. However, both the new method and the KMCR calculations are performed offline. Therefore, the run-time overhead is the same for both algorithms.

次に、マルチキャストルーティングアルゴリズムに基づくフォールトトレラントにおける最短経路のＫ−ｍｅａｎｓクラスタリング（Ｆａｕｌｔ−ＴｏｌｅｒａｎｔＳｈｏｒｔｅｓｔＰａｔｈＫ−ｍｅａｎｓＣｌｕｓｔｅｒｉｎｇ：以下、ＦＴＳＰ−ＫＭＣＲと呼ぶ）について説明を行う。ＦＴＳＰ−ＫＭＣＲは、ＳＰ−ＫＭＣＲに基づいている。 Next, K-means clustering (Fault-Tolerant Shortest Path K-means Clustering: hereinafter referred to as FTSP-KMCR) of the shortest path in fault tolerance based on the multicast routing algorithm will be described. The FTSP-KMCR is based on the SP-KMCR.

ＦＴＳＰ−ＫＭＣＲの基本的な考え方は、次の通りである。 The basic idea of FTSP-KMCR is as follows.

（１）与えられたソースノードからその宛先へのプライマリルーティングツリー及びバックアップルーティングブランチのオフライン計算が実行される。 (1) Offline calculation of the primary routing tree and backup routing branch from a given source node to its destination is performed.

（２）オフライン計算の後、ルーティングテーブルが構成される。 (2) After the offline calculation, the routing table is constructed.

図７は、プライマリ及びバックアップルーティングブランチを示す図である。障害のあるプライマリブランチが検出された場合、事前に計画されたバックアップブランチが使用され、障害のあるリンクが回避される。ＳＰ−ＫＭＣＲメカニズムは、プライマリブランチ（実線）を計算するために使用される。一方、バックアップブランチは、プライマリブランチの代替ルートである。そして、検討中のルータ（すなわち、「ｓｏｎ」）のために、プライマリ接続において障害が発生した場合に使用されるバックアップブランチ（破線）が計算される。例えば、「ｆａｔｈｅｒ」と「ｓｏｎ」との間のプライマリ接続において障害がある場合（すなわち、ｐｌ_１）、ｂｌ_１及びｂｌ_２は、「ｆａｔｈｅｒ」と「ｓｏｎ」との間のトラフィックを維持するために使用されるバックアップブランチである。これは、ｐｌ_２とｐｌ_１との両方に障害がある場合においても同じである。 FIG. 7 is a diagram showing primary and backup routing branches. If a failed primary branch is detected, a pre-planned backup branch is used to avoid the failed link. The SP-KMCR mechanism is used to calculate the primary branch (solid line). The backup branch, on the other hand, is an alternative route to the primary branch. Then, for the router under consideration (ie, "son"), the backup branch (dashed line) used in the event of a failure in the primary connection is calculated. For example, if there is a failure in the primary connection between "father" and "son" (ie pl ₁ ), bl ₁ and bl ₂ are to maintain traffic between "father" and "son". The backup branch used for. This is true even if both pl ₂ and pl ₁ are impaired.

アルゴリズムでは、プライマリルートとバックアップルートの計算は重要な計算タスクである。これらの計算は、オフラインで実行される。これにより、提案されたルーティングアルゴリズムの実行時におけるオーバーヘッドを削減することが可能になり、ＳＮＮで発生する可能性があるタイミング違反を回避する。図８のアルゴリズムに示されているように、ソース及び宛先アドレス（Ｓ；Ｔ）及びサブセットの数（ｋ）は、入力として事前に定義され、出力部分は、各ソースから宛先へのプライマリツリー（Ｐ_ｐｒ）及びバックアップブランチ（Ｐ_ｂｋ）である。 In the algorithm, calculating the primary and backup routes is an important computational task. These calculations are performed offline. This makes it possible to reduce the runtime overhead of the proposed routing algorithm and avoid timing violations that may occur in the SNN. As shown in the algorithm of FIG. 8, the source and destination addresses (S; T) and the number of subsets (k) are predefined as inputs, and the output portion is the primary tree from each source to the destination. P _pr ) and backup branch (P _bc ).

その後、次の手順に従ってルーティングの計算が行われる。 After that, the routing is calculated according to the following procedure.

ステップ１：６行目〜１９行目に示すように、宛先アドレスから、Ｋ−ｍｅａｎｓを採用して宛先サブセットを決定する。 Step 1: As shown in the 6th to 19th lines, K-means is adopted from the destination address to determine the destination subset.

ステップ２：２０行目〜２５行目に示すように、各ソースから各サブセットのノードまでの最短経路を見つける。 Step 2: Find the shortest path from each source to each subset of nodes, as shown on lines 20-25.

ステップ３：プライマリツリーの最初の部分は、ソースノードからＳＰノードまで形成されます。これは、ソースから各ＳＰノードへの次元順序ルーティング（ＤＯＲ）アルゴリズムを採用し、同じルートとマージすることにより行われる。次に、ＤＯＲの代替バリエーションを採用してバックアップブランチを計算し、バックアップブランチがプライマリルートから分離されることを保証する。例えば、プライマリツリーの形成においてＺＹＸのＤＯＲが使用されている場合、バックアップブランチには、ＹＺＸやＸＺＹ等の他のバリエーションのＤＯＲを使用する。 Step 3: The first part of the primary tree is formed from the source node to the SP node. This is done by adopting a dimensional order routing (DOR) algorithm from the source to each SP node and merging it with the same route. It then adopts an alternative variation of DOR to calculate the backup branch and ensure that the backup branch is separated from the primary root. For example, if a ZYX DOR is used in the formation of the primary tree, other variations of the DOR such as YZX or XZY are used for the backup branch.

ステップ４：ステップ２と同じ計算に従って、ＳＰノードから同じグループに含まれるその宛先へのプライマリツリーの２番目の部分とバックアップブランチとを計算する。 Step 4: According to the same calculation as in step 2, calculate the second part of the primary tree and the backup branch from the SP node to its destination in the same group.

なお、プライマリ及びバックアップルーティングパスのみがオフライン計算であることに注意が必要である。これらの計算結果は、ルータにおけるルーティングテーブルの構成に使用される。設定プロセスは、実行前のアプリケーションマッピング中に行われるため、シナプス強化（重み）が更新されるオンライン学習プロセスのカテゴリに影響しない。さらに、これにより、バックアップブランチの計算オーバーヘッドが、提案されたルーティングアルゴリズムの回復時間に影響を与えないことが保証され、システムに必要なハードウェアコストも削減される。 It should be noted that only the primary and backup routing paths are offline calculations. The results of these calculations are used to configure the routing table in the router. Since the configuration process is done during pre-execution application mapping, it does not affect the category of online learning processes where synaptic enhancements (weights) are updated. In addition, this ensures that the computational overhead of the backup branch does not affect the recovery time of the proposed routing algorithm and reduces the hardware cost required for the system.

次に、障害管理アルゴリズムについて説明する。ルーティング情報が構成された後、図９に示すように、着信パケットを処理するために障害管理アルゴリズムが実装される。 Next, the fault management algorithm will be described. After the routing information is configured, a fault management algorithm is implemented to process the incoming packet, as shown in FIG.

Ｓ１：与えられた着信パケットについて、そのパケットがプライマリブランチにあるかバックアップブランチにあるかを示すために、ｆａｕｌｔ＿ｆｌａｇ＿ｖａｌが抽出される。同時に、送信元アドレスは、予想されるプライマリ出力ポートの計算にも使用される。 S1: For a given incoming packet, a fruit_flag_val is extracted to indicate whether the packet is in the primary branch or the backup branch. At the same time, the source address is also used to calculate the expected primary output port.

Ｓ２及びＳ３：ｆａｕｌｔ＿ｆｌａｇ＿ｖａｌ＝０である場合（すなわち、ルータが「ｆａｔｈｅｒ」または「ｇｒａｎｄｆａｔｈｅｒ」の役割を果たしている場合）、計算された出力ポートでは、各ルータに接続された障害検出器を使用することによって障害があるかどうかが判定される。 S2 and S3: If fruit_flag_val = 0 (ie, if the router is acting as a "father" or "grandfather"), the calculated output port should use the fault detector connected to each router. Determines if there is a failure.

Ｓ４：予想される出力ポートが判定されると、転送する前においてｆａｕｌｔ＿ｆｌａｇ＿ｖａｌがパケットに付加される。 S4: When the expected output port is determined, a packet_flag_val is added to the packet before forwarding.

Ｓ５：これ以外の場合、出力ポートは、バックアップブランチを使用するように切り替えられ、このパケットがバックアップブランチ上にあることを次のバックアップルータに通知するために、ｆａｕｌｔ＿ｆｌａｇ＿ｖａｌに初期値（バックアップパスにおけるホップ数と等しい値）を設定する。 S5: Otherwise, the output port is switched to use the backup branch, and the initial value (hop in the backup path) is set to the default_flag_val to notify the next backup router that this packet is on the backup branch. Set a value equal to the number).

Ｓ６：ｆａｕｌｔ＿ｆｌａｇ＿ｖａｌ≠０である場合（すなわち、ルータの役割がバックアップまたは「ｓｏｎ」ルータである場合）、出力ポートは、バックアップルートを介してルーティングされ、さらに、ｆａｕｌｔ＿ｆｌａｇ＿ｖａｌは、０になってバックアップパスが終了するまで１ずつ減少される。 S6: If fruit_flag_val ≠ 0 (ie, if the router role is a backup or "son" router), the output port is routed through the backup route, and further_flag_val is 0 and the backup path is It is reduced by 1 until it finishes.

１：ＳＮＰＣ
２：ＦＴＭＣ−３ＤＲ
１０：スパイキングニューラルタイル
１００：３ＤＦＴ−ＳＮＮシステム 1: SNPC
2: FTMC-3DR
10: Spiking neural tile 100: 3DFT-SNN system

Claims

3D network On-chip spiking neural network
Randomly determine multiple centers of gravity,
The distances from each of the plurality of destination routers mounted on the three-dimensional network on-chip to each of the plurality of centroids are calculated.
Based on the calculated distance, the plurality of destination routers are assigned to any of the plurality of subgroups corresponding to the plurality of centers of gravity.
Based on the allocation results of the plurality of destination routers to the plurality of subgroups, the plurality of centers of gravity are redetermined.
When a packet is transmitted from a source router mounted on the three-dimensional network on chip to a first destination router included in the plurality of destination routers, the packet is transmitted based on the redetermined centers of gravity of the plurality of destination routers. Identify the transmission route of the packet and
The packet is transmitted using the specified transmission path.
A spiking neural network with a three-dimensional network on-chip.

In claim 1,
In the allocation process,
For each of the plurality of destination routers, the first center of gravity having the shortest calculated distance among the plurality of centers of gravity is specified.
Each of the plurality of destination routers is assigned to a subgroup corresponding to the first center of gravity corresponding to each destination router.
A spiking neural network with a three-dimensional network on-chip.

In claim 1,
In the calculation process, the Manhattan distance from each of the plurality of destination routers to each of the plurality of centers of gravity is calculated.
A spiking neural network with a three-dimensional network on-chip.

In claim 1,
In the redetermination process, the centers of gravity of the plurality of destination routers assigned to each subgroup are calculated for each of the plurality of subgroups.
A spiking neural network with a three-dimensional network on-chip.

In claim 1,
The calculation process, the allocation process, and the redetermination process are repeated until the reselected plurality of centers of gravity are not changed.
A spiking neural network with a three-dimensional network on-chip.

In claim 1,
In the specifying process, the transmission route of the packet is specified so as to pass through the center of gravity corresponding to the first destination router among the plurality of redetermined centers of gravity.
A spiking neural network with a three-dimensional network on-chip.

In claim 1,
In the specific process,
The first center of gravity corresponding to the first destination router among the redetermined centers of gravity is specified.
Among the plurality of destination routers assigned to the subgroup corresponding to the specified first center of gravity, the second destination router having the shortest distance from the source router is specified.
The transmission route of the packet is specified so as to pass through the specified second destination router.
A spiking neural network with a three-dimensional network on-chip.

In claim 6 or 7,
In the specifying process, a plurality of transmission routes between the third destination router and the fourth destination router located between the source router and the first destination router are specified.
In the transmission process, when the first transmission path among the specified plurality of transmission paths is available, the packet is transmitted using the first transmission path, and the first transmission path is used. If not available, the packet is transmitted using the second transmission path of the plurality of identified transmission paths.
A spiking neural network with a three-dimensional network on-chip.

In claim 8.
In the process of transmitting,
When the fifth destination router located between the third destination router and the fourth destination router receives the packet from another destination router, the hop given to the received packet. Extract the number,
If the number of extracted hops is not 0, the packet obtained by subtracting 1 from the number of hops is transmitted using the second transmission path.
When the number of the extracted hops is 0, it is determined whether or not the transmission route ahead of the fifth destination router in the first transmission route can be used, and the destination transmission route is determined. When it is determined that the packet can be used, the packet is transmitted using the first transmission route, and when it is determined that the destination transmission route is not available, the unusable transmission among the destination transmission routes is performed. The packet to which the number of hops corresponding to the route is given is transmitted using the second transmission route.
A spiking neural network with a three-dimensional network on-chip.