JP6284177B2

JP6284177B2 - Error resilience router, IC using the same, and error resilience router control method

Info

Publication number: JP6284177B2
Application number: JP2013262523A
Authority: JP
Inventors: アブダラアブデラゼクベン
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2013-12-19
Filing date: 2013-12-19
Publication date: 2018-02-28
Anticipated expiration: 2033-12-19
Also published as: JP2015119387A

Description

本発明は、複数コアのチップ上のネットワークのためのオンデマンドのバイパスリンクに基づく誤り耐性ルータ、前記ルータを有する２又は３次元ＩＣ、及び前記誤り耐性ルータの制御方法に関する。 The present invention relates to an error- tolerant router based on an on-demand bypass link for a network on a multi-core chip, a two- or three-dimensional IC having the router, and a method for controlling the error- tolerant router.

２又は３次元ＩＣに用いられるマルチコアチップ（ＣＭＰ）構造において、ＮｏＣ（ネットワーク-オン-チップ）は、プロセッサノード及びそのプライベート(個別)キャッシュを、共用キャッシュとメモリコントローラに接続する。ＮｏＣは、また、インタラプト(中断)リクエストのような他の制御トラフィックを伝送する。 In a multi-core chip (CMP) structure used for two- or three-dimensional ICs, a NoC (network-on-chip) connects a processor node and its private (individual) cache to a shared cache and a memory controller. The NoC also carries other control traffic such as interrupt requests.

そのような構造において、ルータは、各ノードに存在して、パイプラインリンクでコアを隣接のコアと接続する。ノード間のコミニュケーションユニットは、パケットあるいはフリッツ(Flit)であり、パケットは、リンク幅に等しいフリッツ又はユニットに分割される。 In such a structure, a router exists at each node and connects a core with an adjacent core through a pipeline link. The communication unit between nodes is a packet or a Fritz, and the packet is divided into Fritz or units equal to the link width.

三次元ＩＣ(３Ｄ-ＩＣ)は、いくつかのウェハを積み重ねて形成され、ワイヤの長さ及び遅延を短縮する。３Ｄ-ＩＣにおいて、複数のコアが、それぞれのウェハに配置して備えられ、そして、複数のルータがそれぞれのコアに対して備えられている。 A three-dimensional IC (3D-IC) is formed by stacking several wafers to reduce wire length and delay. In the 3D-IC, a plurality of cores are arranged and provided on each wafer, and a plurality of routers are provided for each core.

それぞれのルータは、“三次元ネットワークオンチップ(３Ｄ-ＮｏＣ)”システムと称される３Ｄ-ＩＣ上のネットワークシステムを通して複数のコアを接続する。 Each router connects a plurality of cores through a network system on a 3D-IC called a “three-dimensional network on chip (3D-NoC)” system.

３Ｄ-ＮｏＣのネットワーク形態が、近年研究されて来ているが、三次元ネットワークオンチップ（３Ｄ-ＮｏＣ）システムにおける誤り耐性に取り組んだ研究は少ない。研究は、目標システム、誤りのタイプ、あるいは誤りをどのように扱うか（例えば、ルーティングアルゴリズムあるいは構造解法を用いて）に依存して分類することができる。 Network form of 3D-NoC is, but has come recently been studied, little research that working on the error resilience in the three-dimensional network-on-chip (3D-NoC) system. Studies, target systems can be classified depending on how to handle the type of error, or an error (e.g., using a routing algorithm or structural solution).

提案された誤り耐性３Ｄ-ＮｏＣ解法のいくつかは、生産を改善するためのスルーシリコンビア（Through-Silicon-Vias：TSVs)において生じる永久誤りに焦点を当てている。例えば、それらのいくつかは、スペアTSVパッド挿入：spare TSV pads insertion（非特許文献１）、あるいはシリアライゼーション：serialization（非特許文献２）に基づくものである。 Some of the proposed error- tolerant 3D-NoC solutions focus on permanent errors that occur in Through-Silicon-Vias (TSVs) to improve production. For example, some of the spare TSV pad insertion: spare A TSV pads insertion (Non-Patent Document 1), or serialization: is based on the serialization (Non-Patent Document 2).

他の解法は、誤り耐性ルーティングアルゴリズムを適用してリンク誤りのみに焦点を当てている。既に存在する解法（非特許文献3-6）の大半は、ルーティングテーブルに基づき、周囲の誤りリンクの情報を格納し、デッドロックの発生を回避する。それらは、仮想チャネル(ＶＣｓ)を用いる(非特許文献７)。他は、ＶＣｓの使用を回避し、その代わりに転換-モデルルーチングを用いてデッドロックを回避するものであった（非特許文献８）。 Other solutions have focused only on link error by applying the error resilience Lou tee ing algorithm. Most of the existing solutions (Non-Patent Documents 3-6) store information on surrounding error links based on the routing table to avoid the occurrence of deadlock. They use virtual channels (VCs) (Non-Patent Document 7). Others avoided the use of VCs and instead used conversion-model routing to avoid deadlock (8).

米国特許出願公開第２０１２／０１５５４８２号明細書US Patent Application Publication No. 2012/0155482

I. Loi, Subhasish Mitra, Thomas H. Lee, Shinobu Fujita, and Luca Benini. A low-overhead fault tolerance scheme for TSV-based 3D network-on chip links. In ICCAD 08: Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, pages 598-602, 2008I. Loi, Subhasish Mitra, Thomas H. Lee, Shinobu Fujita, and Luca Benini.A low-overhead fault tolerance scheme for TSV-based 3D network-on chip links.In ICCAD 08: Proceedings of the 2008 IEEE / ACM International Conference on Computer-Aided Design, pages 598-602, 2008 S. Pasricha. Exploring serial vertical interconnects for 3D ICs. In DAC 09: Proceedings of the 46th Annual Design Automation Conference, pages 581-586, New York, NY, USA, 2009. ACMS. Pasricha. Exploring serial vertical interconnects for 3D ICs. In DAC 09: Proceedings of the 46th Annual Design Automation Conference, pages 581-586, New York, NY, USA, 2009. ACM Ch. Feng. M. Zhang, J. Li. J. Jiang. Z. Lu and A. Jantsch. A Low-overhead Fault-Aware Deflection Routing Algorithm for 3D Network-on-Chip. IEEE Computer Society Annual Symposium on VLSI, pages 19-24. July 2011Ch. Feng. M. Zhang, J. Li. J. Jiang. Z. Lu and A. Jantsch. A Low-overhead Fault-Aware Deflection Routing Algorithm for 3D Network-on-Chip.IEEE Computer Society Annual Symposium on VLSI, pages 19-24. July 2011 Y. Li Shietung Peng, and Wanming Chu. Adaptive box-based efficient fault-tolerant routing in 3D torus. pages 71 -77, 2005Y. Li Shietung Peng, and Wanming Chu.Adaptive box-based efficient fault-tolerant routing in 3D torus.pages 71 -77, 2005 Ch. Feng. Z, Lu, A. Jantsch, J. Li and M. Zhang. A Reconfigurable Fault-tolerant Deflection Routing Algorithm Based on Reinforcement Learning for Network-on-Chip. The 3rd International Workshop on Network on Chip Architectures, pages 11-16, December 2010Ch. Feng.Z, Lu, A. Jantsch, J. Li and M. Zhang.A Reconfigurable Fault-tolerant Deflection Routing Algorithm Based on Reinforcement Learning for Network-on-Chip.The 3rd International Workshop on Network on Chip Architectures, pages 11-16, December 2010 D. Xiang, Y. Zhang and Y. Pan. Practical Deadlock-Free Fault-Tolerant Routing Based on the Planar Network Fault Model. IEEE Transactions on Computers, 58(5):620-633. May 2009D. Xiang, Y. Zhang and Y. Pan. Practical Deadlock-Free Fault-Tolerant Routing Based on the Planar Network Fault Model.IEEE Transactions on Computers, 58 (5): 620-633. May 2009 W. J. Dally. Virtual-channel flow control. IEEE Trans. on Parallel and Distributed Systems, 3(2): 194-205,March 1992W. J. Dally.Virtual-channel flow control.IEEE Trans. On Parallel and Distributed Systems, 3 (2): 194-205, March 1992 S. Pasricha and Y. Zou. A Low Overhead Fault Tolerant Routing Scheme for 3D Networks-on-Chip. The 12th International Symposium on Quality Electronic Design, pages 1-8, March 2011S. Pasricha and Y. Zou. A Low Overhead Fault Tolerant Routing Scheme for 3D Networks-on-Chip.The 12th International Symposium on Quality Electronic Design, pages 1-8, March 2011 T. Lehtonen, P. Liljeberg and J. Plosila. Online Reconfigurable Self-timed links for Fault Tolerant NoC. VLSI Design, (2007): 1-13, 2007T. Lehtonen, P. Liljeberg and J. Plosila. Online Reconfigurable Self-timed links for Fault Tolerant NoC. VLSI Design, (2007): 1-13, 2007 DeOrio, A. et. al., A Reliable Routing Architecture and Algorithm for NoCs. IEEE Transactions on CAD of Integrated Circuits and Systems. May 2012, pp. 726-739DeOrio, A. et. Al., A Reliable Routing Architecture and Algorithm for NoCs.IEEE Transactions on CAD of Integrated Circuits and Systems. May 2012, pp. 726-739 A. Ben Ahmed and A. Ben Abdallah. Architecture and Design of High-throughput, Low-latency, and Fault-Tolerant Routing Algorithm for 3D-Network-on-Chip (3D-NoC). The Journal of Supercomputing, DOI:10.1007/s 11227-013-0940-9A. Ben Ahmed and A. Ben Abdallah.Architecture and Design of High-throughput, Low-latency, and Fault-Tolerant Routing Algorithm for 3D-Network-on-Chip (3D-NoC) .The Journal of Supercomputing, DOI: 10.1007 / s 11227-013-0940-9 A. Ben Ahmed and A. Ben Abdallah. Fault-tolerant Routing Algorithm with Deadlock Recovery Support for 3D-NoC Architectures. The 7th IEEE International Symposium on Embedded Multicore SoCs. September 2013A. Ben Ahmed and A. Ben Abdallah. Fault-tolerant Routing Algorithm with Deadlock Recovery Support for 3D-NoC Architectures. The 7th IEEE International Symposium on Embedded Multicore SoCs. September 2013 A. Ben Abdallah, Multicore Systems-on-Chip: Practical Hardware/Software Design, 2nd Edition, Publisher: Springer, (2013), ISBN-13: 978-9491216916A. Ben Abdallah, Multicore Systems-on-Chip: Practical Hardware / Software Design, 2nd Edition, Publisher: Springer, (2013), ISBN-13: 978-9491216916

上記に説明したように、３Ｄ-ＮｏＣシステムにおける誤り耐性問題を述べている既存の種々の解法に拘わらず、以下に述べるようにいくつかの問題を依然として残している。 As explained above, despite the various existing solutions describing the error resilience problem in 3D-NoC systems, some problems still remain as described below.

第一に、いくつかの解法は、永久誤りのみに焦点を当て、他のタイプの誤りを考慮していなかった。しかし、非特許文献９に述べられているように、誤りの大半（８０％）は、過渡的誤りにより生じ、残りの誤りが、主に永久的並びに間欠的な誤りに基づくものである。 First, some solutions focused only on permanent errors and did not consider other types of errors . However, as described in Non-Patent Document 9, most of the errors (80%) are caused by transient errors , and the remaining errors are mainly based on permanent and intermittent errors .

第二に、多くの解法は、リンク誤りに焦点を当てていた。しかし、非特許文献１０によると、入力バッファ及びクロスバーは、それぞれ８０％、１０％に達する、３Ｄ-ＮｏＣシステムにおいて大きな領域を占めている。一方、残りの素子のそれぞれは、ルータ全領域の３％を超えない。これは、入力バッファ及びクロスバーにおける誤りが、より大きな重要性を与えられるべきである事を意味している。さらに、信頼できる解法は、そのような種類の誤りを扱うことが出来なければならない。 Secondly, many solutions have focused on link errors . However, according to Non-Patent Document 10, the input buffer and the crossbar occupy a large area in the 3D-NoC system reaching 80% and 10%, respectively. On the other hand, each of the remaining elements does not exceed 3% of the total router area. This means that errors in the input buffer and crossbar should be given greater importance. In addition, reliable solutions must be able to handle such types of errors .

第三に、既に提案されている研究は、デッドロックの発生を回避するためにＶＣｓに基づいている。ＶＣｓは、システムがデッドロックに繋がらないことを保証する。しかし、それらは、調整を行うために要求される待ち時間の不利に加えて、ハードウエアの複雑性という点で高価となる調整に基づいている。転換モデルに基づくルーティングアルゴリズムは、ＶＣｓの使用を必要としない。しかし、非最小の性質は、システムのスループットに影響を与える不要のクロックサイクルを結果として必要とする。 Third, research already proposed is based on VCs to avoid the occurrence of deadlocks. VCs ensure that the system does not lead to deadlocks. However, they are based on adjustments that are expensive in terms of hardware complexity in addition to the latency disadvantages required to make the adjustments. Routing algorithms based on diversion models do not require the use of VCs. However, the non-minimal nature results in unnecessary clock cycles that affect system throughput.

上記を要約すると、従来システムは以下の問題点を有している。 In summary, the conventional system has the following problems.

第一に、従来システムは、永久誤りのみに焦点をあて、他の誤りタイプを考慮していなかった。しかし、ＮｏＣにおける誤りの大半は、過渡的誤りにより生じる、一方、残りの誤りは、主に永久的及び間欠的誤りによる。 First, conventional systems focus only on permanent errors and do not consider other error types. However, the majority of errors in NoC are caused by transient errors , while the remaining errors are mainly due to permanent and intermittent errors .

大半の従来の解法は、リンク誤りに焦点を当てていた。しかし、２Ｄ又は３Ｄ-ＮｏＣシステムにおいて、入力バッファ及びクロスバーが大きな領域を占めており、よって、入力バッファ及びクロスバーにおける誤りに大きな重要性を与えられるべきである。 Most conventional solutions have focused on link errors . However, in 2D or 3D-NoC systems, the input buffer and crossbar occupy a large area, and therefore, the error in the input buffer and crossbar should be given great importance.

さらに、デッドロックの発生が、ＶＣｓに基づき回避される。ＶＣｓは、システムのデッドロック回避を保証するが、ハードウエアの複雑性という点で高価となるという判断に基づいている。 Furthermore, the occurrence of deadlock is avoided based on VCs. VCs guarantee the avoidance of system deadlocks, but are based on the determination that they are expensive in terms of hardware complexity.

したがって、本発明の目的は、上記問題を３つの誤りタイプ（過渡的、間欠的及び永久的）の対処により解決される誤り耐性ルータ、及び前記誤り耐性ルータを使用するＩＣを提供することである。提案する解法は、前記提案する誤り耐性ルータを使用する２Ｄ又は３Ｄ−ＩＣに備えられるリンク、入力バッファ、及びクロスバー回路における異なるタイプの誤りに向けられている。 Accordingly, an object of the present invention is to provide an error- tolerant router in which the above problem is solved by dealing with three error types (transient, intermittent and permanent), and an IC using the error- tolerant router. . The proposed solution is directed to different types of errors in links, input buffers, and crossbar circuits provided in 2D or 3D-ICs that use the proposed error resilient router.

本発明において、使用されるランダムアクセスバッファ技術（ＲＡＢ）は、バッファにおけるデッドロックの理由であるフリッツを第一に検出するスマートコントローラ(RAB cntrl)に基づく。検出メカニズムは、ある時間周期の後、もし処理される要求が許可され
ないと、フラグを発行して、前記要求を破棄するタイマーに依存する。 In the present invention, the random access buffer technology (RAB) used is based on a smart controller (RAB cntrl) that first detects Fritz, which is the reason for deadlock in the buffer. The detection mechanism relies on a timer to issue a flag and discard the request if the request being processed is not allowed after a certain period of time.

そして、ＲＡＢは、バッファにおけるいくつかのスロットをフリーにし、従属性を断つことが許可される他のフリッツを求める。ＲＡＢは、各入力バッファにおいて一時に一要求を扱う。したがって、ＶＣｓに対する場合のように、調整の存在はもはや必要がない。さらに、ＶＣｓの数が増えると、ＶＣ配置の複雑さも増える。そして、効果的なスケジューリング方法の利用が重要となる。しかし、ＲＡＢの場合は、バッファの深さは、そのパフォーマンスに影響を与えない。対照的に、非ブロックのフリッツを探す確率が高まり、速いデッドロックの回復をさせる。 The RAB then frees some slots in the buffer and seeks other fritzs that are allowed to break the dependency. The RAB handles one request at a time in each input buffer. Thus, the presence of coordination is no longer necessary, as is the case for VCs. Furthermore, as the number of VCs increases, the complexity of VC placement also increases. And it is important to use an effective scheduling method. However, for RAB, the depth of the buffer does not affect its performance. In contrast, the probability of looking for unblocked Fritz is increased , resulting in faster deadlock recovery.

本発明に従う第一の技術側面は、積層ウェハを有する３Ｄ-ＩＣに備えられる複数のコアの各々に対応して備えられる誤り耐性ルータであって、
それぞれ対応するコアの方向からの入力フリッツを入力バッファ（１１）で受ける様に形成された複数の入力ポート部（１−７）と、
前記複数の入力ポート部（１−７）の出力を、所定の出力先に接続するクロスバー（１３）と、
前記入力バッファ（１１）のトラフィック情報を格納する予測テーブル（１０）を有する誤り制御モジュール（２０）を有し、
前記複数の入力ポート部（１−７）のそれぞれに、
前記入力バッファ（１１）と、
前記入力バッファ（１１）のスロット誤りを検知する誤り検知回路（９１）と、
ランダムアクセスバッファコントローラ（９０）を有し、
前記誤り検知回路（９１）が前記入力バッファ（１１）のスロット誤りを検知する時、
前記ランダムアクセスバッファコントローラ（９０）が、
他の残りのスロットが占有されているか否かをチェックし、他の残りのスロットが占有されている場合、前記誤り制御モジュール（２０）にフラグを送り、
前記予測テーブル（１０）に格納しているトラフィック情報に基づき、前記入力ポート部（１−７）のうちの最良の入力バッファ（１１）を選択して、前記ランダムアクセスバッファコントローラ（９０）に通知して、前記選択した入力バッファ（１１）に切り替えて、後続する入力フリットを格納するように制御する。 A first technical aspect according to the present invention is an error resistant router provided corresponding to each of a plurality of cores provided in a 3D-IC having a laminated wafer,
A plurality of input port portions (1-7) formed to receive input frits from the corresponding core direction by the input buffer (11) ,
A crossbar (13) for connecting outputs of the plurality of input port sections (1-7) to a predetermined output destination;
An error control module (20) having a prediction table (10) for storing traffic information of the input buffer (11) ;
To each of the plurality of input ports (1-7),
The input buffer (11);
An error detection circuit (91) for detecting a slot error in the input buffer (11);
A random access buffer controller (90) ;
When the error detection circuit (91) detects a slot error of the input buffer (11),
The random access buffer controller (90)
Checks whether other remaining slots are occupied, if the other remaining slots are occupied, before Symbol send a flag to the error control module (20),
Based on the traffic information stored in the prediction table (10), to select the best input buffer (11) of said input ports (1-7), notifies the random access buffer controller (90) Then, control is performed to switch to the selected input buffer (11) and store subsequent input flits .

本発明に従う第２の技術側面は、それぞれ複数のコアと、前記複数のコアのそれぞれに対応するルータを有する、積層されたウェハで形成された３Ｄ-ＩＣであって、
それぞれのルータが、
それぞれ対応するコアの方向からの入力フリッツを入力バッファ（１１）で受ける様に形成された複数の入力ポート部（１−７）と、
前記複数の入力ポート部（１−７）の出力を、所定の出力先に接続するクロスバー（１３）と、
前記入力バッファ（１１）のトラフィック情報を格納する予測テーブル（１０）を有する誤り制御モジュール（２０）を有し、
前記複数の入力ポート部（１−７）のそれぞれに、
前記入力バッファ（１１）と、
前記入力バッファ（１１）のスロット誤りを検知する誤り検知回路（９１）と、
ランダムアクセスバッファコントローラ（９０）を有し、
前記誤り検知回路（９１）が前記入力バッファ（１１）のスロット誤りを検知する時、
前記ランダムアクセスバッファコントローラ（９０）が、
他の残りのスロットが占有されているか否かをチェックし、他の残りのスロットが占有されている場合、前記誤り制御モジュール（２０）にフラグを送り、
前記予測テーブル（１０）に格納しているトラフィック情報に基づき、前記入力ポート部（１−７）のうちの最良の入力バッファ（１１）を選択して、前記ランダムアクセスバッファコントローラ（９０）に通知して、前記選択した入力バッファ（１１）に切り替えて、後続する入力フリットを格納するように制御する。 A second technical aspect according to the present invention is a 3D-IC formed of stacked wafers each having a plurality of cores and a router corresponding to each of the plurality of cores,
Each router
A plurality of input port portions (1-7) formed to receive input frits from the corresponding core direction by the input buffer (11) ,
A crossbar (13) for connecting outputs of the plurality of input port sections (1-7) to a predetermined output destination;
An error control module (20) having a prediction table (10) for storing traffic information of the input buffer (11) ;
To each of the plurality of input ports (1-7),
The input buffer (11);
An error detection circuit (91) for detecting a slot error in the input buffer (11);
A random access buffer controller (90) ;
When the error detection circuit (91) detects a slot error of the input buffer (11),
The random access buffer controller (90)
Checks whether other remaining slots are occupied, if the other remaining slots are occupied, before Symbol send a flag to the error control module (20),
Based on the traffic information stored in the prediction table (10), to select the best input buffer (11) of said input ports (1-7), notifies the random access buffer controller (90) Then, control is performed to switch to the selected input buffer (11) and store subsequent input flits .

本発明に従う第３の技術的側面は、３Ｄ-ＩＣを形成するウェハに備えられた複数のコ
アの各々に対応して有する誤り耐性ルータの制御方法であって、
第１の処理ステージとして、
ランダムアクセスバッファコントローラ（９０）により、それぞれの入力ポート部（１−７）の入力バッファ（１１）のデッドロックと誤り状態を検知し、
前記入力ポート部（１−７）のそれぞれの入力バッファ（１１）に付属するプローブからトラフィック情報を受信し、
入力フリッツから次のポートに対する識別情報を取得し、
スイッチ要求信号をスイッチ配置器（１６）に送信し、
第２の処理ステージとして、
誤り制御モジュール（２０）からの受信信号に基づくルーティングモジュールによって実行される次のノードに対する次のポートを計算し、
前記スイッチ配置器（１６）により管理される異なるポートからのリクエスト間をスケジューリングし、
前記スイッチ配置器（１６）から要求された出力ポートの使用を許可する要求信号と制御信号を、スケジューリング結果についての情報を有するクロスバー（１３）に送り、
第３の処理ステージとして、
入力バッファ（１１）からのフリッツをチェックし、適切な隣接ノードに送信する。 A third technical aspect according to the present invention is a method for controlling an error-tolerant router corresponding to each of a plurality of cores provided on a wafer forming a 3D-IC,
As the first processing stage,
The random access buffer controller (90) detects the deadlock and error state of the input buffer (11) of each input port section (1-7),
It receives traffic information from the probe that is included with each of the input buffers of the input port unit (1-7) (11),
Get identification information for the next port from the input Fritz
Send a switch request signal to the switch locator (16) ;
As the second processing stage,
The following ports calculated for the next node thus be executed routing module based on the received signal from the error control module (20),
Scheduling between requests from different ports to be managed by the switch placer (16),
A request signal and a control signal for permitting use of the switch arrangement output port requested by the (16), sent to the crossbar (13) having information about the scheduling result,
As the third processing stage,
Check the full Ritz from the input buffer (11) to the appropriate adjacent node.

大半の従来の３Ｄ-ＮｏＣシステムは仮想チャネル（ＶＣｓ)に基づき、デッドロックの発生を回避する。ＶＣｓは、システムのデッドロックなしにすることを保証する。しかし、これらの利点は、そのような技術を実装する際の複雑さに加え、高次の追加的ハードウエアを伴う。さらに、ＶＣｓは、複数の要求を扱うために、調整を有することが必要となり、その結果、余分なパイプラインステージの追加（通常、ＶＣ配置器として参照される）、及び従って追加的クロックサイクルを必要とする。このように、この待ち時間は、システム全体のスループットに重要な影響を与える。 Most conventional 3D-NoC systems are based on virtual channels (VCs) and avoid deadlocks. VCs ensure that there are no system deadlocks. However, these advantages involve higher levels of additional hardware in addition to the complexity of implementing such techniques. Furthermore, VCs, to handle multiple requests, it is necessary to have an adjustment, as a result, additional extra pipeline stages (usually referred to as VC placer), and thus an additional clock cycles I need it . Thus, this waiting time has an important influence on the throughput of the entire system.

本発明において、使用されるランダムアクセスバッファ技術（ＲＡＢ）は、フリッツが、バッファのデッドロックの理由であることを、先ず検出するスマートコントローラ（RAB cntrl）に基づく。検出メカニズムは、ある時間周期後、処理される要求が許可されないと、フラグを発行し、その要求を破棄するタイマーに基づいている。 In the present invention, the random access buffer technology (RAB) used is based on a smart controller (RAB cntrl) that first detects that Fritz is the reason for buffer deadlock. The detection mechanism is based on a timer that issues a flag and discards the request if the request being processed is not allowed after a certain period of time.

そして、ＲＡＢは、バッファにおけるいくつかのスロットをフリーにして、従属性を断つために、要求が許可される他のフリッツを求める。ＲＡＢは、各入力バッファにおいて一時に一要求を処理する。したがって、ＶＣｓに対する場合のように、調整の存在はもはや必要がない。さらに、ＶＣｓの数が増えると、ＶＣ配置器の複雑さも増える。そして、効果的なスケジューリング方法の適用が必須となる。しかし、ＲＡＢの場合には、バッファの深さは、そのパフォーマンスに影響を与えない。対照的に、非ブロックのフリッツを探す確率が高まり、速いデッドロックの回復をさせる。 The RAB then seeks other Fritz for which the request is granted in order to free some slots in the buffer and break the dependency. The RAB processes one request at a time in each input buffer. Thus, the presence of coordination is no longer necessary, as is the case for VCs. Furthermore, as the number of VCs increases, the complexity of the VC placer also increases. And it is essential to apply an effective scheduling method. However, in the case of RAB, the buffer depth does not affect its performance. In contrast, the probability of looking for unblocked Fritz is increased , resulting in faster deadlock recovery.

本発明に従い、提案する３Ｄ誤り耐性OASIS-ＮｏＣルータ構造のブロックダイヤグラムを示す図である。FIG. 3 is a block diagram of a proposed 3D error- tolerant OASIS-NoC router structure according to the present invention. ランダムアクセスバッファ９のブロックダイヤグラムを示す図である。It is a figure which shows the block diagram of the random access buffer 9. FIG. 予測テーブル１０を説明をする概略図である。It is the schematic explaining the prediction table. システムがスタートする時のカウンタ値を示す図である。It is a figure which shows the counter value when a system starts. 第１の１００サイクル後の出力された入力フリッツの数を示す図である。It is a figure which shows the number of the input Fritz output after 1st 100 cycles. 平均フリッツ到達の計算を示す図である。It is a figure which shows calculation of average Fritz arrival. 最良候補バッファの決定を示す図である。It is a figure which shows determination of the best candidate buffer. 最新の最良候補バッファの送信を示す図である。It is a figure which shows transmission of the newest best candidate buffer. ルックアヘッド誤り耐性ルーティングのアルゴリズムを示す図である。It is a figure which shows the algorithm of look-ahead error tolerance routing. バイパスリンク-オンデマンドメカニズムを示す図である。FIG. 6 illustrates a bypass link-on-demand mechanism. 本発明に従う３Ｄ誤り耐性ＮｏＣ（3D-FTNoC）の動作におけるタイムチャートを示す図である。It is a figure which shows the time chart in operation | movement of 3D error tolerance NoC (3D-FTNoC) according to this invention. 最も一般に使用される従来のルータのタイムチャートを示す図である。It is a figure which shows the time chart of the conventional router used most generally.

本発明の実施例を添付図面に従い、説明する。実施例は、発明をより良く理解するためのものであり、発明の保護の範囲がこの実施例に制限されるものではない。 Embodiments of the present invention will be described with reference to the accompanying drawings. The example is for better understanding of the invention, and the scope of protection of the invention is not limited to this example.

複数の小さなウェハの積み重ねにより形成される３Ｄ-ＩＣについて主として説明が行われるが、本発明は、一つのウェハに形成された複数のコアと、前記複数のコアを接続するネットワークを有して形成される２Ｄ-ＩＣにも適用可能である。 The 3D-IC formed by stacking a plurality of small wafers will be mainly described. However, the present invention includes a plurality of cores formed on one wafer and a network connecting the plurality of cores. It can also be applied to 2D-ICs.

[３Ｄ-誤り耐性ルータ-ＮｏＣアークテクチャ]
図１は、本発明に従い、提案する３Ｄ-誤り耐性-OASIS-ＮｏＣルータ構造のブロックダイヤグラムを示す。 [3D- Error Resistant Router-NoC Architecture]
FIG. 1 shows a block diagram of a proposed 3D- error- tolerant-OASIS-NoC router structure according to the present invention.

図１において、ＮｏＣルータは、それぞれ積層されたウェハに形成されたコアの対応する方向からの入力を受信する、ローカル入力ポート１、ノース入力ポー２、イースト入力ポート３、サウス入力ポート４、ウエスト入力ポート５、アップ入力ポート６、及びダウン入力ポート７である複数の入力ポートユニットを有する。 In FIG. 1, each NoC router receives local input port 1, north input port 2, east input port 3, south input port 4, and west input that receive inputs from the corresponding directions of the cores formed on the stacked wafers. It has a plurality of input port units which are an input port 5, an up input port 6, and a down input port 7.

入力ポートユニット１-７の各々は、非特許文献１１で先に提案された、誤りリンクを扱う先読み誤り耐性（ＬＡＦＴ）ルーティングアルゴリズム８を有する。ＬＡＦＴルーティングアルゴリズム８に加えて、本発明に従い、更に誤り耐性を支えるために、二つの主要コンポーネントが用いられる。 Each of the input port units 1-7 has a look-ahead error resilience (LAFT) routing algorithm 8 that was previously proposed in Non-Patent Document 11 and handles error links. In addition to LAFT Rute I ing algorithm 8, in accordance with the present invention, in order to further support the error resilience, two major components are used.

すなわち、入力バッファ１１の誤りのための誤り制御モジュール２０内に位置するトラフィック予測ユニットの一エレメントである予測テーブル（ＰＴ）１０を有して用いられるランダムアクセスバッファ（ＲＡＢ）メカニズム９が、入力ポートユニット１−７の各々に備えられる。さらに、クロスバー１３における誤りのためのバイパスリンクオンデマンド（ＢＬＯＤ）１２が備えられる。 That is, a random access buffer (RAB) mechanism 9 used with a prediction table (PT) 10 that is an element of a traffic prediction unit located in the error control module 20 for errors in the input buffer 11 is Provided in each of the units 1-7. Furthermore, a bypass link on demand (BLOD) 12 for errors in the crossbar 13 is provided.

ランダムアクセスバッファ（ＲＡＢ）メカニズム９は、ＲＡＢコントローラ９０に加えて、デッドロック管理機能、誤り管理機能を有する。 The random access buffer (RAB) mechanism 9 has a deadlock management function and an error management function in addition to the RAB controller 90.

ランダムアクセスバッファ（ＲＡＢ）メカニズム９のＲＡＢコントローラ９０は、対応する入力バッファ１１の誤り状態を規則的にモニタする。誤りスロットの存在を検知する時、残りのスロットが占有されているか、否かをチェックする。 The RAB controller 90 of the random access buffer (RAB) mechanism 9 regularly monitors the corresponding input buffer 11 for error conditions. When the presence of an error slot is detected, it is checked whether or not the remaining slots are occupied.

残りのスロットが占有されている場合は、ＲＡＢコントローラ(cntrl)９０は、予測テーブル１０に、単一ビットの“バッファフラグ：Buffer-flg”信号を送る。この予測テーブル１０は、残りの入力バッファのそれぞれのトラフィック負荷についての情報を既に有していて、“最良−バッファ：Best-buff”信号を、RAB-コントローラ９０に送り返す。RAB-コントローラ９０は、パケットから分割され、リンク幅に等しい入力フリッツを新しく選択された入力バッファ１１にリダイレクトする。 If the remaining slots are occupied, the RAB controller (cntrl) 90 sends a single-bit “buffer flag: Buffer-flg” signal to the prediction table 10. This prediction table 10 already has information about the traffic load of each of the remaining input buffers and sends a “Best-buff” signal back to the RAB-controller 90. The RAB-controller 90 redirects the input frits that are split from the packet and equal to the link width to the newly selected input buffer 11 .

入力バッファ１１において先に検知された誤りが解消された場合、あるいは入力バッファ１１が殆ど空いていて、他の入力フリッツを受けることができる場合、“バッファフラグ：Buffer-flg”信号が０に戻され、予測テーブル１０に、他の入力バッファ資源を使用する必要がないことを通知する。そして、先の誤りバッファが新しい入力フリッツの受信、及び格納を開始する。 When the previously detected error in the input buffer 11 is eliminated, or when the input buffer 11 is almost free and can receive other input frits, the “buffer flag: Buffer-flg” signal returns to 0. Then, the prediction table 10 is notified that it is not necessary to use another input buffer resource. The previous error buffer then starts receiving and storing a new input Fritz.

[ランダムアクセスバッファコントローラ（RAB buffer cntrl）９０]
図２は、提案するランダムアクセスバッファメカニズム９のブロックダイヤグラムを示す。ランダムアクセスバッファメカニズム９は、先に上記非特許文献１２にデッドロックフリーを確実にするための基本的、且つ低負荷解法として提案されていた。 [Random access buffer controller (RAB buffer cntrl) 90 ]
FIG. 2 shows a block diagram of the proposed random access buffer mechanism 9. The random access buffer mechanism 9 was previously proposed as a basic and low-load solution for ensuring deadlock free in Non-Patent Document 12.

本発明において、拡張されたランダムアクセスバッファコントローラ（以降、バッファコントローラ）９０が備えられて、ランダムアクセスバッファメカニズム（RAB mechanism）９として、入力バッファ１１における過渡的、間欠的、及び永久的な誤りを検知し、回復することができる。 In the present invention, an extended random access buffer controller (hereinafter referred to as buffer controller) 90 is provided, and a random access buffer mechanism (RAB mechanism) 9 is used as a random access buffer mechanism (RAB mechanism) 9 to prevent transient, intermittent, and permanent errors in the input buffer 11. Can be detected and recovered.

ランダムアクセスバッファメカニズム９は、二つの主たる並行したフェーズに分配される。第一のフェーズは、誤りの検出及び誤りの回復が行われることを意味する、局部的な処理である。第二のフェーズは、全体的な(即ち、ルータ全体における)処理であって、ランダムアクセスバッファメカニズム（以降、ＲＡＢメカニズム）９が、予測テーブル（PT）１０で支持され、後に説明するように、システムのパフォーマンスを更に高める。 The random access buffer mechanism 9 is distributed into two main parallel phases. The first phase is a local process that means that error detection and error recovery are performed. The second phase is the overall (ie, in the entire router) process, where a random access buffer mechanism (hereinafter RAB mechanism) 9 is supported by the prediction table (PT) 10 and, as will be described later, Increase system performance further.

誤りの検知のために、図２に示す誤り検知回路９１が使用され、入力バッファ１１のバッファスロットの誤り状態のチェックを行う。誤り検知回路９１によりスロットの一つにおいて、誤りが検知される時、誤り検知回路９１は、“誤りスロット：faulty-slot”信号をバッファコントローラ（Buffer_Contrl）９０内のＲＡＢモジュール９２に送り、バッファコントローラ（Buffer_Contrl）９０内のＦＩＦＯ９３から書込アドレス“Wr_adr”、及び読出アドレス“Read_adr”を付与する時に、フラグを立てられたスロットを考慮させる。 For error detection, an error detection circuit 91 shown in FIG. 2 is used to check the error status of the buffer slot of the input buffer 11. When an error is detected in one of the slots by the error detection circuit 91, the error detection circuit 91 sends a “ error slot: faulty-slot” signal to the RAB module 92 in the buffer controller (Buffer_Contrl) 90, and the buffer controller (Buffer_Contrl) When giving the write address “Wr_adr” and the read address “Read_adr” from the FIFO 93 in the buffer 90, the flagged slot is considered.

誤りスロットの記録を維持するために、n2ビットアイテム（nは、バッファの深さ）を扱うアレーから成るステータスレジスタが、ＲＡＢモジュール９２に備えられる。 In order to maintain a record of error slots, a status register consisting of an array handling n2 bit items (n is the depth of the buffer) is provided in the RAB module 92.

ステータスレジスタに保持される各アイテムの値が、“００”であるとき、対応するフリッツがデッドロックを生じてなく、バッファスロットが誤りでないことを通知する。“０１”であるとき、バッファスロットが誤りでないが、扱われるフリッツの要求がデッドロックを起こしていることを示す。最後に、誤りが対応するバッファスロットで検知される場合、ステータスレジスタにおけるエレメントが、“１１”に更新される。したがって、スロットは、（壊れたスロットを調べるための追加的待ち時間を避けるために）使われることも、入力するフリッツを格納することもできない。 When the value of each item held in the status register is “00”, it notifies that the corresponding Fritz is not deadlocked and the buffer slot is not in error . When “01”, it indicates that the buffer slot is not in error , but the Fritz request being handled is deadlocking. Finally, if an error is detected in the corresponding buffer slot, the element in the status register is updated to “11”. Thus, the slot cannot be used (to avoid additional latency to look for broken slots) nor can it store the incoming Fritz.

[予測テーブル（ＰＴ）１０]
図３は、トラフィック予測ユニット１００に備えられる予測テーブル１０の使用を説明する簡単な例を示す。しばしば生じる興味あるケースは、所定の入力バッファ１１が、いくつかの誤りスロットを有する場合、残りのスロットが、他のフリッツで占有され、同じルータの他の入力バッファが空になるということである。パフォーマンスを拡大し、全てのルータ資源を完全に使用するために、本発明においては、全ての入力ポート１-７において入力バッファ資源を共有させる技術を採用する。これは、入力バッファ１１が、（誤りスロットの存在、残りのスロットの占有のために、）更にフリッツを扱うことをできない時、入力フリッツを他の隣接する空きの入力バッファに振り向けて、速く到達先に送ることができることを意味する。 [Prediction table (PT) 10]
FIG. 3 shows a simple example illustrating the use of the prediction table 10 provided in the traffic prediction unit 100. An interesting case that often arises is that if a given input buffer 11 has several error slots, the remaining slots are occupied by other Fritz and other input buffers of the same router are emptied. . In order to expand performance and use all router resources completely, the present invention adopts a technique for sharing input buffer resources in all input ports 1-7. This is achieved quickly when the input buffer 11 cannot handle more Fritz (due to the presence of an error slot, the remaining slot is occupied), redirecting the input Fritz to another adjacent empty input buffer. It means you can send it first.

この目的を達成するために、使用可能な空き入力バッファ中で、振り向けたフリッツを受信する最良の候補を知ることが、非常に重要である。本発明において、図３に示す様に、予測テーブル（ＰＴ）１０が使用され、特定の時間周期で、それぞれの入力ポートのトラフィック負荷（信号５０）の情報を収集する。これらの情報は、ルータ内の入力バッファ１-７のそれぞれに付加されているモニタリングプローブ（Prb0-Prb6）１４から受信される。収集されたトラフィックのスナップショットが、各入力バッファにおける平均フリッツの到達を計算するために使用される。そして、予測テーブル１０が渋滞を生じさせないで、誤り入力バッファのフリッツを扱うことができる最良の入力ポートを決定することができる。 To achieve this goal, it is very important to know the best candidate to receive the redirected Fritz in the available free input buffer. In the present invention, as shown in FIG. 3, a prediction table (PT) 10 is used to collect information on traffic load (signal 50) of each input port at a specific time period. These pieces of information are received from the monitoring probes (Prb0 to Prb6) 14 added to the input buffers 1-7 in the router. A snapshot of the collected traffic is used to calculate the arrival of average frits in each input buffer. The prediction table 10 can determine the best input port that can handle the frits of the error input buffer without causing traffic jams.

最良の入力バッファが選択されると（本実施例では、イーストバッファ３）、予測テーブル１０は、ＲＡＢコントローラ（RAB cntrl 0-6）９０から発行されるフラグ５１を待つ。図２に示すバッファ誤りは、バッファ（ノース入力ポート２の入力バッファ１１）が誤りであり、同時に一杯であることの情報を与える。この場合、予測テーブル１０は、ノース入力ポート２の対応する入力バッファに他の信号を返送し、新しい入力バッファを指定する（例えば、イースト入力ポート３の入力バッファ）、その入力バッファが破線矢印５２で示す様に、いくつかのスロットが開放になり、ノース入力ポート２の入力バッファが再び入力フリッツを受信できるまで、入力フリッツを受け入れる。 When the best input buffer is selected (the east buffer 3 in this embodiment), the prediction table 10 waits for a flag 51 issued from the RAB controller (RAB cntrl 0-6) 90. The buffer error shown in FIG. 2 gives information that the buffer (input buffer 11 of north input port 2) is in error and is full at the same time. In this case, the prediction table 10 returns another signal to the corresponding input buffer of the north input port 2 and designates a new input buffer (for example, the input buffer of the east input port 3). As shown, the input frits are accepted until some slots are opened and the input buffer of the north input port 2 can receive the input frits again.

所定のパケットのいくつかのフリッツがバッファに格納され、それが誤りであり、一杯になっていると宣言される。そして、ＲＡＢメカニズム９に従い、このパケットの残りのフリッツが異なるバッファに格納される場合がある。このよう場合は、残りのフリッツが異なるバッファに格納されているので、これらのフリッツが順序通りに到達しない可能性がある。 Several fritzs of a given packet are stored in the buffer and declared to be in error and full. Then, according to the RAB mechanism 9, the remaining Fritz of this packet may be stored in a different buffer. In this case, since the remaining Fritz are stored in different buffers, these Fritz may not reach the order.

結果として、到達ノードでは順序の並べ替え処理が必要となり、あるいは、残りのフリッツと同時に到達できるようにいくつかのフリッツに追加すべき遅延が生じる。本発明では、簡単な解決として、ヘッダである場合、振り向けフリッツのみをスタートさせる。この場合、このパケットの残りのフリッツは、ヘッダフリッツと同じバッファに格納される。この追加的規制に伴い、パケットのフリッツが追加のハードウエア／パワーの過負荷もパーマンスの劣化もなしに、パケットのフリッツが順序通りに到着する。 As a result, the reaching node requires a reordering process, or there is a delay that must be added to some Fritz so that it can be reached simultaneously with the remaining Fritz. In the present invention, as a simple solution, if a header, it is started only directed Fritz. In this case, the remaining Fritz of this packet is stored in the same buffer as the header Fritz. With this additional restriction, packet frits arrive in order without additional hardware / power overload and performance degradation.

図３Ａ−３Ｅは、トラフィック予測ユニット１００の動作の詳細を示す図である。 3A to 3E are diagrams showing details of the operation of the traffic prediction unit 100. FIG.

図３Ａ-３Ｅにおいて、プローブ１-６は、ローカル（Local）プローブ、ノース（Nth）プローブ、イースト（Est）プローブ、サウス（Sth）プローブ、ウエスト（Wst）プローブ、アップ（Up）プローブ、及びダウン（Dwn）プローブにそれぞれ対応する。プローブ１-６のそれぞれは、キュー信号を用いて、新たなフリッツが入力バッファ１１に到達す
る都度、歩進するカウンタを有する。カウンタのカウント値は、１００サイクルで到達するフリッツの数を記憶することができる８ビットのレジスタに格納される。 3A-3E, probes 1-6 are a local probe, a north (Nth) probe, an east (Est) probe, a south (Sth) probe, a west (Wst) probe, an up (Up) probe, and a down probe. Corresponds to each (Dwn) probe. Each of the probes 1-6 has a counter that advances each time a new Fritz reaches the input buffer 11 using a cue signal. The count value of the counter is stored in an 8-bit register that can store the number of fritzs reached in 100 cycles.

トラフィック予測ユニット１００は、それぞれ８ビットサイズの７つの入力を受け入れできる予測テーブル１０を含む。それぞれの入力は、一つの入力ポートに割り当てられ、そのフリッツ到達率を格納する。 The traffic prediction unit 100 includes a prediction table 10 that can accept seven inputs each of 8 bit size. Each input is assigned to one input port and stores its Fritz arrival rate.

トラフィック予測ユニット１００は、以下のステップに従い動作する。 The traffic prediction unit 100 operates according to the following steps.

１）それぞれのプローブが、その値を出力し、１００サイクル毎（レジスタサイズを小さくするために）、予測テーブル１０に送る。 1) Each probe outputs its value and sends it to the prediction table 10 every 100 cycles (to reduce the register size).

２）トラフィック予測ユニット１００は、プローブから来た値と既に予測テーブル１０に有る値とを使用して１００サイクル（フリッツ／１００サイクル）でフリッツの平均値を計算する。 2) The traffic prediction unit 100 calculates the average value of Fritz in 100 cycles (Flits / 100 cycles) using the value coming from the probe and the value already in the prediction table 10.

３）新しい平均値が予測テーブル１０に更新される。 3) The new average value is updated in the prediction table 10.

４）トラフィック予測ユニットとは、７つの入力ポートの平均をチェックし、最良の候補と考えられる最小値を選択する。 4) The traffic prediction unit checks the average of the seven input ports and selects the smallest value that is considered the best candidate.

５）ＲＡＢが、フルバッファにおいて誤りがあることのフラグ（Buffer_flg信号）を立てるとき、トラフィック予測ユニット１００が計算した、待避チャネルとして用いられる最良の候補を送る。 5) When the RAB sets a flag (Buffer_flg signal) that there is an error in the full buffer, it sends the best candidate to be used as a save channel calculated by the traffic prediction unit 100.

ここで、上記ステップに従い、トラフィック予測ユニット１００における動作の例を説明する。 Here, according to the above steps, an example of the operation in the traffic prediction unit 100 will be described.

図３Ａにおいて、システムがスタートすると、プローブのカウンタの値と予測テーブル１０の入力がゼロに初期化される。 In FIG. 3A, when the system starts, the value of the probe counter and the input of the prediction table 10 are initialized to zero.

そして、図３Ｂにおいて、最初の１００サイクルの後、入力フリッツの数が、プローブから出力され、トラフィック予測ユニット１００に送られ、平均値がカウントできないので、直接に予測テーブル１０に格納される。同時に、プローブのカウンタは、０にリセットされる。ウエストが次いで、フリッツの平均が、最小値である１０であるとき、最良の候補バッファとして計算される。 In FIG. 3B, after the first 100 cycles, the number of input frits is output from the probe, sent to the traffic prediction unit 100, and the average value cannot be counted, so it is stored directly in the prediction table 10. At the same time, the probe counter is reset to zero. The waist is then calculated as the best candidate buffer when the Fritz average is 10, which is the minimum value.

図３Ｃにおいて、１００サイクル経過毎に、プローブはその値を出力し、トラフィック予測ユニット１００に送る。トラフィック予測ユニット１００は、所定の入力ポートから入力情報を読み取り、予測テーブル１０における対応する値を読み、平均フリッツの到達値を計算する。 In FIG. 3C, every 100 cycles, the probe outputs its value and sends it to the traffic prediction unit 100. The traffic prediction unit 100 reads input information from a predetermined input port, reads the corresponding value in the prediction table 10, and calculates the arrival value of the average fritz.

ついで、図３Ｄにおいて、プローブのカウンタは、再びゼロにリセットされる。予測テーブル１０の入力は、計算された平均値で更新される。予測テーブル１０における最小値が計算され、最良の候補バッファが決定される（その平均値が最小の３５であるローカルが、最良の候補である）。 Then, in FIG. 3D, the probe counter is reset to zero again. The input of the prediction table 10 is updated with the calculated average value. The minimum value in the prediction table 10 is calculated and the best candidate buffer is determined (the local whose average is the smallest 35 is the best candidate).

図３Ｅにおいて、トラフィック予測ユニット１００が、（バッファ１１が空きがなく、誤りを有することを通知する、図１参照）ＲＡＢモジュール９からのバッファフラグ（Buffer_flg）信号を受信するとき、トラフィック予測ユニット１００は、入力フリッツに対する代替の入力バッファとして使用される最新の最良の候補バッファを、フラグが立てられたバッファに送る。 In FIG. 3E, when the traffic prediction unit 100 receives a buffer flag (Buffer_flg) signal from the RAB module 9 (notifying that the buffer 11 is empty and has an error , see FIG. 1), the traffic prediction unit 100 It is the latest best candidate buffer used as an input buffer for alternatives to input Fritz, send flagged buffer.

[先読み（Look-Ahead）誤り耐性ルータ（“LAFT”）]
先読みルーティングの利点を保持するために、先に提案された先読み誤り耐性（LAFT）[非特許文献１１]が、リンク状態を考慮して次のノードに対するルーティングの決定を実行し、図４のアルゴリズム１に示されるように、最良の最小パスを選択する。 [Look-Ahead Error Resilience Router (“LAFT”)]
In order to retain the advantages of prefetch routing, the previously proposed prefetch error resilience (LAFT) [Non-Patent Document 11] executes a routing decision for the next node in consideration of the link state, and the algorithm of FIG. As shown in FIG. 1, the best minimum path is selected.

誤りコントロールモジュール（ＦＣＭ）２０から受信した誤り情報（図１における誤りリンク）がLAFTが実行される各入力ポートにより読まれる。 Error information received from the error control module (FCM) 20 (error links in FIG. 1) is read by each input port LAFT is executed.

このアルゴリズムの第１のフェーズは、次のノードアドレスを計算する。所定の最終到達先にフリッツを送りたい所定のノードに対して、X、Y、Z次元のそれぞれを通して最大３つの可能な方向が存在する。 The first phase of the algorithm calculates the next node address. There are up to three possible directions through each of the X, Y, and Z dimensions for a given node that wishes to send a Fritz to a given final destination.

第２のフェーズでは、LAFTは、現在と到達先ノードの両方のX、Y、Z座標を同時に比較して、この３方向の計算を実行する。これらの方向の計算と同時に、誤り制御モジュール２０は、フリッツから次のポートの識別子を読み、対応する入力ポートに適切な誤り情報を送る。この第２のフェーズ終わりまでに、LAFTは、次のノードの誤り状態についてと、最小のルートのためのこれら可能な３方向についての情報を持つ。 In the second phase, LAFT performs this three-way calculation by simultaneously comparing the X, Y, Z coordinates of both the current and destination nodes. Simultaneously with the calculation of these directions, the error control module 20 reads the next port identifier from the Fritz and sends the appropriate error information to the corresponding input port. By the end of this second phase, LAFT has information about the error status of the next node and about these possible three directions for the minimum route.

次のフェーズでは、ルート選択が実行される。この決定のために、本発明では、以下に示す優先度付け条件ａ、ｂ、ｃのセットが採用され、誤り耐性と、誤りの有無によらず高いパフォーマンスを保証する。 In the next phase, route selection is performed. For this determination, the present invention employs a set of prioritization conditions a, b, and c as shown below, and guarantees error resilience and high performance regardless of the presence or absence of errors .

ａ．選択された方向は、最小パスを保証し、ルート選択において、高い優先度を与えられる。 a. The chosen direction guarantees a minimum path and is given high priority in route selection.

ｂ．方向は、次に期待されるパスの広がりが最大のものが選択される。 b. Direction, then the spread of paths, which is expected to the maximum one is selected.

ｃ．輻輳状態は、最も低い優先度が与えられる。 c. The congestion state is given the lowest priority.

これらの優先度に基づき、LAFTは、誤り制御モジュール２０から受信される次のノードの誤り状態を読み、可能な非誤りの最小方向の数をチェックする。３つの可能な方向が最小で、且つ同じ広がりを有する時、ルート選択は、各出力ポートの輻輳に依存して行われる。 Based on these priorities, LAFT reads the error status of the next node received from error control module 20 and checks the number of possible non- error minimum directions. When the three possible directions are minimal and have the same spread , route selection is performed depending on the congestion of each output port.

[要求に応じたバイパスリンク]
図５は、基準の７×７クロスバー１３における誤りの数が増加する度に、追加の回避チャネルを与える、要求に応じたバイパスリンクのメカニズムを示す。 [Bypass link upon request]
FIG. 5 shows the on-demand bypass link mechanism that provides an additional avoidance channel each time the number of errors in the reference 7 × 7 crossbar 13 increases.

図５において、２つのバイパスリンク１３１、１３２は、実施例構成である。この図５において示されるクロスバーサブコントローラ（cntrl unit）１７は、クロスバーリンク状態をチェックし、一つ又はいくつかのリンクにおいて誤りが検知される場合、フラグ（Faulty_Cross：誤りクロス）を誤り制御モジュール２０に送り、誤りクロスバーリンクを不能にさせ、適切な数のバイパスリンクを可能にさせる。 In FIG. 5, the two bypass links 131 and 132 have the configuration of the embodiment. Crossbar sub-controller (cntrl Unit) 17 shown in FIG. 5 checks the crossbar link state, if an error is detected in one or several link flag (Faulty_Cross: error Cross) error control Sends to module 20 to disable false crossbar links and allow an appropriate number of bypass links.

最も簡易なアプローチは、全てのクロスバーに対して指定のバイパスリンクを与えることである。この態様において、誤り耐性及びパフォーマンスの両方が保証される。なぜならば、入力ポートの要求は、基準の７×７クロスバー１３の全ての基準の７×７クロスバーリンクが誤りであっても、バイパスリンクを共有しないためである。しかし、この方法は、全クロスバーを二重にすることと同じであり、従って、追加の領域及び電力消費が確実に生じる。さらに、誤り率が小さい時は、２又は３個のバイパスリンクのみで、誤りクロスバーリンクの要求に十分対応できる。 The simplest approach is to provide a specified bypass link for every crossbar. In this manner, both error resilience and performance are guaranteed. This is because the input port requirement does not share the bypass link even if all the reference 7 × 7 crossbar links of the reference 7 × 7 crossbar 13 are incorrect . However, this method is the same as duplicating the entire crossbar, thus ensuring additional area and power consumption. Furthermore, when the error rate is small, only two or three bypass links can sufficiently meet the demand for an error crossbar link.

この事実に従えば、本発明において、徐々に増加するアプローチが実行され、使用ベンチマーク及び推定誤り率が、分析される。バイパスリンクの数は、パフォーマンスが一定あるいは殆ど変化しない時まで、徐々に増加される。 In accordance with this fact, in the present invention, a gradually increasing approach is performed and usage benchmarks and estimated error rates are analyzed. The number of bypass links is gradually increased until the performance is constant or hardly changes.

バイパスリンクの数は、大変重要であり、領域及び電力消費を少なくするために、可能な限り、最小にすべきである。したがって、システムに既に存在する不使用のクロスバーリンクが利用される。 The number of bypass links is very important and should be minimized as much as possible to reduce area and power consumption. Therefore, unused crossbar links that already exist in the system are utilized.

これらリンクは、基本的にネットワークの端に位置し、隣接するノードがない。したがって、対応するクロスバーは不使用である。この最適化によって、重要な領域及び電力の削減が可能に成り、一方、パフォーマンスを最大に維持することができる。 These links are basically located at the edge of the network and have no adjacent nodes. Therefore, the corresponding crossbar is not used. This optimization allows for significant area and power savings while maintaining maximum performance.

[誤り制御モジュール（ＦＣＭ）２０]
誤り制御モジュール（ＦＣＭ）２０は、本発明の主要要素の一つである。これは、３つの主要要素、即ちルータ間リンク、入力バッファ、及びクロスバーの全ての種類の誤りの回復を管理するからである。ルータ間リンクから始め、各ルータのリンク状態とその隣接ルータ誤り状態が、“リンク状態”と名付けられた小さなアレイに格納される。 [ Error control module (FCM) 20]
The error control module (FCM) 20 is one of the main elements of the present invention. This is because it manages the recovery of all three types of errors : the inter-router link, the input buffer, and the crossbar. Starting with an inter-router link, the link status of each router and its neighboring router error status are stored in a small array named “link status”.

これらの情報は、常に、LAFTルーティングモジュール８に送られ、次のノードに対応する“次のポート”の選択の間、使用される。入力ポートのために、予測テーブル１０が、先に説明したように、必要とされるとき、他のバッファ資源の割当を扱う。しかし、全てのバッファ資源が誤りであり、もはやフリッツを受け付けることができないとき、誤り制御モジュール２０が、ＲＡＢコントローラ９０から（図１に示す）“バッファ誤り” 信号を受信する。 These information are always sent to the LAFT routing module 8 and used during the selection of the “next port” corresponding to the next node. For input ports, the prediction table 10 handles the allocation of other buffer resources when needed, as explained above. However, when all buffer resources are in error and can no longer accept Fritz, error control module 20 receives a “buffer error ” signal (shown in FIG. 1) from RAB controller 90.

この“バッファ誤り” 信号を受信するとき、誤り制御モジュール２０は、入力ポート全体を不能にして、動的パワーを削減する。同時に、誤り制御モジュール２０は、誤りバッファに接続されたリンクにフラグを立て“リンク状態アレイ”を更新する。これらの情報は、常に全ての隣接ノードにおける誤り制御モジュール２０に送られる。 When receiving this “buffer error ” signal, error control module 20 disables the entire input port to reduce dynamic power. At the same time, error control module 20 updates the flags "link state array" to links connected to the error buffer. These pieces of information are always sent to the error control module 20 in all adjacent nodes.

最後に、クロスバー１３において、誤りを扱うために、誤り制御モジュール２０は、次に詳細を説明するように、３つの主要タスクを有する。 Finally, with the cross bar 13, to handle the error, the error control module 20, as will be described in detail, the three main tasks.

１）(図１に示す)制御（cntrl）ユニット１７から受信する情報に応じて、“Crss_link”状態レジスタ２２を更新する。 1) Update the “Crss_link” status register 22 according to the information received from the control (cntrl) unit 17 (shown in FIG. 1).

２）検出された誤りの数に応じて、“バイパスリンク”１３１、１３２の適切な数を稼働し及び非稼働にする。 2) Activate and deactivate the appropriate number of “bypass links” 131, 132 depending on the number of detected errors .

３）誤りクロスバーリンクについて、“Sw_req_cntrl” １５に通知し、スイッチ配置部１６に送る前に、要求を変えることができるようにする。 3) “Sw_req_cntrl” 15 is notified of the erroneous crossbar link so that the request can be changed before being sent to the switch placement unit 16.

図１に示す様に、制御ユニット１７は、ベースライン７ｘ７クロスバー１３と追加の“バイパスリンク”１２との間の素子である。この制御ユニット１７の第１の役割は、クロスバー１３における誤りの存在を検知して、その誤りの状態について誤り制御モジュール２０への通知を継続する。誤り制御モジュール２０は、この入力する誤り情報をモニタし、レジスタ２１に“クロスリンク”状態と名付けて格納する（図１）。 As shown in FIG. 1, the control unit 17 is an element between the baseline 7 × 7 crossbar 13 and the additional “bypass link” 12. The first role of the control unit 17 detects the presence of an error in the cross bar 13, the status of the error to continue notifying the error control module 20. The error control module 20 monitors the input error information and stores it in the register 21 with the name “cross-link” (FIG. 1).

誤りが検出される時、誤り制御モジュール２０は、同時に３つの信号を送信する。：制御ユニット１７に対する２つは、バイパスリンクの一つを有効にし、誤りクロスバーリンクを不能にする。そして、残りの一つは、“スイッチ要求制御Sw_req_cntrl“ ユニット１５に送られ、フリッツが誤りクロスバーリンクに要求することを阻止し、その代わりに、“バイパスリンク：Bypass_links” の一つを使用することの許可を求める。 When an error is detected, the error control module 20 transmits three signals simultaneously. Two for control unit 17 enable one of the bypass links and disable the erroneous crossbar link. The remaining one is then sent to the “Switch Request Control Sw_req_cntrl” unit 15 to prevent Fritz from requesting an error crossbar link, and instead use one of “Bypass_links”. Ask for permission.

誤りの数が増加するとき、誤り制御モジュール２０は、公平な方法で、可能なバイパスリンクに異なる要求を配布することを管理する。一方、制御ユニット１７は、誤り制御モジュール２０から受信する信号に応じて、電力削減のためにクロックゲート技術を用いてバイパスリンクを有効にしたり無効にしたりする役割を有する。これは、クロスバー１３が有効である時、バイパスリンクがスリープ状態とされて動的電力を削減し、誤りが検出されると、制御ユニット１７は、誤り制御モジュール２０からの情報に応じて、適切な数のバイパスリンクを起こすことを意味する。制御ユニット１７は、マルチプレクサーとディマルチプレクサーで構成される。ディマルチプレクサーは、誤りクロスバーリンクに代わり、起動された“バイパスリンク”に入力フリッツを再仕向けするために使用され、マルチプレクサーは、基準クロスバー１３と“バイパスリンク”間を切り替える機能を有し、フリッツの次の隣接ノードへ通過させる。 As the number of errors increases, the error control module 20 manages distributing different requests to possible bypass links in a fair manner. On the other hand, the control unit 17 has a role of enabling or disabling the bypass link using a clock gate technique for power reduction in accordance with a signal received from the error control module 20. This is because when the crossbar 13 is enabled, the bypass link is put into a sleep state to reduce dynamic power, and when an error is detected, the control unit 17 responds to the information from the error control module 20, It means to create an appropriate number of bypass links. The control unit 17 includes a multiplexer and a demultiplexer. The demultiplexer is used to redirect the input frits to the activated “bypass link” instead of the erroneous crossbar link, and the multiplexer has the function of switching between the reference crossbar 13 and the “bypass link”. Pass to the next adjacent node of Fritz.

図６は、本発明に従う３Ｄ誤り耐性ＮｏＣ（3D-FTNoC）の動作を示す図である。この図に示すように、3D-FTNoCは、３つの主要なパイプラインステージI、II、IIIを含む。最初のパイプラインステージIは、バッファ書き込みステージであり、３つの主要要素が並列
に以下の様に働く。 FIG. 6 is a diagram illustrating the operation of 3D error tolerance NoC (3D-FTNoC) according to the present invention. As shown in this figure, 3D-FTNoC includes three main pipeline stages I, II, and III. The first pipeline stage I is the buffer write stage, and the three main elements work in parallel as follows.

１）ＲＡＢメカニズム９のＲＡＢコントローラ９０は、上記したように、デッドロックと入力バッファ１１の誤り状態をモニタする。 1) The RAB controller 90 of the RAB mechanism 9 monitors the deadlock and the error state of the input buffer 11 as described above.

２）予測テーブル１０は、それぞれの入力ポート１−７の入力バッファ１１に付属のプローブ（Prb0-Prb6）１４からトラフィック情報を受ける。 2) The prediction table 10 receives traffic information from the probe (Prb0-Prb6) 14 attached to the input buffer 11 of each input port 1-7.

３）スイッチ調整コントローラ（Sw-reqcntrl）１５は、入力フリッツから“次のポート：Next-port”の識別子を読み、スイッチ配置器１６に対し、“スイッチ要求：sw-req”信号を発行する。 3) The switch adjustment controller (Sw-reqcntrl) 15 reads the identifier of “Next-port” from the input Fritz and issues a “switch request: sw-req” signal to the switch placement unit 16.

第２のパイプラインステージIIは、ルーティング計算／スイッチ配置ステージである。このステージにおいて、二つの主要処理が同時に行われる。第１の処理は、誤り制御モジュール（FCM）２０における“リンク状態”レジスタ２１から受信される信号に応じて、LAFTルーティングモジュール８により実行される次のノードに対する“次のポート”の計算である。第２の処理は、入力として、異なる“スイッチ要求信号”を有するスイッチ配置器１６により管理される異なる入力ポートからの異なる要求間のスケジューリングである。出力として、スイッチ配置器１６は、要求された出力ポートの使用を許可する“スイッチ許可”信号と、スケジュール結果についての情報を有するクロスバー１３に対する“クロスコントロールCrss_contrl”信号を発行する。 The second pipeline stage II is a routing calculation / switch placement stage. In this stage, two main processes are performed simultaneously. The first process is the calculation of the “next port” for the next node executed by the LAFT routing module 8 according to the signal received from the “link status” register 21 in the error control module (FCM) 20. . The second process is scheduling between different requests from different input ports managed by the switch locator 16 having different “switch request signals” as inputs. As an output, the switch placement unit 16 issues a “switch permission” signal for permitting use of the requested output port and a “cross control Crss_contrl” signal for the crossbar 13 having information on the schedule result.

これら二つのステージI、IIの並列化は、本発明に従うシステムが依拠する先読み（Look ahead）ルーティングにより達成できる。この種のルーティングで、従来システムにおいては通常直列に実行される二つのステージIとII間の従属性は、除去することが可能である。 These two stages I, parallelization of II can be achieved by pre-reading (the Look ahead) routing relies a system according to the present invention. With this type of routing, the dependency between two stages I and II, which are usually performed in series in conventional systems, can be eliminated.

最後に、第３のパイプラインステージIIIとしてクロスバー交叉ステージにおいて、フリッツは、入力ポート１-７の入力バッファ１１から取り込まれ、適切な隣接ノードに送られる。図６に見られるように、誤り制御モジュール２０は、その主な機能が異なるルータの素子間の通信を制御し、確立することであるので、三つのパイプラインステージI、 II、 IIIに沿って、異なるタスクを実行する。 Finally, in the crossbar crossing stage as the third pipeline stage III, Fritz is taken from the input buffer 11 of the input port 1-7 and sent to the appropriate adjacent node. As seen in FIG. 6, the error control module 20 controls the communication between the elements of different routers Its main function, therefore is to establish, three pipeline stages I, II, along III , Perform different tasks.

本発明によるアーキテクチャの新規性の理解のために、図７は、本発明のアーキテクチャと比較される、最もポピュラーなシステムに使用される従来のルータのタイムチャートを示す。 For an understanding of the novelty of the architecture according to the invention, FIG. 7 shows a time chart of a conventional router used in the most popular system compared to the architecture of the invention.

この従来システムについての最初のコメントは、誤り検知あるいは、回復のいずれのための要素が見当たらない場合、誤り耐性支持を欠くということである。さらに、図７に見られるように、図６に示す本発明の３つのステージの代わりに、５つのパイプラインステージI-Vがある。 The first comment on this conventional system is that it lacks error resilience support if no element for error detection or recovery is found. Further, as seen in FIG. 7, there are five pipeline stages IV instead of the three stages of the present invention shown in FIG.

第１のステージは、本発明のシステムにおけると基本的に同様の機能を有するバッファ書き込みステージIと同じである。しかし、単一バッファ１１を使用する代わりに、仮想チャネル（VCs）が用いられ、デッドロック回避要件を満足させる。ＶＣｓの数は、システムアーキテクチャ及び、一般的に２乃至４つのＶＣｓ間で変えることのできるルーティングアルゴリズムに依存する。 The first stage is the same as the buffer write stage I having basically the same function as in the system of the present invention. However, instead of using a single buffer 11, virtual channels (VCs) are used to satisfy the deadlock avoidance requirement. The number of VCs depends on the system architecture and routing algorithms that can generally vary between 2-4 VCs.

次のステージIIは、ルーティング演算であり、大部分のシステムが静的なXYZルーティングあるいは、後者に基づく他のルーティングを採用する。先読みルーティングがないために、次の２つのステージは、XYZルーティングモジュールから結果を受けなければ動作できない。 The next stage II is a routing operation, where most systems employ static XYZ routing or other routing based on the latter. Due to the lack of read-ahead routing, the next two stages cannot operate without receiving results from the XYZ routing module.

第３のステージIIIは、仮想チャネル配置ステージである。ＶＣｓの使用は、追加のクロック周期を付加する調整が必要である。したがって、全てのパイプラインステージが異なるＶＣｓ間の調整のために関与する。図７と比較すると、本発明のシステムは、デッドロック回避に使用されるＲＡＢモジュールのおかげでこのステージを削除している。 The third stage III is a virtual channel arrangement stage. The use of VCs requires adjustment to add additional clock periods. Therefore, all pipeline stages are involved for coordination between different VCs. Compared to FIG. 7, the system of the present invention eliminates this stage thanks to the RAB module used for deadlock avoidance.

ＶＣが許可を与えた後、図６に示す本発明と同じであるスイッチ配置ステージIVに切り替わる。しかし、唯一異なることは、ルーティング演算ステージIIと独立に行われることである。そして、本発明のシステムに対するように並行しては実行されない。最終的に、最後のステージＶ（クロスバー交叉）において、クロスバー１３は、スイッチ配置器１６から受信される調整結果に従い、フリッツの隣接ノードへの転送を行う。 After the VC gives permission, it switches to the switch placement stage IV which is the same as the present invention shown in FIG. However, the only difference is that it is performed independently of the routing operation stage II. And they are not executed in parallel as with the system of the present invention. Finally, in the final stage V (crossbar crossing), the crossbar 13 performs transfer to the adjacent node of Fritz according to the adjustment result received from the switch placement unit 16.

Claims

An error resistant router provided corresponding to each of a plurality of cores included in an IC,
A plurality of input port portions (1-7) formed to receive input frits from the corresponding core direction by the input buffer (11) ,
A crossbar (13) for connecting outputs of the plurality of input port sections (1-7) to a predetermined output destination;
An error control module (20) having a prediction table (10) for storing traffic information of the input buffer (11) ;
In each of the plurality of input port portions (1-7),
The input buffer (11);
An error detection circuit (91) for detecting a slot error in the input buffer (11);
A random access buffer controller (90) ;
When the error detection circuit (91) detects a slot error in the input buffer (11),
The random access buffer controller (90)
Checks whether other remaining slots are occupied, if the other remaining slots are occupied, before Symbol send a flag to the error control module (20),
Based on the traffic information stored in the prediction table (10), to select the best input buffer (11) of said input ports (1-7), notifies the random access buffer controller (90) Then, control is performed to switch to the selected input buffer (11) and store subsequent input flits .
An error-resistant router characterized by that.

In claim 1,
The prediction table (10) collects traffic load information from each of the plurality of input port units (1-7) at a predetermined timing period, calculates the average Frits arrival, and accepts the frits of the error input buffer. Determine the input port of the
An error-tolerant router characterized by that.

In claim 1,
A plurality of bypass links (131, 132) to the crossbar (13);
A crossbar control unit (17) for checking a link state of the crossbar (13);
When the crossbar controller (17) detects an error in one or more links, it sends a flag to the error control module (20),
The error control module (20) disables an error link of the crossbar, and the crossbar and the plurality of bypass links (131, 132) that are not used until the execution process settles or remains unchanged. ) Increase the number of possible,
An error-tolerant router characterized by that.

In any one of claims 1 3,
The error-resistant router, wherein the IC is a 3D-IC composed of a plurality of laminated wafers each having a plurality of cores.

A three-dimensional IC comprising a plurality of cores each having a plurality of error-resistant routers corresponding to the plurality of cores,
Each of the plurality of error-tolerant routers is
A plurality of input port portions (1-7) formed to receive input frits from the corresponding core direction by the input buffer (11) ,
A crossbar (13) for connecting outputs of the plurality of input port sections (1-7) to a predetermined output destination;
An error control module (20) having a prediction table (10) for storing traffic information of the input buffer (11) ;
In each of the plurality of input port portions (1-7),
The input buffer (11);
An error detection circuit (91) for detecting a slot error in the input buffer (11);
A random access buffer controller (90),
When the error detection circuit (91) detects a slot error in the input buffer (11),
The random access buffer controller (90)
Other remaining slots is checked whether it is occupied, if the other remaining slots are occupied, send a flag to the error control module (20),
Based on the traffic information stored in the prediction table (10), to select the best input buffer (11) of said input ports (1-7), notifies the random access buffer controller (90) Then, control is performed to switch to the selected input buffer (11) and store subsequent input flits .
A three-dimensional IC characterized by this.

In claim 4 ,
The prediction table (10) collects traffic load information from each of the plurality of input port units (1-7) at a predetermined timing cycle, calculates an average flit arrival time, and appropriately handles an error input fritz. Determine the input port,
A three-dimensional IC characterized by this.

In claim 4 ,
A plurality of bypass links (131, 132) to the crossbar (13);
A crossbar control unit (17) for checking a link state of the crossbar (13);
When one or more link errors are detected by the crossbar control unit (17), a flag is sent to the error control module (20);
The error control module (20), prior to disabling the error links chrysanthemum crossbar, to enable the appropriate number of the plurality of bypass links (131, 132),
A three-dimensional IC characterized by this.

In any one of Claims 5-7 ,
The three-dimensional IC , wherein the IC is a 3D-IC composed of a plurality of wafers each having a plurality of cores.

An error-resistant router control method provided for each of a plurality of IC cores,
As the first processing stage,
The random access buffer controller (90) detects the deadlock and error state of the input buffer (11) of each input port section (1-7),
Receiving traffic information from the probe attached to each input buffer (11) of the input port section (1-7);
Get identification information for the next port from the input Fritz
Send a switch request signal to the switch placement unit (16),
As the second processing stage,
Calculating the next port for the next node to be executed by the routing module based on the received signal from the error control module ( 20 );
Scheduling between requests from different ports managed by the switch locator (16);
Sending a request signal and a control signal permitting the use of the output port requested from the switch locator (16) to a crossbar (13) having information on scheduling results;
As the third processing stage,
A method for controlling an error-tolerant router, characterized in that the Fritz from the input buffer (11) is checked and sent to an appropriate adjacent node.

In claim 9 ,
The IC is a 3D-IC that includes each of a plurality of cores and is formed of a laminated wafer.
An error-resistant router control method characterized by the above.