JP2009516478A

JP2009516478A - Method and system for reducing interconnect latency

Info

Publication number: JP2009516478A
Application number: JP2008541283A
Authority: JP
Inventors: ジェアベル，フランソワ; エフベナー，アラン; アールグルズィボースキー，リチャード; アールジュニアヘメンウェイ，ブルースター; イリアディス，イリアス; クリシュナマーシー，ラジャラム; ペールイテン，ロナルト; ミンケンベルフ，シリエル
Original assignee: Corning Inc
Current assignee: Corning Inc
Priority date: 2005-11-14
Filing date: 2006-11-14
Publication date: 2009-04-16
Anticipated expiration: 2026-11-14
Also published as: CN101341698B; CN101341698A; US20070110087A1; JP4796149B2; EP1949622B1; KR20080077189A; US8135024B2; DE602006013180D1; WO2007120209A2; EP1949622A2; WO2007120209A3

Abstract

調停待ち時間を低減するための方法及びシステムは、事前の調停を行わずに、ルーティング構造の計画的調停と組み合わせて、投機的送信（ＳＴＸ）を用いる。パケットは、発信元位置からルーティング構造に計画的調停を介して送信されると共に投機的調停を介しても送信され、ルーティング構造の事前に予約されていない出力を、投機的に伝送されたパケットに、衝突が生じないように割り当てる。 Methods and systems for reducing arbitration latency use speculative transmission (STX) in combination with planned arbitration of the routing structure without prior arbitration. Packets are sent from the origin location to the routing structure via planned arbitration and also via speculative arbitration, and the pre-reserved output of the routing structure is sent to the speculatively transmitted packet. Assign them to avoid collisions.

Description

Priority claim

本願は、２００５年１１月１４日に出願された「相互接続の待ち時間を低減するための方法及びシステム(METHOD AND SYSTEM TO REDUCE INTERCONNECT LATENCY)」という名称の米国特許仮出願第６０／７３６，７７９号による優先権を主張する。 This application is a US patent provisional application 60 / 736,779 entitled “METHOD AND SYSTEM TO REDUCE INTERCONNECT LATENCY” filed on November 14, 2005. Claim priority by issue.

本発明はパケットスイッチングの分野に関し、具体的には、特にコンピュータ相互接続ネットワークに適用性を有する、入力キューイングを伴うパケットスイッチングアーキテクチャの分野に関する。 The present invention relates to the field of packet switching, and in particular to the field of packet switching architecture with input queuing, which is particularly applicable to computer interconnect networks.

本発明は、ＤＯＥ／ＮＮＳＡより与えられた契約番号Ｗ−７４０５−ＥＮＧ４８の下、米国政府の支援を受けてなされたものである。米国政府は本発明に一定の権利を有する。 This invention was made with support from the US government under contract number W-7405-ENG48 awarded by DOE / NNSA. The US government has certain rights in this invention.

伝送技術並びに通信及び計算の並列処理の進歩により、情報データの転送に利用可能な帯域幅の限界が高まり続けている。例えば、波長分割多重（ＷＤＭ）や高密度ＷＤＭ（ＤＷＤＭ）等といった進歩により、単一のファイバ上に多数のチャネルを多重化することで、利用可能な帯域幅が非常に増加する。個々のチャネルは、光キャリア（ＯＣ−ｘ）速度ＯＣ−４８（２．５Ｇｂ／秒）、ＯＣ−１９２（１０Ｇｂ／秒）、又はＯＣ−７６８（４０Ｇｂ／秒）で動作する。最新のＤＷＤＭ技術を用いれば、単一のファイバで、５テラビット／秒を超えるデータを搬送可能である。 Advances in transmission technology and parallel processing of communications and computations continue to increase the bandwidth available for transferring information data. For example, advances such as wavelength division multiplexing (WDM), high density WDM (DWDM), etc., greatly increase the available bandwidth by multiplexing multiple channels on a single fiber. Individual channels operate at optical carrier (OC-x) speeds OC-48 (2.5 Gb / sec), OC-192 (10 Gb / sec), or OC-768 (40 Gb / sec). Using the latest DWDM technology, data exceeding 5 terabits / second can be carried by a single fiber.

同時に、そのような進歩によって提供されるますますの高速化と、入手可能なスイッチがスイッチング可能な速度とのギャップが広がっている。光スイッチは、自由区間を通したルーティング、長い距離にわたる最小限の信号減衰、並びに、光学領域から電気領域への及び電気領域から光学領域への変換をなくす等といった、理論的な長所を提供するが、現在の全光学型のスイッチは、比較的動作が遅いか、又は非常に高価である。更に、情報の光ストレージは非常に面倒であり、しばしば実用的でない。この光スイッチングの短所が克服されるまでは、パケットスイッチング方式では、電気的なスイッチが主要な役割を果たし続けるであろう。 At the same time, the gap between the ever-increasing speed offered by such advances and the speed at which available switches can be switched is widening. Optical switches offer theoretical advantages such as routing through free sections, minimal signal attenuation over long distances, and elimination of optical to electrical and electrical to optical domain conversions, etc. However, current all-optical switches are relatively slow or very expensive. Furthermore, optical storage of information is very cumbersome and often impractical. Until this shortcoming of optical switching is overcome, electrical switches will continue to play a major role in packet switching schemes.

ボードの相互接続には、バックプレーンスイッチ、より一般的にはルーティング構造を用いるのが一般的である。ネットワーキングシステムでは、これらのボードはラインカードと呼ばれ、コンピューティング及びストレージにおいては、これらはしばしばアダプタやブレードと呼ばれる。電気通信交換器、マルチサービス・プロビジョニング・プラットホーム、アド／ドロップマルチプレクサ、デジタル交差接続、ストレージスイッチ、ルータ、大企業規模スイッチ、埋め込みプラットフォーム、マルチプロセッサシステム及びブレードサーバ等の広範囲のシステムが、バックプレーンを用いてボードを接続している。 Backplane switches, more generally routing structures, are commonly used for board interconnection. In networking systems, these boards are called line cards, and in computing and storage, they are often called adapters or blades. A wide range of systems including telecommunications switches, multiservice provisioning platforms, add / drop multiplexers, digital cross-connects, storage switches, routers, large enterprise switches, embedded platforms, multiprocessor systems and blade servers Connect the board using.

発信元から宛先に相互接続システムを介して情報データが伝送される際には、情報はまず、複数のデータパケットに分割されることが多い。一般的に、各データパケットはヘッダー部、ペイロード部及び最後尾部を含み、更により小さい単位に分割される。データパケットは、他の発信元から発信された他のデータパケットと同時に、ルーティング構造を介してスイッチングされる。並列コンピュータ、インターネットルータ、Ｓ（ｔ）ＡＮネットワーク、非対称転送モード（ＡＴＭ）ネットワーク、及び特に光ネットワークの相互接続に用いられるものを含む現在の多くのパケットスイッチシステムは、入力キューイング構成を用いており、各ラインカードの出力毎にソートされたキュー（しばしば仮想出力キューイング（ＶＯＱ）と呼ばれるこのような構成は、ＦＩＦＯキューの使用に特有の行頭ブロッキングを解消する）、クロスバールーティング構造、並びに、スイッチングリソースの割り当て及びキュー間の調停を行う中央集中型スケジューラ（例えば、アービター又は調停部）を含む。 When information data is transmitted from a source to a destination via an interconnection system, information is often first divided into a plurality of data packets. In general, each data packet includes a header part, a payload part, and a tail part, and is further divided into smaller units. Data packets are switched through the routing structure simultaneously with other data packets originating from other sources. Many current packet switch systems, including those used for interconnecting parallel computers, Internet routers, S (t) AN networks, asymmetric transfer mode (ATM) networks, and optical networks in particular, use input queuing configurations. Queues sorted for each line card output (such a configuration, often referred to as virtual output queuing (VOQ) eliminates the head-of-line blocking inherent in the use of FIFO queues), crossbar routing structures, and A centralized scheduler (e.g., arbiter or arbitrator) that allocates switching resources and arbitrates between queues.

図１は、ＶＯＱアーキテクチャを用いた従来のスイッチング構成を示す。図１の構成では、個々のラインカード１０２のＮ個のデータリンク２ａ_１〜２ａ_Ｎの各々からデータパケット（例えば、セル、フレーム又はデータグラム）が受信される。データパケットは、ルーティング構造１０６（Ｎ×Ｎクロスバーとして示されている）の各出力３_１〜３_Ｎ毎に、マルチプレクサ１０５ａを介して、Ｎ個のバッファグループ１２１のＮ個のバッファの１つにソートされる。即ち、各入力ラインカード１０２では、各出力３_１〜３_Ｎに対して個別のキューが維持され、その結果、ルーティング構造１０６の入力側ではＮ^２個のＶＯＱが生じる。ルータ構造１０６の同じ出力を求めるデータパケット間の競争を管理し、入力を出力にマッチさせるためのアービター１０７が設けられている。アービター１０７は、制御パス１０８、１０９に沿って各ラインカードと通信し、制御パス１１２に沿ってルーティング構造１０６にスイッチング構成を提供する。アービター１０７は、物理的にルーティング構造１０６の近くに位置するのが一般的である。 FIG. 1 shows a conventional switching configuration using a VOQ architecture. In the configuration of FIG. 1, data packets (eg, cells, frames, or datagrams) are received from each of the _N data links 2a ₁ -2a _N of the individual line card 102. Data packets are sent to one of the N buffers of N buffer groups 121 via multiplexer 105a for each output 3 _{1 to} 3 _{N of} routing structure 106 (shown as an N × N crossbar). To be sorted. That is, each input line card 102 maintains a separate queue for each output 3 _{1 to} 3 _N , resulting in N ² VOQs on the input side of the routing structure 106. Arbiter 107 is provided to manage competition between data packets for the same output of router structure 106 and to match the input to the output. Arbiter 107 communicates with each line card along control paths 108, 109 and provides a switching configuration to routing structure 106 along control path 112. The arbiter 107 is typically physically located near the routing structure 106.

アービター１０７は、バッファグループ１２１の各バッファで待機しているパケットに、衝突が生じないように入力ポート及び出力ポートを割り当てることを含む処理を行う。これらの処理は、割り当て及び調停を含む。割り当ては、最多で１つの出力リソースへの出力に対して、各バッファグループ１２１から最多で１つのパケットが選択されるように、ルーティング構造１０６の入力２ｂ_１〜２ｂＮと出力３_１〜３_Ｎとの間のマッチングを決定する。調停は、単一の各出力リソース３_１〜３_Ｎに対する複数の要求を解決し、これらの出力の１つを１グループの要求者の１つに割り当てる。図１の従来の構成では、アービター１０７は、制御パス１０８上でラインカード１０２からのスイッチアクセス要求を受信する。アービター１０７は、受信した要求及び適切なマッチングアルゴリズムに基づいてマッチングを計算し、多くのタイムスロットの各々において、入力２ｂ_１〜２ｂＮのどれに、どの出力へのデータパケットの転送を許可するかを決定する。アクセスを勝ち得た（即ち、アクセスを許可された）各ラインカード１０２には、制御パス１０９に沿って、特定のタイムスロット又はスイッチングサイクルにおける指定された出力へのパケット等のデータ単位の送信が許可されたことをそのラインカード１０２に知らせる制御メッセージが送信される。そのタイムスロットの間に、アービター１０７は、計算したスイッチング構成をルーティング構造１０６に送信し、勝ち抜いた各ラインカード１０２は、そのバッファグループ１２１内のキューから一単位のデータパケットをデマルチプレクサ１０５ｂを介して放出し、そのデータ単位を２ｂ_１〜２ｂ_Ｎのいずれかの対応する入力に沿ってルーティング構造１０６に伝送する。次に、各データパケットは、ルーティング構造１０６を介して、アービター１０７によって構成されたパスに沿って、出力３_１〜３_Ｎのうちの要求された１つへと伝送される。 The arbiter 107 performs processing including assigning an input port and an output port to a packet waiting in each buffer of the buffer group 121 so that no collision occurs. These processes include allocation and arbitration. Allocation is performed so that at most one packet is selected from each buffer group 121 with respect to the output to one output resource at most, the inputs 2b _{1 to} 2bN and the outputs 3 _{1 to} 3 _N of the routing structure 106 Determine the matching between. Arbitration resolves multiple requests for each single output resource 3 _1-3 _N and assigns one of these outputs to one of a group of requesters. In the conventional configuration of FIG. 1, the arbiter 107 receives a switch access request from the line card 102 on the control path 108. The arbiter 107 calculates a match based on the received request and an appropriate matching algorithm, and in each of a number of time slots, determines which of the inputs 2b ₁ -2bN is allowed to forward the data packet to which output. decide. Each line card 102 that has gained access (ie, has been granted access) is permitted to transmit a unit of data, such as a packet, along a control path 109 to a specified output in a particular time slot or switching cycle. A control message is transmitted to inform the line card 102 of the completion. During that time slot, the arbiter 107 sends the calculated switching configuration to the routing structure 106, and each winning line card 102 passes a unit of data packet from the queue in its buffer group 121 via the demultiplexer 105b. The data unit is transmitted to the routing structure 106 along the corresponding input of any of 2b ₁ to 2b _N. Each data packet is then transmitted via routing structure 106 to the requested one of outputs 3 _{1 to} 3 _N along the path configured by arbiter 107.

図面からわかるように、このような入力キューイングシステムには、ラインカードからアービターへの制御情報の流れ（例えば要求）とラインカードに戻る制御情報の流れ（例えば許可）を含む制御パスと、入力ラインカードからクロスバーを介して出力ラインカードに至るデータパケットの流れを含むデータパスとの、２つの基本的な通信パスが存在する。 As can be seen from the drawing, such an input queuing system includes a control path including a flow of control information from the line card to the arbiter (eg, request) and a flow of control information back to the line card (eg, permission), and an input. There are two basic communication paths, a data path that includes the flow of data packets from the line card through the crossbar to the output line card.

尚、図１に示されている従来のパケットスイッチング構成は、一方向通信パスのみを有するルーティング構造６を示すものであるが、この一般的な概念は、双方向データ及び制御パスも含む。例えば、図１に示されている各データリンク２ｂ_１〜２ｂ_Ｎ及び個々の出力３_１〜３_Ｎは、各バッファグループ１２１と関連付けられたラインカードが入力（ingress）バッファ及び出力（egress）バッファの両方を含むような、双方向リンクとして表わされ得る。この場合には、ラインカード１０２は、ルーティング構造を用いてデータパケットを伝送するための発信元位置及び宛先位置の両方とみなすことができる。同様に、要求、許可、及びリンク２ａ_１〜２ａ_Ｎも、双方向リンクとして表わされ得る。 Note that the conventional packet switching configuration shown in FIG. 1 shows a routing structure 6 having only one-way communication paths, but this general concept also includes bidirectional data and control paths. For example, each of the data links 2b _{1 to} 2b _N and the individual outputs 3 _{1 to} 3 _N shown in FIG. 1 includes an input buffer (egress buffer) and an output buffer (egress buffer). Can be represented as a bi-directional link. In this case, the line card 102 can be regarded as both a source location and a destination location for transmitting data packets using a routing structure. Similarly, requests, authorization, and links _2a 1 to 2A region _N, can be represented as a two-way link.

容量の増加と共に、パケットスイッチの物理的なサイズも増加している。同時に、ライン速度は増加しても、パケットサイズはほぼ一定のままであるので、単一のパケット又はセルの持続時間（Ｔ＝Ｌ／Ｂ。ここで、Ｌはビット単位のパケット長であり、Ｂはビット／秒単位のリンク速度である）は短くなっている。これらの傾向は、パケット時間で測定されるスイッチ内のラウンドトリップ（ＲＴ）が、かなり跳ね上がることを直接暗示するものである。中央集中型の調停が行われる入力キューイング型スイッチでは、最小通過待ち時間は（１）アービターに要求を提出して対応する許可が到着するまで待機する待ち時間（アービターへの及びアービターからの飛行時間（time-of-flight）及び調停のための時間を含む）と、（２）シリアル化/逆シリアル化（ＳｅｒＤｅｓ）、伝送、及びスイッチを介してパケットを送信するための飛行時間の待ち時間との２つの待ち時間からなるので、この影響は２倍になる。ざっと述べれば、これらの待ち時間は最小でも２・（ＲＴ）パケット時間となり、これは、類似のスイッチ（バッファ付きルーティング構造を有するものを除く）の２倍である。 With increasing capacity, the physical size of packet switches has also increased. At the same time, as the line rate increases, the packet size remains approximately constant, so the duration of a single packet or cell (T = L / B, where L is the packet length in bits, B is the link speed in bits / second). These trends directly imply that the round trip (RT) in the switch, measured in packet time, jumps considerably. For input queuing switches with centralized arbitration, the minimum transit latency is (1) the wait time to submit a request to the arbiter and wait until the corresponding permission arrives (flight to and from the arbiter Time (including time-of-flight) and time for arbitration), and (2) serial time / deserialization (SerDes), transmission, and time-of-flight latency to transmit packets through the switch This effect is doubled. Roughly speaking, these latencies are at least 2 · (RT) packet times, which is twice that of similar switches (except those with buffered routing structures).

これらの待ち時間が問題となったのは最近であるので、これまでほとんど注目されていなかった。実際上の好ましい解決法は、入力キュー（一般的にＶＯＱ式に組織される）を有するラインカード等のボードを、ルーティング構造（例えば、クロスバー及びアービターを含むスイッチコア）の物理的に近くに配置することであった。しかし、現在のパッケージ化及び電力の制約により、スイッチコアの近くに多数のラインカードを配置することはできない。その結果、従来の構成では、単にルーティング構造に配置されるラインカードの数を増やすだけでは、より多くの帯域幅を求める増大し続ける要求に対処できない。 Since these wait times have recently become a problem, they have received little attention. A practical preferred solution is to place a board such as a line card with input queues (generally organized in a VOQ style) physically close to the routing structure (eg switch core including crossbar and arbiter). Was to place. However, due to current packaging and power constraints, it is not possible to place a large number of line cards near the switch core. As a result, with conventional configurations, simply increasing the number of line cards placed in the routing structure cannot address the ever-increasing demand for more bandwidth.

特許文献１では、ルーティング構造からラインカードを物理的に分けることにより、ラインカードの数を増やすことで、合計システム帯域幅を増やす試みがなされている。バッファリング及び処理の大半は、物理的に遠隔のラインカード上で実施される。図２Ａは、この手法によるシステムを示す。 In Patent Literature 1, an attempt is made to increase the total system bandwidth by increasing the number of line cards by physically separating the line cards from the routing structure. Most of the buffering and processing is performed on physically remote line cards. FIG. 2A shows a system according to this approach.

図２に示されるように、このシステムは、スイッチコア２１０と、スイッチコア２１０から物理的に離れた位置に設けられた複数のラインカード２０２とを含む。各ラインカード２０２は、入力ＶＯＱバッファグループ（キュー）２２１及び出力バッファ２２２を含む。スイッチコア２１０は、複数のポートモジュール２８０（即ち、「スイッチポート」）と、並列スライス自己ルーティングクロスバー型構造モジュール２０６と、中央集中型アービターモジュール２０７とを含む。データパケットは、ラインカード２０２とスイッチポート２８０との間のデータリンク２３１、及びスイッチポート２８０とクロスバー型ルーティング構造２０６との間のデータリンク２０３に沿って送受信される。各ラインカード２０２は、順方向パスを伝送されているパケットを格納するバッファグループ２２１と、リターンパスのパケットを格納する出力バッファ２２２とを含む。制御メッセージは、ラインカード２０２とスイッチポート２８０との間の制御パス２３２、及びスイッチポート２８０とアービターモジュール２０７との間の制御リンク２０４に沿って送受信される。アービター２０７は、各タイムスロットに対する適切な構成を決定し、その構成を構成リンク２１２に沿ってルーティング構造２０７に供給する。ＶＯＱ２２１とアービター２０７との間のＲＴを最小限にするために、スイッチコア２１０の各入力ポートにつき、ＶＯＱを有する小型バッファ２８１がスイッチコア２１０の近くに配置されている。これは、ラインカード２０２とスイッチポート２８０との間の損失の無い通信を可能にするラインカード対スイッチ（ＬＣＳ）プロトコルを用いて達成される。 As shown in FIG. 2, the system includes a switch core 210 and a plurality of line cards 202 provided at positions physically separated from the switch core 210. Each line card 202 includes an input VOQ buffer group (queue) 221 and an output buffer 222. The switch core 210 includes a plurality of port modules 280 (ie, “switch ports”), a parallel slice self-routing crossbar structure module 206, and a centralized arbiter module 207. Data packets are transmitted and received along the data link 231 between the line card 202 and the switch port 280 and the data link 203 between the switch port 280 and the crossbar routing structure 206. Each line card 202 includes a buffer group 221 that stores packets transmitted through the forward path and an output buffer 222 that stores packets of the return path. Control messages are transmitted and received along the control path 232 between the line card 202 and the switch port 280 and the control link 204 between the switch port 280 and the arbiter module 207. Arbiter 207 determines the appropriate configuration for each time slot and supplies the configuration to routing structure 207 along configuration link 212. In order to minimize the RT between the VOQ 221 and the arbiter 207, a small buffer 281 having a VOQ is arranged near the switch core 210 for each input port of the switch core 210. This is accomplished using a line card to switch (LCS) protocol that allows lossless communication between the line card 202 and the switch port 280.

特許文献１の手法の主な短所は、スイッチポート２８０で必要なのは、少量、即ち、１回のＲＴをカバーするのに十分な数のパケットのバッファリングのみであるにも関わらず、ラインカード２０２及びスイッチポート２８０の両方がバッファを含むことである。これらのバッファ付きスイッチポートは、コスト、カードスペース、電力、及び待ち時間（例えば、追加のＳｅｒＤｅｓ及びバッファリング）を増加させる。これらはまた、ラインカード２０２に既に存在する機能性を二重にするものである。 The main disadvantage of the technique of Patent Document 1 is that the switch port 280 requires only a small amount, that is, buffering a sufficient number of packets to cover one RT, and the line card 202. And switch port 280 both include buffers. These buffered switch ports increase cost, card space, power, and latency (eg, additional SerDes and buffering). They also duplicate the functionality already present in the line card 202.

たとえ特許文献１に記載されている手法を用いても、スイッチポート２８０とアービターモジュール２０７との間のラウンドトリップ時間を１セル時間より短くするのは実際上困難である。更に、（ラインカードからスイッチコアまでの長い距離をカバーするための）ラインカードからスイッチコアへの光リンク及び光ルーティング構造を備えるスイッチ構造の具体的なケースでは、光バッファは現在のところ実用的でないか又は経済的に実現可能ではないため、スイッチポート２８０は、更に、電気的／ＣＭＯＳチップでのバッファリングのための電気から光への及び光から電気への変換を要する。このような変換回路の追加は、システムのコスト及び複雑さをかなり高めることになる。 Even if the technique described in Patent Document 1 is used, it is practically difficult to make the round trip time between the switch port 280 and the arbiter module 207 shorter than one cell time. Furthermore, in the specific case of a switch structure with an optical link and optical routing structure from the line card to the switch core (to cover the long distance from the line card to the switch core), the optical buffer is currently practical. The switch port 280 further requires electricity-to-light and light-to-electrical conversion for buffering in the electrical / CMOS chip because it is not or economically feasible. The addition of such a conversion circuit significantly increases the cost and complexity of the system.

非特許文献１に示されている相互接続ネットワークの待ち時間を低減するための別の手法は、「先読み投機（speculation with lookahead）」を含むものである。非特許文献１に記載されているように、ルータのマッチングアービターは、先読み投機を用いて、入力ＶＯＱのキューを最初の要素（行頭、即ちＨｏＬ）よりも深く調べ、一部のスイッチリソースに、それらの後続のパケットに許可が与えられるという期待（希望）を前もって割り当てる。この手法は、ルータが何らかのマッチング及び設定タスクを並列して行えるようにすることにより、パイプラインのステージ数をできるだけ少なくすることを試みるものである。先読み投機は、キューに加えられて入力ＶＯＱ内に存在するパケット及び既に送信要求がなされアービターが直ちに対応できないパケットには利点があるが、検討されるべき送信要求がまだアービターに受信されていないパケット、及び／又は、入力ＶＯＱに到着したばかりのパケットの伝送速度を高めることはない。 Another technique for reducing the latency of the interconnect network shown in Non-Patent Document 1 includes “speculation with lookahead”. As described in Non-Patent Document 1, the router matching arbiter uses a look-ahead speculation to examine the queue of the input VOQ deeper than the first element (beginning of line, that is, HoL), and to some switch resources, Pre-assign an expectation (hope) that these subsequent packets will be granted permission. This approach attempts to minimize the number of pipeline stages by allowing the router to perform some matching and configuration tasks in parallel. A look-ahead speculation is advantageous for packets that are added to the queue and are present in the input VOQ and for packets that have already been requested for transmission and cannot be immediately addressed by the arbiter, but have not yet been received by the arbiter. And / or does not increase the transmission rate of packets that have just arrived at the input VOQ.

更に、先読み投機は、主にアービターのアルゴリズムにおける待ち時間に対処するものであり、通常はこれより大きい送信器からスイッチ構造までの伝送時間の待ち時間には対処しない。本願より前の、内部スイッチの速度増加及び軽いスイッチ負荷に頼って、スイッチのリソースをより多く投機的に割り当てる、二重及び三重の投機的実行の概念（例えば、非特許文献１の３１７頁を参照）は、ほとんどの適用例で失敗している。従来の厳密な非ブロッキングスイッチ構造の多くのものでは、内部構造が複数の連続したスイッチングステージに内部的に区分されている。二重及び三重の投機的実行では、これらのステージは投機的な負荷に対して増分的に設定される（割り当てられる）。これらの手法では、速度増加が極端又は負荷が軽い場合にのみ投機的実行を行い、多ステージ構造の全体を通る送信を許可することに通常は成功する。負荷が増大すると、この投機的割り当ての手法は、良好に調停された要求に割り当てた方がよいリソースを無駄に予約するため、性能を損なう。 Furthermore, the look-ahead speculation mainly deals with latency in the arbiter algorithm and usually does not deal with latency in transmission time from the larger transmitter to the switch structure. Prior to the present application, the concept of double and triple speculative execution (see page 317 of Non-Patent Document 1, for example) that relies on increased internal switch speed and light switch load to allocate more switch resources speculatively. ) Has failed in most applications. In many conventional strict non-blocking switch structures, the internal structure is internally partitioned into a plurality of successive switching stages. In double and triple speculative execution, these stages are incrementally set (assigned) to speculative loads. These approaches typically perform speculative execution only when the speed increase is extreme or lightly loaded, and are usually successful in allowing transmission through the entire multi-stage structure. As the load increases, this speculative allocation approach impairs performance because it reserves resources that are better allocated to well-arbited requests.

更に、上述のシステムの全てには、依然として、ラインカードが出力リソースに対する要求を提出してから許可が到着するまで待たなければならない最初のＲＴ待ち時間の問題がある。 Furthermore, all of the systems described above still have the initial RT latency problem that the line card must wait for a grant to arrive after submitting a request for output resources.

従って、当該技術分野には、相互接続システムのルーティング構造に関連する待ち時間を低減するための、より効率的で、さほど複雑ではなく、低コストの方法の必要性が残っている。
米国特許第６６４７０１９号明細書 W. J. Dally et al., “Principles and Practices of Interconnection Networks,” Morgan Kaufman, 2004, pp. 316-318 Thus, there remains a need in the art for a more efficient, less complex and lower cost method for reducing the latency associated with the routing structure of an interconnect system.
US Pat. No. 6,647,019 WJ Dally et al., “Principles and Practices of Interconnection Networks,” Morgan Kaufman, 2004, pp. 316-318

従って、本発明は、関連技術の限界及び欠点に起因する短所や問題の１つ以上を実質的に取り除く方法及び装置の提供を目的とするものである。 Accordingly, it is an object of the present invention to provide a method and apparatus that substantially eliminates one or more of the disadvantages and problems resulting from the limitations and drawbacks of the related art.

本発明は、ルーティング構造を用いるシステムにおいて計画的調停によって生じる全体的な待ち時間を低減する、情報単位を伝送するためのシステム及び方法を含む。この方法は、計画的調停を行って又は行わずに複数の発信元からルーティング構造の複数の入力にデータパケットが送信され、入力パケットを指定されたルーティング構造の出力にスイッチングする、スイッチング構造を含むシステムに特に適している。 The present invention includes systems and methods for transmitting information units that reduce the overall latency caused by planned arbitration in a system that uses a routing structure. The method includes a switching structure in which data packets are transmitted from multiple sources to multiple inputs of a routing structure with or without planned arbitration, and the input packets are switched to the output of a specified routing structure. Especially suitable for the system.

本発明の１つの態様において、１つの方法は、計画的調停の結果が存在する場合には、事前の調停（即ち、計画的調停）の結果に従って、少なくとも１つのデータパケットを送信し、計画的調停の結果が存在しない場合には、投機的に送信するデータパケットを選択する工程を含む。この方法は、選択されたデータパケットの出力識別子を含む投機的要求を発行することと、選択されたパケットをルーティング構造に送信することとを同時に行う工程も含む。 In one aspect of the present invention, a method transmits at least one data packet according to a result of prior arbitration (ie, planned arbitration), if planned arbitration results exist, If there is no arbitration result, the method includes a step of selecting a data packet to be speculatively transmitted. The method also includes simultaneously issuing a speculative request that includes the output identifier of the selected data packet and sending the selected packet to the routing structure.

本発明の別の態様において、計画的（事前の）調停を行って又は行わずに、ルーティング構造の複数の入力から複数の出力にデータパケットを送信するシステムは、事前調停の結果が存在する場合に、計画的調停の結果に従って、少なくとも１つのデータパケットを送信する機構と、計画的調停の結果が存在しない場合に、投機的に送信するデータパケットを選択し、選択されたデータパケットの出力識別子を含む投機的要求を発行することと、選択されたパケットをルーティング構造に送信することとを同時に行う機構とを含む。 In another aspect of the invention, a system for sending data packets from multiple inputs to multiple outputs of a routing structure with or without planned (pre-) arbitration, where pre-arbitration results exist A mechanism for transmitting at least one data packet according to the result of the planned arbitration, and a data packet to be transmitted speculatively when the result of the planned arbitration does not exist, and an output identifier of the selected data packet And a mechanism for simultaneously issuing a speculative request including, and transmitting the selected packet to the routing structure.

本発明の別の態様は、複数の位置間の情報単位の流れを管理するシステムを含む。このシステムは、複数の発信元位置から複数の宛先位置に情報単位を送信するためのスイッチング構造への複数のアクセス要求を受信する複数の入力を含む。このアクセス要求は、計画的調停によるアクセス許可の要求と、計画的調停による許可がない投機的アクセスの要求とを含む。各要求は、関連付けられた情報単位を転送すべき宛先位置の標識も含む。システムは、計画的調停によるアクセス許可の要求のグループ又はサブグループに対する、スイッチング構造のリソースの衝突が生じない割り当てを決定するアービターを含む。システムは、決定がなされたグループがスイッチング構造に次に適用されるべきものである時に、投機的アクセスの要求を受信し、決定された割り当てに基づき受信した投機的要求を許可又は拒否する投機的アービターを含む。 Another aspect of the present invention includes a system for managing the flow of information units between multiple locations. The system includes a plurality of inputs for receiving a plurality of access requests to a switching structure for transmitting information units from a plurality of source locations to a plurality of destination locations. This access request includes a request for access permission by planned arbitration and a request for speculative access without permission by planned arbitration. Each request also includes an indication of the destination location to which the associated information unit should be transferred. The system includes an arbiter that determines assignments to the group or subgroup of permission requests through planned arbitration that do not cause a conflict of switching structure resources. The system receives a request for speculative access when the group in which the decision is made is to be applied next to the switching structure and grants or denies the received speculative request based on the determined assignment. Including arbiter.

本発明の更に別の態様は、最終的に許可が拒否された場合には無駄になる事前割り当て又はスイッチングリソースの予約を必要とせずに、入力キューに到着したばかりのデータパケットの伝送速度の増加を容易にするものである。 Yet another aspect of the present invention is to increase the transmission rate of data packets that have just arrived at the input queue without requiring pre-allocation or reservation of switching resources that would be wasted if the grant was ultimately denied. To make it easier.

本発明の別の態様は、パケットの発信元とルーティング構造との間のラウンドトリップ伝送の量を低減することにより、平均的な待ち時間を低減することを含む。例えば、１つのパケットの投機的送信が成功すると、ルーティング構造を通してパケットを送るために必要な少なくとも１つのラウンドトリップ（即ち、要求の送信から許可を受信するまで）が省かれる。 Another aspect of the present invention includes reducing the average latency by reducing the amount of round trip transmission between the packet source and the routing structure. For example, a successful speculative transmission of a packet eliminates at least one round trip (ie, from sending a request until receiving a grant) required to send the packet through the routing structure.

本発明の更なる態様及び長所は、以下の記載で述べられ、その一部はその記載から自明であり、或いは本発明の実施から分かり得るものである。本発明の態様及び長所は、本願明細書及び特許請求の範囲並びに添付の図面において特に指摘されるシステム及び方法によって実現され達成される。 Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned from the practice of the invention. The aspects and advantages of the present invention will be realized and attained by the system and method particularly pointed out in the written description and claims hereof as well as the appended drawings.

上述の概括的な説明及び以下の詳細な説明は例示的なものに過ぎず、特許請求される本発明を限定するものではないことを理解されたい。 It should be understood that the foregoing general description and the following detailed description are exemplary only and are not restrictive of the invention as claimed.

本発明の更なる理解を提供するために含まれ、本願明細書に組み込まれてその一部を構成する添付の図面は、本発明の原理を記載と共に説明する役割を果たす本発明の例示的な実施形態を示す。 The accompanying drawings, which are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, are illustrative of the invention that serve to describe and explain the principles of the invention. An embodiment is shown.

以下、本発明の上記態様及び他の態様を、添付の図面に示されている例に関連してより詳細に述べる。 These and other aspects of the invention will now be described in more detail in connection with the examples illustrated in the accompanying drawings.

関連技術に関して上述したように、中央集中型調停構成は高い最大スループットを達成するが、特に低〜中程度の使用率では、調停待ち時間の不利益を招く。本発明は、投機的送信（ＳＴＸ）と呼ぶ新規な種類の投機的実行を用いて、そのような待ち時間に対処する。本願明細書に記載するＳＴＸの概念は、ＡＬＯＨＡ及びイーサネットの背後の基本的な概念に関係するものであり、その概念では、事前の調停なしに（即ち、所与の伝送の成功を見越して）パケットがシェアードメディアを介して伝送される。しかし本発明では、調停待ち時間を解消するために、この概念をスイッチドメディアの文脈に適用する。投機的送信のみを用いてパケットスイッチを動作させると、頻繁なパケット衝突が生じ、その結果、中〜高程度の使用率では壊滅的な性能悪化を生じるので、本発明は、投機的送信と調停とを組み合わせて、両方の概念の長所、即ち、高い最大スループットと、低〜中程度の使用率における低い待ち時間とを得る。 As described above with respect to the related art, centralized arbitration configurations achieve high maximum throughput, but at low to moderate usage rates, there is a penalty for arbitration latency. The present invention addresses such latency using a new type of speculative execution called speculative transmission (STX). The STX concept described herein relates to the basic concept behind ALOHA and Ethernet, which does not require prior arbitration (ie, in anticipation of a given transmission success). Packets are transmitted over shared media. However, in the present invention, this concept is applied to the context of switched media in order to eliminate arbitration latency. When operating a packet switch using only speculative transmissions, frequent packet collisions occur, resulting in catastrophic performance degradation at moderate to high usage rates. In combination to obtain the advantages of both concepts: high maximum throughput and low latency at low to moderate utilization.

本発明は、ルーティング構造（ファブリック）の近くに追加のバッファ・ステージを実装せずに、ルーティング構造の調停と組み合わせて、事前の調停を行わずにＳＴＸを用いて、調停待ち時間を低減する。更に、本発明は、パケット持続時間より（遥かに）大きいラウンドトリップに効果的に対処できるような、投機的処理の非常に深いパイプライン化を可能にする。本発明は、スイッチング構造における調停によって生じる平均的な待ち時間を実質的に低減又は解消することにより、全体的な待ち時間を５０％又はそれより低く減少させることができる。 The present invention reduces arbitration latency using STX without prior arbitration, in combination with routing structure arbitration, without implementing additional buffer stages near the routing structure (fabric). Furthermore, the present invention allows a very deep pipeline of speculative processing that can effectively deal with round trips that are (much) greater than the packet duration. The present invention can reduce the overall latency by 50% or less by substantially reducing or eliminating the average latency caused by arbitration in the switching structure.

しばしば無駄になるスイッチリソースの事前割り当てを含む従来技術の投機的実行方式とは対照的に、本願明細書で説明するルーティング構造の効率的な非ブロッキング型の単一ステージ送信の性質は、そのようなリソースの事前割り当てを行わない。換言すれば、構造（ファブリック）リソースは実際には投機的に設定されないので、本発明は、スイッチング構造に単一ステージセレクタを実装することにより、スイッチリソースの無駄を回避する。 In contrast to prior art speculative execution schemes that involve pre-allocation of switch resources that are often wasted, the nature of the efficient non-blocking single stage transmission of the routing structure described herein is such Do not pre-allocate resources. In other words, since the structure (fabric) resources are not actually speculatively set, the present invention avoids waste of switch resources by implementing a single stage selector in the switching structure.

本発明は、ＳＴＸ選択ポリシー、衝突、再送、順序が乱れた配信、再シーケンス化、二重配信、再送及び再シーケンス化のウィンドウサイズ決定に関係する問題に対処する。しかし、本発明の任意の特定の適用例に関するスイッチング構造に関係する他の問題及び特殊性も、当業者には容易にわかることを理解されたい。 The present invention addresses issues related to STX selection policy, collisions, retransmissions, out-of-order delivery, resequencing, duplex delivery, resending and resequencing window size determination. However, it should be understood that other problems and particularities related to the switching structure for any particular application of the present invention will be readily apparent to those skilled in the art.

図３は、本発明による例示的なシステムを示す。図３に示されるように、このシステムは、Ｎ個のラインカード３０２に接続されたＮ個の双方向全二重型入力/出力データリンク３０１を備える。ペイロード情報を有するデータパケット及び要求されたパケットの宛先を示す情報を含むヘッダーは、データリンク３０１を介して送受信される。各ラインカード３０２は、双方向全二重型構造内データリンク３０３を介してスイッチコア３１０に接続されている。スイッチコア３１０は、Ｎ個の入力ポート及びＮ個の出力ポートを有するクロスバーとして例示されているルーティング構造３０６を有する。各ラインカード３０２は、ラインカード３０２とアービター３０７との間で制御メッセージ（例えば、要求及び許可を含む）をやりとりする専用双方向制御リンク３０４で、中央集中型割り当て／調停部３０７にも接続されている。アービター３０７は、構成リンク３１２を介してクロスバー３０６に接続される。各ラインカード２は、入力バッファ部３２１及び出力バッファ部３２２を更に有する。以下、アービター３０７、入力バッファ部３２１及び出力バッファ部３２２の構成及び機能を、図４〜図７を参照して更に詳細に述べる。 FIG. 3 illustrates an exemplary system according to the present invention. As shown in FIG. 3, the system includes N bidirectional full-duplex input / output data links 301 connected to N line cards 302. A data packet having payload information and a header including information indicating the destination of the requested packet are transmitted / received via the data link 301. Each line card 302 is connected to the switch core 310 via a bidirectional full-duplex intra-structure data link 303. The switch core 310 has a routing structure 306, illustrated as a crossbar having N input ports and N output ports. Each line card 302 is also connected to a centralized allocation / arbiter 307 with a dedicated bi-directional control link 304 that exchanges control messages (eg, including requests and permissions) between the line card 302 and the arbiter 307. ing. Arbiter 307 is connected to crossbar 306 via configuration link 312. Each line card 2 further includes an input buffer unit 321 and an output buffer unit 322. Hereinafter, the configurations and functions of the arbiter 307, the input buffer unit 321 and the output buffer unit 322 will be described in more detail with reference to FIGS.

本発明は、他のタイプのルーティング構造と共に用いられてもよい。しかし、本発明は、クロスバーに基づくスイッチに特に適している。従って、好ましい例を示す目的で、図３に示されているシステムを、クロスバーに基づくスイッチを含むスイッチコアに関連して説明するが、当業者には、本発明が多くのスイッチング構造アーキテクチャに適用可能であることがわかるであろう。 The present invention may be used with other types of routing structures. However, the invention is particularly suitable for switches based on crossbars. Thus, for purposes of illustrating a preferred example, the system shown in FIG. 3 will be described in the context of a switch core that includes a crossbar-based switch, but those skilled in the art will recognize that the present invention has many switching architecture architectures. You will see that it is applicable.

クロスバー型スイッチの特性は、常に、１つの入力は１つの出力のみに接続可能であり、１つの出力は１つの入力のみに接続可能なことである。即ち、スイッチの入力と出力とは一対一でマッチングする。待ち時間及びスループットに関する良好な性能を得るために、マッチングは、一般的に、中央集中型割り当て／調停部によって計算される。更に、割り当て／調停部は、スイッチの１つの出力リソースをめぐる競争を解決する。図３に示されている例示的なシステムでは、アービター３０７は、これらの割り当て及び調停機能を実行する。図３では、アービター３０７は単一のユニットとして示されているが、割り当て及び調停機能は、プログラム命令と連動して動作する２つ以上のハードウェア装置及び／又は１つ以上のハードウェア装置によって実行されてもよいことを理解されたい。 A characteristic of a crossbar switch is that one input can always be connected to only one output, and one output can be connected to only one input. That is, the input and output of the switch are matched one to one. In order to obtain good performance in terms of latency and throughput, matching is typically calculated by a centralized assignment / arbiter. Furthermore, the allocation / arbiter resolves the competition for one output resource of the switch. In the exemplary system shown in FIG. 3, arbiter 307 performs these allocation and arbitration functions. In FIG. 3, the arbiter 307 is shown as a single unit, but the allocation and arbitration functions are performed by two or more hardware devices and / or one or more hardware devices operating in conjunction with program instructions. It should be understood that it may be performed.

アービター３０７は、ラインカード３０２からの出力リソース要求を受信する。要求は、その要求を発信したラインカード３０２が特定の出力ポートへのパケットの伝送を望んでいることを示す出力ポート識別子を含む。アービター３０７は、受信した要求に基づき、入力ポートと出力ポートとの適切な一対一マッチングを計算する。これは、二部グラフマッチング問題に相当する。アービター３０７は、計算されたマッチングに基づき、対応する許可をラインカード３０２に返す。許可は、この許可を受信したラインカード３０２が特定の出力ポートにパケットを伝送できることを意味する出力ポート識別子を含む。オプションとして、特定の要求に対して使用可能な出力リソースが無い場合には、アービター３０７は、拒否された要求に対応する応答を返してもよい。ラインカード３０２が許可を受信すると、ラインカード３０２は、入力バッファ部３２１の対応するＶＯＱから１つのパケットを取り出し、そのパケットをデータリンク３０３上でクロスバー３０６に送信する。クロスバー３０６は、アービター３０７によって計算され構成リンク３１２を介して適用された構成（即ち、マッチング）に従って、着信したパケットをデータリンク３０３にルーティングする。 The arbiter 307 receives the output resource request from the line card 302. The request includes an output port identifier indicating that the line card 302 that originated the request wants to transmit a packet to a particular output port. The arbiter 307 calculates an appropriate one-to-one matching between the input port and the output port based on the received request. This corresponds to the bipartite graph matching problem. The arbiter 307 returns a corresponding permission to the line card 302 based on the calculated matching. The permission includes an output port identifier which means that the line card 302 that has received this permission can transmit a packet to a specific output port. Optionally, if no output resource is available for a particular request, arbiter 307 may return a response corresponding to the rejected request. When the line card 302 receives permission, the line card 302 extracts one packet from the corresponding VOQ of the input buffer unit 321 and transmits the packet to the crossbar 306 on the data link 303. Crossbar 306 routes incoming packets to data link 303 according to the configuration (ie, matching) computed by arbiter 307 and applied via configuration link 312.

図４は、入力バッファ部３２１をより詳細に示す図である。入力バッファ部３２１は、入力デマルチプレクサ４１０、各出力に対して１つずつ対応する複数の仮想出力キュー（ＶＯＱ）４１１、各出力に対して１つずつ対応する複数の再送（ＲＴＸ）キュー４１２、各ＶＯＱ４１１に対して１つずつ対応する複数のデマルチプレクサ４１３（これらのデマルチプレクサは、実際には、ルーティング機能に加えて複製機能も有する）、各ＶＯＱ４１１に対して１つずつ対応する複数のマルチプレクサ４１４、出力マルチプレクサ４１５、及び制御部４１６を含む。 FIG. 4 is a diagram showing the input buffer unit 321 in more detail. The input buffer unit 321 includes an input demultiplexer 410, a plurality of virtual output queues (VOQ) 411 corresponding to each output, a plurality of retransmission (RTX) queues 412 corresponding to each output, A plurality of demultiplexers 413 corresponding to one for each VOQ 411 (these demultiplexers actually have a duplication function in addition to the routing function), and a plurality of multiplexers corresponding to one for each VOQ 411 414, an output multiplexer 415, and a control unit 416.

動作においては、パケットは、入力バッファ部３２１に接続された入力データリンク３０１上で到着し、デマルチプレクサ４１０は、着信したパケットを、それぞれの宛先（出力ポート）に従って、対応するＶＯＱ４１１にルーティングする。パケットの到着は、制御部４１６にも伝えられる。制御部４１６は、アービター部３０７に要求を送信し、アービター部３０７から制御チャネル３０４を介して許可を受信する。高信頼性配信方式のためにシーケンス番号が必要な場合には、制御部４１６は、同じ出力を宛先とする後続のパケットに後続のシーケンス番号を割り当てる。計画的調停要求及び投機的要求、並びに許可及び応答は全て、出力識別子を含む。投機的要求、許可及び応答は、更に、シーケンス番号を含み得る。例えば、特定のタイプの高信頼性配信（ＲＤ）方法（その一部については後述する）は、計画的調停の結果として伝送されたデータパケット及び投機的に伝送されたデータパケットの各々がシーケンス番号を有することを必要とし得る。 In operation, a packet arrives on the input data link 301 connected to the input buffer unit 321, and the demultiplexer 410 routes the received packet to the corresponding VOQ 411 according to each destination (output port). The arrival of the packet is also transmitted to the control unit 416. The control unit 416 transmits a request to the arbiter unit 307 and receives permission from the arbiter unit 307 via the control channel 304. When a sequence number is required for the reliable delivery method, the control unit 416 assigns a subsequent sequence number to subsequent packets destined for the same output. Planned mediation requests and speculative requests, as well as grants and responses, all include an output identifier. The speculative request, grant, and response may further include a sequence number. For example, certain types of reliable delivery (RD) methods (some of which will be described later) have a sequence number for each of a data packet transmitted as a result of planned arbitration and a speculatively transmitted data packet. It may be necessary to have

このシステムは、２つの送信モード、即ち、調停送信（即ち、計画的調停によって生じる送信）及び投機的送信（ＳＴＸ）を有する。計画的調停送信モードは従来技術で公知である。例えば、ミンケンバーグら（C. Minkenberg et al.），ＩＰ．ｃｏｍ，２００４年６月３日，記事番号ＩＰＣＯＭ００００２８８１５Ｄを参照されたい（その全内容をここに参照して組み込む）。本発明は、投機的送信モード及び、計画的調停送信モードと投機的送信モードとの間の相互作用に対応する。これは、計画的調停要求及びそれに対応する許可と、投機的要求及びそれに対応する肯定的及び否定的応答とを区別することによって達成される。 The system has two transmission modes: arbitrated transmission (ie, transmission caused by planned arbitration) and speculative transmission (STX). The planned arbitration transmission mode is known in the prior art. For example, C. Minkenberg et al., IP. com, June 3, 2004, article number IPCOM00000028815D (the entire contents of which are incorporated herein by reference). The present invention addresses speculative transmission mode and the interaction between planned arbitration transmission mode and speculative transmission mode. This is accomplished by distinguishing between planned arbitration requests and corresponding grants and speculative requests and corresponding positive and negative responses.

図４に戻ると、好ましい実施形態において、ラインカード３０２は、所与のタイムスロットにおける計画的調停送信に対するいかなる許可も受信していない場合にのみ、投機的送信を行う資格がある。許可が受信されていない時は、制御部４１６は、所与のポリシーに従って、空ではないＶＯＱ４１１を選択し、そこから、投機的送信のために１つのパケットが取り出される。このポリシーは、例えば、ランダム、ラウンド・ロビン、最も早く到着したセルを優先（ＯＣＦ：oldest cell first）、最も遅く到着したセルを優先（ＹＣＦ：youngest cell first）等であり得る。好ましい実施形態では、ポリシーは、待ち時間のゲインが最大になるよう選択される。例えば、最も遅く到着したセルを優先等といった、より新しい到着を優先するポリシーでは、待ち時間のゲインが最大になる傾向がある。 Returning to FIG. 4, in the preferred embodiment, the line card 302 is eligible to make a speculative transmission only if it has not received any permission for a planned arbitration transmission in a given time slot. When no grant has been received, the controller 416 selects a non-empty VOQ 411 according to a given policy, from which one packet is retrieved for speculative transmission. This policy can be, for example, random, round robin, preferred first cell (OCF), oldest cell first (YCF), youngest cell first (YCF), etc. In the preferred embodiment, the policy is chosen to maximize latency gain. For example, a policy that prioritizes newer arrivals, such as prioritizing the cell that has arrived the latest, tends to maximize the latency gain.

選択されたパケットは、マルチプレクサ４１３を介して再送バッファ４１２に格納され、マルチプレクサ４１３、４１４、４１５を介してデータリンク３０３上をクロスバー３０６に向かって伝送される。同時に、制御部４１６は、制御リンク３０４を介してアービター３０７に、出力識別子を含む対応する投機的送信要求を送信する。パケットを再送バッファ４１２に格納する目的は、投機的送信が失敗した場合にパケットの再送を可能にするためである。ラインカード３０２は、許可を求める計画的調停要求と投機的要求とを同時に発行してもよい。好ましい実施形態では、再送キュー４１２に格納されたパケットは、既に一旦投機的に送信されているので、投機的送信の資格はない。即ち、潜在的な待ち時間の利点が既に失われている可能性が高いので、失敗したＳＴＸパケットを投機的に再送信することには利点はほとんど又は全くない。 The selected packet is stored in the retransmission buffer 412 via the multiplexer 413, and transmitted to the crossbar 306 on the data link 303 via the multiplexers 413, 414, and 415. At the same time, the control unit 416 transmits a corresponding speculative transmission request including the output identifier to the arbiter 307 via the control link 304. The purpose of storing the packet in the retransmission buffer 412 is to enable retransmission of the packet when speculative transmission fails. The line card 302 may issue a planned arbitration request and a speculative request for permission at the same time. In the preferred embodiment, packets stored in the retransmission queue 412 are not eligible for speculative transmission because they have already been transmitted speculatively. That is, there is little or no benefit in speculatively retransmitting a failed STX packet, since the potential latency advantage is likely already lost.

ラインカード３０２が、所与のタイムスロットにおける所与の出力に対する計画的調停による許可を受信した場合には、このラインカード３０２は、現在のタイムスロットにおいて、この出力にパケットを送信する資格がある。制御部は、まず、対応する再送キュー４１２の使用率をチェックする。対応する再送キュー４１２が空でない場合には、この再送キュー４１２から１つのパケットが取り出され、マルチプレクサ４１４及び４１５を介して、データリンク３０３上をクロスバー３０６へと再送される。再送キュー４１２が空である場合には、制御部は、対応するＶＯＱ４１１の使用率をチェックする。再送キュー４１１が空でない場合には、このＶＯＱ４１１から１つのパケットが取り出され、マルチプレクサ４１３、４１４及び４１５を介して、データリンク３０３上をクロスバー３０６へと伝送される。許可が到着した場合には、ＲＴＸキュー４１２内で待機しているパケットは、通常、対応するＶＯＱ４１１内のパケットよりも優先されることを理解されたい。これは、失敗したＳＴＸパケットが迅速にその宛先に配信されることを確実にするためである。しかし、優先順位は、キュー４１１内に待機又は到着しているデータの性質（例えば、スピーチやビデオ等の待ち時間に敏感な着信データ）等といった他の要因にも依存し得る。投機的動作モードのせいで、空のＶＯＱ４１１に対して許可が到着することがあり得る。このような許可は、無駄になると見なされる。好ましい実施形態では、ラインカード３０２は、無駄になる許可を受信した任意のタイムスロットにおいて、投機的パケットを送信する資格がある。 If the line card 302 receives a grant with planned arbitration for a given output in a given time slot, the line card 302 is eligible to send a packet to this output in the current time slot. . The control unit first checks the usage rate of the corresponding retransmission queue 412. If the corresponding retransmission queue 412 is not empty, one packet is extracted from the retransmission queue 412 and retransmitted to the crossbar 306 over the data link 303 via the multiplexers 414 and 415. When the retransmission queue 412 is empty, the control unit checks the usage rate of the corresponding VOQ 411. When the retransmission queue 411 is not empty, one packet is extracted from the VOQ 411 and transmitted to the crossbar 306 over the data link 303 via the multiplexers 413, 414 and 415. It should be understood that if a grant arrives, a packet waiting in the RTX queue 412 is usually prioritized over a packet in the corresponding VOQ 411. This is to ensure that failed STX packets are quickly delivered to their destination. However, the priority may also depend on other factors such as the nature of the data waiting or arriving in the queue 411 (eg, incoming data sensitive to latency such as speech or video). Because of the speculative mode of operation, permission may arrive for an empty VOQ 411. Such permission is considered useless. In the preferred embodiment, the line card 302 is eligible to send speculative packets in any time slot that receives a useless grant.

以下に説明する本発明の例示的な実施形態では、ラインカード３０２が応答を受信した場合に、ラインカード３０２は、応答に含まれるシーケンス番号を有するパケットが、同じく応答に含まれる出力識別子によって示される再送キュー４１２内に存在するか否かをチェックする。このパケットが存在しない場合には、応答は無視される。このパケットが存在する場合には、本発明の特定の適用例（例えば、高信頼性配信方式を統合した適用例（「高信頼性配信」の節を参照））に応じたアクションがとられる。例えば、選択的再試行（ＳＲ：Selective Retry）を用いる例示的な実施形態では、応答されたパケット（存在する場合）は、再送キュー４１２内のどの位置からも取り出される。別の例示的な実施形態（例えば、Ｇｏ−Ｂａｃｋ−Ｎ（ＧＢＮ）、ストップ・アンド・ウェイト（Ｓ＆Ｗ））では、応答されたパケットは、それが再送キュー（ＧＢＮ）又はＶＯＱ（Ｓ＆Ｗ）の先頭にある場合にのみ取り出される。 In the exemplary embodiment of the invention described below, when the line card 302 receives a response, the line card 302 indicates that the packet having the sequence number included in the response is indicated by the output identifier also included in the response. It is checked whether or not it exists in the retransmission queue 412. If this packet does not exist, the response is ignored. When this packet exists, an action is taken according to a specific application example of the present invention (for example, an application example in which a high-reliability distribution method is integrated (see the section “Reliable distribution”)). For example, in an exemplary embodiment using Selective Retry (SR), the responded packet (if any) is retrieved from any location in the retransmission queue 412. In another exemplary embodiment (e.g., Go-Back-N (GBN), Stop and Wait (S & W)), the replied packet is the head of the retransmission queue (GBN) or VOQ (S & W). Only taken out if

出力で再シーケンス化を行わない場合
図５は、再シーケンス化が行われない、ラインカード３０２の出力バッファ部３２２ａの例を示す。バッファ部３２２ａは、ストップ・アンド・ウェイトやＧｏ−Ｂａｃｋ−Ｎ（後述する）等の再シーケンス化を必要としないＲＤ方式と共に用いられ得る。出力バッファ部３２２ａは、エンキュー部５２１と、出力キュー５２６と、デキュー部５２７と、制御部５２８とを含む。 FIG. 5 shows an example of the output buffer unit 322a of the line card 302 that is not re-sequenced. The buffer unit 322a can be used together with an RD method that does not require re-sequencing such as stop-and-wait and Go-Back-N (described later). The output buffer unit 322 a includes an enqueue unit 521, an output queue 526, a dequeue unit 527, and a control unit 528.

動作においては、パケットは、データリンク３０３に沿って出力バッファ部３２２ａに到着する。制御部５２８は、パケットのヘッダーを調べ、エンキュー部５２１が、そのパケットを出力キュー５２６に格納すべきか又は落とすべきかを決定する。ストップ・アンド・ウェイト又はＧｏ−Ｂａｃｋ−ＮのＲＤ方式が用いられる場合には、パケットは正しい順序でのみ受け入れられる。しかし、たとえストップ・アンド・ウェイトを用いても、二重配信は生じ得るものであり、一方、ＧＢＮを用いても、二重配信や順序が乱れた配信が生じ得るものであることを理解されたい。従って、制御部５２８は、各入力に対する次の期待されるシーケンス番号を維持する。制御部５２８は、受信したパケットのシーケンス番号が、対応する入力に対する次の期待されるシーケンス番号と等しいか否かをチェックする（例えば、最新の良好に受信したパケットのシーケンス番号＋１）。肯定された場合（即ち、期待されるシーケンス番号と受信したシーケンス番号が等しい場合）には、そのパケットは、出力キュー５２６に加えられる。否定された場合には、そのパケットは、二重配信（例えば、シーケンス番号が低すぎる）又は順序が乱れた配信（例えば、シーケンス番号が高すぎる）であるために落とされる。 In operation, the packet arrives at the output buffer unit 322 a along the data link 303. The control unit 528 examines the header of the packet and determines whether the enqueue unit 521 should store or drop the packet in the output queue 526. If a stop-and-wait or Go-Back-N RD scheme is used, packets are accepted only in the correct order. However, it is understood that even if stop-and-wait is used, double delivery can occur, while even using GBN, double delivery or out-of-order delivery can occur. I want. Accordingly, the controller 528 maintains the next expected sequence number for each input. The control unit 528 checks whether the sequence number of the received packet is equal to the next expected sequence number for the corresponding input (for example, the sequence number of the latest successfully received packet + 1). If affirmed (ie, if the expected sequence number and the received sequence number are equal), the packet is added to the output queue 526. If not, the packet is dropped due to dual delivery (eg, sequence number too low) or out of order delivery (eg, sequence number too high).

デキュー部５２７は、出力キュー５２６にパケットが存在する場合には、出力キュー５２６からパケットを取り出し、そのパケットを外部のデータリンク３０１上で伝送する。 When there is a packet in the output queue 526, the dequeue unit 527 extracts the packet from the output queue 526 and transmits the packet on the external data link 301.

出力で再シーケンス化を行う場合
図６ａは、ラインカード３０２の出力バッファ部３２２ｂの別の例示的な実施形態を示す。この実施形態は、選択的再試行（ＳＲ）等といった、再シーケンス化を必要とするＲＤ方式と共に用いられるべきである。出力バッファ部３２２ｂは、デマルチプレクサ６２１と、デマルチプレクサ６２２と、各入力に対して１つずつ対応する複数の再シーケンス化（ＲＳＱ）キュー６２３と、マルチプレクサ６２４と、マルチプレクサ６２５と、出力キュー６２６と、デキュー部６２７と、制御部６２８とを有する。 When Re-Sequencing with Output FIG. 6 a shows another exemplary embodiment of the output buffer 322 b of the line card 302. This embodiment should be used with RD schemes that require resequencing, such as selective retry (SR). The output buffer unit 322b includes a demultiplexer 621, a demultiplexer 622, a plurality of resequencing (RSQ) queues 623 corresponding to each input, a multiplexer 624, a multiplexer 625, and an output queue 626. , A dequeue unit 627 and a control unit 628.

動作においては、出力バッファ部３２２ｂは、データリンク３０３上のパケットを受信する。制御部６２８は、到着しているパケットが順序通りであるか否か（即ち、そのシーケンス番号が、対応する入力に対する次の期待されるシーケンス番号と等しいか否か）をチェックする。肯定された場合には、そのパケットは、デマルチプレクサ６２１及びマルチプレクサ６２５を介してルーティングされ、直接、出力キュー６２６に加えられ、期待される次のシーケンス番号が増分される。次の期待される番号の値は、選択された特定のタイプのシーケンス化方式によって異なることを理解されたい。例えば、以下に説明する例示的な一実施形態では、パケットのシーケンス番号は、期待される順序の次のパケット毎に１ずつ増分される整数を用いてつけられる。当然ながら、再シーケンス化は、シーケンス順に数が減分される方式、及び／又は、パケットのシーケンス番号が１以外の値で増分若しくは減分される方式等といった、他の多くのシーケンス化方式の任意のものを用いて行われ得る。パケットの順序が乱れて到着した場合（例えば、そのシーケンス番号が、対応する入力に対する次の期待されるシーケンス番号より大きい場合）には、そのパケットは、デマルチプレクサ６２１及びデマルチプレクサ６２２を介して、その入力に対応する再シーケンス化キュー６２３にルーティングされる。従って、再シーケンス化キュー６２５は、順序が乱れて到着した全てのパケットを格納する。これらのパケットは、先行する全てのパケットが正しく受信された場合にのみ、出力キュー６２６に進むことが可能である。そのパケットが二重配信である場合（例えば、そのシーケンス番号が、対応する入力に対する次の期待されるシーケンス番号より小さい場合）には、そのパケットは落とされる。再シーケンス化キュー内の二重配信されたパケットは、このポリシーに従って落とされるのではなく、既に格納されている同じパケットに単に上書きされる。 In operation, the output buffer unit 322 b receives a packet on the data link 303. The control unit 628 checks whether the arriving packet is in order (ie, whether its sequence number is equal to the next expected sequence number for the corresponding input). If so, the packet is routed through demultiplexer 621 and multiplexer 625 and added directly to output queue 626, incrementing the expected next sequence number. It should be understood that the next expected number value depends on the particular type of sequencing scheme selected. For example, in one exemplary embodiment described below, the packet sequence number is assigned using an integer that is incremented by 1 for each next packet in the expected order. Of course, resequencing can be used in many other sequencing schemes, such as a scheme in which the number is decremented in sequence order and / or a scheme in which the sequence number of a packet is incremented or decremented by a value other than 1. This can be done with any one. If a packet arrives out of order (eg, if its sequence number is greater than the next expected sequence number for the corresponding input), the packet is routed via demultiplexer 621 and demultiplexer 622. Routed to the resequencing queue 623 corresponding to that input. Accordingly, the resequencing queue 625 stores all packets that arrive out of order. These packets can go to the output queue 626 only if all previous packets are correctly received. If the packet is dual delivery (eg, if its sequence number is less than the next expected sequence number for the corresponding input), the packet is dropped. Doubly delivered packets in the resequencing queue are not dropped according to this policy, but simply overwrite the same packets already stored.

順序通りのパケットが到着すると常に、制御部６２８は、この順序通りの到着によって、対応する再シーケンス化キュー６２３からのパケットの取り出しが可能になるか否かをチェックする。肯定された場合には、このようなパケットの１つ以上が再シーケンス化キュー６２３から取り出され、マルチプレクサ６２４及びマルチプレクサ６２５を介してルーティングされ、出力キュー６２６に加えられると共に、対応する入力に対する次の期待されるシーケンス番号が更新される。 Whenever an in-order packet arrives, the control unit 628 checks whether the in-order arrival enables the retrieval of the packet from the corresponding resequencing queue 623. If so, one or more of such packets are removed from the resequencing queue 623, routed through the multiplexer 624 and multiplexer 625, added to the output queue 626, and the next for the corresponding input. The expected sequence number is updated.

図６ｂは、出力バッファ部３２２ｂにおける再シーケンス化の例を示す。データリンク３０１及び特定の入力ラインカード３０２に対して、シーケンス番号３までを含む全てのパケットが正しく受信されている。更に、５、６及び８の番号のパケットが正しく受信されているが、これらは順序が乱れている。従って、パケット５、６及び８は、データリンク３０１に対する再シーケンス化キュー６２３に格納される。次に、パケット４が到着し、これは次の期待されるシーケンス番号に対応するので、直ちに出力キュー６２６へとルーティングされる。更に、パケット８を除くパケット５及び６が、この時点で再シーケンス化バッファ６２３から取り出し可能となり、出力キュー６２６に加えられる。これが完了したら、データリンク３０１に対する次の期待されるシーケンス番号は７となる。 FIG. 6b shows an example of resequencing in the output buffer unit 322b. For the data link 301 and the specific input line card 302, all packets including the sequence number 3 are correctly received. In addition, packets with numbers 5, 6 and 8 have been received correctly, but they are out of order. Thus, packets 5, 6 and 8 are stored in the resequencing queue 623 for the data link 301. Next, packet 4 arrives and is immediately routed to output queue 626 because it corresponds to the next expected sequence number. In addition, packets 5 and 6, except packet 8, can now be retrieved from resequencing buffer 623 and added to output queue 626. When this is complete, the next expected sequence number for data link 301 is seven.

デキュー部６２７は、出力キュー６２６にパケットが存在すれば、出力キュー６２６からパケットを取り出し、そのパケットを外部のデータリンク３０１上で伝送する。 If there is a packet in the output queue 626, the dequeue unit 627 extracts the packet from the output queue 626 and transmits the packet on the external data link 301.

図７は、例示的なアービター部３０７を模式的に示す図である。アービター部３０７は、各入力制御リンク３０４ａに対して１つずつ対応する複数の制御メッセージ受信部７６１と、マッチング部７６２と、各入力に対して１つずつ対応する複数のマッチング遅延ライン７６３と、投機的要求調停部７６４と、各出力制御リンク３０４ｂに対して１つずつ対応する複数の制御メッセージ送信部７６５とを含む。 FIG. 7 is a diagram schematically illustrating an example arbiter unit 307. The arbiter unit 307 includes a plurality of control message receiving units 761 corresponding to each input control link 304a, a matching unit 762, a plurality of matching delay lines 763 corresponding to each input, A speculative request arbitration unit 764 and a plurality of control message transmission units 765 each corresponding to each output control link 304b are included.

図７に示されるように、アービター部３０７は、制御リンク３０４ａから制御メッセージを受信する。制御メッセージ受信部７６１は、着信した制御メッセージをデコードする。デコードされた計画的調停要求はマッチング部７６２に送られ、デコードされた投機的要求は投機的要求調停部７６４に送られる。マッチング部７６２は、ＰＩＭ、ｉ−ＳＬＩＰ（N. McKeown, “Scheduling Algorithms for Input-Queued Switches,” PhD. Thesis, University of California at Berkeley, 1995を参照）、又はデュアル・ラウンド・ロビン・マッチング（ＤＲＲＭ）アルゴリズム等といった適切な公知のマッチングアルゴリズムに従って、入力と出力との間の一対一マッチングを計算する。このようなアルゴリズムは当業者に周知であるので、その詳細な説明は簡潔のために省略する。新たに計算されたマッチングは、制御メッセージ送信部７６５及びマッチング遅延ライン７６３に送られる。遅延ライン７６３は、クロスバーの構成設定を対応するパケットの到着と同期させるために、マッチングを所定の時間だけ遅延させる。実際には、この遅延はラウンドトリップ時間と等しい。ラウンドトリップ時間とは、許可がラインカードまで伝わるのにかかる時間と、パケットがクロスバーまで伝わるのにかかる時間（シリアル化及び逆シリアル化遅延、伝送遅延、並びに飛行時間を含む）とを加え、これから、構成情報をルーティング構造に送りそれに従ってルーティング構造を構成するのにかかる時間を差し引いた時間である。 As shown in FIG. 7, the arbiter unit 307 receives a control message from the control link 304a. The control message receiving unit 761 decodes the incoming control message. The decoded planned arbitration request is sent to the matching unit 762, and the decoded speculative request is sent to the speculative request arbitration unit 764. The matching unit 762 can be PIM, i-SLIP (see N. McKeown, “Scheduling Algorithms for Input-Queued Switches,” PhD. Thesis, University of California at Berkeley, 1995), or dual round robin matching (DRRM). ) Calculate a one-to-one matching between input and output according to a suitable known matching algorithm such as an algorithm. Since such algorithms are well known to those skilled in the art, a detailed description thereof is omitted for the sake of brevity. The newly calculated matching is sent to the control message transmission unit 765 and the matching delay line 763. The delay line 763 delays matching by a predetermined time in order to synchronize the crossbar configuration with the arrival of the corresponding packet. In practice, this delay is equal to the round trip time. Round trip time includes the time it takes for permission to travel to the line card and the time it takes for the packet to travel to the crossbar (including serialization and deserialization delays, transmission delays, and flight times) This is the time after deducting the time taken to send the configuration information to the routing structure and configure the routing structure accordingly.

投機的要求調停部７６４は、クロスバーに適用されるべき計画的調停送信要求の現在のマッチングと現在の投機的要求とを入力として受信する。投機的要求調停部７６４は、現在のマッチングで既にマッチングが決定している出力に対応する全ての投機的要求を拒否する。即ち、高い最大スループットを確保するために、計画的調停送信は常に投機的送信よりも優先して行われる。少なくとも１つの投機的要求がきていてマッチングが未決定の各出力に対して、投機的要求調停部７６４は、何らかのポリシー（例えば、ラウンド・ロビン）に従って、許可を与えるものを選択し、他の全てのものを拒否する。投機的要求調停部７６４は、成功した各投機的送信要求に対応する応答（ＡＣＫ）を、その要求を送信した入力に対応する制御メッセージ送信部７６５に送信する。オプションとして、投機的要求調停部７６４は、拒否された各投機的送信要求に対応する否定的な応答（ＮＡＫ）を送信する。成功した投機的送信要求はマッチングに加えられ、その結果がクロスバー構成リンク３１２を介してクロスバーに適用される。 The speculative request arbitration unit 764 receives the current matching of the planned arbitration transmission request to be applied to the crossbar and the current speculative request as inputs. The speculative request arbitration unit 764 rejects all speculative requests corresponding to an output for which matching is already determined in the current matching. That is, in order to ensure a high maximum throughput, planned arbitration transmission is always prioritized over speculative transmission. For each output for which at least one speculative request has been made and the matching is undecided, the speculative request arbitration unit 764 selects what grants permission according to some policy (eg, round robin) and all other Reject things. The speculative request arbitration unit 764 transmits a response (ACK) corresponding to each successful speculative transmission request to the control message transmission unit 765 corresponding to the input that transmitted the request. As an option, the speculative request arbitration unit 764 transmits a negative response (NAK) corresponding to each rejected speculative transmission request. A successful speculative transmission request is added to the match and the result is applied to the crossbar via the crossbar configuration link 312.

各制御メッセージ送信部７６５は、対応するラインカード３０２に対する制御メッセージをアセンブルし、その制御メッセージを、制御リンク３０４ｂを介してラインカード３０２に送信する。制御メッセージは、調停による新たな許可（存在する場合）及び成功した投機的送信（存在する場合）に関する応答を含み得る。 Each control message transmission unit 765 assembles a control message for the corresponding line card 302 and transmits the control message to the line card 302 via the control link 304b. The control message may include a response regarding the new grant by arbitration (if any) and a successful speculative transmission (if any).

一般的に、投機的送信を使用すると、飛行中の投機的パケットがシーケンスから外れるため、出力バッファ部３２２に到着するパケットの順序の乱れが生じ得る。衝突によって１つ以上のパケットが失われることもあり得る。従って、具体的なＲＤ方式によっては、出力バッファ部３２２は、シーケンスチェック及び／又は再シーケンス化を行う必要があり得る。 In general, when speculative transmission is used, in-flight speculative packets are out of sequence, which may cause out-of-order packets to arrive at the output buffer unit 322. One or more packets can be lost due to a collision. Therefore, depending on the specific RD method, the output buffer unit 322 may need to perform sequence check and / or re-sequencing.

高信頼性配信
ＳＴＸ動作モードでは、投機的調停によって送信されたパケットが、他の投機的パケットや計画的調停によって送信されたパケットと衝突し得る。衝突が生じた場合には、１つ以上のパケットが落とされなければならず、その結果、たとえ誤差が無い場合でも、損失の多いスイッチング処理となる。信頼性が高く、正しく、各単一のコピーが順序通りに届く配信を確実にするために、更なる対策をとることができる。例えば、本発明は、システムの信頼性を高めるために、多くの高信頼性配信（ＲＤ）方式と統合され得る。一般的に、このような方式は、受信側が特定のパケットを正しく受信したことを示す応答（ＡＣＫ）が受信されるまで、パケットを送信側に格納しておくことによって行われる。再送は、一般的に、タイムアウトによって暗示的に又は否定的な応答（ＮＡＫ）によって明示的にトリガされる。具体的な実装例によっては、特定のパケットを識別するためのシーケンス番号が必要となるか、それに加えて／又は、出力に、正しいパケット順を復元するための再シーケンス化バッファが必要となる。 Reliable delivery In the STX mode of operation, packets transmitted by speculative arbitration may collide with other speculative packets or packets transmitted by planned arbitration. If a collision occurs, one or more packets must be dropped, resulting in a lossy switching process even if there are no errors. Further measures can be taken to ensure reliable, correct delivery of each single copy in order. For example, the present invention can be integrated with many reliable delivery (RD) schemes to increase the reliability of the system. In general, such a method is performed by storing a packet on the transmission side until a response (ACK) indicating that the reception side has correctly received a specific packet is received. Retransmissions are typically triggered either implicitly by a timeout or explicitly by a negative response (NAK). Depending on the specific implementation, a sequence number may be required to identify a particular packet, and / or in addition, a resequencing buffer may be required at the output to restore the correct packet order.

例えば、投機的送信の存在下でも信頼性の高い順序通りの配信を強化するために、計画的調停と組み合わせたＳＴＸ調停を、公知のストップ・アンド・ウェイト（Ｓ＆Ｗ）、Ｇｏ−Ｂａｃｋ−Ｎ（ＧＢＮ）、及び選択的再試行（ＳＲ）ＲＤ方式と統合してもよい。以下、これらの例示的な方式を説明する。 For example, in order to enhance reliable and in-order delivery even in the presence of speculative transmissions, STX arbitration combined with planned arbitration is replaced with known stop-and-wait (S & W), Go-Back-N ( GBN) and selective retry (SR) RD schemes. These exemplary schemes are described below.

ストップ・アンド・ウェイト
ストップ・アンド・ウェイト（Ｓ＆Ｗ）の場合には、任意の所与の時間に、各入力において、１つの出力につき１つの未応答パケットのみが未解決であることが許される。この入力−出力ペアに対する次のパケットは、対応する出力に関する許可が入力に到着した場合、又は、対応する応答が到着した場合にのみ送信できる。この場合には、パケット、投機的送信要求及び応答は、二重配信の検出を可能にする標識（例えば、１ビットのシーケンス番号）を有する。例えば、出力バッファ部は、シーケンス番号をチェックすることができ（例えば、順序が正しいシーケンス番号は交互にならなければならない）、ＯＫである場合（即ち、順序が正しい場合）には、そのパケットをキューに加え、そうでない場合には、そのパケットを落とす。例えば図５は、再シーケンス化なしで実装された出力バッファ部３２２ａを示す。 Stop and Wait In the case of Stop and Wait (S & W), at any given time, only one unacknowledged packet per output is allowed outstanding at each input. The next packet for this input-output pair can only be sent if permission for the corresponding output arrives at the input or when the corresponding response arrives. In this case, the packet, speculative transmission request, and response have an indicator (eg, a 1-bit sequence number) that allows detection of dual delivery. For example, the output buffer can check the sequence number (eg, the sequence numbers with the correct order must alternate) and if OK (ie, the order is correct) Add to queue, otherwise drop the packet. For example, FIG. 5 shows an output buffer unit 322a implemented without resequencing.

Ｓ＆Ｗは実装が簡単で、間接経費が低い。例えば、応答又は許可されるまで、パケットはＶＯＱの行頭に留まることでできるので、別個のＲＴＸキューを物理的に実装する必要がない。しかし、投機的送信はパイプライン化できないので、ラウンドトリップ時間が１パケット時間より大きい場合には、期待される待ち時間の改善は低くなる。これは、トラフィックがバーストしている場合には、特に不利益となり得る。 S & W is easy to implement and has low overhead. For example, a packet can remain at the beginning of a VOQ until it is answered or granted, so there is no need to physically implement a separate RTX queue. However, speculative transmission cannot be pipelined, so if the round trip time is greater than one packet time, the expected latency improvement is low. This can be particularly detrimental when traffic is bursting.

Ｇｏ−Ｂａｃｋ−Ｎ
Ｇｏ−Ｂａｃｋ−Ｎ（ＧＢＮ）の場合には、任意の所与の時間に、各入力において、１つの出力につき所定の最大数のパケットが未応答であることが許される。この数は、再送キュー４１２の最大許容長によって異なる。ＧＢＮでは、出力バッファ部は、例えば、図５の出力バッファ部３２２ａに従って、再シーケンス化なしで実装可能である。しかし、パケットは正しい順序である場合にのみ受け入れられるので、上述のシーケンスチェックが必要である。即ち、順序が乱れたパケットは全て落とされる。更に、応答も、順序通りである場合にのみ受け入れられる。例えば、ＡＣＫが到着すると、入力バッファ部３２１は、このＡＣＫが、対応する再送キュー４１２の行頭パケットに対応するか否かをチェックする。シーケンス番号が一致し、且つ、同じＶＯＱ４１１に対する最新の先行する成功した送信が少なくとも１ＲＴ前に生じている場合には、そのパケットは良好に配信されているので、そのパケットはキューから取り出されて落とされる。シーケンス番号が一致しない場合には、そのＡＣＫは無視される。最新の成功した送信（計画的調停又は投機的調停から生じたパケット送信であり得る）に関するこの付加的な条件は、成功した投機的送信であっても、その前の投機的送信の失敗の結果、順序が乱れた到着となり、後で出力バッファ３２２ａで落とされ得るという、生じ得るシナリオに対処するために必要である。この場合には、ＡＣＫは誤検出となる。 Go-Back-N
In the case of Go-Back-N (GBN), a predetermined maximum number of packets per output is allowed to be unanswered at each input at any given time. This number varies depending on the maximum allowable length of the retransmission queue 412. In GBN, the output buffer unit can be implemented without resequencing, for example, according to the output buffer unit 322a of FIG. However, since the packets are only accepted if they are in the correct order, the above sequence check is necessary. That is, all packets out of order are dropped. In addition, responses are accepted only if they are in order. For example, when ACK arrives, the input buffer unit 321 checks whether or not this ACK corresponds to the first packet in the corresponding retransmission queue 412. If the sequence numbers match and the most recent previous successful transmission for the same VOQ 411 has occurred at least 1 RT before, the packet is successfully delivered and the packet is removed from the queue and dropped. It is. If the sequence numbers do not match, the ACK is ignored. This additional condition for the latest successful transmission (which may be a packet transmission resulting from planned or speculative arbitration) is the result of a previous speculative transmission failure, even for a successful speculative transmission. This is necessary to handle a possible scenario where the arrival is out of order and can be dropped later in the output buffer 322a. In this case, ACK is erroneously detected.

これは、（パイプライン化された）ＳＴＸ要求シーケンスの最初のものが失敗すると、対応するＡＣＫシーケンス中にギャップが生じ、失敗したパケットが良好に配信されるまで、全ての後続のＡＣＫが無視されなければならないことを意味する。その結果、ＡＣＫが欠けている場合には、ＲＴＸキュー内の全てのパケットを再送しなければならず、それゆえ、Ｇｏ−Ｂａｃｋ−Ｎと呼ばれる。新たなＳＴＸ要求は、ＲＴＸキューが一杯でない場合にのみ発行可能である。 This is because if the first of the (pipelined) STX request sequence fails, there will be a gap in the corresponding ACK sequence and all subsequent ACKs will be ignored until the failed packet is successfully delivered. It means you have to. As a result, if the ACK is missing, all packets in the RTX queue must be retransmitted and are therefore referred to as Go-Back-N. A new STX request can only be issued if the RTX queue is not full.

ＧＢＮは、Ｓ＆Ｗよりも実装が複雑である。この複雑さは、部分的には、ＲＴＸキューが先入れ先出し（ＦＩＦＯ）構成を有することから生じている。余計にかかる間接経費としては、パケット、ＳＴＸ要求及び応答のシーケンス番号が長いことが含まれ得る。入力バッファも、異なる方法で実装され得る（例えば、シフトレジスタＲＴＸキューと付加的なＮＡＫキュー）。 GBN is more complex to implement than S & W. This complexity arises in part from the fact that the RTX queue has a first-in first-out (FIFO) configuration. The extra overhead may include long packet, STX request and response sequence numbers. Input buffers can also be implemented differently (eg, shift register RTX queue and additional NAK queue).

一方、ラウンドトリップが大きいシステムでは、Ｓ＆ＷよりもＧＢＮの使用が適している場合もある。例えば、ＧＢＮでは、多くのパケットを立て続けに投機的に送信することが可能であり、これはＳ＆Ｗ方式ではできない。 On the other hand, in a system with a large round trip, use of GBN may be more suitable than S & W. For example, in GBN, it is possible to speculatively transmit many packets in succession, which is not possible with the S & W method.

選択的再試行
選択的再試行（ＳＲ）では、任意の所与の時間に、各入力において、１つの出力につき所定の最大数のパケットが未応答であることが許される。この数は、再送キュー４１２の最大許容長及び再シーケンス化キュー６２３の最大許容長によって異なる。出力バッファ部は、例えば、図６ａ及び図６ｂに示されている出力バッファ部３２２ｂに従って実装され得るものであり、これは、上述のように、順序が乱れたパケットの受け入れ及び再シーケンス化が可能である。入力バッファ部３２１も、いかなる順序の応答も受け入れる。これは、失敗したＳＴＸパケットのみが再送されることを意味し、それゆえ、ＧＢＮのようにＲＴＸキュー全体を再送するのとは異なり、選択的再試行と呼ばれる。 Selective Retry Selective retry (SR) allows a predetermined maximum number of packets per output to be unacknowledged at each input at any given time. This number depends on the maximum allowable length of the retransmission queue 412 and the maximum allowable length of the resequencing queue 623. The output buffer can be implemented, for example, according to the output buffer 322b shown in FIGS. 6a and 6b, which can accept and resequence out-of-order packets as described above. It is. The input buffer unit 321 also accepts responses in any order. This means that only failed STX packets are retransmitted and is therefore called selective retry, unlike retransmitting the entire RTX queue like GBN.

ＳＲは、出力バッファ部３２２における複数の再シーケンス化キューの複雑さが加わるため、ＧＢＮよりも実装がかなり複雑である。更に、ＳＲにおけるＲＴＸキューは、パケットをキューの行頭のみでなくキュー内の任意の地点から取り出せるため、ランダムアクセス構成を必要とする。 SR is much more complex to implement than GBN due to the added complexity of multiple resequencing queues in output buffer unit 322. Furthermore, the RTX queue in the SR requires a random access configuration because packets can be taken not only from the head of the queue but also from any point in the queue.

再送及び再シーケンス化バッファのサイズ決定
一般的に、待ち時間の低減に関して最大の利益を得るには、再送バッファのサイズを、少なくとも完全な１回のラウンドトリップに相当するパケットを投機的に順序通りに送信可能なように決定するのが有益である。この場合には、投機的送信で、リンクを１００％の率で使用することができる。 Resizing and resequencing buffer sizing In general, for maximum benefit in terms of latency reduction, the size of the resending buffer should be speculatively ordered in order of at least one full round-trip packet. It is useful to determine that it can be sent to. In this case, the link can be used at a rate of 100% with speculative transmission.

最適な利益を達成するために、好ましい実施形態では、各再シーケンス化キュー（ＲＳＱ）のサイズは、ＲＳＱバッファに対応するＲＴＸキューと少なくとも等しい数のパケットを格納可能なように決定される。例えば、ＲＳＱ６２３のサイズは、ＲＴＸキュー、例えば、ＲＳＱ６２３に対応するＲＴＸキュー４１２の行頭パケットのシーケンス番号とＲＳＱ６２３に対応するＲＴＸキュー４１２の最後尾パケットのシーケンス番号との差分に基づいて決定され得る。 In order to achieve optimal benefits, in the preferred embodiment, the size of each resequencing queue (RSQ) is determined to be able to store at least as many packets as the RTX queue corresponding to the RSQ buffer. For example, the size of the RSQ 623 may be determined based on the difference between the sequence number of the first packet in the RTX queue 412 corresponding to the RSQ 623 and the sequence number of the last packet in the RTX queue 412 corresponding to the RSQ 623, for example.

非対称ルーティング構造における投機的実行
本発明の上述の例は、一般的に、「正方形」のクロスバー状ルーティング構造（即ち、同じ数の入力及び出力を有する構造）を示すものである。ラインカード等のボードは、１つの構造入力及び１つの構造出力に接続されるのが一般的であるが、本発明は、入力よりも出力の数が多いことを特徴とするスイッチ構造でも実装され得る。例えば、出力の数が入力の数のＫ倍（Ｋは１よりも大きい整数）であるとすると、丁度Ｋ個のルーティング構造出力を、各出力ラインカードに割り当てることができる。これにより、各出力ラインカードが、各タイムスロットにおいてＫ個までのパケットを受け取ることが可能になる。 Speculative Implementation in an Asymmetric Routing Structure The above example of the present invention generally illustrates a “square” crossbar-like routing structure (ie, a structure having the same number of inputs and outputs). A board such as a line card is generally connected to one structure input and one structure output, but the present invention is also implemented in a switch structure characterized by a greater number of outputs than inputs. obtain. For example, if the number of outputs is K times the number of inputs (K is an integer greater than 1), exactly K routing structure outputs can be assigned to each output line card. This allows each output line card to receive up to K packets in each time slot.

図８は、非対称ルーティング構造を含む例示的なシステムを示す。図８に示されるように、このシステムは、スイッチコア８１０と、Ｎ個のラインカード８０２のうちの１つの入力バッファ８２１の各出力間に各々が設けられたＮ個の入力データリンク８０３ａ_１〜８０３ａ_Ｎと、ルーティング構造８０６と各出力バッファ８２２との間に設けられたデータリンク８０３ｂ_１〜８０３ｂ_Ｋとを有する。従って、スイッチコア８１０のルーティング構造８０６からの出力には、全部でＮ・Ｋ個のデータリンク８０３ｂ_ｘ（ｘは１〜ｋ）が存在する。スイッチコア８１０とラインカード８０２との間には、上述のものと同様の制御リンクが存在するが、簡潔のために制御リンクは図示しない。 FIG. 8 illustrates an exemplary system that includes an asymmetric routing structure. As shown in FIG. 8, this system includes a switch core 810 and N input data links 803a ₁ to 803a ₁ each provided between the outputs of one input buffer 821 of N line cards 802. 803a _N and data links 803b _{1 to} 803b _K provided between the routing structure 806 and each output buffer 822. Accordingly, the output from the routing structure 806 of the switch core 810 includes a total of N · K data links 803b _x (x is 1 to k). A control link similar to that described above exists between switch core 810 and line card 802, but the control link is not shown for the sake of brevity.

スイッチコア８１０は、ルーティング構造８０６と、アービター８０７とを含む。アービター８０７は、入力ポートと出力ポートとの適切なマッチングを提供するために、計画的調停及び／又は投機的調停の要求のグループ又はサブグループに対する衝突が生じないスイッチング構造リソースの割り当てを決定する。構成リンク８１２は、アービター８０７によって決定されたスイッチング構成をルーティング構造８０６に提供する。上述したように、各出力８２２には、ルーティング構造からの出力が、ラインカード８０２からの入力数のＫ倍の数だけ設けられている。例えば、Ｋ＝２の非対称を実装するシステムでは、出力バッファ８２２は、スイッチング構造の入力の任意の１つからのパケットを受信可能な、１つの更なるデータリンク（スイッチング構造からの出力）を有する。任意の時間又はサイクルに受信するデータパケットの増加に対処するために、出力バッファ８２２のキューのサイズは、負荷及び／又は様々な制約に従って決定される。また、出力バッファ８２２は、書き込み帯域幅のＫ倍を有するのが好ましい。 The switch core 810 includes a routing structure 806 and an arbiter 807. Arbiter 807 determines the allocation of switching structure resources that do not cause collisions to groups or subgroups of planned and / or speculative arbitration requests in order to provide appropriate matching of input and output ports. Configuration link 812 provides the switching configuration determined by arbiter 807 to routing structure 806. As described above, each output 822 is provided with the number of outputs from the routing structure that is K times the number of inputs from the line card 802. For example, in a system implementing K = 2 asymmetry, the output buffer 822 has one additional data link (output from the switching structure) that can receive packets from any one of the inputs of the switching structure. . In order to handle the increase in data packets received at any time or cycle, the queue size of the output buffer 822 is determined according to load and / or various constraints. The output buffer 822 preferably has K times the write bandwidth.

図８の例示的なシステムでは、ペイロード情報を有するデータパケット及び要求されたパケットの宛先を示す情報を含むヘッダーは、Ｎ個の双方向全二重型入力／出力データリンク８０１を介して送受信される。受信されたデータパケットは、入力バッファ８２１から取り出され、データリンク８０３ａ_１〜８０３ａ_Ｎを介してルーティング構造８０６の入力に伝送される。各ラインカード８０２は、残りのＮ−１個のラインカードからもＫ個までの出力を受信できるので、本発明の非対称の例のルーティング構造８０６は、例えば、Ｎ個のラインカード８０２の各々によって計画的調停及び／又は投機的調停によって送信された１つのパケットを、他の任意のラインカード８０２の出力バッファ８２２に出力する８０３ｂ_ｘデータラインの１つとマッチングさせることができる。このように、各ラインカード８２０は、他のＮ−１個のラインカード８０２から送信された、計画的調停、投機的調停、又はそれらの組み合わせによって送信されたパケットを含み得るＫ個までのパケットを受信できる。 In the exemplary system of FIG. 8, a header containing data packets with payload information and information indicating the destination of the requested packet is transmitted and received via N bidirectional full-duplex input / output data links 801. . The received data packet is taken from the input buffer 821 and transmitted to the input of the routing structure 806 via the data links 803a _{1 to} 803a _N. Since each line card 802 can receive up to K outputs from the remaining N−1 line cards, the asymmetric example routing structure 806 of the present invention is, for example, by each of the N line cards 802. One packet transmitted by planned and / or speculative arbitration can be matched with one of the 803b _x data lines that are output to the output buffer 822 of any other line card 802. Thus, each line card 820 may have up to K packets that may include packets transmitted from the other N-1 line cards 802 by planned arbitration, speculative arbitration, or a combination thereof. Can be received.

この特徴は、任意のタイムスロットに、複数の計画的調停送信、複数の投機的送信、又は両者の組み合わせが各出力に受け入れられるのを可能にするために、有益に用いられ得る。本発明の一実施形態では、例えば、この特徴を利用して、投機的送信方式の大きな長所とすることができる。例えば、所与のタイムスロットに、計画的調停送信が存在しない場合には、各出力８２２は、Ｋ個までの投機的送信を受け入れることができる。１つの計画的調停送信が存在する場合には、各出力８２２は、Ｋ−１個までの投機的送信を受け入れることができる。この例では、計画的調停送信は、依然として、入力と出力との間に一対一マッチングを構成し、投機的調停送信は、残りの全ての構造出力の帯域幅を利用することができる。このようにして、投機的送信がかなり高い成功の確率を有することを確実でき、かなり高い待ち時間の低減（負荷によって異なる）が達成される。 This feature can be beneficially used to allow multiple planned arbitration transmissions, multiple speculative transmissions, or a combination of both to be accepted at each output in any time slot. In one embodiment of the present invention, for example, this feature can be used to provide a great advantage of the speculative transmission method. For example, if there are no planned arbitration transmissions in a given time slot, each output 822 can accept up to K speculative transmissions. If there is one planned arbitration transmission, each output 822 can accept up to K-1 speculative transmissions. In this example, planned arbitration transmission still constitutes a one-to-one match between input and output, and speculative arbitration transmission can utilize the bandwidth of all remaining structural outputs. In this way, it can be ensured that the speculative transmission has a much higher probability of success, and a much higher latency reduction (depending on the load) is achieved.

既存の一対一マッチングアルゴリズムは、本発明の非対称の例を実装するために、ほとんど又は全く修正を必要としない。しかし、伝送されるパケットの複数のタイプ及び／又は伝送されるパケットの多くのタイプの各々に関して、ルーティング構造の入力と出力とをマッチングさせるために、多くの方式が用いられ得ることを理解されたい。 Existing one-to-one matching algorithms require little or no modification to implement the asymmetric example of the present invention. However, it should be understood that many schemes can be used to match the input and output of the routing structure for each of multiple types of transmitted packets and / or many types of transmitted packets. .

１つのタイムスロットにおいて１つの出力につきＫ個までの投機的要求／応答を扱うには、図７の投機的要求調停部７６４を修正しなければならない。まず、各出力に対して、応答可能な投機的要求の最大数を決定しなければならない。現在のマッチングにおいて既にマッチングが決定されている全ての出力については、この最大数はＫ−１である。他の全ての出力については、この最大数はＫである。次に、投機的要求調停部７６４は、出力毎に受信した投機的要求からの選択を行い、決定された最大数までの勝者を選び、他の全てのものは拒否する。この選択は、公正のためにラウンド・ロビン方式で行われるのが好ましいが、他の方法で行われてもよい。 To handle up to K speculative requests / responses per output in a time slot, the speculative request arbitration unit 764 of FIG. 7 must be modified. First, for each output, the maximum number of speculative requests that can be answered must be determined. For all outputs for which matching has already been determined in the current matching, this maximum number is K-1. For all other outputs, this maximum number is K. Next, the speculative request arbitration unit 764 performs a selection from the speculative request received for each output, selects the winners up to the determined maximum number, and rejects all others. This selection is preferably done in a round robin fashion for fairness, but may be done in other ways.

本発明は、待ち時間がシステム全体の計算性能に直接影響し、低い待ち時間が望ましい又は必要な、高性能計算システム、クラスタ及びＩＯネットワークを含む並列計算用途等の相互接続ネットワークに適用性を有する。本発明は、電気的に制御される電気的スイッチ、電気的に制御される光スイッチ、及び光学的に制御される光スイッチを含むルーティング構造と共に用いられ得る。 The present invention has applicability to interconnect networks, such as parallel computing applications including high performance computing systems, clusters and IO networks, where latency directly affects the overall system computing performance and low latency is desirable or necessary. . The present invention may be used with routing structures including electrically controlled electrical switches, electrically controlled optical switches, and optically controlled optical switches.

本発明の理解を容易にするために、本発明の多くの態様を、相互接続システムの要素によって行われるアクションのシーケンスに関して説明した。各例示的な実施形態において、様々なアクションは、特殊な回路（例えば、特殊な機能を行うために相互に接続されたディスクリート論理ゲート）によって、１つ以上のプロセッサによって実行されるプログラム命令によって、又は両者の組み合わせによって行われることも可能であることが認識されよう。更に、本発明は、本願明細書に記載した技術をプロセッサに実行させる適切なコンピュータ命令のセットを含む、固体メモリ、磁気ディスク、光ディスク又は搬送波（例えば、無線周波数、可聴周波数又は光周波数の搬送波）等といった、任意の形態のコンピュータ読取り可能キャリア内で具現化されることも考えられる。従って、本発明の様々な態様は、多くの異なる形態で具現化され得るものであり、そのような全ての形態は、本発明の範囲に含まれることが意図される。 In order to facilitate understanding of the present invention, many aspects of the present invention have been described in terms of sequences of actions performed by elements of an interconnect system. In each exemplary embodiment, the various actions are performed by program instructions executed by one or more processors by special circuitry (e.g., discrete logic gates interconnected to perform special functions). It will also be appreciated that this can be done by a combination of both. Further, the present invention provides a solid state memory, magnetic disk, optical disk or carrier wave (eg, radio frequency, audible frequency or optical frequency carrier) that includes a suitable set of computer instructions that cause the processor to perform the techniques described herein. It may be embodied in any form of computer readable carrier, such as. Accordingly, various aspects of the invention can be embodied in many different forms, and all such forms are intended to be included within the scope of the invention.

上述の例では、ボード（例えば、ラインカード）は、所与のタイムスロットに対する許可を受信していない場合にのみ、投機的送信を行う資格がある。しかし、１つのタイムスロットに複数のパケットを送信できるような、クロスバーへのリンクの有効容量が外部のリンクの有効容量を超える適用例では、空いている容量を、同じタイムスロットに投機的及び許可された送信を行うために用いることも可能である。 In the above example, a board (eg, a line card) is eligible to make a speculative transmission only if it has not received permission for a given time slot. However, in applications where the effective capacity of the link to the crossbar exceeds the effective capacity of the external link, such that multiple packets can be transmitted in one time slot, the available capacity is speculative and It can also be used to make authorized transmissions.

本発明を、特定の実施形態を参照して説明した。しかし、本発明の精神及び範囲を逸脱することなく本発明において様々な変更及び修正がなされ得ることは当業者には自明である。従って、本発明のそのような変更が、添付の請求項の範囲及びその均等物の範囲内である限りにおいて、本発明は、それらの変更をも包含することが意図される。 The invention has been described with reference to specific embodiments. However, it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Therefore, to the extent that such modifications of the invention fall within the scope of the appended claims and their equivalents, the present invention is intended to encompass those modifications.

仮想出力バッファリングを含む従来のシステムの高レベルの概要を示す図。1 shows a high-level overview of a conventional system that includes virtual output buffering. FIG. ラインカード対スイッチ（ＬＣＳ）プロトコルを用いる最適化されていない従来のシステムを示す図。1 shows a conventional unoptimized system using a line card to switch (LCS) protocol. FIG. 本発明による例示的なシステムを示す図。1 illustrates an exemplary system according to the present invention. 本発明の一例による入力バッファ部を示す図。The figure which shows the input buffer part by an example of this invention. 本発明によるラインカードの出力バッファ部の一例を示す図。The figure which shows an example of the output buffer part of the line card by this invention. 本発明によるラインカードの出力バッファ部の別の例を示す図。The figure which shows another example of the output buffer part of the line card by this invention. 図６ａに示されている出力バッファのより詳細な例を示す図。FIG. 6b shows a more detailed example of the output buffer shown in FIG. 6a. 本発明による例示的なアービター部を模式的に示す図。The figure which shows typically the example arbiter part by this invention. 本発明による別の例示的なシステムを示す図。FIG. 3 illustrates another exemplary system according to the present invention.

Explanation of symbols

３０１データリンク
３０２ラインカード
３０３データリンク
３０４制御チャネル
３０６クロスバー（ルーティング構造）
３０７アービター
３１０スイッチコア
３１２構成リンク
３２１入力バッファ部
３２２出力バッファ部
４１０入力デマルチプレクサ
４１１仮想出力キュー（ＶＯＱ）
４１２再送（ＲＴＸ）キュー
４１３デマルチプレクサ
４１４マルチプレクサ
４１５出力マルチプレクサ
４１６制御部 301 Data Link 302 Line Card 303 Data Link 304 Control Channel 306 Crossbar (Routing Structure)
307 Arbiter 310 Switch core 312 Configuration link 321 Input buffer unit 322 Output buffer unit 410 Input demultiplexer 411 Virtual output queue (VOQ)
412 Retransmission (RTX) queue 413 Demultiplexer 414 Multiplexer 415 Output multiplexer 416 Control unit

Claims

A method of transmitting data packets from multiple inputs of a routing structure to multiple outputs with or without planned arbitration comprising:
Transmitting at least one data packet according to the result of the planned arbitration if the result of the planned arbitration exists;
If there is no result of the planned mediation,
Selecting a data packet to be speculatively transmitted;
Issuing a speculative request including an output identifier of the selected data packet and simultaneously transmitting the selected packet to the routing structure.

The method of claim 1, further comprising storing the selected data packet and storing the packet until a response or permission to the stored packet is received.

The method of claim 1, wherein the routing structure is not buffered.

For each time slot, matching the input of the routing structure to the output according to the result of the planned arbitration;
If there is an issued speculative request for each unmatched output in the time slot where the matched input and output are applied to the routing structure, one speculative request is subject to a selection policy. A process to select;
The method of claim 1, further comprising: adding an input-to-output mapping for each winning speculative request to the current result of the planned arbitration applied to the routing structure.

5. The method of claim 4, comprising the step of returning a response corresponding to each winning selected speculative request.

When the speculatively issued request is determined while suppressing other requests, the transmission of the transmission permission issued to the routing structure is synchronized with the arrival of the speculatively issued data packet. The method of claim 1, wherein the switch state of the structure is set at a time when it is just right so as not to prevent the data packet from passing through the routing structure.

In a system that manages the flow of information units between multiple locations,
A plurality of inputs for receiving a plurality of access requests to a switching structure for transmitting a unit of information from a plurality of source locations to a plurality of destination locations, wherein the request is a request for access permission by planned arbitration; A plurality of inputs, each of which includes an indication of a destination location to which an associated unit of information should be transferred, and a request for speculative access without permission by planned arbitration;
An arbiter for determining an allocation in which no conflict of resources of the switching structure occurs for a group or subgroup of permission requests by the planned arbitration;
A speculative that receives the request for speculative access and grants or denies the received request based on the determined assignment when the group in which the determination is made is to be applied next to the switching structure; A system comprising: a request arbiter.

An information unit associated with an authorized one of the received speculative access requests is passed through the switching structure and sent to an output not reserved for the group or subgroup. The system of claim 7.

8. The system of claim 7, wherein the switching structure does not include a buffer circuit for storing the information unit.

8. The system of claim 7, wherein each of the plurality of inputs of the switching structure is simultaneously connectable to an individual output of the switching structure.