JP2010009628A

JP2010009628A - Server system

Info

Publication number: JP2010009628A
Application number: JP2009237839A
Authority: JP
Inventors: Morihide Nakatani; 守秀中谷; Shisei Fujiwara; 至誠藤原; Toshihiro Ishiki; 敏宏石木; Naoto Sakuma; 直人作間; Junichi Funatsu; 淳一船津; Takeshi Yoshida; 健吉田; Tomonaga Itoi; 朋永糸井
Original assignee: Hitachi Ltd; Hitachi Information and Communication Engineering Ltd
Current assignee: Hitachi Ltd; Hitachi Information and Telecommunication Engineering Ltd
Priority date: 2004-12-09
Filing date: 2009-10-15
Publication date: 2010-01-14
Anticipated expiration: 2025-04-28
Also published as: JP5050028B2; CN101526935A; CN1786936B; CN1786936A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a server system which has, in addition to extensibility of scale-out type of a conventional blade server system, extensibility of scale-up type by making SMP (Symmetric Multi Processing) coupling among a plurality of blade server modules. <P>SOLUTION: A node controller in each blade server module has an SMP coupling interface, and is coupled via a back plane. Links among individual blade server modules are laid through equidistant wiring lines on a back plane and besides a loop wiring line having length equal to that of the links among the individual blade server modules on the back plane is also laid in each blade server module, thereby setting up synchronization. Each blade server module has a reference clock distribution unit mounted on the back plane and adapted to distribute reference clocks and by switching reference clocks by a clock distribution circuit inside each blade server module, synchronization of reference clocks for SMP coupled blade server modules can be established. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、複数のスケールアウト型サーバモジュールを密結合することで高性能なスケールアップサーバの構築を可能とするサーバシステムに関する。特に対称型マルチプロセッサシステム（ＳＭＰ）における、多ノードＳＭＰサーバ装置に関する。 The present invention relates to a server system capable of constructing a high-performance scale-up server by tightly coupling a plurality of scale-out type server modules. In particular, the present invention relates to a multi-node SMP server apparatus in a symmetric multiprocessor system (SMP).

従来のサーバ装置における演算処理能力の拡張手段としては、「スケールアウト」、「スケールアップ」と呼ばれる大きく２つの方式に分類できる。スケールアウト方式とは、ブレードサーバシステムに代表されるように、複数のサーバ装置に処理を分散させることで全体の処理能力を向上する拡張手法のことであり、相互に関連の薄い処理が大量にある場合に有効である。スケールアップ方式とは、ＳＭＰ（ＳｙｍｍｅｔｒｉｃＭｕｌｔｉＰｒｏｃｅｓｓｉｎｇ：対称型マルチプロセッシング）システムに代表されるように、プロセッサの高速化と増設、メモリの大容量化などによってサーバ装置単体の処理能力を向上させる拡張手法であり、高負荷の単一プロセス処理に有効である。このようにブレードサーバシステムとＳＭＰシステムはそれぞれ異なる特徴をもつことから、システム構築にあたっては、アプリケーション、業務内容に応じて適切な方式を選択するのが一般的である。実際にインターネットデータセンタ（ＩＤＣ）ではＷＥＢフロントエンド処理など比較的軽い処理を大量に実行するＷＥＢサーバとしてスケールアウトに適したブレードサーバ装置が、大規模ＤＢなどメモリを大量に必要とする処理を実行するデータベースサーバとしてはスケールアップに適したＳＭＰサーバ装置というように使い分けられている。このことは一見すると適材適所で非常に効率的であるように思われるが、目的に合わせてそれぞれ専用のサーバ装置を設置することになるために管理が煩雑になり、運用コストの面では必ずしも効率の良い状態とは言い難い。また、めまぐるしく変化するビジネス環境の中にあって急激なシステム要件の変化に対する既知の解決策として、第一にハードウェアの増設が挙げられる。これは例えばスケールアウト型のブレードサーバではブレードサーバモジュールの増設、スケールアップ型のＳＭＰサーバではプロセッサやメモリ等のハードウェアリソースの増設またはより高性能なハードウェアリソースへの増強ということになるが、これもＴＣＯの削減を妨げる要因の１つと言える。 As a means for expanding the arithmetic processing capability in the conventional server device, it can be classified into two methods called “scale-out” and “scale-up”. The scale-out method is an expansion method that improves the overall processing capacity by distributing the processing to multiple server devices, as represented by the blade server system. Effective in some cases. The scale-up method is an expansion method that improves the processing capacity of a single server device by increasing the speed and increasing the number of processors and increasing the memory capacity, as represented by SMP (Symmetric Multi Processing) systems. It is effective for high-load single process processing. As described above, since the blade server system and the SMP system have different characteristics, it is general to select an appropriate method according to the application and business contents when constructing the system. In fact, in the Internet Data Center (IDC), a blade server device suitable for scale-out as a WEB server that executes a large amount of relatively light processing such as WEB front-end processing performs processing that requires a large amount of memory, such as a large-scale database. The database server to be used is properly used as an SMP server device suitable for scale-up. At first glance, this seems to be very efficient at the right place for the right person, but because each server has its own dedicated server device, it becomes cumbersome to manage and is not necessarily efficient in terms of operational costs. It is hard to say that it is in good condition. In addition, as a known solution to a sudden change in system requirements in a rapidly changing business environment, firstly, the addition of hardware can be mentioned. For example, in a scale-out type blade server, an increase in the blade server module, in a scale-up type SMP server, an increase in hardware resources such as a processor and memory, or an increase in hardware resources with higher performance, This is also one of the factors that hinder TCO reduction.

多ノードＳＭＰ構成では、メモリアドレスを送信し、キャッシュコヒーレンシを維持し、データをキャッシュラインの大きさのブロック単位で転送する必要がある。プロセッサには頻繁に使用されるデータ・ブロックを保管するキャッシュメモリがある。一般的なキャッシュ・ブロック・サイズは32、64、または128バイトであり、キャッシュラインと呼ばれる。プロセッサは、必要なデータがキャッシュにないと(キャッシュ・ミス)、他のプロセッサに必要なデータを要求する。要求したブロックの修正されたコピーが何処のプロセッサにもなく、また入出力コントローラにもないと、ブロックはメモリから取り出される。ブロックを変更する許可を得るためには、メモリからブロックを取り出したプロセッサが、ブロックの所有者になる必要がある。変更許可を得たプロセッサが所有者になると、他のすべてのデバイスは保持しているコピーを無効にし、前の所有者は変更許可を得たプロセッサが要求したデータを新所有者に渡す。前の所有者が変更許可を得たプロセッサが要求したデータを新所有者に渡した後は、他のプロセッサが所有プロセッサの要求したデータの読み取り専用コピーを共有しようとすると、データは所有デバイス(メモリではなく)から提供される。所有プロセッサは、新しいデータを書き込むためにキャッシュの空き領域が必要になると、キャッシュブロックをメモリに書き込み、メモリが再び所有者になる。キャッシュブロックの最新コピーを見つけるプロセスを「キャッシュコヒーレンシ」と呼ぶ。システム設計者は、主にブロードキャストコヒーレンシとディレクトリコヒーレンシの２つの方法を使って各プロセッサから見たメモリの一貫性を維持する。 In a multi-node SMP configuration, it is necessary to transmit a memory address, maintain cache coherency, and transfer data in units of blocks of the size of the cache line. The processor has a cache memory that stores frequently used data blocks. Common cache block sizes are 32, 64, or 128 bytes and are called cache lines. If the required data is not in the cache (cache miss), the processor requests the required data from another processor. If there is no modified copy of the requested block in any processor or I / O controller, the block is retrieved from memory. In order to obtain permission to change a block, the processor that fetches the block from memory must become the owner of the block. When a processor with change permission becomes the owner, all other devices invalidate the copy held, and the previous owner passes the data requested by the processor with change permission to the new owner. After passing the data requested by the processor to which the previous owner has obtained permission to change to the new owner, if another processor attempts to share a read-only copy of the requested data from the owning processor, the data Provided by (not memory). When the owning processor needs free space in the cache to write new data, it writes the cache block to memory and the memory becomes the owner again. The process of finding the latest copy of a cache block is called “cache coherency”. System designers maintain memory consistency from the perspective of each processor using two main methods: broadcast coherency and directory coherency.

ブロードキャストコヒーレンシでは、すべてのアドレスがすべてのノードに送信される。各デバイスは、要求されたキャッシュラインがローカル・キャッシュでどのような状態になっているかを調べる(スヌープする)。システムは、各デバイスが要求されたキャッシュラインがローカル・キャッシュでどのような状態になっているかを調べた数サイクル後に、全体的なスヌープ結果を判定しているため、ブロードキャストコヒーレンシでは、遅延が最小限に抑えられる。 In broadcast coherency, all addresses are sent to all nodes. Each device examines (snoops) what state the requested cache line is in the local cache. Broadcast coherency minimizes latency because the system determines the overall snoop result after several cycles after each device examines the state of the requested cache line in the local cache. To the limit.

ディレクトリコヒーレンシでは、プロセッサからのアクセス要求に対し、特定のキャッシュブロックのアドレスを管理しているノード(ホームノード)だけにアドレスが送信される。ハードウェアは、どのノードがどのキャッシュブロックを共有または所有しているかをメモリ内のディレクトリや特殊なRAMや制御装置を使って管理している。「ディレクトリ」がメモリ内に埋め込まれているので原理的にはコントローラは毎回メモリをアクセスしてディレクトリ情報をチェックする必要があるので、プロトコルが複雑になるため、遅延は長くなり、遅延の変動も大きくなる。 In directory coherency, in response to an access request from a processor, an address is transmitted only to a node (home node) that manages the address of a specific cache block. The hardware manages which nodes share or own which cache block using a directory in memory, special RAM, or a control device. Since the "directory" is embedded in the memory, the controller needs to check the directory information by accessing the memory every time. In principle, the protocol becomes complicated, so the delay becomes longer and the delay varies. growing.

多ノードＳＭＰ構成を実現するために、キャッシュコヒーレンシ制御を多ノード間で制御する際には、クロスバスイッチを使用するのが一般的である。しかし、クロスバスイッチをトランザクションが通過しなければならないということは、クロスバスイッチが無い場合に比べ、トランザクションが通過しなければならない経路にデバイスが1つ増えるのでレイテンシが悪くなるという問題がある。要求系トランザクションと応答系トランザクションの往復経路でみると、クロスバスイッチを使用した場合と使用しない場合ではレイテンシにかなりの差が出ることとなる。 In order to realize a multi-node SMP configuration, a crossbar switch is generally used when cache coherency control is controlled between multiple nodes. However, the fact that the transaction must pass through the crossbar switch has a problem that the latency becomes worse because one device is added to the path through which the transaction must pass, compared to the case where there is no crossbar switch. Looking at the round trip path between the request system transaction and the response system transaction, there is a considerable difference in latency when the crossbar switch is used and when it is not used.

現在、クロスバスイッチを持たない多ノードＳＭＰ構成もあるが、ディレクトリコヒーレンシ方式のディレクトリベースのＳＭＰ構成が一般的でありコヒーレンシの遅延が長くなる分システム性能劣化の原因の一つとなっている。 Currently, there is a multi-node SMP configuration that does not have a crossbar switch, but a directory-based SMP configuration of a directory coherency method is generally used, which is one of the causes of system performance degradation due to an increase in coherency delay.

また、バックプレーン上でノード間を直接相互接続する方法として、特許文献４に記載の例がある。これは、ノード間を直接相互接続する方法を示しているが、キャッシュコヒーレンシの維持形態や、トランザクションの処理方式については明記されていない。 Further, as a method for directly interconnecting nodes on a backplane, there is an example described in Patent Document 4. This shows a method of directly interconnecting nodes, but the maintenance mode of cache coherency and the transaction processing method are not specified.

特開２００４−１１０７９１JP 2004-110791 A 特開２００４−０７８９３０JP2004-079930 特開２００３−２１６５９５JP 2003-216595 A 特開２００４−０７０９５４JP2004070954

本発明の目的は、ブレードサーバモジュール（ノードとも呼ぶ）としての機能に加え、複数のブレードサーバモジュール間を物理的にＳＭＰ結合することができるサーバ装置を提供することで、スケールアウトとスケールアップを同時に実現可能なマルチプロセッササーバ装置を実現することにある。 An object of the present invention is to provide a server apparatus capable of physically SMP coupling between a plurality of blade server modules in addition to a function as a blade server module (also referred to as a node). An object is to realize a multiprocessor server device that can be realized simultaneously.

本発明の目的は、さらに、多ノードＳＭＰ構成においてレイテンシを小さくすることにある。また、装置の部品点数を削減し、コスト低下、障害率低下及びリソースの削減をすることにある。 It is a further object of the present invention to reduce latency in a multi-node SMP configuration. Another object is to reduce the number of parts of the apparatus, reduce the cost, reduce the failure rate, and reduce resources.

本発明は、複数のサーバモジュールと装置全体を管理する管理ユニットからなるサーバ装置であって、各サーバモジュールは該サーバモジュールの動作モードを切り換えるモジュール管理部を備え、該モジュール管理部は前記管理ユニットから伝達される構成情報に従って各サーバモジュールが単独で動作するか又は他のサーバモジュールとＳＭＰ構成で協調して動作するかを切り換えることを特徴とする。 The present invention is a server device comprising a plurality of server modules and a management unit for managing the entire device, each server module comprising a module management unit for switching the operation mode of the server module, and the module management unit comprising the management unit According to the configuration information transmitted from the server, each server module switches between operating independently or operating cooperatively with other server modules in the SMP configuration.

本発明は、さらに、複数のサーバモジュール（ノード）から成るＳＭＰ構成のサーバ装置において、前記複数のノードを搭載して各ノード間を相互に接続するバックプレーンを備え、各ノードは自ノードを含め全ノードとの間でトランザクションの送信と受信を行うノードコントローラを備え、該ノードコントローラがトランザクションの順序付けを行うことを特徴とする。 The present invention further includes a server device having an SMP configuration composed of a plurality of server modules (nodes), and further comprising a backplane that mounts the plurality of nodes and connects the nodes to each other, and each node includes its own node. A node controller that transmits and receives transactions with all nodes is provided, and the node controller orders transactions.

本発明は、さらに、バックプレーン上で各ノード間リンクを等長配線し、自ノード内でもバックプレーン上の各ノード間リンクと等長のループ配線をすることで同期をとることを特徴とする。 The present invention is further characterized in that the links between the nodes are wired with the same length on the backplane, and the synchronization is achieved by making the loop wiring with the same length with the links between the nodes on the backplane even within the own node. .

本発明は、さらに、複数のサーバモジュールと装置全体を管理する管理ユニットと該複数のサーバモジュールに対し共通の基準クロックを分配する基準クロック分配ユニットからなるサーバ装置であって、各サーバモジュールは自身の基準クロックを発生させる基準クロック発生回路と、該基準クロック発生回路から発生した自身の基準クロックと前記基準クロック分配ユニットから分配された共通の基準クロックとを切り換えていずれか一方の基準クロックを当該サーバモジュール内に分配するクロック分配器と、前記管理ユニットから伝達される構成情報に従って該サーバモジュール内に分配する基準クロックの切り換えを前記クロック分配器に指示するモジュール管理部とを備えることを特徴とする。 The present invention further comprises a server device comprising a plurality of server modules, a management unit that manages the entire apparatus, and a reference clock distribution unit that distributes a common reference clock to the plurality of server modules. A reference clock generation circuit for generating a reference clock of the reference clock, and switching between a reference clock generated by the reference clock generation circuit and a common reference clock distributed from the reference clock distribution unit. A clock distributor distributed in the server module; and a module management unit that instructs the clock distributor to switch a reference clock distributed in the server module in accordance with configuration information transmitted from the management unit. To do.

本発明は、さらに、複数のサーバモジュール、装置全体を管理する管理ユニット、及び前記複数のサーバモジュールと前記管理ユニットを搭載して相互に信号伝達を可能とするバックプレーンとからなるサーバ装置であって、各サーバモジュールは基準クロックを出力する基準クロック発生回路と、該基準クロック発生回路から出力された基準クロック信号を入力して自サーバモジュールの第２のクロック分配器に出力すると供に前記バックプレーンを経由して自サーバモジュール及び他のサーバモジュールの第２のクロック分配器に出力する第１のクロック分配器と、当該自サーバモジュールの第１のクロック分配器から出力された基準クロック信号と前記バックプレーンを経由して入力された自サーバモジュール及び他のサーバモジュールからの基準クロック信号の中からいずれか１つの基準クロック信号を選択して当該サーバモジュール内に分配する前記第２のクロック分配器と、前記管理ユニットから伝達される構成情報に従って該サーバモジュール内に分配する基準クロックの切り換えを前記第２のクロック分配器に指示するモジュール管理部とを備えることを特徴とする。 The present invention further relates to a server device comprising a plurality of server modules, a management unit that manages the entire device, and a backplane that mounts the plurality of server modules and the management unit and enables mutual signal transmission. Each server module receives a reference clock generation circuit that outputs a reference clock, and receives the reference clock signal output from the reference clock generation circuit and outputs it to the second clock distributor of the server module. A first clock distributor that outputs to the second clock distributor of the local server module and other server modules via a plane, and a reference clock signal that is output from the first clock distributor of the local server module; The local server module and other server modules input via the backplane The second clock distributor that selects and distributes one of the reference clock signals from the reference clock signal from within the server module, and the server module according to the configuration information transmitted from the management unit And a module management unit that instructs the second clock distributor to switch the reference clock to be distributed.

本発明によれば、従来のブレードサーバシステムのスケールアウト型の拡張性に加え、複数のブレードサーバモジュール間をＳＭＰ結合することによるスケールアップ型の拡張性を有するサーバ装置及びブレードサーバモジュールを提供することが可能となることから、システム導入後のビジネス要件の変化に応じて、これらアプリケーションを実行するサーバが柔軟にそのリソースを拡大または縮小し最適化することで、結果的に運用コスト低減、ＴＣＯの削減が可能となる。 According to the present invention, there is provided a server device and a blade server module having a scale-up type expandability by SMP coupling between a plurality of blade server modules in addition to a scale-out type expandability of a conventional blade server system. As a result, the server that executes these applications flexibly expands or reduces resources and optimizes them according to changes in business requirements after the system is installed. Can be reduced.

また、本発明によれば、多ノード構成マルチプロセッサのサーバ装置におけるノード間リンク接続にクロスバスイッチを必要とせず、多ノードＳＭＰ構成において、レイテンシが小さくなりシステムの性能を向上することができる。また、クロスバスイッチを不要とすることで、部品点数の削減による障害率の低減、コスト低下、及びリソースの削減が実現できる。 Furthermore, according to the present invention, a crossbar switch is not required for link connection between nodes in a server device of a multi-node configuration multiprocessor, and in a multi-node SMP configuration, latency is reduced and system performance can be improved. Also, by eliminating the need for a crossbar switch, it is possible to reduce the failure rate, reduce costs, and reduce resources by reducing the number of components.

本発明の一つの実施形態のシステム構成図である。It is a system configuration figure of one embodiment of the present invention. ４ノードＳＭＰ構成の各ノード間リンクを示す図である。It is a figure which shows the link between each node of 4 node SMP structure. ４ノードＳＭＰ構成の各ノードの構成例である。It is an example of a structure of each node of 4 node SMP structure. ４ノードＳＭＰ構成の各ノードの他の構成例である。It is another example of a structure of each node of 4 node SMP structure. ４ノードＳＭＰ構成の各ノードの他の構成例である。It is another example of a structure of each node of 4 node SMP structure. ４ノードＳＭＰ構成の各ノード間リンクを示す図である。It is a figure which shows the link between each node of 4 node SMP structure. ４ノードＳＭＰ構成の各ノードの他の構成例である。It is another example of a structure of each node of 4 node SMP structure. ４ノードＳＭＰ構成におけるブロードキャストとコヒーレンシ応答を説明する図である。It is a figure explaining the broadcast and coherency response in 4 node SMP structure. トランザクションの追い越しの例を説明する図である。It is a figure explaining the example of the overtaking of a transaction. 待ち合わせ回路を使用したトランザクション応答の同期を説明する図である。It is a figure explaining the synchronization of the transaction response using a waiting circuit. ノードコントローラの構成例である。It is a structural example of a node controller. ノードコントローラの構成例である。It is a structural example of a node controller. ノードコントローラの処理フロー図である。It is a processing flow figure of a node controller. 図１の具体的な動作を説明するためのシステム構成図である。It is a system block diagram for demonstrating the specific operation | movement of FIG. 本発明の他の実施形態のシステム構成図である。It is a system configuration | structure figure of other embodiment of this invention. 一般的なブレードサーバシステムの構成図である。It is a block diagram of a general blade server system.

以下、本発明の実施例について図面を用いて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１６は一般的なブレードサーバシステムの構成図である。少なくとも２台以上のブレードサーバモジュール１１０（＃０〜＃ｎ）、サーバ装置全体を管理するサービスプロセッサユニット１１１、そしてこれらユニットを装着することでユニット間の信号伝達を実現するバックプレーン１１３により構成されている。ブレードサーバモジュール１１０には少なくとも１台以上のＣＰＵ２２を搭載でき、このＣＰＵ２２とメモリ２３を制御するノードコントローラ２０とＩ／Ｏ回路２４、そしてブレードサーバモジュール１１０内部の電源制御、構成管理、環境監視などの機能を有するモジュール管理部２５を備えることで、１台のサーバ装置としての機能を有している。また各ブレードサーバモジュール１１０内部には基準クロック発生器２６とクロック分配器２７からなる基準クロック分配回路１２１を備えており、ブレードサーバモジュール１１０内の同期動作する各ＬＳＩに基準クロックＳ２１が分配されているが、前述のように１台のブレードサーバモジュール１１０は１台のサーバ装置として独立しており、このクロックは他のブレードサーバモジュール内の基準クロックとは同期している必要は無い。 FIG. 16 is a configuration diagram of a general blade server system. At least two or more blade server modules 110 (# 0 to #n), a service processor unit 111 that manages the entire server apparatus, and a backplane 113 that realizes signal transmission between the units by mounting these units. ing. At least one or more CPUs 22 can be mounted on the blade server module 110, the node controller 20 and the I / O circuit 24 for controlling the CPU 22 and the memory 23, and power control, configuration management, environmental monitoring, etc. inside the blade server module 110, etc. By providing the module management unit 25 having the above functions, it has a function as one server device. Each blade server module 110 includes a reference clock distribution circuit 121 including a reference clock generator 26 and a clock distributor 27, and the reference clock S21 is distributed to the LSIs operating in synchronization in the blade server module 110. However, as described above, one blade server module 110 is independent as one server device, and this clock does not need to be synchronized with a reference clock in other blade server modules.

図１は本発明に係るマルチプロセッササーバ装置の一実施例を示したものである。複数のブレードサーバモジュール１０（＃０〜＃ｎ）、サーバ装置全体を管理するサービスプロセッサユニット１１、そしてこれらユニットを装着することでユニット間の信号伝達を実現するバックプレーン１３により構成され、各ブレードサーバモジュール１０内部のノードコントローラ２０はＳＭＰ結合インタフェースＳ２０を有しており、バックプレーン１３を経由して多ノードＳＭＰ構成を実現する機能を有している。 FIG. 1 shows an embodiment of a multiprocessor server apparatus according to the present invention. Each blade includes a plurality of blade server modules 10 (# 0 to #n), a service processor unit 11 that manages the entire server device, and a backplane 13 that implements signal transmission between the units by mounting these units. The node controller 20 in the server module 10 has an SMP coupling interface S20, and has a function of realizing a multi-node SMP configuration via the backplane 13.

図２は、本発明に係るクロスバスイッチを持たないブロードキャスト方式よる多ノードＳＭＰサーバの構成例であり、本図では４ノードの場合を例にあげている。 FIG. 2 shows an example of the configuration of a multi-node SMP server based on a broadcast system without a crossbar switch according to the present invention. In this figure, the case of 4 nodes is taken as an example.

ブロードキャストコヒーレンシでは、図８に示すようにすべてのアドレスがすべてのノードにブロードキャストされ、各ノードはコヒーレンシ応答をする。各ノードは、要求されたキャッシュラインがローカル・キャッシュでどのような状態になっているかを調べた数サイクル後に、全体的なスヌープ結果を判定してコヒーレンシ応答をする。 In broadcast coherency, as shown in FIG. 8, all addresses are broadcast to all nodes, and each node makes a coherency response. Each node determines the overall snoop result and makes a coherency response after several cycles of examining what state the requested cache line is in the local cache.

図９に示すように、アドレスのブロードキャストからコヒーレンシ応答までに要する時間は、アドレスをブロードキャストしたノードから近いノードと遠いノード、つまり、ノード間の配線が短いノードと長いノードがある場合にトランザクションの追い越しが発生してしまう。この、ノード間の配線の長さが違うことにより発生するトランザクションの追い越しを制御するために、一般的にはクロスバスイッチの実装をすることで回避するが、本手法では、クロスバスイッチを持たない為、トランザクションの順序づけをどの様にして行うかが問題となる。そこで、図２では、各ノード間リンクをバックプレーン201内で等長配線することで同サイクルとし、レイテンシを一定に保つことでノード間の同期をとるようにした。 As shown in FIG. 9, the time required from address broadcast to coherency response is overtaken by transactions when there are nodes near and far from the node that broadcast the address, that is, when there are short nodes and long nodes between the nodes. Will occur. In order to control the overtaking of transactions that occur due to the difference in wiring length between nodes, it is generally avoided by implementing a crossbar switch, but this method does not have a crossbar switch. The question is how to order transactions. Therefore, in FIG. 2, the links between the nodes are made the same cycle by wiring the same length in the backplane 201, and synchronization between the nodes is achieved by keeping the latency constant.

また、図１０に示すようにアドレスをブロードキャストしたノード自身へのコヒーレンシ応答は、ノード間の配線と通る必要が無いためノード渡りの時間が無く、トランザクション要求を出した他ノードからのコヒーレンシ応答よりもはやくなるために、トランザクションの追い越しが発生してしまう。この、ノード渡りの時間が無いことで発生してしまうトランザクションの追い越しを制御する為に、ノードコントローラ内に待ち合わせ回路を実装した。 Further, as shown in FIG. 10, the coherency response to the node itself that broadcasts the address does not need to pass through the wiring between the nodes, so there is no time for crossing the node, and more than the coherency response from the other node that issued the transaction request. Because it becomes fast, overtaking of the transaction occurs. In order to control the overtaking of transactions that occur due to lack of time for crossing nodes, a waiting circuit was implemented in the node controller.

図１１はノードコントローラ1002の構成を示しており、図１３に示すように、CPU1101から発行されたトランザクションはHOSTi/f1106によりシーケンサ1107に渡される（ステップ１３０１、１３０２）。次に、シーケンサ1107の制御により、トランザクションはタグ制御回路1104へ渡される。タグ制御回路1104がタグ情報1103を確認し（ステップ１３０６）、キャッシュ上でModifyまたは、Shared readの場合はシーケンサ1107へとかえり（ステップ１３０７、１３０３）、メモリi/f1109の制御により、メモリ1111にアクセスする（ステップ１３０４，１３０５）。ここで、キャッシュ上でModifyまたは、Shared readでなかった場合は、スヌープ問い合わせのために、コヒーレンシ送信部1105より、他ノードへとトランザクションを発行する（ステップ１３０８、１３１０）。この時、自ノードへのコヒーレンシ応答待ち合わせ回路1110へと送られる（ステップ１３０９）。 FIG. 11 shows the configuration of the node controller 1002. As shown in FIG. 13, the transaction issued from the CPU 1101 is transferred to the sequencer 1107 by the HOSTi / f 1106 (steps 1301, 1302). Next, the transaction is passed to the tag control circuit 1104 under the control of the sequencer 1107. The tag control circuit 1104 confirms the tag information 1103 (step 1306). In the case of Modify or Shared read on the cache, the tag control circuit 1104 returns to the sequencer 1107 (steps 1307 and 1303), and the memory i / f 1109 controls to store the memory 1111. Access is made (steps 1304 and 1305). Here, when neither Modify nor Shared read is performed on the cache, a transaction is issued from the coherency transmission unit 1105 to another node for a snoop inquiry (Steps 1308 and 1310). At this time, it is sent to the coherency response waiting circuit 1110 to the own node (step 1309).

図１２はノードコントローラ1201上に待ち合わせ回路1204を実装した例である。他ノードでのスヌープ処理が終わった後でかえってきたコヒーレンシ応答は、ノードコントローラ1201上のコヒーレンシ受信部1205で受ける。同期を取ることにより同タイミングでかえってきたコヒーレンシ応答と、待ち合わせ回路1204によるディレイでタイミングを合わせた自ノードへのコヒーレンシ応答のタイミングは全て一致する。コヒーレンシ受信部1105で受け取ったトランザクションは、シーケンサ1203に渡され、メモリi/f1109の制御によりメモリ1111にアクセスする。 FIG. 12 shows an example in which a waiting circuit 1204 is mounted on the node controller 1201. The coherency response returned after the snoop process in the other node is completed is received by the coherency receiving unit 1205 on the node controller 1201. The timing of the coherency response that has been changed at the same timing by synchronization and the timing of the coherency response to the own node that is timed by the delay by the waiting circuit 1204 coincide. The transaction received by the coherency receiving unit 1105 is transferred to the sequencer 1203 and accesses the memory 1111 under the control of the memory i / f 1109.

図１２に示す回路ように、サーバ起動時にCPUが起動前にファームウェア1206により、トランザクション要求先の各ノードからの応答時間と、自ノードでの待ち合わせ回路を経た応答時間に要した時間を一度計算する。この応答にかかった時間の計算から、ファームウェア1206により調整を行う。各ノード間の配線を等長にすることと、ノードコントローラ内に待ち合わせ回路(ループ配線)を実装することと、ファームウェア1206による応答時間の調整により、ノード間の同期をとっている。ノード間の同期をとることで、ブロードキャストされた該当アドレスに対するスヌープ結果の判定後の応答が一定となり、トランザクションの選択順序が必ず同じになることを保証している。尚、それぞれのノードを独立のサーバ増設の形としている場合は、図８に示すブロードキャストは生じない。 As shown in the circuit of FIG. 12, when the server is started, the firmware 1206 calculates the response time from each node of the transaction request destination and the time required for the response time after passing through the waiting circuit in the own node before starting the server. . Adjustment is performed by the firmware 1206 from the calculation of the time required for this response. The nodes are synchronized by making the wiring between the nodes equal in length, mounting a waiting circuit (loop wiring) in the node controller, and adjusting the response time by the firmware 1206. By synchronizing the nodes, it is guaranteed that the response after the determination of the snoop result for the corresponding broadcast address is constant, and the transaction selection order is always the same. Note that when each node is in the form of an independent server expansion, the broadcast shown in FIG. 8 does not occur.

図３は多ノードＳＭＰサーバの４ノード構成における各ノードの構成例を示している。ノード 301上では、複数のノードを結合して1つのＳＭＰを構成するためのノード間結合インタフェース307を持たせている。さらに、ノードコントローラ 302上にノードリンクコントローラ 303を実装した。ノードコントローラ 302上にノードリンクコントローラインタフェース306を持たせ、各ノードのノードリンクコントローラインタフェース306によってノードコントローラ同士を１対１接続することで、ＳＭＰ構成を実現している。ノードリンクコントローラインタフェース306はノードコントローラ内部のノードリンクコントローラ303と共にクロスバスイッチの役割を行う。ノードリンクコントローラ 303からノードリンクコントローラ 303への、ラッチ304をはさんだローカルループ配線をすることで、各ノードからの応答とのタイミングを合わせこみ、全ノード間の同期をとることで同サイクルとし、レイテンシを一定に保つことで、トランザクション応答のタイミングと一致させる機能を実現している。トランザクション応答のタイミングと一致させる機能の実現により、トランザクションの順序付けを保証している。 FIG. 3 shows a configuration example of each node in a four-node configuration of a multi-node SMP server. On the node 301, an inter-node coupling interface 307 for coupling a plurality of nodes to form one SMP is provided. Further, a node link controller 303 is mounted on the node controller 302. A node link controller interface 306 is provided on the node controller 302, and the node controllers are connected one to one by the node link controller interface 306 of each node, thereby realizing the SMP configuration. The node link controller interface 306 functions as a crossbar switch together with the node link controller 303 inside the node controller. By performing local loop wiring from the node link controller 303 to the node link controller 303 across the latch 304, the timing with the response from each node is matched, and the same cycle is achieved by synchronizing all nodes. A function that matches the timing of the transaction response is realized by keeping the latency constant. Ordering of transactions is guaranteed by realizing a function that matches the timing of transaction responses.

ノードコントローラ 302のトランザクション送信機能と受信機能は独立しておりノードコントローラ 302はトランザクションの送信と受信を並行して処理することが可能である。ノードリンクコントローラ 303は、コヒーレントトランザクションをすべてのノードへ同一の順序でブロードキャストする。ノードリンクコントローラインタフェース306を持ったノードコントローラ302は、各ノードから受け取ったコヒーレントトランザクションを同一の順序でノードコントローラ内部へ転送する。ノードリンクコントローラインタフェース306は、ブロードキャストトランザクションのノード間転送、コヒーレンシ応答トランザクションのノード間転送、1 to 1トランザクションのノード間転送の機能を持つ。ノードリンクを流れるトランザクションは、ECC(Error Correction Coding)によって保護される。 The node controller 302 has a transaction transmission function and a reception function that are independent, and the node controller 302 can process the transmission and reception of transactions in parallel. The node link controller 303 broadcasts coherent transactions to all nodes in the same order. The node controller 302 having the node link controller interface 306 transfers the coherent transaction received from each node into the node controller in the same order. The node link controller interface 306 has functions of inter-node transfer of a broadcast transaction, inter-node transfer of a coherency response transaction, and inter-node transfer of a 1 to 1 transaction. Transactions flowing through the node link are protected by ECC (Error Correction Coding).

ブロードキャストトランザクションは要求系トランザクションと応答系トランザクションに分類される。ノードリンクコントローラインタフェース 306とノードリンクコントローラ303を持ったノードコントローラ 302内部は要求系トランザクションと応答系トランザクションに対して二重化されているが、ノードリンク上では要求系トランザクションと応答系トランザクションは区別なく転送される。また、1 to 1トランザクションはアドレストランザクションとデータトランザクションに分類される。1 to 1トランザクションはアドレストランザクションとデータトランザクションに対して二重化されているが、ノードリンク上ではアドレストランザクションと対応データトランザクションに含まれるデータが連続して転送される。 Broadcast transactions are classified into request transactions and response transactions. The node controller 302 with node link controller interface 306 and node link controller 303 is duplexed for request transaction and response transaction, but the request transaction and response transaction are transferred without distinction on the node link. The 1 to 1 transactions are classified into address transactions and data transactions. The 1 to 1 transaction is duplicated for the address transaction and the data transaction, but the data included in the address transaction and the corresponding data transaction is continuously transferred on the node link.

各ノードでリンクの転送に何サイクルかかったかを計算する計算回路をノードコントローラ内に持たせ、計算させることで、リンク間のずれを各ノードのファームウェアに通知し各ノード間のサイクル数の同期をファームウェアの補正で行う。等長配線に加え、リンク間のずれを一切無くしているので、ブロードキャストされた該当アドレスに対するスヌープが一定となることが保証され、トランザクション応答のタイミングと一致させる機能の実現がなされ、トランザクションの順序付けを保証している。ノード間の同期をとっていたとしても、ノードコントローラ内のキューの状態により応答のタイミングがずれてしまわないように、ファームウェアの制御により余裕のあるノードが応答に時間のかかるノードの処理を待つことでトランザクションの選択順序が必ず同じになることを保証している。 By having a calculation circuit in the node controller that calculates how many cycles it took to transfer the link at each node, the shift between the links is notified to the firmware of each node, and the number of cycles between each node is synchronized. This is done by correcting the firmware. In addition to equal-length wiring, there is no shift between links, so it is guaranteed that the snoop for the broadcast address will be constant, and the function that matches the timing of the transaction response will be realized, and the transaction will be ordered. Guaranteed. Even if the nodes are synchronized, a node that has enough time to wait for the processing of a node that takes a long time to respond by firmware control so that the response timing does not shift due to the queue status in the node controller. The transaction selection order is guaranteed to be the same.

図４は4ノード構成における各ノードの他の構成例を示す。図２に示す各ノード間を等長配線したバックプレーン201と図４の404に示すように、ノードコントローラ402内部のローカルループ404の長さを図１に示すバックプレーン201内の各ノード間の等長配線と等長のループ配線を施すことで各ノードからのトランザクション応答とのタイミングをそろえる機能を実現することができるため、クロスバスイッチを不要とする「ブロードキャスト(スヌーピー)コヒーレンシ」方式のスヌープベースのＳＭＰ構成を実現できる。 FIG. 4 shows another configuration example of each node in a four-node configuration. The length of the local loop 404 in the node controller 402 is set between the nodes in the backplane 201 shown in FIG. By providing equal-length wiring and equal-length loop wiring, it is possible to realize a function that aligns the timing of transaction responses from each node, so a `` broadcast (snoopy) coherency '' snoop base that eliminates the need for crossbar switches The SMP configuration can be realized.

図５は4ノード構成における各ノードのさらに他の構成例を示す。図２に示す各ノードリンク間を等長配線したバックプレーン201と図５に示すノードリンクインタフェースを使用することでも、クロスバスイッチを不要とする「ブロードキャスト(スヌーピー)コヒーレンシ」方式のスヌープベースのＳＭＰ構成を実現できる。図５に示すノードコントローラ502上には他ノードへのデータ転送トランザクションの送信機能を持つリンクポート（ノード数-１）個と他ノードへのデータ転送トランザクションの受信機能を持つリンクポート（ノード数-１）個のノード間結合インタフェース506に加え、トランザクション要求を出す自身のノードへのローカルループ用のデータ転送トランザクションの送信リンクポートとデータ転送トランザクションの受信リンクポートを持ったノードリンクコントローラインタフェース505を実装している。ノードコントローラ502のトランザクション送信機能と受信機能は独立しているため、ノードコントローラはトランザクションの送信と受信を並行して処理することが可能である。図４に示すように、ノードリンクコントローラインタフェース404を出たあとのノード上の経路405に、各ノードリンク間と等長の配線をすることでディレイの役割を持たせ、各ノードからのトランザクション応答とのタイミングをそろえる機能を実現している。 FIG. 5 shows still another configuration example of each node in a four-node configuration. Snoop-based SMP configuration of the “broadcast (snoopy) coherency” method that eliminates the need for a crossbar switch even by using the backplane 201 having equal length wiring between the node links shown in FIG. 2 and the node link interface shown in FIG. Can be realized. On the node controller 502 shown in FIG. 5, there are link ports (number of nodes−1) having a function of transmitting data transfer transactions to other nodes and link ports having the function of receiving data transfer transactions to other nodes (number of nodes− 1) In addition to the inter-node connection interface 506, a node link controller interface 505 having a transmission link port for a data transfer transaction for a local loop and a reception link port for a data transfer transaction to its own node that issues a transaction request is implemented. is doing. Since the transaction transmission function and the reception function of the node controller 502 are independent, the node controller can process the transmission and reception of transactions in parallel. As shown in FIG. 4, the route 405 on the node after leaving the node link controller interface 404 has a role of delay by wiring the same length as between the node links, and the transaction response from each node The function that aligns the timing with is realized.

図６は4ノード構成における各ノードのさらに他の構成例を示す。図２に示すバックプレーン201内での各ノード間の等長配線に加え、図６に示すようにバックプレーン601内に各ノードへのループ配線606-609を各ノード間リンクの等長配線と同じ長さで配線することでトランザクション応答のタイミングをそろえている。この場合、ノード間結合インタフェース705は、図７に示すように、ノードコントローラ702内にデータ要求トランザクションの送信部と受信部のノードリンクポートをノード数用意するのみで、ノードコントローラ内でのラッチをはさんだループ配線、及び各ノード間と等長のループ配線を施す必要はない。 FIG. 6 shows still another configuration example of each node in a four-node configuration. In addition to the equal length wiring between the nodes in the backplane 201 shown in FIG. 2, the loop wiring 606-609 to each node in the backplane 601 as shown in FIG. By wiring with the same length, the transaction response timing is aligned. In this case, as shown in FIG. 7, the inter-node connection interface 705 simply prepares the node link ports of the data request transaction transmission unit and the reception unit in the node controller 702, and latches in the node controller. It is not necessary to provide interleaved loop wiring and loop wiring of the same length between the nodes.

又、本発明は、多ノード構成マルチプロセッサのサーバ装置において、各ノードをサーバブレードとするブレードサーバであって、データ転送トランザクションの順序付けをノードコントローラ内部で行うことにより、外部にクロスバスイッチを必要としないノード間リンク接続方式を採っており、対象型マルチプロセッサ構成へのプロセッサ増設の形でも、独立のサーバの増設の形でも可能である。 Further, the present invention is a blade server in which each node is a server blade in a multi-node configuration multiprocessor server apparatus, and requires an external crossbar switch by ordering data transfer transactions inside the node controller. The inter-node link connection method is adopted, and it is possible to add a processor to the target multiprocessor configuration or to add an independent server.

図１において、ブレードサーバモジュール１０間ＳＭＰ結合をより高性能とするためにはサーバモジュール間で基準クロックが同期していることが不可欠であるため、全てのブレードサーバモジュール１０に基準クロックを分配できる基準クロック分配ユニット１４をバックプレーン１３に装着し、分配された基準クロックをバックプレーン１３内で等長配線し、各ブレードサーバモジュール１０内部の基準クロック分配回路２１内のクロック分配器２７によって基準クロックを切り換えることで、全ブレードサーバモジュール１０の基準クロックの同期化を実現可能としている。 In FIG. 1, since it is indispensable that the reference clocks are synchronized between the server modules in order to make the SMP coupling between the blade server modules 10 higher performance, the reference clocks can be distributed to all the blade server modules 10. The reference clock distribution unit 14 is mounted on the backplane 13, the distributed reference clocks are wired in the backplane 13 at the same length, and the reference clock is distributed by the clock distributor 27 in the reference clock distribution circuit 21 in each blade server module 10. By switching these, the synchronization of the reference clocks of all the blade server modules 10 can be realized.

図１４を用いて前述のクロック切り換え動作について、代表的なシステム構成例を示し具体的に説明する。本システム構成例では４台のブレードサーバモジュール１０（＃０〜＃３）がバックプレーン１３に装着されており、＃０、＃１を協調した１台のＳＭＰサーバとし、＃２、＃３を独立したブレードサーバとして使用する構成とする場合、まずユーザが管理ソフトウェアを介してサービスプロセッサユニット１１に対し、このシステム構成情報を設定する。ここで設定された構成情報は、サービスプロセッサユニット１１内にもつメモリ２８に格納され、電源が遮断されても消去されないものとし、システム起動毎にサービスプロセッサユニット１１から各ブレードサーバモジュール１０内のモジュール管理部２５に伝達される。各モジュール管理部２５は基準クロック分配回路２１に対し、サービスプロセッサユニット１１から伝達された構成情報に応じて使用する基準クロックを切り換える指示をする。本図構成例では、ブレードサーバモジュール＃０、＃１の基準クロックを外部基準クロックに切り換えることで＃０、＃１の基準クロックを同期化しＳＭＰサーバを構成しており、ブレードサーバモジュール＃２、＃３の基準クロックを内部基準クロックに切り換えることでそれぞれ独立したサーバとして稼動できる。各ブレードサーバモジュール１０のクロック切り換えに関するシステム構成情報をサービスプロセッサユニット１１内のメモリ２８に持たせることによって、ブレードサーバモジュール１０に障害が発生して保守交換される場合においても、構成情報を引き継ぐ一切の手順が不要となる効果がある。また、本図のようにシステム構成情報を一元管理するサービスプロセッサユニット１１を二重化することによってシステム全体の信頼性向上を図ることも可能となる。 The above-described clock switching operation will be specifically described with reference to a typical system configuration example with reference to FIG. In this system configuration example, four blade server modules 10 (# 0 to # 3) are mounted on the backplane 13, and # 0 and # 1 are coordinated as one SMP server. When the configuration is to be used as an independent blade server, the user first sets the system configuration information for the service processor unit 11 via the management software. The configuration information set here is stored in the memory 28 in the service processor unit 11 and is not erased even when the power is turned off. The module in the blade server module 10 from the service processor unit 11 every time the system is started. This is transmitted to the management unit 25. Each module management unit 25 instructs the reference clock distribution circuit 21 to switch the reference clock to be used according to the configuration information transmitted from the service processor unit 11. In this configuration example, the SMP server is configured by synchronizing the reference clocks of # 0 and # 1 by switching the reference clock of the blade server modules # 0 and # 1 to the external reference clock, and the blade server module # 2, By switching the reference clock of # 3 to the internal reference clock, it can operate as an independent server. By providing system configuration information related to clock switching of each blade server module 10 in the memory 28 in the service processor unit 11, even when a failure occurs in the blade server module 10 and maintenance replacement is performed, the configuration information is not taken over. There is an effect that the procedure is unnecessary. Further, it is possible to improve the reliability of the entire system by duplicating the service processor unit 11 for centrally managing the system configuration information as shown in the figure.

図１５では図１、図１４で説明した実施例とは異なる他の実施の形態について説明する。尚、図１５ではクロック分配に関わる機能以外は図１、図１４に示す実施の形態と同じであるため、図示及び説明を省略する。これまで説明した実施例では、図１、図１４のように基準クロック分配ユニット１４から各ブレードサーバモジュール１０に対し基準クロックを分配していたが、図１５の実施例ではこの基準クロック分配ユニットの機能をブレードサーバモジュール１０内部に取り込んでいる。まず基準クロック発生器２６の出力クロック信号が第１のクロック分配器３０に入力され、このクロック分配器の出力は第２のクロック分配器２９、及びバックプレーン１３を経由して、自身を含み一緒にＳＭＰサーバを構成し得る全てのブレードサーバモジュール１０内部のクロック分配器２９に等長配線にて接続されている。第２のクロック分配器２９の出力はモジュール管理部２５によって切り換えられる。例えば図１５の構成において、ブレードサーバモジュール＃０上の第２のクロック分配器２９はクロック信号Ｓ２２を選択し、ブレードサーバモジュール＃１上の第２のクロック分配器２９はクロック信号Ｓ２３を選択し、ブレードサーバモジュール＃２上の第２のクロック分配器２９はクロック信号Ｓ２４を選択し、ブレードサーバモジュール＃３上の第２のクロック分配器２９はクロック信号Ｓ２５を選択したとする。こうすることによってブレードサーバモジュール＃０と＃１で１台のＳＭＰサーバＡ１５００を構成し、＃２と＃３で１台のＳＭＰサーバＢ１５０１を構成することが可能である。このようにこの実施の形態によれば２台のＳＭＰサーバの基準クロックは完全に独立していることから、ブレードサーバモジュール＃０、＃１の組と＃２、＃３の組で基準クロックの周波数が異なるブレードサーバモジュール、例えば異種あるいは次世代ブレードサーバモジュールにより構成されるＳＭＰサーバの同一サーバシャーシ内混載が実現可能となる。 In FIG. 15, another embodiment different from the embodiment described in FIGS. 1 and 14 will be described. 15 is the same as the embodiment shown in FIGS. 1 and 14 except for functions related to clock distribution, and therefore illustration and description thereof are omitted. In the embodiment described so far, the reference clock is distributed from the reference clock distribution unit 14 to each blade server module 10 as shown in FIGS. 1 and 14, but in the embodiment of FIG. The function is incorporated in the blade server module 10. First, the output clock signal of the reference clock generator 26 is input to the first clock distributor 30, and the output of this clock distributor includes the second clock distributor 29 and the backplane 13, including itself. Are connected to the clock distributors 29 in all blade server modules 10 that can constitute the SMP server by equal-length wiring. The output of the second clock distributor 29 is switched by the module management unit 25. For example, in the configuration of FIG. 15, the second clock distributor 29 on the blade server module # 0 selects the clock signal S22, and the second clock distributor 29 on the blade server module # 1 selects the clock signal S23. Assume that the second clock distributor 29 on the blade server module # 2 selects the clock signal S24, and the second clock distributor 29 on the blade server module # 3 selects the clock signal S25. By doing so, it is possible to configure one SMP server A 1500 with the blade server modules # 0 and # 1, and configure one SMP server B 1501 with # 2 and # 3. As described above, according to this embodiment, since the reference clocks of the two SMP servers are completely independent, the combination of blade server modules # 0, # 1 and # 2, # 3 It is possible to implement the same server chassis mixed mounting of SMP servers composed of blade server modules having different frequencies, for example, heterogeneous or next generation blade server modules.

１０ブレードサーバモジュール
１１サービスプロセッサユニット
１３バックプレーン
１４基準クロック分配ユニット
２０ノードコントローラ
２１基準クロック分配回路
２２ＣＰＵ
２３メモリ
２４Ｉ／Ｏ回路
２５モジュール管理部
２６基準クロック発生部
２７クロック分配器
２８サービスプロセッサユニット内メモリ
２９第２のクロック分配器
３０第１のクロック分配器
１５００ＳＭＰサーバＡ
１５０１ＳＭＰサーバＢ
Ｓ２０ＳＭＰ結合インタフェース
Ｓ２１〜Ｓ２５基準クロック
２０１バックプレーン
２０２〜２０５ノード(ブレードサーバモジュール)
３０１ノード(ブレードサーバモジュール)
３０２ノードコントローラ
３０３ノードリンクコントローラ
３０４ラッチ
３０５ノードコントローラ内のローカルループ配線
３０６ノードリンクコントローラインタフェース
３０７ノード間結合インタフェース
４０１ノード(ブレードサーバモジュール)
４０２ノードコントローラ
４０３ノードリンクコントローラ
４０４ノードコントローラ上のローカルループ配線
４０５ノードリンクコントローラインタフェース
４０６ノード間結合インタフェース
５０１ノード(ブレードサーバモジュール)
５０２ノードコントローラ
５０３ノードリンクコントローラ
５０４ノードリンクコントローラインタフェース
５０５ノード上のローカルループ配線
５０６ノード間結合インタフェース
６０１ノードリンク間を等長配線したバックプレーン
６０２〜６０５ノード(ブレードサーバモジュール)
６０６〜６０９自ノードへのローカルループ配線
７０１ノード(ブレードサーバモジュール)
７０２ノードコントローラ
７０３ノードリンクコントローラ
７０４ノードリンクコントローラインタフェース
７０５ノード間結合インタフェース
８０１〜８０４ノード(ブレードサーバモジュール)
１１０１ＣＰＵ
１１０２ノードコントローラ
１１０３タグ
１１０４タグ制御回路
１１０５コヒーレンシ送信部
１１０６ＨＯＳＴｉ／ｆ
１１０７シーケンサ
１１０８コヒーレンシ受信部
１１０９メモリｉ／ｆ
１１１０待ち合わせ回路
１１１１メモリ
１２０１ノードコントローラ
１２０２コヒーレンシ送信部
１２０３シーケンサ
１２０４待ち合わせ回路
１２０５コヒーレンシ受信部
１２０６ファームウェア 10 blade server module 11 service processor unit 13 backplane 14 reference clock distribution unit 20 node controller 21 reference clock distribution circuit 22 CPU
23 Memory 24 I / O Circuit 25 Module Management Unit 26 Reference Clock Generation Unit 27 Clock Distributor 28 Memory in Service Processor Unit 29 Second Clock Distributor 30 First Clock Distributor 1500 SMP Server A
1501 SMP server B
S20 SMP coupling interface S21 to S25 Reference clock 201 Backplane 202 to 205 Node (blade server module)
301 nodes (blade server module)
302 Node controller 303 Node link controller 304 Latch 305 Local loop wiring 306 in the node controller Node link controller interface 307 Inter-node connection interface 401 Node (blade server module)
402 Node controller 403 Node link controller 404 Local loop wiring 405 on the node controller Node link controller interface 406 Internode connection interface 501 Node (blade server module)
502 Node controller 503 Node link controller 504 Node link controller interface 505 Local loop wiring on node 506 Internode connection interface 601 Backplane 602 to 605 with equal length wiring between node links Node (blade server module)
606 to 609 Local loop wiring 701 to the own node Node (blade server module)
702 Node controller 703 Node link controller 704 Node link controller interface 705 Inter-node connection interface 801-804 Node (blade server module)
1101 CPU
1102 Node controller 1103 Tag 1104 Tag control circuit 1105 Coherency transmission unit 1106 HOSTi / f
1107 Sequencer 1108 Coherency receiver 1109 Memory i / f
1110 Waiting circuit 1111 Memory 1201 Node controller 1202 Coherency transmission unit 1203 Sequencer 1204 Waiting circuit 1205 Coherency reception unit 1206 Firmware

Claims

In an SMP server device composed of a plurality of server modules (hereinafter referred to as nodes),
A backplane for mounting the plurality of nodes and interconnecting the nodes;
Each node
A node controller having a transmission unit that transmits a transaction with all nodes including its own node and a reception unit that receives a transaction with all nodes including its own node;
Receiving a transaction transmitted from the transmission unit of the own node, and after a predetermined time delay, comprising a waiting circuit for transmitting the received transaction to the reception unit of the own node,
A server apparatus, wherein the node controller orders transactions.

2. The node controller according to claim 1, wherein each node controller has a path for transferring a transaction to another node controller and the own node controller, and the number of transfer cycles between all nodes including the own node is the same. Server device.

3. The server apparatus according to claim 2, further comprising means for adjusting the number of transfer cycles in a transfer path between the nodes within each node.

2. The server apparatus according to claim 1, wherein each node controller includes means for adjusting a response time of a transaction between other nodes and between the own nodes.

The server device according to claim 1, wherein the node controller includes the waiting circuit.

The server device according to claim 5, wherein the waiting circuit is a latch.