JP2004054931A

JP2004054931A - System and method for memory migration in distributed memory multiprocessor system

Info

Publication number: JP2004054931A
Application number: JP2003181949A
Authority: JP
Inventors: Das Sharuma Debendora; デベンドラ・ダス・シャルマ; Ashish Gupta; アシシュ・グプタ; William R Bryg; ウィリアム・アール・ブライグ
Original assignee: Hewlett Packard Development Co LP
Current assignee: Hewlett Packard Development Co LP
Priority date: 2002-07-23
Filing date: 2003-06-26
Publication date: 2004-02-19
Anticipated expiration: 2023-06-26
Also published as: JP4500014B2; US7103728B2; US20040019751A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for memory migration in a distributed memory multiprocessor system. <P>SOLUTION: The distributed memory multiprocessor system includes a plurality of cells communicatively coupled to each other and collectively including a plurality of processors, caches, main memories and cell controllers. Each of the cells include at least one of the processors, at least one of the caches, at least one of the main memories, and at least one of the cell controllers. Each of the cells is configured to perform a memory migration function of migrating memory from a first main memory of the main memories to a second main memory by a method that is invisible to an operating system of the multiprocessor system. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、概してコンピュータシステムに関する。より詳細には、分散メモリマルチプロセッサシステムにおけるメモリ移行（ｍｉｇｒａｔｉｏｎ）に関する。
【０００２】
【従来の技術】
従来、メインメモリは、物理的に中央バス上に位置していた。このタイプのシステム内では、完全な物理アドレスからなるメモリ要求がメモリサブシステムに転送され、データが返されていた。分散メモリシステムでは、メインメモリは、物理的に多くの異なるセルにわたって分散される。セルは、複数のプロセッサと、１つまたは複数の入出力（Ｉ／Ｏ）装置と、セルコントローラと、メモリとからなり得る。各セルは、メインメモリ空間の異なる部分を保持する。各プロセッサは、ローカルメモリだけでなく、１つまたは複数のクロスバスイッチ等のセル通信リンク回路を介して他のセルのメモリにもアクセスすることができる。
【０００３】
キャッシングは、メモリアクセスに関連するパフォーマンス上の制限を改良することができる。キャッシングは、メインメモリより小さく高速なキャッシュメモリに、メインメモリの内容のサブセットを格納することを含む。キャッシュ内容がデータに対する要求を予期する確率を増大させるために、あらゆる戦略が使用される。たとえば、メモリアドレス空間における要求されたワードに近いデータは、その要求されたワードと時間的に近接して要求される可能性が比較的高いため、大抵のキャッシュはマルチワードラインをフェッチし格納する。単一のキャッシュラインに格納されるワードの数は、システムのラインサイズを画定する。たとえば、キャッシュラインは、８ワード長であってよい。
【０００４】
キャッシュは、通常、メインメモリよりはるかに少ないライン記憶ロケーションを有する。通常、キャッシュデータを所持するメインメモリラインアドレスを一意に指示するために、各キャッシュロケーションにおいてデータとともに「タグ」が格納される。
【０００５】
シングルプロセッサシステムとマルチプロセッサシステムとの両方において、キャッシュとメインメモリとの間の「整合性（ｃｏｈｅｒｅｎｃｙ：コヒーレンシ）」を保証するという課題がある。たとえば、プロセッサがキャッシュに格納されたデータを変更すると、その変更がメインメモリにおいて反映されなければならない。通常、キャッシュにおいてデータが変更される時刻とメインメモリにおいて変更が反映される時刻との間に幾分かのレイテンシがある。このレイテンシ中、メインメモリにおける未変更データは無効である。メインメモリデータが無効である間は読み出されないことを保証するための処置がとられなければならない。
【０００６】
各プロセッサまたは入出力モジュールがキャッシュメモリを有する、分散メモリマルチプロセッサシステムの場合、キャッシュメモリを有するシングルプロセッサシステムの場合より状況は幾分か複雑である。マルチプロセッサシステムでは、特定のメインメモリアドレスに対応する現データを、１つまたは複数のキャッシュメモリおよび／またはメインメモリに格納してよい。キャッシュメモリのデータは、プロセッサによって操作された可能性があり、その結果、値がメインメモリに格納された値と異なることになる。このため、いかなるアドレスの現データ値も、そのデータ値がどこに存在するかとは無関係に提供されることを保証するために、「キャッシュ整合性方式（ｃａｃｈｅ　ｃｏｈｅｒｅｎｃｙ　ｓｃｈｅｍｅ）」が実施される。
【０００７】
通常、キャッシュデータを変更するためには「許可」が必要である。通常、データが正確に１つのキャッシュに格納されている場合にのみ、その許可は与えられる。複数のキャッシュに格納されたデータは、しばしば読取専用として扱われる。各キャッシュラインは、そのラインに格納されているデータを変更する許可が与えられるか否かを示す１つまたは複数の状態ビットを含むことができる。状態の正確な特質はシステムによって決まるが、通常、変更する許可を示すために「プライバシ（ｐｒｉｖａｃｙ）」状態ビットが使用される。プライバシビットが「プライベート（ｐｒｉｖａｔｅ）」を示す場合、１つのキャッシュのみがそのデータを保持し、関連するプロセッサはそのデータを変更する許可を有する。プライバシビットが「パブリック（ｐｕｂｌｉｃ）」を示す場合、いかなる数のキャッシュもデータを保持することができ、いかなるプロセッサもデータを変更することができない。
【０００８】
マルチプロセッサシステムでは、データを読み出すかまたは変更することを望むプロセッサに対し、通常、あるとすればいずれのキャッシュがそのデータのコピーを有するかと、そのデータの変更に対し許可が与えられるかとに関する判断がなされる。「スヌーピング（Ｓｎｏｏｐｉｎｇ）」は、その判断を行うために複数のキャッシュの内容を検査することを含む。要求されたデータがローカルキャッシュにおいて見つからない場合、リモートキャッシュを「スヌーピングする」ことができる。プライベートデータを、別のプロセッサが読み出すことができるようにパブリックにするよう要求するリコール（ｒｅｃａｌｌ）を発行することができ、もしくは、いくつかのキャッシュのパブリックデータを、別のキャッシュが変更することができるように無効にするリコールを発行することができる。
【０００９】
多数のプロセッサおよびキャッシュに対し、網羅的なスヌーピングは、パフォーマンスを低下させる可能性がある。この理由により、分散メモリマルチプロセッサシステムによっては、セル内でスヌーピングし、セル間整合性についてはディレクトリベースのキャッシュ整合性に頼るものがある。ディレクトリベースのキャッシュ整合性を用いる分散メモリマルチプロセッサシステムは、１９９７年８月２５日に出願され、２０００年４月２５日に発行され、「ＤＩＳＴＲＩＢＵＴＥＤ　ＭＥＭＯＲＹ　ＭＵＬＴＩＰＲＯＣＥＳＳＯＲ　ＣＯＭＵＰＵＴＥＲ　ＳＹＳＴＥＭ　ＷＩＴＨ　ＤＩＲＥＣＴＯＲＹ　ＢＡＳＥＤ　ＣＡＣＨＥ　ＣＯＨＥＲＥＮＣＹ　ＷＩＴＨ　ＡＭＢＩＧＵＯＵＳ　ＭＡＰＰＩＮＧ　ＯＦ　ＣＡＣＨＥＤ　ＤＡＴＡ　ＴＯ　ＭＡＩＮ−ＭＥＭＯＲＹ　ＬＯＣＡＴＩＯＮＳ」と題された米国特許第６，０５５，６１０号に記載されている。
【００１０】
ディレクトリベースのキャッシュ整合性を使用する分散メモリシステムでは、各セルのメインメモリは、通常、ディレクトリエントリをメモリの各ラインに関連付ける。各ディレクトリエントリは、通常、ラインをキャッシュするセルと、データのラインがパブリックであるかプライベートであるかとを特定する。また、ディレクトリエントリは、データをキャッシュするセル内の特定のキャッシュ（１つまたは複数）を特定してもよく、および／またはスヌーピングを使用してセル内のいずれのキャッシュ（１つまたは複数）がデータを有しているかを判断してよい。このように、各セルは、そのメインメモリに格納されたデータのキャッシュされたコピーのロケーションを示すディレクトリを含む。
【００１１】
例として、８セルシステムでは、各ディレクトリエントリは９ビット長であってよい。８つのセルの各々について、各々の「サイト（ｓｉｔｅ）」ビットは、各々のセルがラインのキャッシュされたコピーを含むか否かを示す。９番目の「プライバシ」ビットは、データがプライベートに保持されているかパブリックに保持されているかを示す。
【００１２】
時に、セルからセルへまたは特定のセル内でメモリを移動または移行することが望ましい。たとえば、メモリを、欠陥のあるメモリデバイスから予備のメモリデバイスに移行させることができる。他の例として、１つまたは複数のメモリデバイスを含むボードを、恐らくはそのボードが欠陥のあるコンポーネントを含むため、そのボードがより新しい改訂版によって置き換えられているため、または他の何らかの理由により、システムから取り外す必要のある場合がある。ボードを取り外す前に、ボードからメモリを別のロケーションに移行することが望ましい場合がある。
【００１３】
【発明が解決しようとする課題】
メモリ移行は、通常オペレーティングシステム介入によって発生し、メモリは、まず割付解除され、後に所望の宛先に再割付される。かかる従来技術によるメモリ移行技法では、移行されているメモリにアクセスしているプロセスが停止する可能性があり、もしくは、システムが、プロセスが終了するのを待たなければメモリを移行することができない可能性がある。このため、従来技術によるメモリ移行技法は、ソフトウェアの動作に影響を与え、時に、システムをある期間使用不可能にする。さらに、従来技術による移行技法を使用すると、オペレーティングシステムおよびファームウェアが必要とするいくつかのページを容易に移行することができない。
【００１４】
また、メモリはインタリーブされる可能性もあり、従来の技法を使用するメモリ移行がさらに困難となる。メモリをデインタリーブすることは単純なタスクではなく、時に、デインタリーブソリューションは存在しない。
【００１５】
【課題を解決するための手段】
本発明の一形態では、互いに通信可能に結合され、全体として複数のプロセッサとキャッシュとメインメモリとセルコントローラとを含む、複数のセルを有する、分散メモリマルチプロセッサシステムを提供する。セルの各々は、プロセッサのうちの少なくとも１つと、キャッシュのうちの少なくとも１つと、メインメモリのうちの１つと、セルコントローラのうちの１つとを含む。セルの各々は、本システムのオペレーティングシステムに対して不可視である方法で、メインメモリのうちの第１のメインメモリからメインメモリのうちの第２のメインメモリにメモリを移行させるメモリ移行機能を実行するように構成される。
【００１６】
【発明の実施の形態】
好ましい実施形態の以下の詳細な説明では、実施形態の一部を形成し、本発明を実施することができる特定の実施形態を例示として示す、添付図面を参照する。他の実施形態を利用してよく、本発明の範囲から逸脱することなしに構造的変更または論理的変更を行ってよい、ということを理解しなければならない。したがって、以下の詳細な説明は、限定する意味でとられるべきものではなく、本発明の範囲は、併記特許請求項によって規定される。
【００１７】
図１は、本発明の一実施形態による、オペレーティングシステム介入なしにメモリを移行するように構成された分散メモリマルチプロセッサシステム１００を示すブロック図である。システム１００は、８つのセル１０２、１０４、１０６、１０８、１１０、１１２、１１４および１１６を有し、それらはセル通信リンク１１８を介して通信可能に結合される。セル１０２は、４つのメモリアクセス装置１２０Ａ〜１２０Ｄと、４つのキャッシュ１２４Ａ〜１２４Ｄと、高速整合性ディレクトリまたはディレクトリキャッシュ１２６と、セルコントローラ１２８と、メインメモリ１３６とを有する。同様に、セル１１６は、４つのメモリアクセス装置１６４Ａ〜１６４Ｄと、４つのキャッシュ１６８Ａ〜１６８Ｄと、高速整合性ディレクトリ１５０と、セルコントローラ１５２と、メインメモリ１６０とを有する。
【００１８】
一実施形態では、メモリアクセス装置１２０Ａ〜１２０Ｂおよび１６４Ａ〜１６４Ｂはプロセッサであり、メモリアクセス装置１２０Ｃ〜１２０Ｄおよび１６４Ｃ〜１６４Ｄは入出力（Ｉ／Ｏ）モジュールである。メモリアクセス装置１２０Ａ〜１２０Ｄは、それぞれアドレスレジスタ１２２Ａ〜１２２Ｄを有する。メモリアクセス装置１６４Ａ〜１６４Ｄは、それぞれアドレスレジスタ１６６Ａ〜１６６Ｄを有する。セルコントローラ１２８は、ファームウェア１３０と、コンフィギュレーション・ステータスレジスタ（ＣＳＲ）１３２と、順序付きアクセスキュー（ＯＡＱ）１３４とを有する。セルコントローラ１５２は、ファームウェア１５４と、コンフィギュレーション・ステータスレジスタ１５６と、順序付きアクセスキュー１５８と、を有する。本発明の一形態では、セル１０４、１０６、１０８、１１０、１１２および１１４は、実質的にはセル１０２および１１６と同じである。
【００１９】
本発明の一実施形態を、各セルが複数のプロセッサと複数のＩ／Ｏモジュールとを含むマルチセルシステムのコンテキストで説明するが、当業者には、本明細書で説明するメモリ移行技法は他のシステム構成にも適用可能である、ということが明らかとなろう。たとえば、本発明の代替実施形態は、図１に示すもののようなセルを有するのではなく、単一プロセッサ（キャッシュ付き）ビルディングブロック、単一Ｉ／Ｏモジュール（キャッシュ付き）ビルディングブロックまたは他のビルディングブロック等、他のシステムビルディングブロックを組み込んでよい。
【００２０】
一実施形態によれば、標準動作時、システム１００は、ディレクトリベースのキャッシュ整合性を使用して従来からの方法でメモリにアクセスするように構成される。たとえば、プロセッサ１２０Ａによりメインメモリ１３６からワードがフェッチされると、そのワードはキャッシュ１２４Ａのキャッシュラインに格納される。さらに、要求されたワードに隣接するワードもまた、要求されたワードとともにフェッチされキャッシュラインに格納される。
【００２１】
プロセッサ１２０Ａによるデータに対する要求には、システム１００のメインメモリロケーションのうちの１つを一意に特定するメインメモリワードアドレスが含まれる。キャッシュ１２４Ａは、メインメモリアドレスの複数の最下位ビットを除去することにより、プロセッサ１２０Ａからのメインメモリワードアドレスをラインアドレスに変換する。このラインアドレスがセルコントローラ１２８に転送されることにより、要求されたデータの位置が特定される。要求が所有者セルに転送されなければならない場合、セルコントローラ１２８は、ラインアドレスの複数の最上位ビットをセルＩＤに復号し、要求を満たすことができるように、アドレスの残りのビットを適当なセルに転送する。一実施形態では、セルコントローラは、セル内およびセル間のすべてのメモリトランザクションを処理する。
【００２２】
キャッシュ（たとえば、キャッシュ１２４Ａ〜１２４Ｄおよび１６８Ａ〜１６８Ｄ）は、ラインアドレスの複数のビットを使用してキャッシュラインを特定する。そして、ラインアドレスの残りのビットが、特定されたキャッシュラインに格納されたタグと比較される。「ヒット」の場合（すなわち、タグがラインアドレスの残りのビットと一致する場合）、ラインアドレスの複数の最下位ビットを使用して、プロセッサ（またはＩ／Ｏモジュール）に転送するために、特定されたキャッシュラインに格納されたワードのうちの１つが選択される。ミスの場合（すなわち、タグが一致しない場合）、最終的にメインメモリからフェッチされるラインは、特定されたキャッシュラインにおいてデータのラインに上書きし、特定されたキャッシュラインにおけるタグが更新される。最後に、要求されたワードは、キャッシュラインからプロセッサ（またはＩ／Ｏモジュール）に転送される。
【００２３】
本発明の一形態では、キャッシュ（たとえば、キャッシュ１２４Ａ〜１２４Ｄおよび１６８Ａ〜１６８Ｄ）における各キャッシュラインについて状態ビットを含み、各セルのメインメモリの整合性ディレクトリ（たとえば、整合性ディレクトリ１３８および１６２）に整合性情報を格納し、各セルの高速整合性ディレクトリ（たとえば、高速整合性ディレクトリ１２６および１５０）に整合性情報を格納することにより、システム１００において従来からの方法で整合性が実施される。
【００２４】
一実施形態では、各キャッシュラインにタグビットおよびユーザデータビットを格納することに加えて、各キャッシュラインには「有効性」状態ビットおよび「プライバシ」状態ビットもまた格納される。有効性状態ビットは、キャッシュラインが有効であるか無効であるかを示す。プライバシ状態ビットは、キャッシュラインに格納されたデータがパブリックであるかプライベートであるかを示す。プロセッサは、そのキャッシュのいかなる有効データも読み出すことができる。しかしながら、一実施形態では、プロセッサは、そのキャッシュがプライベートに保持するデータしか変更することができない。プロセッサが、パブリックに保持するデータを変更する必要がある場合、そのデータはまずプライベートにされる。プロセッサがその関連するキャッシュにないデータを変更する必要がある場合、そのデータはそのキャッシュにプライベートとして入れられる。データは、他のキャッシュによって使用中である場合、プライベートにされる前にそのキャッシュからリコールされる。
【００２５】
一実施形態では、スヌーピングを使用して、同じセルの他のプロセッサ（またはＩ／Ｏモジュール）に関連するキャッシュにおいて要求されたデータのコピーが突き止められる。このため、プロセッサ１２０Ａが、パブリックに保持するデータを変更するよう要求する場合、セルコントローラ１２８は、スヌーピングを使用してローカルキャッシュ１２４Ａ〜１２４Ｄのすべてのコピーのリコールを行う。リコールは、いかなるプライベートに保持されたコピーも可能な限り迅速にパブリックに変換されるように、およびパブリックなコピーが無効化されるように要求する役割を果たす。一旦未解決のデータのコピーがなくなると、データのプライベートなコピーをプロセッサ１２０Ａに提供することができ、あるいは、データのパブリックなコピーをプライベートにすることができる。そして、プロセッサ１２０Ａは、データのそのプライベートなコピーを変更することができる。
【００２６】
一実施形態では、セル間整合性は、システム１００においてディレクトリベースである。要求をセル内で満足させることができない場合、要求は要求されたデータを所有しているセルのセルコントローラに転送される。たとえば、プロセッサ１２０Ａがメインメモリ１６０内のアドレスをアサートする場合、セル１１６は要求されたデータを所有する。セルコントローラ１５２には、要求されたデータのコピーをシステム全体で探す責任が課される。この探索に必要な情報は、メインメモリ１６０においてユーザデータとともにラインベースで格納される整合性ディレクトリ１６２に維持される。本発明の一形態では、メインメモリの各ラインは、サイトビットと状態ビットとを格納する。一実施形態では、サイトビットは、各セルに対しそのセルがラインのコピーを保持するか否かを示し、スヌーピングを使用して、ラインのコピーを保持するセル内の特定のキャッシュが特定される。代替実施形態では、サイトビットは、各セルの各キャッシュに対し、そのキャッシュがラインのコピーを保持しているか否かを示す。本発明の一形態では、メインディレクトリ状態ビットは、「プライバシ」状態ビットと「共有」状態ビットとを含む。プライバシメインディレクトリ状態ビットは、データがパブリックに保持されるかプライベートに保持されるかを示す。共有メインディレクトリ状態ビットは、データが「アイドル」であるか複数のキャッシュによってキャッシュされるか否かを示す。共有状態ビットが、データがアイドルであることを示す場合、そのデータはキャッシュされないか、または単一キャッシュのみによってキャッシュされる。
【００２７】
セルコントローラ１５２は、整合性ディレクトリ１６２の状態ビットから、システム１００のいずれのセルが要求されたデータのコピーを保持しているかと、要求されたデータがプライベートに保持されているかパブリックに保持されているかとを判断することができる。それにしたがって、リコールを、特定されたセルに向けることができる。
【００２８】
高速ディレクトリ１５０は、予測リコールを起動することができる。一実施形態では、高速ディレクトリ１５０はユーザデータ情報を格納しないが、メインメモリ１６０のメインディレクトリ１６２に格納された整合性ディレクトリ情報のサブセットを格納する。当業者には、高速ディレクトリを実施し予測リコールを起動する技法は既知である。
【００２９】
本発明の一形態では、整合性情報を格納することに加えて、システム１００のメインメモリにおける整合性ディレクトリ（たとえば、ディレクトリ１３８および１６２）はまた、メモリ移行トランザクション中にラインの移行状態を特定するために使用される移行ステータス情報も格納する。一実施形態では、移行ステータス情報は、４つの移行状態、すなわち（１）Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ（ホームセル所有権）、（２）Ｗａｉｔｉｎｇ＿Ｆｏｒ＿Ｍｉｇｒａｔｉｏｎ（移行待機）、（３）Ｉｎ＿Ｍｉｇｒａｔｉｏｎ（移行中）または（４）Ｍｉｇｒａｔｅｄ（移行済み）のうちの１つを特定する。これらの移行状態の各々については、図２および図３Ａないし図３Ｃを参照して後により詳細に論考する。各セルコントローラ（たとえば、セル１０２のセルコントローラ１２８、およびセル１１６のセルコントローラ１５２）は、セルコントローラに対し本明細書で説明するようなメモリ移行機能を実行させるファームウェア（たとえば、セルコントローラ１２８のファームウェア１３０およびセルコントローラ１５２のファームウェア１５４）を含む。
【００３０】
一実施形態では、メモリ移行は、キャッシュラインベースで行われる。代替実施形態では、単一ライン以外のメモリサイズを使用してよい。ラインが移行される先のセルを特定するために、「新セル」という用語を使用し、ラインが移行される元のセルを特定するために、「旧セル」という用語を使用する。一実施形態では、移行は同じセル内で発生する可能性があり、そこでは、ラインは、セル内のメモリの１つの物理的ロケーションから同じセル内のメモリの別の物理的ロケーションに移動される。
【００３１】
図２は、本発明の一実施形態によるメモリ移行プロセス２００を示すフローチャートである。メモリ移行プロセス２００では、メモリがセル１０２からセル１１６に移行されており、そのためセル１０２を「旧セル」と呼び、セル１１６を「新セル」と呼ぶと仮定する。ステップ２０２において、セルコントローラ１２８のファームウェア１３０は、旧セル１０２のコンフィギュレーション・ステータスレジスタ１３２にビットが書き込まれるようにし、その後新セル１１６のコンフィギュレーション・ステータスレジスタ１５６にビットを書き込むことにより、移行を開始する。旧セル１０２のコンフィギュレーション・ステータスレジスタ１３２への書き込みは、セルに対し、メモリがそのセルから移行されていることを通知し、新セル１１６のコンフィギュレーション・ステータスレジスタ１５６への書き込みは、セルに対し、メモリがそのセルに移行されていることを通知する。この時点で、プロセッサおよびＩ／Ｏモジュールアドレス範囲レジスタ（たとえば、旧セル１０２のレジスタ１２２Ａ〜１２２Ｄおよび新セル１１６のレジスタ１６６Ａ〜１６６Ｄ）は、まだ、旧セル１０２を移行されるラインの所有者として指している。
【００３２】
ステップ２０４において、セルコントローラ１５２は、メインメモリ１６０のメモリラインのうちの選択された１つ（すなわち、「新ライン」）に対しディレクトリ１６２の移行状態を「Ｗａｉｔｉｎｇ＿Ｆｏｒ＿Ｍｉｇｒａｔｉｏｎ」に設定する。ステップ２０６において、高速ディレクトリ１５０の新ラインへのいかなる参照もフラッシュされる。一実施形態では、「Ｗａｉｔｉｎｇ＿Ｆｏｒ＿Ｍｉｇｒａｔｉｏｎ」状態の新ラインに対するいかなる要求も不当であり、適当な誤り回復／ロギングステップが起動される。
【００３３】
ステップ２０８において、新セル１１６のセルコントローラ１５２は、所望のライン（すなわち「旧ライン」）に対する所有者の変更の意図とともに「フェッチ要求」を旧セル１０２に送出する。フェッチ要求は、旧セル１０２に対し、新セル１１６がそのラインのホームセルとなるよう要求していることを指示する。代替実施形態では、フェッチ要求を、新セル１１６以外のエンティティによって起動することができる。
【００３４】
ステップ２１０において、セルコントローラ１２８は、フェッチ要求をその順序付きアクセスキュー１３４に入れる。ステップ２１２において、旧ラインに対するいかなる先の要求も、セルコントローラ１２８によって通常の方法で処理される。フェッチ要求の順番に達すると、ステップ２１４において、セルコントローラ１２８はフェッチ要求を処理する。ステップ２１６において、セルコントローラ１２８は、旧ラインを、それを所有しているいかなるエンティティ（たとえば、プロセッサまたはＩ／Ｏモジュール）からもリコールする。
【００３５】
ステップ２１８において、セルコントローラ１２８は、フェッチ要求に対する応答を返し、「Ｍｉｇｒａｔｅ＿Ｉｄｌｅ＿Ｄａｔａ（アイドルデータ移行）」トランザクションを通してホームセル所有権とともに旧ラインデータを転送する。ステップ２２０において、セルコントローラ１２８は、旧ラインに対しディレクトリ１３８の移行状態を「Ｉｎ＿Ｍｉｇｒａｔｉｏｎ」に設定する。この時点で旧ラインに対するいかなる要求も、順序付きアクセスキュー１３４に入れられる（空きがある場合）か、または「否定応答（ｎａｃｋ）される」（すなわち、肯定応答されない）。一実施形態では、旧セル１０２は、「Ｉｎ＿Ｍｉｇｒａｔｉｏｎ」状態にある間、旧ラインに対するいかなる要求も処理しない。
【００３６】
ステップ２２２において、セルコントローラ１５２は、「Ｍｉｇｒａｔｅ＿Ｉｄｌｅ＿Ｄａｔａ」トランザクションを受け取り、「Ａｃｋ」（すなわち、肯定応答）トランザクションを送出する。ステップ２２４において、セルコントローラ１５２は、受け取ったラインデータを新ラインにコピーし、ラインのホーム所有権を想定し、ディレクトリ１６２においてラインを「アイドル」としてマークする。他のいずれのエンティティもこのラインを有していないため、ラインは「アイドル」としてマークされる。上述したように、ステップ２１６において、ラインは先のすべての保持者からリコールされた。
【００３７】
ステップ２２６において、セルコントローラ１２８は、セルコントローラ１５２から「Ａｃｋ」トランザクションを受け取り、旧ラインに対しディレクトリ１３８の移行状態を「Ｍｉｇｒａｔｅｄ」状態に遷移させる。ここで、ステップ２２８において、旧ラインに対する順序付きアクセスキューに保留中の任意のアクセスまたは旧ラインに対する任意の新たなアクセスは、セルコントローラ１２８により旧ラインのための新ホームセル１１６に向けられる。代替実施形態では、セルコントローラ１２８は、旧ラインに対する要求者に応答し、それらに対し、それらの要求を新セル１１６に送るように要求する。ステップ２３０において、旧セル１０２が旧ラインに対する順序付きアクセスキュー１３４にそれ以上保留中のエントリを有していない場合、セルコントローラ１２８は、旧ラインに対する保留中要求のすべてが処理されたことを示すために、そのコンフィギュレーション・ステータスレジスタ１３２にステータスビットを設定する。一実施形態では、この時点で、旧ラインに対するいかなる新たな要求も順序付きアクセスキュー１３４に入れられていない。旧ラインに対する新たな要求はすべて、セルコントローラ１２８により新セル１１６に転送される。
【００３８】
ステップ２３２において、ファームウェア１３０は、コンフィギュレーション・ステータスレジスタ１３２を読み出し、ステータスビットが設定された（ステップ２３０）と判断する。そして、ファームウェア１３０は、ステータスビットをリセットする。ステップ２３４において、ファームウェア１３０は、旧セル１０２からさらなるラインが移行されるか否かを判断する。さらなるラインが移行される場合、かかるラインの各々に対しステップ２０２〜２３２が繰り返される。さらなるラインが移行されるか否かに関らず、ステップ２３６〜２４０が実行されることにより、第１のラインの移行が完了する。
【００３９】
ステップ２３６において、ファームウェア１３０は、すべてのメモリアクセス装置（たとえば、プロセッサまたはＩ／Ｏモジュール）においてアドレス範囲レジスタ（たとえば、アドレス範囲レジスタ１２２Ａ〜１２２Ｄおよび１６６Ａ〜１６６Ｄ）を変更することにより、旧ラインに対する新しい要求がすべて新セル１１６に向かうことを保証する。アドレスレジスタの変更は、即時には発生しない。変更中、旧ラインに対するいずれかの要求が旧セル１０２に送出されると、これらの要求は、旧セル１０２によって新セル１１６に転送される。代替実施形態では、旧セル１０２は、かかる要求を要求者に戻し、要求者に対して要求を新セル１１６に送るように通知する。ステップ２３６における変更後、メモリアクセス装置のすべてに対するアドレス範囲レジスタは、移行されたラインに対するホームセルとして新セル１１６を指す。一実施形態では、アドレス範囲レジスタを、セルマップの形態で実施してよい。
【００４０】
ステップ２３８において、旧ラインに対する未解決の要求（たとえば、まだ処理されていない、順序付きアクセスキューにおける保留中の要求）がまだいくつかある可能性があるため、ファームウェア１３０は、すべてのあり得る要求者から旧セル１０２に対し「プランジ（ｐｌｕｎｇｅ）」を起動することにより、旧セル１０２に対するいかなる先の要求も旧セル１０２に達した（および新セル１１６に向けられた）ことを保証する。プランジトランザクションは、あり得るすべてのメモリアクセス装置から旧セル１０２に対して送出される。プランジトランザクションが他のメモリ要求トランザクションと同様に待ち行列に入れられるため、旧セル１０２がプランジトランザクションのすべてを受け取るまでに、ファームウェア１３０は、旧ラインに対するすべての要求が受け取られた（および新セル１１６に転送された）ことを知る。
【００４１】
ステップ２４０において、ファームウェア１３０は、コンフィギュレーション・ステータスレジスタ１３２が、プランジが完了したことを示すまで待機する。
コンフィギュレーション・ステータスレジスタ１３２は、ファームウェア１３０に対し、旧ラインに対しそれ以上未解決の要求がないことを示す。代替実施形態では、ファームウェア１３０は、プランジトランザクションを使用するのではなく、旧ラインに対するすべての未解決の要求が旧セル１０２に達し新セル１１６に転送されるために十分長い所定期間待機する。
【００４２】
ステップ２４２によって示すように、プランジトランザクションが完了した後（または所定期間が経過した後）、ラインの移行が完了する。
【００４３】
図３Ａは、本発明の一実施形態による、メモリ移行プロセス２００中に旧セル１０２から移行されるメモリラインに対するディレクトリ１３８のディレクトリ状態遷移を示す状態図である。図３Ａにおいて状態Ｓ１によって示すように、旧セル１０２における旧ラインに対するディレクトリ１３８の開始移行状態は、「Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ」であり、それは、旧セル１０２が、移行されるラインに対する現ホームセルであることを示す。旧ラインに対する第２の移行状態Ｓ２は、「Ｉｎ＿Ｍｉｇｒａｔｉｏｎ」である。上述したように、旧ラインの移行状態は、プロセス２００のステップ２２０中に「Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ」から「Ｉｎ＿Ｍｉｇｒａｔｉｏｎ」に遷移する。旧ラインに対する第３の移行状態Ｓ３は、「Ｍｉｇｒａｔｅｄ」である。旧ラインの移行状態は、プロセス２００のステップ２２６中に「Ｉｎ＿Ｍｉｇｒａｔｉｏｎ」から「Ｍｉｇｒａｔｅｄ」に遷移する。
【００４４】
図３Ｂは、本発明の一実施形態による、メモリ移行プロセス２００中に新セル１１６におけるメモリラインに対するディレクトリ１６２のディレクトリ状態遷移を示す状態図である。図３Ｂにおいて状態Ｓ４によって示すように、新セル１１６における新ラインに対するディレクトリ１６２の移行状態は、「Ｗａｉｔｉｎｇ＿Ｆｏｒ＿Ｍｉｇｒａｔｉｏｎ」であり、それはプロセス２００のステップ２０４中に設定される。新ラインに対する次の移行状態Ｓ５は、「Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ」であり、それはセル１１６が移行されたラインに対する新ホームセルであることを示す。新ラインの移行状態は、プロセス２００のステップ２２４中に「Ｗａｉｔｉｎｇ＿Ｆｏｒ＿Ｍｉｇｒａｔｉｏｎ」から「Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ」に遷移する。
【００４５】
図３Ｃは、図３Ａおよび図３Ｂに示す状態の時間順序を示す図である。図３Ｃに示すように、メモリ移行プロセス２００は、状態Ｓ１（Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ）の旧セル１０２の旧ラインで開始する。次に、メモリ移行プロセス２００中、新セル１１６の新ラインの移行状態は、状態Ｓ４（Ｗａｉｔｉｎｇ＿Ｆｏｒ＿Ｍｉｇｒａｔｉｏｎ）に設定される。後に、移行プロセス２００中、旧セル１０２における旧ラインの移行状態は、状態Ｓ２（Ｉｎ＿Ｍｉｇｒａｔｉｏｎ）に遷移する。次に、新セル１１６における新ラインの移行状態は、状態Ｓ５（Ｈｏｍｅ＿Ｃｅｌｌ＿Ｏｗｎｅｒｓｈｉｐ）に遷移する。最後に、旧セル１０２における旧ラインの移行状態は、状態Ｓ３（Ｍｉｇｒａｔｅｄ）に遷移する。
【００４６】
一実施形態では、移行シーケンス中、メモリ１つまたは複数のアドレス範囲が移行されるが、移行の粒度は１メモリラインである。所望のアドレス範囲における複数のメモリラインは、常に種々の異なる移行状態にあってよい。本発明の一形態では、指定された範囲内のすべてのメモリラインが移行されるまで、アドレス範囲レジスタは移行を反映するように更新されない（たとえば、図２のステップ２３６）。
【００４７】
本発明の実施形態は、従来技術によるメモリ移行技法に対し多くの利点を提供する。本発明の一形態は、ディレクトリベースのキャッシュ整合性を用いる分散メモリマルチプロセッサシステムにおいて、オペレーティングシステムの関与を必要とすることなしに、インタリーブされるか否かに関らずメモリをロケーションからロケーションに動的に移行させるシステムおよび方法を提供する。一実施形態では、システムファームウェアによるかまたはユーティリティファームウェアによって提供される何らかのソフトウェアの助けを借りて、移行の際にハードウェアを使用する。本発明の一形態では、移行されたメモリにアクセスするソフトウェアは、移行が発生している間にシームレスにメモリにアクセスし続けることができ、そのため、ユーザに対しサービスが中断されない。本発明の一形態では、オペレーティングシステムおよびアプリケーションソフトウェアの関与なしに、またはそれらに悪影響を及ぼすことなく、移行が発生する。一実施形態では、移行はオペレーティングシステムに対して「不可視」であり、オペレーティングシステムとアプリケーションソフトウェアとは、移行が発生しているかまたは発生したことが通知される必要がない。
【００４８】
本発明の一形態は、プロセッサインタフェースから独立したメモリ移行プロセスを提供し、移行機能を実施するためにプロセッサインタフェースプロトコルがいかなる新たなトランザクションをサポートすることも必要としない。メモリ移行プロセスの一実施形態とともに、いかなるプロセッサまたはＩ／Ｏコントローラ設計をも使用することができる。
【００４９】
一実施形態では、本明細書で説明した技法を使用して、さらなるメモリがシステムに追加された時にメモリを移行させることができ、従来からの誤り検出および訂正方式とともに使用して、欠陥のあるメモリロケーションを予備のメモリロケーションに移行させることができる。欠陥のあるメモリが置換されるかまたは他の方法で修復されると、予備のメモリロケーションを新たなまたは修復されたメモリに戻るように移行させることができる。
【００５０】
本発明の好ましい実施形態の説明の目的のために、本明細書では、特定の実施形態を例示し説明したが、当業者には、本発明の範囲から逸脱することなく、示し説明した特定の実施形態を、多種多様の代替および／または等価実施態様に置き換えてよいことは認められよう。化学、機械、電子機械、電気およびコンピュータ技術における当業者は、本発明を多種多様な実施形態で実施してよい、ということを容易に認めるであろう。この出願は、本明細書で論考した好ましい実施形態のいかなる適用形態または変形形態をも包含するように意図される。したがって、この発明は、特許請求項の範囲とその等価物とによってのみ限定されることが明示的に意図されている。
【図面の簡単な説明】
【図１】本発明の一実施形態による、メモリを移行するように構成された分散メモリマルチプロセッサシステムを示すブロック図。
【図２】本発明の一実施形態によるメモリ移行プロセスを示すフローチャート。
【図３Ａ】本発明の一実施形態によるメモリ移行シーケンス中の「旧セル」に対するディレクトリ状態遷移を示す状態図。
【図３Ｂ】本発明の一実施形態によるメモリ移行シーケンス中の「新セル」に対するディレクトリ状態遷移を示す状態図。
【図３Ｃ】図３Ａおよび図３Ｂに示す状態の時間順序を示す図。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates generally to computer systems. More particularly, it relates to memory migration in a distributed memory multiprocessor system.
[0002]
[Prior art]
Traditionally, the main memory was physically located on a central bus. In this type of system, a memory request consisting of a complete physical address was transferred to the memory subsystem and data was returned. In a distributed memory system, the main memory is physically distributed over many different cells. A cell may consist of multiple processors, one or more input / output (I / O) devices, a cell controller, and memory. Each cell holds a different part of the main memory space. Each processor can access not only local memory, but also memory of other cells via cell communication link circuits such as one or more crossbar switches.
[0003]
Caching can improve performance limitations associated with memory accesses. Caching involves storing a subset of the contents of main memory in a cache memory that is smaller and faster than main memory. Any strategy is used to increase the probability that the cache contents will expect a request for data. For example, most caches fetch and store multi-word lines because data near a requested word in the memory address space is relatively likely to be requested in close proximity to the requested word. . The number of words stored in a single cache line defines the line size of the system. For example, a cache line may be eight words long.
[0004]
Caches typically have much less line storage locations than main memory. Typically, a "tag" is stored with the data at each cache location to uniquely indicate the main memory line address holding the cache data.
[0005]
In both single-processor and multi-processor systems, there is the problem of guaranteeing "coherency" between the cache and the main memory. For example, when a processor changes data stored in a cache, the change must be reflected in main memory. Typically, there is some latency between the time the data is changed in the cache and the time the change is reflected in the main memory. During this latency, unmodified data in the main memory is invalid. Steps must be taken to ensure that main memory data is not read while it is invalid.
[0006]
In a distributed memory multiprocessor system where each processor or I / O module has a cache memory, the situation is somewhat more complicated than in a single processor system with a cache memory. In a multiprocessor system, current data corresponding to a particular main memory address may be stored in one or more cache memories and / or main memory. The data in the cache memory may have been manipulated by the processor, resulting in a value that is different from the value stored in main memory. Thus, a "cache coherency scheme" is implemented to ensure that the current data value at any address is provided independent of where the data value resides.
[0007]
Usually, "permission" is required to change cache data. Typically, permission is granted only if the data is stored in exactly one cache. Data stored in multiple caches is often treated as read-only. Each cache line may include one or more status bits that indicate whether permission to change the data stored on that line is granted. The exact nature of the state depends on the system, but typically a "privacy" state bit is used to indicate permission to change. If the privacy bit indicates "private", only one cache holds the data and the associated processor has permission to change the data. If the privacy bit indicates "public," any number of caches can hold the data and no processor can modify the data.
[0008]
In a multi-processor system, a processor that wants to read or modify data typically determines which cache, if any, has a copy of the data and is authorized to modify the data. Is made. "Snooping" involves examining the contents of multiple caches to make that determination. If the requested data is not found in the local cache, the remote cache can be "snooped". A recall may be issued requesting that the private data be made public so that another processor can read it, or another cache may change the public data of some caches. You can issue a recall to disable it.
[0009]
For many processors and caches, exhaustive snooping can degrade performance. For this reason, some distributed memory multiprocessor systems snoop within cells and rely on directory-based cache consistency for inter-cell consistency. A distributed memory multiprocessor system using directory-based cache coherence was filed on August 25, 1997, issued on April 25, 2000, and published in "DISTRIBUTED MEMORY MULTIPROCESSOR COMPUTER SYSTEM WITH DIRECTORY BASE FASHION CAMERA BUSINESS WHERE. CACHED DATA TO MAIN-MEMORY LOCATIONS "in U.S. Patent No. 6,055,610.
[0010]
In distributed memory systems that use directory-based cache coherency, the main memory of each cell typically associates a directory entry with each line of memory. Each directory entry typically specifies the cell that caches the line and whether the line of data is public or private. The directory entry may also identify the particular cache (s) in the cell that caches the data, and / or any cache (s) in the cell using snooping. You may determine whether you have the data. Thus, each cell includes a directory that indicates the location of a cached copy of the data stored in its main memory.
[0011]
As an example, in an 8-cell system, each directory entry may be 9 bits long. For each of the eight cells, a respective "site" bit indicates whether each cell contains a cached copy of the line. The ninth “privacy” bit indicates whether the data is kept private or public.
[0012]
It is sometimes desirable to move or migrate memory from cell to cell or within a particular cell. For example, memory can be migrated from a defective memory device to a spare memory device. As another example, a board containing one or more memory devices, possibly because the board contains defective components, because the board has been replaced by a newer revision, or for some other reason, May need to be removed from the system. Before removing the board, it may be desirable to migrate the memory from the board to another location.
[0013]
[Problems to be solved by the invention]
Memory migration usually occurs due to operating system intervention, where the memory is first deallocated and later reallocated to the desired destination. With such prior art memory migration techniques, processes accessing the memory being migrated may be stalled, or the system may be unable to migrate memory without waiting for the process to terminate. There is. Thus, prior art memory migration techniques affect the operation of software and sometimes render the system unusable for a period of time. Moreover, using the prior art migration techniques, some pages required by the operating system and firmware cannot be easily migrated.
[0014]
Also, memories can be interleaved, making memory migration using conventional techniques more difficult. Deinterleaving memory is not a simple task, and sometimes there is no deinterleaving solution.
[0015]
[Means for Solving the Problems]
One aspect of the present invention provides a distributed memory multiprocessor system having a plurality of cells communicably coupled to one another and including a plurality of processors, a cache, a main memory, and a cell controller as a whole. Each of the cells includes at least one of the processors, at least one of the caches, one of the main memory, and one of the cell controllers. Each of the cells performs a memory migration function to migrate memory from a first main memory of the main memory to a second main memory of the main memory in a manner that is invisible to an operating system of the system. It is configured to
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the invention. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
[0017]
FIG. 1 is a block diagram illustrating a distributed memory multiprocessor system 100 configured to migrate memory without operating system intervention, according to one embodiment of the invention. System 100 has eight cells 102, 104, 106, 108, 110, 112, 114 and 116, which are communicatively coupled via a cell communication link 118. Cell 102 has four memory access devices 120A-120D, four caches 124A-124D, a high-speed consistent directory or directory cache 126, a cell controller 128, and a main memory 136. Similarly, the cell 116 has four memory access devices 164A to 164D, four caches 168A to 168D, a high-speed consistency directory 150, a cell controller 152, and a main memory 160.
[0018]
In one embodiment, memory access devices 120A-120B and 164A-164B are processors, and memory access devices 120C-120D and 164C-164D are input / output (I / O) modules. The memory access devices 120A to 120D have address registers 122A to 122D, respectively. The memory access devices 164A to 164D have address registers 166A to 166D, respectively. The cell controller 128 has firmware 130, a configuration status register (CSR) 132, and an ordered access queue (OAQ) 134. The cell controller 152 has firmware 154, a configuration status register 156, and an ordered access queue 158. In one form of the invention, cells 104, 106, 108, 110, 112 and 114 are substantially the same as cells 102 and 116.
[0019]
While one embodiment of the present invention is described in the context of a multi-cell system, where each cell includes multiple processors and multiple I / O modules, those skilled in the art will appreciate that the memory migration techniques described herein may use other techniques. It will be clear that the present invention can be applied to a system configuration. For example, alternative embodiments of the invention may have a single processor (with cache) building block, a single I / O module (with cache) building block, or other building blocks, rather than having cells like those shown in FIG. Other system building blocks, such as blocks, may be incorporated.
[0020]
According to one embodiment, during normal operation, system 100 is configured to access memory in a conventional manner using directory-based cache coherency. For example, when a word is fetched from main memory 136 by processor 120A, the word is stored in a cache line of cache 124A. In addition, words adjacent to the requested word are also fetched along with the requested word and stored in the cache line.
[0021]
The request for data by the processor 120A includes a main memory word address that uniquely identifies one of the main memory locations of the system 100. Cache 124A converts the main memory word address from processor 120A to a line address by removing the least significant bits of the main memory address. By transferring this line address to the cell controller 128, the position of the requested data is specified. If the request has to be forwarded to the owner cell, the cell controller 128 decodes the most significant bits of the line address into a cell ID and converts the remaining bits of the address to the appropriate bits so that the request can be satisfied. Transfer to cell. In one embodiment, the cell controller handles all memory transactions within and between cells.
[0022]
Caches (e.g., caches 124A-124D and 168A-168D) use multiple bits of a line address to identify a cache line. Then, the remaining bits of the line address are compared with the tag stored in the specified cache line. In the case of a "hit" (i.e., if the tag matches the remaining bits of the line address), the least significant bits of the line address are used to transfer to the processor (or I / O module) for identification. One of the words stored in the assigned cache line is selected. In the case of a miss (that is, when the tags do not match), the line that is finally fetched from main memory overwrites the line of data in the specified cache line, and the tag in the specified cache line is updated. Finally, the requested word is transferred from the cache line to the processor (or I / O module).
[0023]
In one form of the invention, a status bit is included for each cache line in a cache (e.g., caches 124A-124D and 168A-168D), and is stored in a consistency directory (e.g., consistency directories 138 and 162) of the main memory of each cell. By storing the consistency information and storing the consistency information in a fast consistency directory (eg, fast consistency directories 126 and 150) of each cell, consistency is enforced in system 100 in a conventional manner.
[0024]
In one embodiment, in addition to storing tag bits and user data bits in each cache line, each cache line also stores a "valid" status bit and a "privacy" status bit. The validity status bit indicates whether the cache line is valid or invalid. The privacy status bit indicates whether the data stored in the cache line is public or private. The processor can read any valid data in its cache. However, in one embodiment, the processor can only change data that the cache holds privately. When a processor needs to change data it holds publicly, it is first made private. If the processor needs to change data that is not in its associated cache, the data is placed in that cache as private. If the data is in use by another cache, it is recalled from that cache before being made private.
[0025]
In one embodiment, snooping is used to locate a copy of the requested data in a cache associated with another processor (or I / O module) in the same cell. Thus, when the processor 120A requests to change the data held publicly, the cell controller 128 uses snooping to recall all copies of the local caches 124A-124D. The recall serves to request that any privately held copies be converted to public as quickly as possible and that the public copies be invalidated. Once there are no more unresolved copies of the data, a private copy of the data can be provided to processor 120A, or a public copy of the data can be made private. Processor 120A can then modify that private copy of the data.
[0026]
In one embodiment, inter-cell consistency is directory-based in system 100. If the request cannot be satisfied in the cell, the request is forwarded to the cell controller of the owning cell for the requested data. For example, if processor 120A asserts an address in main memory 160, cell 116 will own the requested data. Cell controller 152 is responsible for finding a copy of the requested data throughout the system. The information necessary for this search is maintained in the consistency directory 162 stored on a line basis together with the user data in the main memory 160. In one aspect of the invention, each line of the main memory stores a site bit and a status bit. In one embodiment, the site bit indicates for each cell whether that cell holds a copy of the line and uses snooping to identify the particular cache in the cell holding the copy of the line. . In an alternative embodiment, the site bit indicates, for each cache in each cell, whether that cache holds a copy of the line. In one form of the invention, the main directory status bits include a "privacy" status bit and a "shared" status bit. The privacy main directory status bit indicates whether the data is kept public or private. The shared main directory status bit indicates whether the data is "idle" or cached by multiple caches. If the shared status bit indicates that the data is idle, the data is not cached or is cached by only a single cache.
[0027]
The cell controller 152 determines from the status bits in the consistency directory 162 which cell of the system 100 is holding a copy of the requested data, and whether the requested data is kept private or public. Can be determined. Accordingly, the recall can be directed to the specified cell.
[0028]
The fast directory 150 can initiate a predictive recall. In one embodiment, high speed directory 150 does not store user data information, but stores a subset of the consistency directory information stored in main directory 162 of main memory 160. Those skilled in the art are aware of techniques for implementing a fast directory and initiating a predictive recall.
[0029]
In one form of the invention, in addition to storing the consistency information, the consistency directories in the main memory of system 100 (eg, directories 138 and 162) also identify the migration status of the line during a memory migration transaction. It also stores the migration status information used for this purpose. In one embodiment, the transition status information includes four transition states: (1) Home_Cell_Ownership (home cell ownership), (2) Waiting_For_Migration (waiting for transition), (3) In_Migration (migrating) or (4) Migrated ( (Migrated). Each of these transition states will be discussed in more detail below with reference to FIGS. 2 and 3A-3C. Each cell controller (eg, cell controller 128 of cell 102 and cell controller 152 of cell 116) has firmware (eg, firmware of cell controller 128) that causes the cell controller to perform memory migration functions as described herein. 130 and firmware 154) of the cell controller 152.
[0030]
In one embodiment, memory migration is performed on a cache line basis. In alternative embodiments, memory sizes other than a single line may be used. The term "new cell" is used to identify the cell to which the line is migrated, and the term "old cell" is used to identify the cell from which the line is migrated. In one embodiment, the transition may occur in the same cell, where the line is moved from one physical location of memory in the cell to another physical location of memory in the same cell. .
[0031]
FIG. 2 is a flowchart illustrating a memory migration process 200 according to one embodiment of the present invention. In the memory migration process 200, it is assumed that the memory has been migrated from the cell 102 to the cell 116, so that the cell 102 is called an "old cell" and the cell 116 is called a "new cell". In step 202, the firmware 130 of the cell controller 128 causes the bit to be written to the configuration status register 132 of the old cell 102, and then writes the bit to the configuration status register 156 of the new cell 116 to make the transition. Start. Writing to the configuration status register 132 of the old cell 102 informs the cell that the memory has been migrated from that cell, and writing to the configuration status register 156 of the new cell 116 On the other hand, it notifies that the memory has been transferred to the cell. At this point, the processor and I / O module address range registers (eg, registers 122A-122D of old cell 102 and registers 166A-166D of new cell 116) still have old cell 102 as the owner of the line to be migrated. pointing.
[0032]
In step 204, the cell controller 152 sets the transition state of the directory 162 to “Waiting_For_Migration” for a selected one of the memory lines of the main memory 160 (ie, “new line”). In step 206, any references to the new line in the fast directory 150 are flushed. In one embodiment, any request for a new line in the “Waiting_For_Migration” state is invalid and the appropriate error recovery / logging steps are invoked.
[0033]
In step 208, the cell controller 152 of the new cell 116 sends a "fetch request" to the old cell 102 with the intention of changing the owner for the desired line (ie, the "old line"). The fetch request indicates to the old cell 102 that the new cell 116 is requesting to be the home cell for the line. In an alternative embodiment, the fetch request may be initiated by an entity other than the new cell 116.
[0034]
At step 210, the cell controller 128 places the fetch request in its ordered access queue 134. At step 212, any prior requests for the old line are processed in the usual manner by the cell controller 128. When the order of the fetch requests has been reached, in step 214, the cell controller 128 processes the fetch requests. At step 216, cell controller 128 recalls the old line from any entity (eg, processor or I / O module) that owns the old line.
[0035]
In step 218, the cell controller 128 returns a response to the fetch request and transfers the old line data with the home cell ownership through a "Migrate_Idle_Data" transaction. In step 220, the cell controller 128 sets the migration state of the directory 138 for the old line to “In_Migration”. At this point, any requests for the old line will be placed in the ordered access queue 134 (if available) or "nacked" (ie, not acknowledged). In one embodiment, the old cell 102 does not process any requests for the old line while in the "In_Migration" state.
[0036]
At step 222, cell controller 152 receives the "Migrate_Idle_Data" transaction and sends out an "Ack" (ie, acknowledgment) transaction. At step 224, the cell controller 152 copies the received line data to a new line and assumes the line's home ownership and marks the line as "idle" in the directory 162. The line is marked as "idle" since no other entity has this line. As described above, in step 216, the line was recalled from all previous holders.
[0037]
In step 226, the cell controller 128 receives the “Ack” transaction from the cell controller 152, and changes the migration state of the directory 138 to the “Migrated” state for the old line. Here, in step 228, any access pending in the ordered access queue for the old line or any new access to the old line is directed by the cell controller 128 to the new home cell 116 for the old line. In an alternative embodiment, cell controller 128 responds to requestors for the old line and requests them to send their requests to new cell 116. In step 230, if old cell 102 has no more pending entries in ordered access queue 134 for the old line, cell controller 128 indicates that all pending requests for the old line have been processed. For this purpose, a status bit is set in the configuration status register 132. In one embodiment, no new requests for the old line have been placed in the ordered access queue 134 at this time. All new requests for the old line are forwarded by the cell controller 128 to the new cell 116.
[0038]
In step 232, the firmware 130 reads the configuration status register 132 and determines that the status bit has been set (step 230). Then, the firmware 130 resets the status bit. In step 234, firmware 130 determines whether more lines are migrated from old cell 102. If additional lines are to be transitioned, steps 202-232 are repeated for each such line. Regardless of whether or not additional lines are transferred, the execution of steps 236-240 completes the transfer of the first line.
[0039]
At step 236, firmware 130 changes the address range registers (eg, address range registers 122A-122D and 166A-166D) in all memory access devices (eg, processors or I / O modules) to update the old line. Ensure that all new requests go to the new cell 116. Changes to the address register do not occur immediately. During the change, if any requests for the old line are sent to the old cell 102, these requests are forwarded by the old cell 102 to the new cell 116. In an alternative embodiment, the old cell 102 returns such a request to the requester and informs the requester to send the request to the new cell 116. After the change in step 236, the address range registers for all of the memory access devices point to the new cell 116 as the home cell for the migrated line. In one embodiment, the address range register may be implemented in the form of a cell map.
[0040]
At step 238, firmware 130 may return all possible requests because there may still be some outstanding requests for the old line (eg, pending requests in the ordered access queue that have not yet been processed). Initiating a "plunge" from the party to the old cell 102 ensures that any previous requests for the old cell 102 have reached the old cell 102 (and have been directed to the new cell 116). Plunge transactions are sent to the old cell 102 from all possible memory access devices. By the time the old cell 102 receives all of the plunge transactions, the firmware 130 has received all requests for the old line (and the new cell 116) because the plunge transaction is queued like any other memory request transaction. Was transferred to).
[0041]
At step 240, firmware 130 waits until configuration status register 132 indicates that the plunge is complete.
The configuration status register 132 indicates to the firmware 130 that there are no more outstanding requests for the old line. In an alternative embodiment, rather than using a plunge transaction, firmware 130 waits for a predetermined period of time long enough for all outstanding requests for the old line to reach old cell 102 and be transferred to new cell 116.
[0042]
As indicated by step 242, after the plunge transaction is completed (or after a predetermined period of time), the line transition is completed.
[0043]
FIG. 3A is a state diagram illustrating directory state transitions of directory 138 for memory lines migrated from old cell 102 during memory migration process 200, according to one embodiment of the present invention. As shown by state S1 in FIG. 3A, the start transition state of directory 138 for the old line in old cell 102 is "Home_Cell_Ownership", which indicates that old cell 102 is the current home cell for the line to be transitioned. Show. The second transition state S2 for the old line is “In_Migration”. As described above, the transition state of the old line transitions from “Home_Cell_Ownership” to “In_Migration” during step 220 of the process 200. The third transition state S3 for the old line is “Migrated”. The transition state of the old line transitions from “In_Migration” to “Migrated” during step 226 of process 200.
[0044]
FIG. 3B is a state diagram illustrating directory state transitions of directory 162 for a memory line in new cell 116 during memory migration process 200, according to one embodiment of the present invention. As shown by state S4 in FIG. 3B, the transition state of directory 162 for the new line in new cell 116 is “Waiting_For_Migration”, which is set during step 204 of process 200. The next transition state S5 for the new line is "Home_Cell_Ownership", which indicates that cell 116 is the new home cell for the transitioned line. The transition state of the new line transitions from “Waiting_For_Migration” to “Home_Cell_Ownership” during step 224 of process 200.
[0045]
FIG. 3C is a diagram showing a time sequence of the states shown in FIGS. 3A and 3B. As shown in FIG. 3C, the memory migration process 200 begins with the old line of the old cell 102 in state S1 (Home_Cell_Ownership). Next, during the memory migration process 200, the migration state of the new line of the new cell 116 is set to the state S4 (Waiting_For_Migration). Later, during the transition process 200, the transition state of the old line in the old cell 102 transitions to the state S2 (In_Migration). Next, the transition state of the new line in the new cell 116 transits to the state S5 (Home_Cell_Ownership). Finally, the transition state of the old line in the old cell 102 transitions to the state S3 (Migrated).
[0046]
In one embodiment, during the transition sequence, one or more address ranges of memory are migrated, with a granularity of one memory line. The memory lines in the desired address range may always be in a variety of different transition states. In one form of the invention, the address range register is not updated to reflect the transition until all memory lines within the specified range have been transitioned (eg, step 236 in FIG. 2).
[0047]
Embodiments of the present invention provide a number of advantages over prior art memory migration techniques. One aspect of the present invention is a distributed memory multiprocessor system that uses directory-based cache coherency to move memory from location to location, whether or not interleaved, without the need for operating system involvement. A system and method for dynamically migrating is provided. In one embodiment, the hardware is used during the migration with the help of some software provided by the system firmware or by the utility firmware. In one aspect of the invention, software accessing the migrated memory can continue to access the memory seamlessly during the migration, so that service is not interrupted to the user. In one aspect of the invention, the migration occurs without the involvement of or adversely affecting the operating system and application software. In one embodiment, the migration is "invisible" to the operating system, and the operating system and application software need not be notified that the migration has occurred or has occurred.
[0048]
One aspect of the present invention provides a memory migration process that is independent of the processor interface and does not require that the processor interface protocol support any new transactions to perform the migration function. Any processor or I / O controller design can be used with one embodiment of the memory migration process.
[0049]
In one embodiment, the techniques described herein may be used to migrate memory as additional memory is added to the system, and may be used in conjunction with traditional error detection and correction schemes to remove defective memory. A memory location can be migrated to a spare memory location. When the defective memory is replaced or otherwise repaired, the spare memory location can be migrated back to the new or repaired memory.
[0050]
While specific embodiments have been illustrated and described herein for purposes of describing the preferred embodiments of the invention, those skilled in the art will recognize the specific embodiments shown and described without departing from the scope of the invention. It will be appreciated that the embodiments may be replaced by a wide variety of alternative and / or equivalent implementations. Those with skill in the chemical, mechanical, electro-mechanical, electrical, and computer arts will readily appreciate that the present invention may be implemented in a wide variety of embodiments. This application is intended to cover any adaptations or variations of the preferred embodiments discussed herein. Accordingly, it is expressly intended that this invention be limited only by the claims and the equivalents thereof.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a distributed memory multiprocessor system configured to migrate memory, according to one embodiment of the invention.
FIG. 2 is a flowchart illustrating a memory migration process according to one embodiment of the present invention.
FIG. 3A is a state diagram illustrating directory state transitions for “old cells” during a memory transition sequence according to one embodiment of the present invention.
FIG. 3B is a state diagram illustrating directory state transitions for a “new cell” during a memory transition sequence according to one embodiment of the present invention.
FIG. 3C is a diagram showing a time sequence of the states shown in FIGS. 3A and 3B.

Claims

A distributed memory multiprocessor system, comprising:
A plurality of cells communicatively coupled to each other, the plurality of cells generally including a plurality of processors, a cache, a main memory, and a cell controller;
Each of the cells includes at least one of the processors, at least one of the caches, one of the main memories, and one of the cell controllers;
Each of the cells migrates memory from a first main memory of the main memory to a second main memory of the main memory in a manner that is invisible to an operating system of the multiprocessor system. A distributed memory multiprocessor system configured to perform a memory migration function.

The distributed memory multiprocessor system of claim 1, wherein the distributed memory multiprocessor system is configured to allow access to the memory being migrated during the migration of the memory.

2. The distributed memory multiprocessor system according to claim 1, wherein the transition is performed on a memory line basis.

The distributed memory multiprocessor system of claim 1, wherein each cell includes a cache coherency directory, wherein the cache coherency directory is configured to store migration status information.

5. The distributed memory multiprocessor system according to claim 4, wherein the transition status information indicates a transition state of the memory line during the transition.

The transition status information includes a first state indicating that the memory line is waiting for the transition, a second state indicating that the memory line is in the transition, and a second state indicating that the memory line has been shifted. 6. The distributed memory multiprocessor system according to claim 5, wherein at least three transition states are included, including three states.

The distributed memory multiprocessor system of claim 1, wherein the distributed memory multiprocessor system is configured to update an address register of the processor after the transition.

A method for migrating memory in a distributed memory multiprocessor system, comprising:
Providing a plurality of cells, each including at least one processor, at least one cache, main memory, a cell controller, and a cache coherency directory;
Initiating a memory transfer transaction between a first of said cells and a second of said cells;
Copying data from a first memory portion of the main memory of the first cell to a second memory portion of the main memory of the second cell during the memory transfer transaction;
In the cache coherency directory between the first cell and the second cell, migration status information indicating a migration state between the first memory unit and the second memory unit during the memory migration transaction is stored. The steps of:

9. The method of claim 8, further comprising sending a recall request to a cache holding a copy of the first memory unit.

Redirecting memory access originally directed to the first memory unit to the second memory unit after the data has been copied from the first memory unit to the second memory unit. The method of claim 8.