JP4240610B2

JP4240610B2 - Computer system

Info

Publication number: JP4240610B2
Application number: JP33685898A
Authority: JP
Inventors: 秀貴青木; 昌尚伊藤; 由子玉置; 啓明藤井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-11-27
Filing date: 1998-11-27
Publication date: 2009-03-18
Anticipated expiration: 2018-11-27
Also published as: JP2000163316A

Description

【０００１】
【発明の属する技術分野】
本発明は多バンク構成の主記憶装置を有する計算機システムに関し、特に、主記憶スキューをおこなう計算機システムに関する。
【０００２】
【従来の技術】
計算機システムの主記憶装置の構成方式として、多数のバンクで構成するインタリーブ方式が知られている。図２に、主記憶装置を構成するバンク数が１６の場合の、インタリーブ方式によるアドレスの割り付けを示す。１６個のバンクには順に０から１５までの番号が付けられており、連続するアドレスは順次異なるバンクに割り付けられている。このインタリーブ方式では、連続アドレスに順次アクセスする場合、異なるバンクを順次アクセスすることになるため、高速な主記憶アクセスが可能となる。しかし，ある特定のアドレス間隔（ストライド）でアクセスすると、アクセスが特定のバンクあるいはバンク群に集中し、主記憶アクセス性能が低下する問題が知られている。例えば，図２のアドレス割り付けにおいて、アドレス０を起点にストライド１６でアクセスする場合を考える。この時、アドレス０，１６，３２，４８，・・・とアクセスするが，これらのアドレスはすべてバンク０に割り付けられているため、バンク競合により高速な主記憶アクセスが不可能となる。図３に、ストライドとそれに対するアクセス要求のバンクへの分散状況との関係を示す。アクセス要求がより多くのバンクに分散した方が、バンク競合を避けられ、高い主記憶処理性能を得ることができる。
【０００３】
なお、ここで言うアドレスとは，メモリの各アクセス単位のメモリ位置に付けた番号を指す。以下、特に断らない限り，本明細書におけるアドレスの定義は上記の通りとする。
【０００４】
この主記憶装置のアクセス要求処理性能の低下を緩和する手段として、主記憶スキューが知られている。主記憶スキューは，D. J. Kuck:“ILLIAC IV Software and Application Programming”, IEEE Transactions on Computers, Vol.C-17, No.8, pp.758-770, August 1968あるいはP. Budnik and D. J. Kuck:“The Organization and Use of Parallel Memories”, IEEE Transactions on Computers, pp.1566-1569, December 1971で数学的な基礎があたえられている。その方法は一通りではなく様々なバリエーションがあり，そのいくつかがD. T. Harper III and J. R. Jump:“Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme”, IEEE Transactions on Computers, Vol.C-36(12), pp.1440-1449, December 1987あるいは同じ著者らによる“Performance Evaluation of Vector Accesses in Parallel Memories Using a Skewed Storage Scheme”, Conference Proceedings of the 13th Annual International Symposium on Computer Architecture, pp.324-328, June 1986, IEEEあるいは米国特許4,918,600に示されている。
【０００５】
図４に、主記憶スキューをおこなう計算機システムの例を示す。この全体を符号１で示す計算機システムは、データの処理をおこなう４個のプロセッサ１１と、データを記憶する主記憶装置３１とを有し、それらが相互結合網２１を介して相互に結合されている。主記憶装置３１は４個のメモリモジュール５１から成り、メモリモジュール５１はそれぞれ４個のバンク９１を有する。主記憶装置３１全体では、１６個のバンクを有する。相互結合網２１は、主記憶スキューをおこなうためのアドレスマッピング回路４１を有する。アドレスマッピング回路４１では、主記憶アクセス要求に付随するアドレス情報をもとに、その主記憶アクセス要求を送出するバンクを決定する。バンク番号ＢＫとアドレスＡＤＲとメモリモジュール数Ｎ（ここではＮ＝１６）の関係は、（数１）で表される。
【０００６】
（数１）
ＢＫ＝（ＡＤＲ＋ＡＤＲ÷Ｎ）ｍｏｄＮ
ここで、ｍｏｄＮはＮのモジュロを取ることを表す。この主記憶スキューをおこなう計算機システム１におけるアドレス割り付けを、図５に示す。図５に示す通り、図４の計算機システムにおける主記憶スキューは，アドレスがバンク数（ここでは１６）だけ進むごとに、割り付けるバンクをひとつずつすらすものとなっている。この主記憶スキューによるメモリ割り付けをおこなった場合の、ストライドとそれに対するアクセス要求のバンクへの分散状況との関係を、図６に示す。
【０００７】
図３と図６との比較からわかるように、主記憶スキューをおこなうことにより，アクセス要求が特定のバンクまたはバンク群に集中しその結果性能が低下するようなストライドの種類を、少なくすることができる。
【０００８】
【発明が解決しようとする課題】
主記憶装置の性能を向上させるために、現在のメモリモジュールを，高速化のための諸技術を導入した別構成のメモリモジュールに交換する手法が考えられる。ここで、バンク数のより多いメモリモジュールに交換した場合，図４に示した従来の計算機システムでは、ストライドによっては，アクセス要求のバンクへの分散状況がバンク数の増加に見合わない、すなわち、バンク数を増やしても増やしたほどの効果が得られないという問題がある。
【０００９】
例えば、主記憶装置全体は３２バンクから成り，あるストライドではそのうち８バンクにのみアクセス要求が集中するとする。ここで、各メモリモジュール内のバンク数を４倍にし、主記憶装置全体で１２８バンクの構成としても、そのストライドでは依然として８バンクにアクセス要求が集中し，結果として性能が上がらないといった問題である。
【００１０】
この問題は、従来の計算機システムでは，最初のバンク構成（上記の例では３２バンク）に合わせた主記憶スキューをおこなうが、その主記憶スキューはメモリモジュールの交換による新しいバンク構成（上記の例では１２８バンク）には適したものでないことから生じる。
【００１１】
本発明の目的は、上記の問題を解決し，メモリモジュールの交換によりバンク数を増やした場合に、その新しいバンク構成に適した主記憶スキューを実現することで、主記憶装置の性能を向上させることにある。
【００１２】
【課題を解決するための手段】
上記課題を解決するために、本発明の計算機システムは，データの処理をおこなう１個または複数個のプロセッサと、データを記憶する主記憶装置と、該プロセッサと該主記憶装置を相互に結合する相互結合網とを備え、該相互結合網は１個または複数個のアドレスマッピング回路を有し、該主記憶装置は複数個のメモリモジュールを有し、各々のメモリモジュールはアドレスマッピング回路を有し、相互結合網内の該アドレスマッピング回路と、メモリモジュール内のアドレスマッピング回路の２段階でアドレス変換をおこなう。
【００１３】
また、メモリモジュール内のアドレスマッピング回路でおこなうアドレス変換は、メモリモジュールのバンク構成に適した方式となっている。
【００１４】
【発明の実施の形態】
本発明に従う計算機システムの例を図１に示す。プロセッサ１０は４個示されているがその個数は本質でなく、１個のプロセッサでも良い。このプロセッサに対し、主記憶装置３０は複数個（図では４個）のメモリモジュール５０で構成され、プロセッサ１０と主記憶装置３０の各メモリモジュールとは相互結合網２０で結合される。相互結合網２０にはアドレスマッピング回路４０を備え、ストライドされたアクセスの際にも特定のメモリモジュールへのアクセス要求の集中を軽減するようなアドレス変換がなされる。さらに、主記憶装置３０を構成する各メモリモジュール５０にもそれぞれアドレスマッピング回路６０を備えており、アドレスマッピング回路６０でおこなうアドレス変換は、メモリモジュール５０内のバンク構成に適した方式となっている。全体として主記憶スキューのためのアドレス変換を２段階でおこなう。
【００１５】
以下、図面を用いて本発明の実施例を説明する。
【００１６】
図７に、本発明の一実施例による計算機システムの要部構成を示す。全体を符号２で示す本実施例の計算機システムは、データの処理をおこなう４個のプロセッサ１２と、データを記憶する主記憶装置３２とを有し，それらが相互結合網２２を介して相互に結合されている。相互結合網２２は、４個のプロセッサ１２それぞれに対応する４個の網制御装置９２を有し、網制御装置９２はそれぞれアドレスマッピング回路４２を備える。主記憶装置３２は４個のメモリモジュール５２を有し、メモリモジュール５２はそれぞれアドレスマッピング回路６２を備える。
【００１７】
プロセッサ１２は、相互結合網２２に対してそれぞれ２個のアクセスポートを有する。また、網制御装置９２は，プロセッサ１２に対して２個のアクセスポートを有する。プロセッサ１２と網制御装置９２は１対１に対応しており（プロセッサ＃０と網制御装置＃０、プロセッサ＃１と網制御装置＃１など），対応するプロセッサ１２と網制御装置９２の２個のアクセスポート間にそれぞれアクセスパスが張られ、プロセッサ１２と網制御装置９２の間のアクセスパスが二重化されている。各アクセスパスの上を１サイクルあたり最大１個のアクセス要求が流れ、各プロセッサ１２から対応する網制御装置９２に対しては、１サイクルあたり最大２個のアクセス要求が送られる。
【００１８】
メモリモジュール５２は、４個の網制御装置９２それぞれに対応する４個のアクセスポートを有する。また、網制御装置９２は，４個のメモリモジュール５２それぞれに対応する４個のアクセスポートを有する。各網制御装置９２と各メモリモジュール５２の対応するポート間にアクセスパスが張られ、全体としては、４個の網制御装置９２と４個のメモリモジュール５２とが全対全で結合されている。各アクセスパスの上を、１サイクルあたり最大１個のアクセス要求が流れる。
【００１９】
本実施例の計算機システム２では、プロセッサ１２から主記憶装置３２に送るアクセス要求として、主記憶装置からデータをプロセッサ内のレジスタ（図示せず）あるいはキャッシュ（図示せず）に格納するフェッチ要求、および、プロセッサ内のレジスタあるいはキャッシュからデータを主記憶装置に格納するストア要求などがある。しかし本実施例では以下、簡単化のためストアアクセスに関する動作を例に取って説明する。図面についても、主にストア要求に関する装置部分を示し，他の要求のための回路は省略する。
【００２０】
プロセッサ１２がストアアクセス要求を発行すると、その要求は二重化されたアクセスパスを介して対応する網制御装置９２に送られる。各プロセッサ１２は対応する網制御装置９２に対して、ストア要求を１サイクルあたり最大２個発行できる。網制御装置９２は、プロセッサ１２からストア要求を受け取ると，ストア要求で指定されたアドレスが属するいずれかのメモリモジュール５２へ、ストア要求を転送する。各網制御装置９２は、４個のメモリモジュール５２それぞれに対して独立に、ストア要求を１サイクルあたり最大１個発行できる。これにより，各網制御装置９２から主記憶装置３２全体に対しては、ストア要求を１サイクルあたり最大４個発行できる。
【００２１】
図７に示した本実施例の計算機システム２で使われるメモリモジュール５２として、最初は図８に示すメモリモジュール５２Ａを用い、のちに性能向上のために図９に示すメモリモジュール５２Ｂに交換するものとする。
【００２２】
図８に示すメモリモジュール５２Ａは、アドレスマッピング回路６２Ａと，２個のバンクグループ８２からなる。各バンクグループ８２は４個のバンク７２からなり、ストア要求を１サイクルあたり最大１個受けることができる。メモリモジュール５２Ａ内の２個のバンクグループ８２は独立に動作し、網制御装置９２から並列にアクセス可能となっている。これにより、メモリモジュールの数より多いデータ要素を並列に処理可能となる。
【００２３】
一方、図９に示すメモリモジュール５２Ｂは，アドレスマッピング回路６２Ｂと，８個のバンクグループ８２からなる。各バンクグループ８２は、図８に示したものと同一である。すなわち、各バンクグループ８２は４個のバンク７２からなり，ストア要求を１サイクルあたり最大１個受けることができる。メモリモジュール５２Ｂ内の８個のバンクグループ８２は独立に動作し、網制御装置９２から並列にアクセス可能となっている。
【００２４】
図８に示したメモリモジュール５２Ａ全体では、ストア要求を１サイクルあたり最大２個受けることができるのに対し、図９に示したメモリモジュール５２Ｂ全体では、ストア要求を１サイクルあたり最大８個受けることができる。また総バンク数も増えるため、バンク競合が起こりにくくなる。これらにより、メモリモジュール５２Ｂを用いて主記憶装置３２を構築した方が，より高い処理性能を得ることができる。
【００２５】
図１０に、プロセッサ１２が発行するストアアクセス要求に付随するアドレスフィールドを示す。本実施例の計算機システム２では、アドレスは２０ビットで指定される。データは主記憶装置３２に８バイト単位で格納され、１要素８バイトがアクセス単位となる。図において上位アドレスとは、先に定義したアドレス、すなわち，主記憶装置３２のアクセス単位に付けられた番号を意味する。本実施例では、以下，この上位アドレスを単にアドレスということがある。上位アドレスは１７ビットからなる。一方、下位アドレスは，データの１要素内のバイトアドレスを指す。本実施例では、データ要素が８バイトの大きさを有するため、下位アドレスは３ビットからなる。
【００２６】
図８に示したメモリモジュール５２Ａを用いて主記憶装置３２を構築した場合、図１０のアドレスフィールドがさらに図１１に示すように解釈される。アドレスの最上位ビットを第０ビットとした時、第１５ビットと第１６ビットの２ビットよりなるＭＭフィールドは、図７の計算機システムにインタリーブ方式でアドレスを割り付ける時に、そのアドレスが割り付けられるメモリモジュール番号を示す。第１４ビットはＢＧフィールドで、ＭＭフィールドで決まるメモリモジュール内でそのアドレスが割り付けられているバンクグループ番号を示す。そして、第１２ビットと第１３ビットの２ビットよりなるＢＫフィールドは，そのバンクグループ内でそのアドレスが割り付けられるバンク番号を示す。第０ビットから第１１ビットはバンク内オフセットを表すフィールドであり、バンク内でそのアドレスが割り当てられたアクセス単位の記憶位置を表す。
【００２７】
一方、図９に示したメモリモジュール５２Ｂを用いて主記憶装置３２を構築した場合、図１０のアドレスフィールドは図１２に示すように解釈される。第１５ビットと第１６ビットの２ビットよりなるＭＭフィールドは、図７の計算機システムにインタリーブ方式でアドレスを割り付ける時に、そのアドレスが割り付けられるメモリモジュール番号を示す。第１２ビットから第１４ビットの３ビットはＢＧフィールドで、ＭＭフィールドで決まるメモリモジュール内でそのアドレスが割り付けられているバンクグループ番号を示す。そして、第１０ビットと第１１ビットの２ビットよりなるＢＫフィールドは、そのバンクグループ内でそのアドレスが割り付けられるバンク番号を示す。第０ビットから第９ビットはバンク内オフセットを表すフィールドであり、バンク内でそのアドレスが割り当てられたアクセス単位の記憶位置を表す。
【００２８】
以下、プロセッサ１２からストアアクセス要求が発行された際の動作を説明する。
【００２９】
プロセッサ１２がストアアクセス要求を発行すると、その要求は二重化されたアクセスパスを介して対応する網制御装置９２に送られる。既に述べた通り、各プロセッサ１２は対応する網制御装置９２に対して、ストア要求を１サイクルあたり最大２個発行することができる。
【００３０】
網制御装置９２は、プロセッサ１２からストア要求を１サイクルあたり最大２個受け取る。ストア要求を受け取ると、まず，アドレスマッピング回路４２を用いて、そのストア要求に付随するアドレスを，本実施例における主記憶スキュースキームに従って、アクセスすべきメモリモジュール番号を含むアドレスに変換する。その後、変換したアドレスからストア要求を転送すべきメモリモジュール５２を決定し、そのメモリモジュール５２へストア要求を転送する。各網制御装置９２は、４個のメモリモジュール５２それぞれに対して独立に，ストア要求を１サイクルあたり最大１個転送することができる。これにより、各網制御装置９２から主記憶装置３２全体に対しては、ストア要求を１サイクルあたり最大４個転送することができる。
【００３１】
図１３に、アドレスマッピング回路４２でおこなうアドレス変換の方法を示す。アドレスマッピング回路４２は、モジュロ４の２ビット加算器１０１を備えている。加算器１０１は、変換前アドレスの第１３〜１４ビットと第１５〜１６ビットを４のモジュロで加算し、その結果を変換後アドレスの第１５〜１６ビットとする。この加算の際に生じる桁上げは無視する（すなわち、モジュロ４の加算をおこなう）。第０〜１４ビットおよび第１７〜１９ビットについては、変換前アドレスと変換後アドレスで変わらない。変換後アドレスの第１５〜１６ビットを、アクセスすべきメモリモジュール番号とする。図１４に、各メモリモジュールに割り付けられるアドレスを示す。メモリモジュール番号ＭＭとアドレスＡＤＲの関係は、（数２）で表される。
【００３２】
（数２）
ＭＭ＝（ＡＤＲ＋ＡＤＲ÷４）ｍｏｄ４
図１４のアドレス割り付けにおいて、プロセッサ１２から網制御装置９２に一定のアドレス間隔（ストライド）のストア要求を発行していく場合の、ストライドとそれに対するストア要求のメモリモジュール５２への分散状況との関係を、図１５に示す。また，図１４のアドレス割り付けにおいて，プロセッサ１２から網制御装置９２に対して、一定のアドレス間隔（ストライド）のストア要求を１サイクルあたり２個ずつ発行していく場合の、ストライドとそれに対する網制御装置９２の処理性能との関係を、図１６に示す。ただし，この場合の性能とは，プロセッサ１２がストア要求を発行し始めてから十分時間が経過し、１サイクルあたりの網制御装置９２から主記憶装置３２へのストア要求転送数が定常状態になった時の性能であり、１サイクルに１要求を転送できる場合の性能を１としている。網制御装置９２の処理性能に着目するため、主記憶装置３２におけるストア要求の滞りは起こらないと仮定している。ストライドが１６の倍数の時は、プロセッサ１２から網制御装置９２に１サイクルあたり２要求が送られてくるのに対し、網制御装置９２から主記憶装置３２へは１サイクルあたり１要求しか転送することができない。網制御装置９２への要求入力ピッチが処理能力を越えるため、プロセッサ１２のストア要求発行を適宜抑止する必要が生じる。
【００３３】
比較のため、図１７に，網制御装置９２がアドレスマッピング回路４２でアドレス変換をおこなわない場合の、各メモリモジュール５２に割り付けられるアドレスを示す。また、その場合の，ストライドとそれに対するストア要求のメモリモジュール５２への分散状況との関係を図１８に、ストライドとそれに対する網制御装置９２の処理性能との関係を図１９に示す。図１５と図１８の比較からわかるように、アドレスマッピング回路４２のアドレス変換により，ストア要求が特定メモリモジュール５２へ集中するストライドの種類を少なくすることができる。また、図１６と図１９の比較からわかるように，アドレスマッピング回路４２のアドレス変換により、網制御装置９２の処理性能が低下するストライドの種類を少なくすることができる。
【００３４】
網制御装置９２からメモリモジュール５２にストア要求を転送する際、図２０に示すように、ストア要求に付随するアドレスのうち，第１５〜１６ビット（ＭＭフィールド）を落として転送する。これによって、必要な信号線の本数を減らすことができる。
【００３５】
網制御装置９２は、ストア要求を一時的に保持するバッファ等を備えてもよい。これにより、ストア要求の転送先が一時的にある特定のメモリモジュール５２に集中しても、網制御装置９２におけるストア要求の保持能力を越えない限りは，プロセッサ１２からのストア要求を抑止しないで済む。また、アドレスマッピング回路４２でおこなうアドレス変換は，図１３とは異なっていてもよい。例えば、図１３のモジュロ４の２ビット加算器１０１に代えて，２ビットの排他的論理和回路を利用してもよい。
【００３６】
続いて、メモリモジュール５２内でのストア要求の処理の流れを説明する。
【００３７】
まず、図８に示したメモリモジュール５２Ａを使った場合について説明する。メモリモジュール５２Ａは、４個の網制御装置９２それぞれから，ストア要求を１サイクルあたり最大１個受け取る。ストア要求を受け取ると、まず，アドレスマッピング回路６２Ａを用いて、そのストア要求に付随するアドレスを，本実施例における主記憶スキュースキームに従って、アクセスすべきバンクグループ番号およびバンク番号を含むアドレスに変換する。その後、変換したアドレスからストア要求を転送すべきバンクグループ８２およびバンク７２を決定し、当該バンクグループ８２の当該バンク７２でストア処理をおこなう。各バンクグループ８２は、独立にストア処理をおこなうことができる。
【００３８】
図２１に、アドレスマッピング回路６２Ａでおこなうアドレス変換の方法を示す。アドレスマッピング回路６２Ａは、排他的論理和回路１０２を備えている。排他的論理和回路１０２は、変換前アドレスの第０〜２ビットと第３〜５ビットと第６〜８ビットと第９〜１１ビットと第１２〜１４ビットを入力として、３ビット幅のビットごとに排他的論理和を取り、その結果を変換後アドレスの第１２〜１４ビットとする。第０〜１１ビットおよび第１５〜１７ビットについては、変換前アドレスと変換後アドレスで変わらない。変換後アドレスの第１４ビットをアクセスすべきバンクグループ番号とし、第１２〜１３ビットをバンクグループ内のバンク番号とする。図２２に、メモリモジュール６２Ａを用いて主記憶装置３２を構築した場合の、メモリモジュール，バンクグループ，バンクの各階層へのアドレス割り付けを示す。また、本アドレス割り付けにおけるストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を、図２３に示す。
【００３９】
比較のため、図２４に，メモリモジュール５２Ａがアドレスマッピング回路６２Ａでアドレス変換をおこなわない場合の、メモリモジュール，バンクグループ，バンクの各階層へのアドレス割り付けを示す。また、その場合の，ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を、図２５に示す。図２３と図２５の比較からわかるように，アドレスマッピング回路６２Ａのアドレス変換により、ストライドが２の倍数であれば、同じストライドに対してストア要求が送られるバンクグループおよびバンクの数が増え、より高い主記憶処理性能を得ることができる。
【００４０】
次に、図９に示したメモリモジュール５２Ｂを使った場合について説明する。メモリモジュール５２Ｂは、４個の網制御装置９２それぞれから，ストア要求を１サイクルあたり最大１個受け取る。ストア要求を受け取ると、まず，アドレスマッピング回路６２Ｂを用いて、そのストア要求に付随するアドレスを，本実施例における主記憶スキュースキームに従って、アクセスすべきバンクグループ番号およびバンク番号を含むアドレスに変換する。その後、変換したアドレスからストア要求を転送すべきバンクグループ８２およびバンク７２を決定し、当該バンクグループ８２の当該バンク７２でストア処理をおこなう。各バンクグループ８２は、独立にストア処理をおこなうことができる。
【００４１】
図２６に、アドレスマッピング回路６２Ｂでおこなうアドレス変換の方法を示す。アドレスマッピング回路６２Ｂは、排他的論理和回路１０３を備えている。排他的論理和回路１０３は、変換前アドレスの第０〜４ビットと第５〜９ビットと第１０〜１４ビットを入力として、５ビット幅のビットごとに排他的論理和を取り，その結果を変換後アドレスの第１０〜１４ビットとする。第０〜９ビットおよび第１５〜１７ビットについては、変換前アドレスと変換後アドレスで変わらない。変換後アドレスの第１２〜１４ビットをアクセスすべきバンクグループ番号とし、第１０〜１１ビットをバンクグループ内のバンク番号とする。図２７ないし図３０に、メモリモジュール６２Ａを用いて主記憶装置３０を構築した場合の、メモリモジュール、バンクグループ，バンクの各階層へのアドレス割り付けを示す。また、本アドレス割り付けにおけるストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を，図３１に示す。
【００４２】
比較のため、図３２ないし図３５に，メモリモジュール５２Ａがアドレスマッピング回路６２Ｂでアドレス変換をおこなわない場合の、メモリモジュール、バンクグループ，バンクの各階層へのアドレス割り付けを示す。また、その場合の，ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を，図３６に示す。図３１と図３６の比較からわかるように、アドレスマッピング回路６２Ｂのアドレス変換により、ストライドが２の倍数であれば，同じストライドに対してストア要求が送られるバンクグループおよびバンクの数が増え、より高い主記憶処理性能を得ることができる。
【００４３】
なお、図２３と図３１の比較からわかるように，主記憶装置３２を構成するメモリモジュールを図８のメモリモジュール５２Ａから図９のメモリモジュール５２Ｂに交換することで、同じストライドに対してストア要求が送られるバンクグループおよびバンクの数が増え、より高い処理性能を持つ主記憶装置３２が実現される。
【００４４】
図８のメモリモジュール５２Ａおよび図９のメモリモジュール５２Ｂは、ストア要求を一時的に保持するバッファ等を備えてもよい。これにより、ストア要求の転送先が一時的にある特定のバンクグループ８２あるいは特定のバンク７２に集中しても、メモリモジュール５２Ａまたはメモリモジュール５２Ｂにおけるストア要求の保持能力を越えない限りは、網制御装置９２からのストア要求を抑止しないで済む。また、アドレスマッピング回路６２Ａまたはアドレスマッピング回路６２Ｂでおこなうアドレス変換は、図２１または図２６と異なっていてもよい。例えば，図２１の３ビット排他的論理和回路１０２に代えて、モジュロ８の３ビット加算器を利用してもよい。また、図２６の５ビット排他的論理和回路１０３に代えて，モジュロ３２の５ビット加算器を利用してもよい。
【００４５】
本発明の特徴は、主記憶スキューを実現するためのアドレスマッピング回路を相互結合網内とメモリモジュール内の２箇所に設け、アドレス変換を２段階でおこなっていることにある。これにより、メモリモジュールの交換によりバンクグループおよびバンクの構成が変わった場合でも、その新しい構成に適した主記憶スキューを実現し、高い主記憶性能を得ることができる。
【００４６】
ここで比較のために、従来の計算機システムでメモリモジュールを交換することを考えてみる。図３７に示す従来の計算機システム３は、メモリモジュール５３がアドレスマッピング回路を持たないことと、網制御装置９３内のアドレスマッピング回路４３でおこなうアドレス変換がアドレスマッピング回路４２でおこなうアドレス変換と異なることを除いては、図７に示した本発明の一実施例による計算機システム２と同様の構成である。図３７の計算機システム３で使われるメモリモジュール５３として、最初は図３８に示すメモリモジュール５３Ａを用い、のちに性能向上のために図３９に示すメモリモジュール５３Ｂに交換するものとする。図３８のメモリモジュール５３Ａは、アドレスマッピング回路を持たず、アドレス変換をおこなわないことを除いては，バンクグループおよびバンクの構成などすべて図８のメモリモジュール５２Ａと同様である。また、図３９のメモリモジュール５３Ｂは，アドレスマッピング回路を持たず，アドレス変換をおこなわないことを除いては、バンクグループおよびバンクの構成などすべて図９のメモリモジュール５２Ｂと同様である。
【００４７】
図４０に、アドレスマッピング回路４３でおこなうアドレス変換の方法を示す。アドレスマッピング回路４３では、計算機システム３でメモリモジュール５３Ａを用いた場合に、計算機システム２でメモリモジュール５２Ａを用いた場合と同じアドレス割り付け、すなわち図２２のアドレス割り付けになるよう，アドレス変換をおこなう。アドレスマッピング回路４３は、モジュロ４の２ビット加算器１０４と，排他的論理和回路１０５とを備えている。加算器１０４は、変換前アドレスの第１３〜１４ビットと第１５〜１６ビットを４のモジュロで加算し、その結果を変換後アドレスの第１５〜１６ビットとする。排他的論理和回路１０５は、変換前アドレスの第０〜２ビットと第３〜５ビットと第６〜８ビットと第９〜１１ビットと第１２〜１４ビットを入力として、３ビット幅のビットごとに排他的論理和を取り，その結果を変換後アドレスの第１２〜１４ビットとする。第０〜１１ビットおよび第１７〜１９ビットについては、変換前アドレスと変換後アドレスで変わらない。計算機システム３でメモリモジュール５３Ａを用いた場合のアドレス割り付けは、計算機システム２でメモリモジュール５２Ａを用いた場合のアドレス割り付け（図２２）と同じである。また、ストライドとそれに対するストア要求のメモリモジュール，バンクグループ、バンクへの分散状況との関係も，計算機システム２でメモリモジュール５３Ａを用いた場合のもの（図２３）と同じである。
【００４８】
ここで、計算機システム３のメモリモジュール５３として，メモリモジュール５３Ｂを用いた場合を考える。この場合のアドレス割り付けを、図４１ないし図４４に示す。また、本アドレス割り付けにおけるストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を，図４５に示す。図３１と図４５の比較からわかるように、本発明による計算機システム２は従来の計算機システム３と比べて、ストライドが６４の倍数であれば，同じストライドに対してストア要求が送られるバンクの数が増え、より高い主記憶処理性能を得ることができる。
【００４９】
【発明の効果】
以上説明したように、本発明による計算機システムでは，メモリモジュールの交換によりバンク構成が変わった場合でも、その新しいバンク構成に適した主記憶スキューを実現することで、主記憶装置の性能を向上させることができる。
【図面の簡単な説明】
【図１】本発明による計算機システムの要部構成を示すブロック図。
【図２】主記憶スキューをおこなわないインタリーブ方式におけるアドレス割り付けを示す図。
【図３】図２のアドレス割り付けをおこなった場合の、ストライドと，それに対するアクセス要求のバンクへの分散状況との関係を表す図。
【図４】主記憶スキューをおこなう従来の計算機システムの全体構成を示すブロック図。
【図５】図４の計算機システムで主記憶スキューをおこなった場合のアドレス割り付けを示す図。
【図６】図５のアドレス割り付けをおこなった場合の、ストライドと，それに対するアクセス要求のバンクへの分散状況との関係を表す図。
【図７】本発明の一実施例による計算機システムの全体構成を示すブロック図。
【図８】図７の計算機システムに用いるメモリモジュールの一例の構成を表すブロック図。メモリモジュールを２バンクグループ、各バンクグループを４バンクで構成したものである。
【図９】図７の計算機システムに用いるメモリモジュールの一例の構成を表すブロック図。メモリモジュールを８バンクグループ、各バンクグループを４バンクで構成したものである。
【図１０】図７の計算機システムのプロセッサが発行するアクセス要求に付随するアドレスフィールドを示す図。
【図１１】図７の計算機システムの主記憶装置を図８のメモリモジュールで構築した場合の、プロセッサから網制御装置に送られるストア要求に付随するアドレスフィールドを示す図。
【図１２】図７の計算機システムの主記憶装置を図９のメモリモジュールで構築した場合の、プロセッサから網制御装置に送られるストア要求に付随するアドレスフィールドを示す図。
【図１３】図７の計算機システムのアドレスマッピング回路４２でおこなうアドレス変換の方法を表す図。
【図１４】図１３のアドレス変換をおこなった場合の、各メモリモジュールに割り付けられるアドレスを示す図。
【図１５】図１４のアドレス割り付けをおこなった場合の、ストライドと，それに対するストア要求のメモリモジュールへの分散状況との関係を表す図。
【図１６】図１４のアドレス割り付けをおこなった場合の、ストライドと，それに対する網制御装置の処理性能との関係を表す図。
【図１７】図１３のアドレス変換をおこなわない場合の、各メモリモジュールに割り付けられるアドレスを示す図。
【図１８】図１７のアドレス割り付けをおこなった場合の、ストライドと，それに対するストア要求のメモリモジュールへの分散状況との関係を表す図。
【図１９】図１７のアドレス割り付けをおこなった場合の、ストライドと，それに対する網制御装置の処理性能との関係を表す図。
【図２０】図７の計算機システムで、網制御装置からメモリモジュールへストア要求を転送する際の、アドレス幅の削減方法を表す図。
【図２１】図８のメモリモジュールのアドレスマッピング回路６２Ａでおこなうアドレス変換の方法を表す図。
【図２２】図７の計算機システムで図８のメモリモジュール５２Ａを使用した場合の、アドレス割り付けを示す図。
【図２３】図２２のアドレス割り付けをおこなった場合の、ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を表す図。
【図２４】図２１のアドレス変換をおこなわない場合の、アドレス割り付けを示す図。
【図２５】図２４のアドレス割り付けをおこなった場合の、ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を表す図。
【図２６】図９のメモリモジュールのアドレスマッピング回路６２Ｂでおこなうアドレス変換の方法を表す図。
【図２７】図７の計算機システムで図９のメモリモジュール５２Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図２８】図７の計算機システムで図９のメモリモジュール５２Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図２９】図７の計算機システムで図９のメモリモジュール５２Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図３０】図７の計算機システムで図９のメモリモジュール５２Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図３１】図２７〜３０のアドレス割り付けをおこなった場合の、ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を表す図。
【図３２】図２６のアドレス変換をおこなわない場合の、アドレス割り付けの一部を示す図。
【図３３】図２６のアドレス変換をおこなわない場合の、アドレス割り付けの一部を示す図。
【図３４】図２６のアドレス変換をおこなわない場合の、アドレス割り付けの一部を示す図。
【図３５】図２６のアドレス変換をおこなわない場合の、アドレス割り付けの一部を示す図。
【図３６】図３２〜３５のアドレス割り付けをおこなった場合の、ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を表す図。
【図３７】従来の計算機システムの一例の全体構成を示すブロック図。
【図３８】図３７の計算機システムに用いるメモリモジュールの一例の構成を表すブロック図。メモリモジュールを２バンクグループ、各バンクグループを４バンクで構成したものである。
【図３９】図３７の計算機システムに用いるメモリモジュールの一例の構成を表すブロック図。メモリモジュールを８バンクグループ、各バンクグループを４バンクで構成したものである。
【図４０】図３７の計算機システムのアドレスマッピング回路４３でおこなうアドレス変換の方法を表す図。
【図４１】図３７の計算機システムで図３９のメモリモジュール５３Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図４２】図３７の計算機システムで図３９のメモリモジュール５３Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図４３】図３７の計算機システムで図３９のメモリモジュール５３Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図４４】図３７の計算機システムで図３９のメモリモジュール５３Ｂを使用した場合の、アドレス割り付けの一部を示す図。
【図４５】図４１〜４４のアドレス割り付けをおこなった場合の、ストライドとそれに対するストア要求のメモリモジュール、バンクグループ，バンクへの分散状況との関係を表す図。
【符号の説明】
１０プロセッサ
２０相互結合網
３０主記憶装置
４０アドレスマッピング回路
５０メモリモジュール
６０アドレスマッピング回路。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a computer system having a multi-bank main memory device, and more particularly to a computer system that performs main memory skew.
[0002]
[Prior art]
As a configuration method of a main storage device of a computer system, an interleaving method composed of a large number of banks is known. FIG. 2 shows allocation of addresses by the interleave method when the number of banks constituting the main storage device is 16. The 16 banks are numbered sequentially from 0 to 15, and consecutive addresses are assigned to different banks in sequence. In this interleaving method, when sequentially accessing consecutive addresses, different banks are sequentially accessed, so that high-speed main memory access is possible. However, there is a known problem that when access is made at a specific address interval (stride), the access is concentrated in a specific bank or bank group, and the main memory access performance is lowered. For example, in the address allocation of FIG. 2, consider a case where access is made with a stride 16 starting from address 0. At this time, addresses 0, 16, 32, 48,... Are accessed, but since these addresses are all assigned to bank 0, high-speed main memory access becomes impossible due to bank conflicts. FIG. 3 shows the relationship between the stride and the distribution of access requests to the banks. If access requests are distributed to more banks, bank contention can be avoided and high main memory processing performance can be obtained.
[0003]
The address here refers to the number assigned to the memory location of each access unit of the memory. Hereinafter, unless otherwise specified, the address definition in this specification is as described above.
[0004]
Main memory skew is known as a means for alleviating the degradation in access request processing performance of the main storage device. The main memory skew is DJ Kuck: “ILLIAC IV Software and Application Programming”, IEEE Transactions on Computers, Vol. C-17, No. 8, pp. 758-770, August 1968 or P. Budnik and DJ Kuck: “The Mathematical foundation is given in "Organization and Use of Parallel Memories", IEEE Transactions on Computers, pp. 1566-1569, December 1971. There are various variations of the method, some of which are DT Harper III and JR Jump: “Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme”, IEEE Transactions on Computers, Vol. C-36 (12 ), pp. 1440-1449, December 1987 or the same authors, “Performance Evaluation of Vector Accesses in Parallel Memories Using a Skewed Storage Scheme”, Conference Proceedings of the 13th Annual International Symposium on Computer Architecture, pp.324-328, June. 1986, IEEE or US Pat. No. 4,918,600.
[0005]
FIG. 4 shows an example of a computer system that performs main memory skew. The computer system generally indicated by reference numeral 1 has four processors 11 for processing data and a main storage device 31 for storing data, which are coupled to each other via an interconnection network 21. Yes. The main storage device 31 is composed of four memory modules 51, and each memory module 51 has four banks 91. The main storage device 31 as a whole has 16 banks. The interconnection network 21 has an address mapping circuit 41 for performing main memory skew. The address mapping circuit 41 determines a bank to send the main memory access request based on the address information accompanying the main memory access request. The relationship between the bank number BK, the address ADR, and the number N of memory modules (N = 16 in this case) is expressed by (Expression 1).
[0006]
(Equation 1)
BK = (ADR + ADR ÷ N) mod N
Here, mod N represents taking N modulo. FIG. 5 shows address allocation in the computer system 1 that performs this main memory skew. As shown in FIG. 5, the main memory skew in the computer system of FIG. 4 is such that every time the address advances by the number of banks (here, 16), one bank is allocated. FIG. 6 shows the relationship between the stride and the distribution of access requests to the bank when the memory is allocated by the main memory skew.
[0007]
As can be seen from the comparison between FIG. 3 and FIG. 6, by performing main memory skew, it is possible to reduce the types of strides that cause access requests to concentrate on a specific bank or group of banks, resulting in performance degradation. it can.
[0008]
[Problems to be solved by the invention]
In order to improve the performance of the main storage device, a method of replacing the current memory module with a memory module having a different configuration in which various techniques for speeding up are introduced can be considered. Here, when the memory module is replaced with a memory module having a larger number of banks, in the conventional computer system shown in FIG. 4, depending on the stride, the distribution of access requests to the banks does not match the increase in the number of banks. There is a problem that even if the number of banks is increased, the effect as increased cannot be obtained.
[0009]
For example, it is assumed that the entire main storage device is composed of 32 banks, and in a certain stride, access requests are concentrated only in 8 banks. Here, even if the number of banks in each memory module is quadrupled and the configuration of the main memory device is 128 banks, access requests are still concentrated in 8 banks in the stride, and as a result, the performance does not increase. .
[0010]
In the conventional computer system, the main memory skew is adjusted to the initial bank configuration (32 banks in the above example). The main memory skew is a new bank configuration (in the above example, by replacing the memory module). This is because it is not suitable for (128 banks).
[0011]
An object of the present invention is to solve the above-described problem and improve the performance of the main storage device by realizing a main storage skew suitable for the new bank configuration when the number of banks is increased by replacing the memory module. There is.
[0012]
[Means for Solving the Problems]
In order to solve the above problems, a computer system according to the present invention connects one or more processors for processing data, a main storage device for storing data, and the processor and the main storage device to each other. An interconnection network, the interconnection network having one or a plurality of address mapping circuits, the main memory device having a plurality of memory modules, and each memory module having an address mapping circuit. The address conversion is performed in two stages: the address mapping circuit in the interconnection network and the address mapping circuit in the memory module.
[0013]
The address conversion performed by the address mapping circuit in the memory module is a method suitable for the bank configuration of the memory module.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
An example of a computer system according to the present invention is shown in FIG. Although four processors 10 are shown, the number is not essential, and one processor may be used. For this processor, the main storage device 30 is composed of a plurality (four in the figure) of memory modules 50, and the processor 10 and each memory module of the main storage device 30 are connected by an interconnection network 20. The interconnection network 20 includes an address mapping circuit 40, which performs address conversion to reduce concentration of access requests to a specific memory module even during strided access. Furthermore, each memory module 50 constituting the main storage device 30 is also provided with an address mapping circuit 60, and the address conversion performed by the address mapping circuit 60 is a method suitable for the bank configuration in the memory module 50. . As a whole, address conversion for main memory skew is performed in two stages.
[0015]
Embodiments of the present invention will be described below with reference to the drawings.
[0016]
FIG. 7 shows a main configuration of a computer system according to an embodiment of the present invention. The computer system of this embodiment, indicated as a whole by reference numeral 2, has four processors 12 for processing data and a main storage device 32 for storing data, which are mutually connected via an interconnection network 22. Are combined. The interconnection network 22 includes four network control devices 92 corresponding to the four processors 12, and each network control device 92 includes an address mapping circuit 42. The main storage device 32 has four memory modules 52, and each memory module 52 includes an address mapping circuit 62.
[0017]
Each processor 12 has two access ports for the interconnection network 22. The network control device 92 has two access ports for the processor 12. There is a one-to-one correspondence between the processor 12 and the network control device 92 (processor # 0 and network control device # 0, processor # 1 and network control device # 1, etc.). An access path is established between each access port, and the access path between the processor 12 and the network controller 92 is duplicated. A maximum of one access request per cycle flows on each access path, and a maximum of two access requests per cycle are sent from each processor 12 to the corresponding network controller 92.
[0018]
The memory module 52 has four access ports corresponding to the four network control devices 92, respectively. The network control device 92 has four access ports corresponding to the four memory modules 52, respectively. An access path is established between corresponding ports of each network control device 92 and each memory module 52, and as a whole, four network control devices 92 and four memory modules 52 are coupled in an all-to-all manner. . A maximum of one access request flows per cycle on each access path.
[0019]
In the computer system 2 of this embodiment, as an access request sent from the processor 12 to the main storage device 32, a fetch request for storing data from the main storage device in a register (not shown) or a cache (not shown) in the processor, In addition, there is a store request for storing data in a main memory from a register or cache in the processor. However, in the present embodiment, for the sake of simplification, an operation related to store access will be described as an example. Also in the drawing, the apparatus part mainly related to the store request is shown, and circuits for other requests are omitted.
[0020]
When the processor 12 issues a store access request, the request is sent to the corresponding network control device 92 via the duplexed access path. Each processor 12 can issue a maximum of two store requests per cycle to the corresponding network controller 92. When receiving the store request from the processor 12, the network control device 92 transfers the store request to one of the memory modules 52 to which the address specified by the store request belongs. Each network control device 92 can issue a maximum of one store request per cycle independently to each of the four memory modules 52. As a result, a maximum of four store requests can be issued per cycle from each network control device 92 to the entire main storage device 32.
[0021]
As the memory module 52 used in the computer system 2 of this embodiment shown in FIG. 7, the memory module 52A shown in FIG. 8 is used first, and is later replaced with the memory module 52B shown in FIG. 9 to improve performance. And
[0022]
The memory module 52A shown in FIG. 8 includes an address mapping circuit 62A and two bank groups 82. Each bank group 82 includes four banks 72 and can receive a maximum of one store request per cycle. The two bank groups 82 in the memory module 52A operate independently and can be accessed in parallel from the network control device 92. This makes it possible to process more data elements than the number of memory modules in parallel.
[0023]
On the other hand, the memory module 52B shown in FIG. 9 includes an address mapping circuit 62B and eight bank groups 82. Each bank group 82 is the same as that shown in FIG. That is, each bank group 82 includes four banks 72 and can receive a maximum of one store request per cycle. The eight bank groups 82 in the memory module 52B operate independently and can be accessed in parallel from the network control device 92.
[0024]
The entire memory module 52A shown in FIG. 8 can receive up to two store requests per cycle, whereas the entire memory module 52B shown in FIG. 9 receives up to eight store requests per cycle. Can do. In addition, since the total number of banks increases, bank competition is less likely to occur. Accordingly, it is possible to obtain higher processing performance when the main storage device 32 is constructed using the memory module 52B.
[0025]
FIG. 10 shows an address field associated with a store access request issued by the processor 12. In the computer system 2 of this embodiment, the address is specified by 20 bits. Data is stored in the main storage device 32 in units of 8 bytes, and 8 bytes per element is an access unit. In the figure, the upper address means the previously defined address, that is, the number assigned to the access unit of the main storage device 32. In the present embodiment, hereinafter, this upper address may be simply referred to as an address. The upper address consists of 17 bits. On the other hand, the lower address indicates a byte address in one element of data. In this embodiment, since the data element has a size of 8 bytes, the lower address consists of 3 bits.
[0026]
When the main storage device 32 is constructed using the memory module 52A shown in FIG. 8, the address field in FIG. 10 is further interpreted as shown in FIG. When the most significant bit of the address is the 0th bit, the MM field consisting of 2 bits of the 15th bit and the 16th bit is a memory module to which the address is assigned when the address is assigned to the computer system of FIG. Indicates the number. The 14th bit is a BG field, which indicates the bank group number to which the address is assigned in the memory module determined by the MM field. A BK field consisting of 2 bits of the 12th and 13th bits indicates a bank number to which the address is assigned in the bank group. The 0th bit to the 11th bit are fields representing the offset in the bank, and represent the storage position of the access unit to which the address is assigned in the bank.
[0027]
On the other hand, when the main storage device 32 is constructed using the memory module 52B shown in FIG. 9, the address field in FIG. 10 is interpreted as shown in FIG. The MM field consisting of 2 bits of the 15th bit and the 16th bit indicates the memory module number to which the address is assigned when the address is assigned to the computer system of FIG. 7 by the interleave method. The 3rd bit from the 12th bit to the 14th bit is a BG field and indicates a bank group number to which the address is allocated in the memory module determined by the MM field. A BK field consisting of 2 bits of the 10th bit and the 11th bit indicates a bank number to which the address is assigned in the bank group. The 0th to 9th bits are a field representing an in-bank offset, and represent a storage position of an access unit to which the address is assigned in the bank.
[0028]
Hereinafter, an operation when a store access request is issued from the processor 12 will be described.
[0029]
When the processor 12 issues a store access request, the request is sent to the corresponding network control device 92 via the duplexed access path. As described above, each processor 12 can issue a maximum of two store requests per cycle to the corresponding network controller 92.
[0030]
The network controller 92 receives a maximum of two store requests from the processor 12 per cycle. When the store request is received, first, the address mapping circuit 42 is used to convert the address accompanying the store request into an address including the memory module number to be accessed according to the main memory skew scheme in this embodiment. Thereafter, the memory module 52 to which the store request is to be transferred is determined from the converted address, and the store request is transferred to the memory module 52. Each network controller 92 can transfer a maximum of one store request per cycle to each of the four memory modules 52 independently. As a result, a maximum of four store requests can be transferred from each network control device 92 to the entire main storage device 32 per cycle.
[0031]
FIG. 13 shows a method of address conversion performed by the address mapping circuit 42. The address mapping circuit 42 includes a modulo 4 2-bit adder 101. The adder 101 adds the thirteenth to fourteenth bits and the fifteenth to sixteenth bits of the pre-conversion address with 4 modulo and sets the result as the fifteenth to sixteenth bits of the post-conversion address. The carry generated during this addition is ignored (ie, modulo 4 addition is performed). The 0th to 14th bits and the 17th to 19th bits do not change between the pre-conversion address and the post-conversion address. The 15th to 16th bits of the post-conversion address are the memory module number to be accessed. FIG. 14 shows addresses assigned to the memory modules. The relationship between the memory module number MM and the address ADR is expressed by (Expression 2).
[0032]
(Equation 2)
MM = (ADR + ADR ÷ 4) mod 4
In the address allocation of FIG. 14, when the store request is issued from the processor 12 to the network control device 92 at a fixed address interval (stride), the relationship between the stride and the distribution status of the store request to the memory module 52 corresponding thereto Is shown in FIG. Further, in the address allocation of FIG. 14, when the processor 12 issues two store requests at a constant address interval (stride) to the network control device 92 for each cycle, the stride and the network control for it. The relationship with the processing performance of the apparatus 92 is shown in FIG. However, the performance in this case is that a sufficient time has passed since the processor 12 started issuing a store request, and the number of store request transfers from the network controller 92 to the main storage device 32 per cycle became a steady state. The performance when one request can be transferred in one cycle is 1. In order to focus on the processing performance of the network control device 92, it is assumed that there is no stagnation of store requests in the main storage device 32. When the stride is a multiple of 16, two requests are sent per cycle from the processor 12 to the network control unit 92, whereas only one request per cycle is transferred from the network control unit 92 to the main storage unit 32. I can't. Since the request input pitch to the network control device 92 exceeds the processing capability, it is necessary to appropriately suppress the store request issuance of the processor 12.
[0033]
For comparison, FIG. 17 shows an address assigned to each memory module 52 when the network control device 92 does not perform address conversion in the address mapping circuit 42. In addition, FIG. 18 shows the relationship between the stride and the distribution status of the store request to the memory module 52 in that case, and FIG. 19 shows the relationship between the stride and the processing performance of the network control device 92 corresponding thereto. As can be seen from the comparison between FIG. 15 and FIG. 18, the type of stride in which store requests concentrate on the specific memory module 52 can be reduced by the address conversion of the address mapping circuit 42. Further, as can be seen from the comparison between FIG. 16 and FIG. 19, the address conversion of the address mapping circuit 42 can reduce the types of strides that degrade the processing performance of the network control device 92.
[0034]
When the store request is transferred from the network controller 92 to the memory module 52, as shown in FIG. 20, the 15th to 16th bits (MM field) of the address associated with the store request are dropped and transferred. As a result, the number of necessary signal lines can be reduced.
[0035]
The network control device 92 may include a buffer or the like that temporarily stores a store request. As a result, even if the transfer destination of the store request is temporarily concentrated on a specific memory module 52, the store request from the processor 12 is not suppressed unless the store request holding capability in the network control device 92 is exceeded. That's it. Further, the address conversion performed by the address mapping circuit 42 may be different from that shown in FIG. For example, a 2-bit exclusive OR circuit may be used instead of the modulo 4 2-bit adder 101 of FIG.
[0036]
Next, the flow of store request processing in the memory module 52 will be described.
[0037]
First, the case where the memory module 52A shown in FIG. 8 is used will be described. The memory module 52A receives a maximum of one store request per cycle from each of the four network controllers 92. When the store request is received, first, the address mapping circuit 62A is used to convert the address accompanying the store request into an address including the bank group number and bank number to be accessed according to the main memory skew scheme in this embodiment. . Thereafter, the bank group 82 and the bank 72 to which the store request is transferred are determined from the converted address, and the store process is performed in the bank 72 of the bank group 82. Each bank group 82 can perform store processing independently.
[0038]
FIG. 21 shows a method of address conversion performed by the address mapping circuit 62A. The address mapping circuit 62A includes an exclusive OR circuit 102. The exclusive OR circuit 102 receives the 0th to 2nd bits, the 3rd to 5th bits, the 6th to 8th bits, the 9th to 11th bits, and the 12th to 14th bits of the pre-conversion address as a 3 bit wide bit. An exclusive OR is taken every time, and the result is taken as the 12th to 14th bits of the converted address. The 0th to 11th bits and the 15th to 17th bits do not change between the pre-conversion address and the post-conversion address. The 14th bit of the post-conversion address is the bank group number to be accessed, and the 12th to 13th bits are the bank numbers in the bank group. FIG. 22 shows the address allocation to each layer of the memory module, bank group, and bank when the main storage device 32 is constructed using the memory module 62A. FIG. 23 shows the relationship between the stride in this address allocation and the distribution status of the store request for the memory module, bank group, and bank.
[0039]
For comparison, FIG. 24 shows address allocation to each layer of the memory module, bank group, and bank when the memory module 52A does not perform address conversion by the address mapping circuit 62A. In addition, FIG. 25 shows the relationship between the stride and the distribution status of the store request corresponding to the memory module, bank group, and bank in that case. As can be seen from the comparison between FIG. 23 and FIG. 25, if the stride is a multiple of 2 by the address conversion of the address mapping circuit 62A, the number of bank groups and banks to which store requests are sent for the same stride increases. High main memory processing performance can be obtained.
[0040]
Next, a case where the memory module 52B shown in FIG. 9 is used will be described. The memory module 52B receives a maximum of one store request per cycle from each of the four network controllers 92. When a store request is received, first, the address mapping circuit 62B is used to convert the address accompanying the store request into an address including the bank group number and bank number to be accessed according to the main memory skew scheme in this embodiment. . Thereafter, the bank group 82 and the bank 72 to which the store request is transferred are determined from the converted address, and the store process is performed in the bank 72 of the bank group 82. Each bank group 82 can perform store processing independently.
[0041]
FIG. 26 shows a method of address conversion performed by the address mapping circuit 62B. The address mapping circuit 62B includes an exclusive OR circuit 103. The exclusive OR circuit 103 inputs the 0th to 4th bits, the 5th to 9th bits, and the 10th to 14th bits of the pre-conversion address, takes an exclusive OR for each bit of 5 bits width, and outputs the result. The 10th to 14th bits of the post-conversion address. The 0th to 9th bits and the 15th to 17th bits do not change between the pre-conversion address and the post-conversion address. The 12th to 14th bits of the post-conversion address are the bank group number to be accessed, and the 10th to 11th bits are the bank number within the bank group. 27 to 30 show the address assignment to the memory module, bank group, and bank hierarchy when the main storage device 30 is constructed using the memory module 62A. FIG. 31 shows the relationship between the stride in this address allocation and the distribution status of the store request for the memory module, bank group, and bank.
[0042]
For comparison, FIG. 32 to FIG. 35 show the address assignment to each layer of the memory module, bank group, and bank when the memory module 52A does not perform address conversion by the address mapping circuit 62B. In addition, FIG. 36 shows the relationship between the stride and the distribution status of the store request corresponding to the memory module, bank group, and bank in that case. As can be seen from the comparison between FIG. 31 and FIG. 36, if the stride is a multiple of 2 by the address conversion of the address mapping circuit 62B, the number of bank groups and banks to which store requests are sent for the same stride increases. High main memory processing performance can be obtained.
[0043]
As can be seen from the comparison between FIG. 23 and FIG. 31, the store request for the same stride can be obtained by replacing the memory module constituting the main storage device 32 from the memory module 52A in FIG. 8 to the memory module 52B in FIG. Increase in the number of bank groups and banks to which the main storage device 32 having higher processing performance is realized.
[0044]
The memory module 52A in FIG. 8 and the memory module 52B in FIG. 9 may include a buffer or the like that temporarily holds a store request. As a result, even if the transfer destination of the store request is temporarily concentrated on a specific bank group 82 or a specific bank 72, the network control is performed as long as the store request holding capability in the memory module 52A or the memory module 52B is not exceeded. It is not necessary to suppress store requests from the device 92. Further, the address conversion performed by the address mapping circuit 62A or the address mapping circuit 62B may be different from that shown in FIG. For example, a modulo 8 3-bit adder may be used in place of the 3-bit exclusive OR circuit 102 of FIG. In place of the 5-bit exclusive OR circuit 103 shown in FIG. 26, a modulo 32 5-bit adder may be used.
[0045]
A feature of the present invention is that address mapping circuits for realizing main memory skew are provided in two locations in the interconnection network and the memory module, and address conversion is performed in two stages. As a result, even when the configuration of the bank group and the bank is changed due to the replacement of the memory module, the main storage skew suitable for the new configuration can be realized, and high main storage performance can be obtained.
[0046]
Here, for the sake of comparison, consider replacing a memory module in a conventional computer system. In the conventional computer system 3 shown in FIG. 37, the memory module 53 does not have an address mapping circuit, and the address conversion performed by the address mapping circuit 43 in the network control device 93 is different from the address conversion performed by the address mapping circuit 42. The configuration is the same as that of the computer system 2 according to the embodiment of the present invention shown in FIG. As the memory module 53 used in the computer system 3 of FIG. 37, it is assumed that the memory module 53A shown in FIG. 38 is used first, and is later replaced with the memory module 53B shown in FIG. 39 in order to improve performance. The memory module 53A shown in FIG. 38 is the same as the memory module 52A shown in FIG. 8 except that it has no address mapping circuit and does not perform address conversion. 39 is the same as the memory module 52B of FIG. 9 except that it has no address mapping circuit and does not perform address conversion.
[0047]
FIG. 40 shows a method of address conversion performed by the address mapping circuit 43. In the address mapping circuit 43, when the memory module 53A is used in the computer system 3, address conversion is performed so that the same address allocation as that in the case where the memory module 52A is used in the computer system 2, that is, the address allocation of FIG. The address mapping circuit 43 includes a modulo 4 2-bit adder 104 and an exclusive OR circuit 105. The adder 104 adds the thirteenth to fourteenth bits and the fifteenth to sixteenth bits of the pre-conversion address by 4 modulo and sets the result as the fifteenth to sixteenth bits of the post-conversion address. The exclusive OR circuit 105 inputs the 0th to 2nd bits, the 3rd to 5th bits, the 6th to 8th bits, the 9th to 11th bits, and the 12th to 14th bits of the pre-conversion address, An exclusive OR is taken every time and the result is taken as the 12th to 14th bits of the converted address. The 0th to 11th bits and the 17th to 19th bits do not change between the pre-conversion address and the post-conversion address. The address allocation when the memory module 53A is used in the computer system 3 is the same as the address allocation when the memory module 52A is used in the computer system 2 (FIG. 22). Also, the relationship between the stride and the storage request memory module, bank group, and distribution status to the bank is the same as that in the case where the memory module 53A is used in the computer system 2 (FIG. 23).
[0048]
Here, the case where the memory module 53B is used as the memory module 53 of the computer system 3 is considered. The address assignment in this case is shown in FIGS. FIG. 45 shows the relationship between the stride in this address allocation and the distribution status of the store request corresponding to the memory module, bank group, and bank. As can be seen from the comparison between FIG. 31 and FIG. 45, the computer system 2 according to the present invention has a number of banks to which store requests are sent for the same stride when the stride is a multiple of 64 as compared with the conventional computer system 3. Increase, and higher main memory processing performance can be obtained.
[0049]
【The invention's effect】
As described above, in the computer system according to the present invention, even when the bank configuration is changed due to the replacement of the memory module, the main storage skew is suitable for the new bank configuration, thereby improving the performance of the main storage device. be able to.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a main configuration of a computer system according to the present invention.
FIG. 2 is a diagram showing address allocation in an interleave method in which main memory skew is not performed.
FIG. 3 is a diagram illustrating a relationship between a stride and a distribution state of access requests to the bank when the address assignment of FIG. 2 is performed;
FIG. 4 is a block diagram showing the overall configuration of a conventional computer system that performs main memory skew.
FIG. 5 is a diagram showing address allocation when main memory skew is performed in the computer system of FIG. 4;
6 is a diagram showing a relationship between a stride and a distribution state of access requests to banks in response to the address assignment of FIG. 5;
FIG. 7 is a block diagram showing the overall configuration of a computer system according to an embodiment of the present invention.
8 is a block diagram showing a configuration of an example of a memory module used in the computer system of FIG. The memory module is composed of 2 bank groups and each bank group is composed of 4 banks.
9 is a block diagram showing a configuration of an example of a memory module used in the computer system of FIG. The memory module is composed of 8 bank groups and each bank group is composed of 4 banks.
10 is a view showing an address field associated with an access request issued by the processor of the computer system of FIG. 7;
11 is a diagram showing an address field associated with a store request sent from the processor to the network control device when the main storage device of the computer system of FIG. 7 is constructed with the memory module of FIG. 8;
12 is a diagram showing an address field associated with a store request sent from the processor to the network control device when the main storage device of the computer system of FIG. 7 is constructed by the memory module of FIG. 9;
13 is a diagram showing a method of address conversion performed by the address mapping circuit 42 of the computer system of FIG. 7;
14 is a diagram showing addresses assigned to each memory module when the address conversion of FIG. 13 is performed.
15 is a diagram showing a relationship between a stride and a distribution status of store requests to memory modules in response to the address assignment shown in FIG. 14;
FIG. 16 is a diagram showing the relationship between stride and the processing performance of the network control apparatus for the address assignment in FIG.
17 is a diagram showing addresses assigned to each memory module when the address conversion in FIG. 13 is not performed.
18 is a diagram showing the relationship between the stride and the distribution status of store requests to the memory modules when the address allocation of FIG. 17 is performed.
19 is a diagram showing the relationship between stride and the processing performance of the network control apparatus for the address assignment in FIG.
20 is a diagram showing a method for reducing an address width when transferring a store request from the network control apparatus to the memory module in the computer system of FIG. 7;
FIG. 21 is a diagram illustrating a method of address conversion performed by the address mapping circuit 62A of the memory module in FIG. 8;
22 is a diagram showing address allocation when the memory module 52A of FIG. 8 is used in the computer system of FIG. 7;
FIG. 23 is a diagram showing a relationship between a stride and a distribution status of store requests corresponding to the memory modules, bank groups, and banks when the address allocation of FIG. 22 is performed;
24 is a diagram showing address assignment when the address conversion in FIG. 21 is not performed.
FIG. 25 is a diagram showing a relationship between a stride and a distribution status of store requests corresponding to the memory modules, bank groups, and banks when the address allocation of FIG. 24 is performed;
26 is a diagram showing a method of address conversion performed by the address mapping circuit 62B of the memory module in FIG. 9;
27 is a diagram showing a part of address assignment when the memory module 52B of FIG. 9 is used in the computer system of FIG. 7;
28 is a diagram showing a part of address assignment when the memory module 52B of FIG. 9 is used in the computer system of FIG. 7;
29 is a view showing a part of address assignment when the memory module 52B of FIG. 9 is used in the computer system of FIG. 7;
30 is a diagram showing a part of address assignment when the memory module 52B of FIG. 9 is used in the computer system of FIG. 7;
FIG. 31 is a diagram showing a relationship between a stride and a distribution status of store requests corresponding to the memory modules, bank groups, and banks when the addresses shown in FIGS. 27 to 30 are assigned;
FIG. 32 is a diagram showing a part of address assignment when the address conversion in FIG. 26 is not performed;
FIG. 33 is a diagram showing a part of address assignment when the address conversion in FIG. 26 is not performed;
34 is a diagram showing a part of address assignment when the address conversion in FIG. 26 is not performed.
FIG. 35 is a diagram showing a part of address assignment when the address conversion in FIG. 26 is not performed;
FIG. 36 is a diagram showing a relationship between a stride and a distribution status of store requests corresponding to the memory modules, bank groups, and banks when the address allocation of FIGS. 32 to 35 is performed.
FIG. 37 is a block diagram showing the overall configuration of an example of a conventional computer system.
38 is a block diagram showing a configuration of an example of a memory module used in the computer system of FIG. The memory module is composed of 2 bank groups and each bank group is composed of 4 banks.
39 is a block diagram showing a configuration of an example of a memory module used in the computer system of FIG. The memory module is composed of 8 bank groups and each bank group is composed of 4 banks.
40 is a diagram showing an address conversion method performed by the address mapping circuit 43 of the computer system of FIG. 37;
41 is a view showing a part of address assignment when the memory module 53B of FIG. 39 is used in the computer system of FIG. 37;
42 is a diagram showing a part of address assignment when the memory module 53B of FIG. 39 is used in the computer system of FIG. 37;
43 is a diagram showing a part of address assignment when the memory module 53B of FIG. 39 is used in the computer system of FIG.
44 is a diagram showing a part of address assignment when the memory module 53B of FIG. 39 is used in the computer system of FIG. 37;
45 is a diagram showing the relationship between the stride and the distribution status of store requests corresponding to the memory modules, bank groups, and banks when the addresses shown in FIGS. 41 to 44 are assigned. FIG.
[Explanation of symbols]
10 processor
20 Interconnection network
30 Main memory
40 Address mapping circuit
50 memory modules
60 Address mapping circuit.

Claims

Comprising a one or more processors, a main memory device including a plurality of memory modules, and interconnection network to couple to each other between said plurality of memory modules of the main memory and the processor, wherein the plurality Each of the memory modules in a computer system having a plurality of memory banks ,
The interconnection network includes
A predetermined conversion is performed on a specific bit of a main memory address designated by the processor to identify a memory module corresponding to the main memory address designated by the processor, and an address allocation by an interleave method is performed on the plurality of memory modules by the conversion. A first address mapping circuit to be realized ;
Means for transferring a main memory address specified by the processor excluding the specific bit to the memory module specified by the first address mapping circuit ;
Each of the memory modules has a second address mapping circuit that converts the main memory address transferred via the coupling network and realizes address allocation by interleaving in the memory bank in each memory module by the conversion. And
A computer system that performs address conversion in two stages of the first and second address mapping circuits.

2. The computer system according to claim 1, wherein the address conversion performed by the second address mapping circuit is performed by a method suitable for a bank configuration of the memory module.