JP3954248B2

JP3954248B2 - Testing method for information processing apparatus

Info

Publication number: JP3954248B2
Application number: JP22819599A
Authority: JP
Inventors: 文男市川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-08-12
Filing date: 1999-08-12
Publication date: 2007-08-08
Anticipated expiration: 2019-08-12
Also published as: JP2001051965A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリを備えたマルチＣＰＵシステムを対象としたランダム命令試験で、キャッシュコヒーレンシィ機能の検証を可能にした情報処理装置の試験方法に関する。
【０００２】
【従来の技術】
以下、従来例を説明する。
【０００３】
§１：キャッシュコヒーレンシィの説明
近年、コンピュータの高速化の一手段として、主記憶メモリ（主記憶装置）とは別に、ＣＰＵにキャッシュメモリと呼ばれる数キロバイトから数メガバイトにも及ぶ高速メモリを配置し処理のスピードアップを図っている。この場合、主記憶メモリからキャッシュに読み込まれたデータは、キャッシュメモリ上で動作している間、常に高速でデータ処理が行える。
【０００４】
この制御は、ＣＰＵ１台の時は自ＣＰＵ内部のキャッシュメモリと主記憶メモリとの一致性（Cache Coherency ：キャッシュコヒーレンシィ）を考慮するだけで済むが、複数のＣＰＵが搭載された場合は、ＣＰＵ相互のキャッシュメモリをも考慮する必要があり、複雑なハード制御となる。この制御を、図に基づいて簡単に説明すると次のようになる。
【０００５】
§２：システム構成と処理概要の説明・・・図１５参照
図１５は従来例のシステム構成図である。このシステムは、マルチＣＰＵシステムの１例であり、複数のＣＰＵ（ＣＰＵ−０、ＣＰＵ−１・・・）と主記憶メモリ１（全ＣＰＵで共有する１個のメモリ）と、システムコントローラ２等がバスで接続されている。そして、各ＣＰＵには、それぞれキャッシュメモリ（以下、単に「キャッシュ」とも記す）を持っている。
【０００６】
この場合、各ＣＰＵ内部には一次キャッシュ（以下「Ｌ１＄」と記す）を設け、各ＣＰＵのチップの外側には二次キャッシュ（以下「Ｌ２＄」と記す）を設けている。前記Ｌ１＄は小容量のキャッシュであり、Ｌ２＄はＬ１＄よりも大容量のキャッシュである。
【０００７】
また、前記バスにはシステムコントローラ２が接続され、該システムコントローラ２に接続された別のバスには入出力制御部（以下「Ｉ／Ｏ制御部」と記す）５を介してＲＯＭ６等が接続されている。前記システムコントローラ２は、システム内のキャッシュメモリの管理等を含む各種制御を行うものであり、該システムコントローラ２内に、キャッシュ情報テーブル４を有するキャッシュ制御部３が設けてある。
【０００８】
前記キャッシュ制御部３は、前記各ＣＰＵが持つＬ１＄、Ｌ２＄の情報をキャッシュ情報テーブル４に格納し、これらＬ１＄、Ｌ２＄のキャッシュ制御を行う機能を備えている。例えば、ＣＰＵが主記憶メモリ１のデータを取り込む場合、先ず、内部のＬ１＄を見にゆく。そして、Ｌ１＄に該当するデータがなければ、他のＣＰＵへ該当するデータを取りに行くが、この時、前記ＣＰＵはシステムコントローラ２へキャッシュ情報を貰いに行き、その情報に基づいて他のＣＰＵのデータ読み込む。
【０００９】
従って、システムコントローラ２は、各ＣＰＵのＬ１＄、Ｌ２＄の情報を、キャッシュ情報テーブル４に格納し、常に更新しながら管理しておき、各ＣＰＵに対してこの情報によりキャッシュ制御を行うようになっている。
【００１０】
前記システムにおいて、各ＣＰＵがロード命令（Load）、及びストア命令（Store ）を実行する場合の処理概要は次の通りである。例えば、前記ロード命令は、主記憶メモリ１のデータをＣＰＵ内のレジスタに取り込む処理であるが、この場合、最初のロード命令であれば、データは主記憶メモリ１→Ｌ２＄→Ｌ１＄→ＣＰＵ内部のレジスタの順に転送される。
【００１１】
また、ストア命令は、ＣＰＵ内のレジスタに格納されているデータを主記憶メモリ１に書き込む処理であるが、この命令を実行した場合の処理は次の通りである。先ず、ＣＰＵ内のレジスタに格納されているストアすべきデータに対応するデータがＬ１＄に有るかどうかを判断し、該データが有れば、Ｌ１＄の対応するデータを新データ（前記ストア対象のデータ）で書き換えて処理を終了する。この時、主記憶メモリ１のデータは書き換えないため古いデータのままである。
【００１２】
しかし、Ｌ１＄に前記ストア対象のデータに対応するデータが無ければ、ストアすべきデータに対応するデータがＬ２＄に有るかどうかを判断し、該データが有れば、Ｌ２＄の対応するデータを新データ（前記ストア対象のデータ）で書き換えて処理を終了する。この時、主記憶メモリ１のデータは書き換えないため古いデータのままである。
【００１３】
更に、前記ストアすべきデータに対応するデータが、Ｌ１＄にもＬ２＄にも無ければ、前記対応するデータを主記憶メモリ１から読み出してＬ２＄に書き込み、Ｌ２＄の対応するデータを新データ（前記ストア対象のデータ）で書き換えて処理を終了する。この時、主記憶メモリ１のデータは書き換えないため古いデータのままである。
【００１４】
なお、主記憶メモリ１にデータを書き込むのは、Ｌ２＄を追い出されたデータである。そして、前記の各処理はシステムコントローラ２が常に監視していて、そのキャッシュメモリの情報は、キャッシュ制御部３がキャッシュ情報テーブル４に格納している。また、Ｌ１＄に有るデータは、必ずＬ２＄にも有る。従って、各ＣＰＵのアクセス命令実行時には、システムコントローラ２からの前記キャッシュ情報テーブル４の情報を基に処理を行う。
【００１５】
なお、前記キャッシュの制御に関しては、「ＭＯＥＳＩ」の理論として多くの文献に記載されており、良く知られている（周知の理論）ので、詳細な説明は省略する。前記「ＭＯＥＳＩ」に関する参考文献の１例としては、例えば、次の参考文献が知られている。
【００１６】
参考文献：「ultraSPARC^TM-IUser′s Manual」，Revision 1.0,Sept 18,1995,「 SPARC Technology Business 」 A Sun Microsystems,Inc.Business 2550 Garcia Avenue,Mountain View,CA 94043 U.S.A,Part No;STP1030-UG.(P91-97参照）．
§３：システム内の処理の説明
前記システムにおいて、例えば、ＣＰＵ−０が主記憶メモリ１の１００番地の内容をロード（Load）すると、ＣＰＵ−０のキャッシュ（Ｌ１＄）には主記憶メモリ１の１００番地から１ブロック（例えば、６４バイト）分のデータが読み込まれる。次に、主記憶メモリ１の１００番地に新しいデータを格納（Store ）すると、ＣＰＵ−０のキャッシュ（Ｌ１＄、又はＬ２＄）は新しいデータに置き変わるが、主記憶メモリ１は古いデータを保持している。
【００１７】
この状態でＣＰＵ−１が主記憶メモリ１の１００番地の内容をロードした場合、その時のデータは主記憶メモリ１から読み出されずにＣＰＵ−０のキャッシュ（Ｌ１＄、又はＬ２＄）から読み出される。これは最新の１００番地のデータをＣＰＵ−０のキャッシュ（Ｌ１＄、又はＬ２＄）が保持しているからである。このような機構を持つシステムにおけるキャッシュの一致性を検証する手法について次に述べる。
【００１８】
§４：キャッシュの一致性を検証する手法の説明・・・図１６参照
図１６は、従来のランダム命令試験例である。一般的に、或る１つのタスクを複数同時に動作させる場合、スタック域、データアクセス域等のデータを書き換える領域をタスク数分設け、各タスクはそれぞれ自分に割り当てられた空間を使用することにより実現させている。この手法は、１つのタスクを複数のＣＰＵから共有して動作させる場合に当てはまり、一般的にはマルチＣＰＵのランダム命令試験もこの手法を用いている場合が多い。
【００１９】
従来技術では、１台のＣＰＵ配下で動作しているランダム命令テスト（この中にはメモリをアクセスする命令も含まれている）を単に、複数のＣＰＵ配下で同時に動作させているだけである。つまり、ランダム命令が同時に複数ＣＰＵから実行されれば、そのランダム命令列中に含まれるアクセス命令も複数ＣＰＵからランダムに発信され、結果的にキャッシュ・主記憶メモリ間のテストができるという考えである。
【００２０】
例えば、図１６に示した例では、ランダム命令列が異なり、メモリアクセス域が異なる第１のテスト例（図１６のＡ図参照）と、ランダム命令列が同じで、メモリアクセス域が異なる第２のテスト例（図１６のＢ図参照）を示している。この場合、前記テスト例１では、ＣＰＵ−０とＣＰＵ−１が、異なるメモリ域を使用し、異なる命令列を実行することでテストを行う。また、前記テスト例２では、ＣＰＵ−０とＣＰＵ−１が、同じメモリ域を使用し、異なる命令列を実行することで試験を行う。
【００２１】
ここで問題となるのが、各ＣＰＵからのアクセス域が別空間に存在するため、或るＣＰＵのメモリアクセス動作が他のＣＰＵのキャッシュに対して作用することがない。すなわち、複数ＣＰＵに跨がってキャッシュコヒーレンシィテストができていないのが現状である。このテストを実現するためには、同一キャッシュラインに対してのアクセスが必要になるが、このような手法は提供されていないのが現状である。
【００２２】
【発明が解決しようとする課題】
前記のような従来のものにおいては、次のような課題があった。
【００２３】
前記のように、同一キャッシュライン（例えば、６４バイト）に対してのアクセスを可能にするためには、各ＣＰＵが同一ブロックアドレス（６４バイトバウンダリから６４バイト）をアクセスする必要がある。ここで、各ＣＰＵがアクセスする空間（メモリ領域）を同じにすれば良いと考えるかもしれないが、各ＣＰＵは非同期に動作しているため、各ＣＰＵが同一アドレスを使うとなると、タイミングによってメモリに書き込んだ内容を読み出した時に異なるデータを読み出すことがある。これは、そのアドレスの内容を読み出す直前に、別のＣＰＵがそのアドレスに別の内容を書き込む可能性があるからである。
【００２４】
本発明は、このような従来の課題を解決し、マルチＣＰＵを対象としたランダム命令試験で、各ＣＰＵが同一ブロックをアクセスでき、かつ別のＣＰＵからデータが破壊されない手法を用いてキャッシュコヒーレンシィ試験を実現できるようにすることを目的とする。
【００２５】
【課題を解決するための手段】
図１は本発明の原理説明図であり、１は主記憶メモリ、２はシステムコントローラ、３はキャッシュ制御部、４はキャッシュ制御情報テーブル、６はプログラムを格納したＲＯＭ、９はキャッシュメモリを示す。本発明は前記の目的を達成するため、次のように構成した。
【００２７】
(1) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリを備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内でランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、同一論理空間を持ち、命令により切り換え可能な異なるコンテキストを複数用意し、該コンテキスト毎に、それぞれ論理アドレスと実メモリの物理アドレスとの対応情報を持ち、前記ランダム命令列を実行させる過程で、特定命令により前記コンテキストを切り換えた際、同一論理アドレスをアクセスしても、実メモリの異なる物理アドレスを指すようにしてキャッシュリプレースを行わせ、キャッシュリプレース回数をより多く生成させるようにした。
【００２８】
(2) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリ９を備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内でランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、各ＣＰＵが実行する前記ランダム命令列内に試験対象キャッシュアドレスの選択を行う特定命令を設けておき、この特定命令の実行により、各ＣＰＵとも、同一の試験対象キャッシュアドレスを選択し、このアドレスに、各ＣＰＵに割り振ったシフトバイトを加算した値を求め、この値により、各ＣＰＵから見て、同一キャッシュアドレスで、かつ、アクセスが重複しないようにした。
【００２９】
(3) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリ９を備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内でランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、生成したランダム命令列を別の空間に複写する処理と、両方のランダム命令空間に対して、論理アドレスが等しく、物理アドレスが異なるアドレス変換テーブルを作成し、該テーブルに異なるコンテキスト値を定義する処理と、初期値として、コンテキストに値を設定し、或る論理アドレスから試験を開始させる処理と、或る割り込みを契機に、命令の論理アドレスを前記アドレス変換テーブルから選択し、この値にランダム命令内アドレスを命令カウンタに設定する処理と、或る割り込みを契機に、コンテキストの値を切り換えることにより、論理アドレスの命令カウンタは変わらずに、物理アドレスのみ変更できるようにする処理とを有し、前記処理により、ランダム命令列を頭から順次実行し、かつ実行空間のみ変化させることで、命令キャッシュのキャッシュコヒーレンシィ試験を可能にした。
【００３１】
（作用）
前記構成に基づく本発明の作用を、図１に基づいて説明する。
【００３３】
このようにすれば、全てのＣＰＵから同一命令を実行した時に、アクセス命令のアクセスアドレスは、ＣＰＵ毎に必ず決められたアドレス範囲を指すことができる。すなわち、全ＣＰＵに同一アクセス空間を割り当てた時の各ＣＰＵからの重複アクセスを回避することができる。
【００３４】
従って、マルチＣＰＵを対象としたランダム命令試験で、各ＣＰＵが同一ブロックをアクセスでき、かつ別のＣＰＵからデータが破壊されないようにしてキャッシュコヒーレンシィ試験を実現できる。
【００３５】
(a) ：前記(1) では、前記情報処理装置の試験を行う際、同一論理空間を持ち、命令により切り換え可能な異なるコンテキストを複数用意する。そして、コンテキスト毎に、それぞれ論理アドレスと実メモリの物理アドレスとの対応情報を持ち、ランダム命令列を実行させる過程で、特定命令によりコンテキストを切り換えた際、同一論理アドレスをアクセスしても、実メモリの異なる物理アドレスを指すようにしてキャッシュリプレース（例えば、Ｌ１＄からＬ２＄へのキャッシュデータの追い出し）を行わせる。このようにすれば、キャッシュリプレース回数をより多く生成することができる。
【００３６】
(b) ：前記(2) では、前記情報処理装置の試験を行う際、各ＣＰＵが実行するランダム命令列内に試験対象キャッシュアドレスの選択を行う特定命令（例えば、Select Line ( )）を設けておき、この特定命令の実行により、各ＣＰＵとも、同一の試験対象キャッシュアドレスを選択し、このアドレスに、各ＣＰＵに割り振ったシフトバイトを加算した値を求め、この値により、各ＣＰＵから見て、同一キャッシュアドレスで、かつ、アクセスが重複しないようにする。
【００３７】
すなわち、複数ＣＰＵから試験対象となる同一メモリブロック内でアクセスが重複しないメモリアドレスを導出することができる。このようにすれば、ランダム命令列に意図したキャッシュコヒーレンシィ論理を組み込む手法を実現でき、試験時の全組み合わせの実現時間を短縮させることが可能になる。
【００３８】
(c) ：前記(3) では、前記情報処理装置の試験を行う際、生成したランダム命令列を別の空間に複写する処理と、両方のランダム命令空間に対して、論理アドレスが等しく、物理アドレスが異なるアドレス変換テーブルを作成し、該テーブルに異なるコンテキスト値を定義する処理と、初期値として、コンテキストに値を設定し、或る論理アドレスから試験を開始させる処理と、或る割り込みを契機に、命令の論理アドレスを前記アドレス変換テーブルから選択し、この値にランダム命令内アドレスを命令カウンタに設定する処理と、或る割り込みを契機に、コンテキストの値を切り換えることにより、論理アドレスの命令カウンタは変わらずに、物理アドレスのみ変更できるようにする処理とを行う。
【００３９】
そして、前記処理により、ランダム命令列を頭から順次実行し、かつ実行空間のみ変化させることで、命令キャッシュのキャッシュコヒーレンシィ試験を可能にする。
【００４３】
【発明の実施の形態】
以下、発明の実施の形態を図面に基づいて詳細に説明する。
【００４４】
§１：システムの説明・・・図２参照
図２はシステム構成図である。このシステムは、マルチＣＰＵシステム（マルチＣＰＵ構成の情報処理装置）の１例であり、バス上に複数（ｎ個）のＣＰＵ（ＣＰＵ−０、ＣＰＵ−１・・・ＣＰＵ−ｎ）と主記憶メモリ１（全ＣＰＵで共有する１個のメモリ）と、システムコントローラ２等が搭載されている。そして、各ＣＰＵは、それぞれキャッシュメモリ（以下、単に「キャッシュ」とも記す）を持っている。
【００４５】
この場合、各ＣＰＵ内部（ＣＰＵチップの内部）にはＬ１＄（一次キャッシュ）を設け、各ＣＰＵの外側（ＣＰＵチップとは別のチップ）にはＬ２＄（二次キャッシュ）を設け、これらをバスにより接続している。なお、この例の場合、一次キャッシュと二次キャッシュを設けた例であるが、一般的には更に多くのキャッシュ（三次キャッシュ、四次キャッシュ等、一般的にはｎ次キャッシュまで）設けることもできる。
【００４６】
前記Ｌ１＄は小容量のキャッシュであり、例えば、３２ＫＢ（キロバイト）のキャッシュメモリ（それぞれ、Ｗａｙ−０、Ｗａｙ−１、Ｗａｙ−２、Ｗａｙ−３とする）を４個で構成する（４つのキャッシュ領域で構成する）。また、前記Ｌ２＄は１ＭＢ、又は２ＭＢ、又は４ＭＢ等の容量を持つ大容量（Ｌ１＄に比べて大容量）のキャッシュである。この場合、図面上では、Ｌ１＄のＷａｙ−０、Ｗａｙ−１、Ｗａｙ−２、Ｗａｙ−３の各左端がアドレス＝０で、右端がアドレス＝３２Ｋとする。そして、図２の斜線部分は、６４バイト幅のキャッシュラインである。
【００４７】
前記バスには、システムコントローラ２が接続され、該システムコントローラ２に接続された別のバスには、入出力制御部（以下「Ｉ／Ｏ制御部」と記す）５を介してＲＯＭ６等が接続されている。前記システムコントローラ２は、システム内のキャッシュ管理等を含む各種制御を行うものであり、該システムコントローラ２内に、キャッシュ情報テーブル４を備えたキャッシュ制御部３が設けてある。
【００４８】
前記キャッシュ制御部３は、各ＣＰＵが持つＬ１＄、Ｌ２＄の情報（キャッシュ情報）をキャッシュ情報テーブル４に格納し、これらＬ１＄、Ｌ２＄のキャッシュ管理（又はキャッシュ制御）を行う機能を備えている。例えば、ＣＰＵが主記憶メモリ１のデータを取り込む（ロードする）場合、先ず、内部のＬ１＄を見にゆく。
【００４９】
そして、Ｌ１＄、Ｌ２＄に該当するデータがなければ、他のＣＰＵへ該当するデータを取りに行くが、この時、前記ＣＰＵはシステムコントローラ２へキャッシュ情報を貰いに行き、その情報に基づいて他のＣＰＵからキャッシュデータを読み込む。
【００５０】
従って、システムコントローラ２は、各ＣＰＵのＬ２＄の情報を、キャッシュ情報テーブル４に格納し、前記情報を常に更新しながら管理しておき、各ＣＰＵに対してこの情報によりキャッシュ制御を行うようになっている。また、ＲＯＭ６には、以下に説明するキャッシュ試験用のプログラムが格納されている。
【００５１】
そして、以下に説明するキャッシュコヒーレンシィ試験を行う場合には、ＲＯＭ６に格納されているプログラムを、ＩＰＬでＣＰＵ（例えば、ＣＰＵ−０）が主記憶メモリ１へロードした後、該ＣＰＵが主記憶メモリ１のプログラムを取り込んで実行することで、このシステム内での試験を行う。
【００５２】
前記システムにおいて、各ＣＰＵがロード命令（Load）、及びストア命令（Store ）を実行する場合の処理概要は次の通りである。例えば、前記ロード命令は、主記憶メモリ１のデータをＣＰＵ内のレジスタに取り込む処理であるが、この場合、最初の（Ｌ１＄、Ｌ２＄が全て空の状態での）ロード命令であれば、データは、主記憶メモリ１→Ｌ２＄→Ｌ１＄→ＣＰＵ内部のレジスタの順に転送される。
【００５３】
この場合、例えば、主記憶メモリ１の１００番地のデータをロードするのであれば、主記憶メモリ１の１００番地から６４バイト分のデータを読み出し、このデータをＬ２＄に格納し、その後、該Ｌ２＄からＬ１＄へデータを転送し、Ｌ１＄のＷａｙ−３に格納する。
【００５４】
このように動作を繰り返すことで、順次データのロードを行うと、Ｌ１＄には、順次Ｗａｙ−３→Ｗａｙ−２→Ｗａｙ−１→Ｗａｙ−０にそれぞれ６４バイト分のデータが格納され、図２に示したキャッシュラインに、主記憶メモリ１から読み出したデータが順次格納されるようになる。
【００５５】
また、ストア命令は、ＣＰＵ内のレジスタに格納されているデータを主記憶メモリ１に書き込む処理であるが、この命令を実行した場合の処理は次の通りである。先ず、ＣＰＵ内のレジスタに格納されているストアすべきデータに対応するデータがＬ１＄に有るかどうかを判断し、該データが有れば、Ｌ１＄の対応するデータを新データ（前記ストア対象のデータ）で書き換えて処理を終了する。この時、主記憶メモリ１のデータは書き換えないため古いデータのままである。
【００５６】
しかし、Ｌ１＄に前記ストア対象のデータに対応するデータが無ければ、ストアすべきデータに対応するデータがＬ２＄に有るかどうかを判断し、該データが有れば、Ｌ２＄の対応するデータを新データ（前記ストア対象のデータ）で書き換えて処理を終了する。この時、主記憶メモリ１のデータは書き換えないため古いデータのままである。
【００５７】
更に、前記ストアすべきデータに対応するデータが、Ｌ１＄にもＬ２＄にも無ければ、前記対応するデータを主記憶メモリ１から読み出してＬ２＄に書き込み、Ｌ２＄の対応するデータを新データ（前記ストア対象のデータ）で書き換えて処理を終了する。この時、主記憶メモリ１のデータは書き換えないため古いデータのままである。以下、キャッシュコヒーレンシィ試験の具体例を詳細に説明する。
【００５８】
§２：例１の説明・・・図３、図４参照
図３は例１の説明図（その１）であり、Ａ図はオペランドアドレス生成用ベースレジスタの定義、Ｂ図はオペランドアドレス生成用インデックスレジスタの定義、Ｃ図はオペランドアドレス生成用Displacement値の定義を示す。また、図４は例１の説明図（その２）である。
【００５９】
例１は、全ＣＰＵに同一アクセス空間を割り当てた時の各ＣＰＵからの重複アクセスを防ぐ手法を用いた試験方法である。なお、各ＣＰＵには、それぞ各種のレジスタ等が設けてあるが、以下の説明では、各ＣＰＵ内に設けたインデックスレジスタを％Ｇ１、ベースレジスタを％Ｇ２と記す（％Ｇ１、％Ｇ２はいずれもレジスタの番号）。
【００６０】
主記憶メモリ１に対するアクセス命令のオペランドアドレスの生成は、▲１▼：ベースレジスタ（％Ｇ２）のアドレス＋インデックスレジスタ（％Ｇ１）のアドレス（％Ｇ２＋％Ｇ１）、又は、▲２▼：ベースレジスタ（％Ｇ２）のアドレス＋ディスプレースメント値（％Ｇ２＋ディスプレースメント値）で決まる。従って、％Ｇ１の値、％Ｇ２の値、及びディスプレースメント値を規定することで、各ＣＰＵのアクセスする空間（主記憶メモリ１の領域）を同じにして、かつ領域の重複アクセスを回避させることができる。そこで、各レジスタの定義を次のようにする。
【００６１】
(1) ：オペランドアドレス生成用ベースレジスタの定義
ランダム命令列で生成されるアクセス命令のオペランドアドレスを指すベースレジスタ（％Ｇ２）の値を、各ＣＰＵ毎に固定に定義し、各ＣＰＵはそのレジスタ（％Ｇ２）の値に１６バイトづつずらした値を設定する。なお、ここで定義した１６バイトという値は、キャッシュのアクセス単位が６４バイトの時に最大４ＣＰＵを動作させた場合の値である。つまり、試験対象のＣＰＵ数とキャッシュのアクセス単位によりこの値は変化する。
【００６２】
この例では、ＣＰＵ−０〜ＣＰＵ−３の各ベースレジスタ（％Ｇ２）の値を図３のＡ図のように設定する（図のｘは１６進数のヘキサを意味する記号である）。例えば、ＣＰＵ−０内のベースレジスタ（％Ｇ２）のアドレスをｎ＝「０ｘ００１０００００」に設定し、この値に１６バイトづつずらした値を他のＣＰＵのベースレジスタに設定する。
【００６３】
すなわち、ＣＰＵ−１内のベースレジスタ（％Ｇ２）のアドレス＝ｎ＋１６＝「０ｘ００１０００１０」に設定し、ＣＰＵ−２内のベースレジスタ（％Ｇ２）のアドレス＝ｎ＋３２＝「０ｘ００１０００２０」に設定し、ＣＰＵ−３内のベースレジスタ（％Ｇ２）のアドレス＝ｎ＋４８＝「０ｘ００１０００３０」に設定する。
【００６４】
(2) ：オペランドアドレス生成用インデックスレジスタの定義
ランダム命令列で生成されるアクセス命令のオペランドアドレスのインデックス部を指すインデックスレジスタ（％Ｇ１）の値を固定に定義し、その内容をキャッシュのアクセス境界（６４バイトバウンダリ）に設定する。この例では、図３のＢ図に示したように、インデックスレジスタ（％Ｇ１）に、「０ｂｍ・・・ｍｍｍ００００００」を設定する。この場合、例えば、０ｘ３ｃ０（ｍの単位はビット）とする。
【００６５】
(3) ：オペランドアドレス生成用ディスプレースメント値（直値）の定義
ランダム命令列で生成されるアクセス命令のオペランドアドレス内のディスプレースメント値のビット４、５を０固定とする。この例では、ディスプレースメント値は「０ｂｎｎｎｎｎｎｎｎ００ｍｍｍｍ」に設定する。この場合、例えば、ディスプレースメント値＝０ｘ２ｃ８（ｍ，ｎの単位はビット）とする。また、前記値「０ｂｎｎｎｎｎｎｎｎ００ｍｍｍｍ」において、図示の右端のｍはビット０、その左側のｍはビット１となる。従って、ビット４は右端から５ビット目の０である。
【００６６】
なお、前記ビット４、５は、前記(1) のオペランドアドレス生成用ベースレジスタの定義で説明した条件（前記定義した１６バイトという値は、キャッシュのアクセス単位が６４バイトの時に最大４ＣＰＵを動作させた場合の値である。つまり、試験対象のＣＰＵ数とキャッシュのアクセス単位によりこの値は変化する。）により決定する。
【００６７】
この場のｍの値は、生成された命令の種類により規定される。例えば、１バイト格納命令が生成された場合は、「ｍｍｍｍ」の値（ｍは任意の値）、２バイト格納命令が生成された場合は、「ｍｍｍ０」の値、４バイト格納命令が生成された場合は、「ｍｍ００」の値、８バイト格納命令が生成された場合は、「ｍｍ００」の値とする。
【００６８】
すなわち、そのアドレスからデータをアクセスした時に、１６のバウンダリを越えない値を定義する必要がある。１６バウンダリを越えてアクセスすると、次のＣＰＵがアクセスする領域にアクセスが及んでしまうからである。前記の条件で生成されたアクセス命令のオペランドアドレスは、必ず６４バイトバウンダリでＣＰＵ単位に決められたアドレスを指すことができる。例えば、以下の命令が生成された場合で前記の例の値を当てはめると、図３のＣ図に示したようになる。
【００６９】
▲１▼：ベースレジスタ（％Ｇ２）のアドレス＋インデックスレジスタ（％Ｇ１）のアドレスを適用した場合、次のようになる。例えば、８バイトアクセスの例では、ストア命令をＳＴＸ（Ｘ：例えば、８バイト）、ＣＰＵ内に設けられたＩレジスタ番号５を％Ｉ５、アクセス命令のアクセスアドレス（オペランドアドレス）をＯＰとすると、ＳＴＸ，％Ｉ５，［％Ｇ２＋％Ｇ１］は、次のようになる。なお、前記［％Ｇ２＋％Ｇ１］は主記憶メモリ１のアドレスを示す。
【００７０】
すなわち、ＣＰＵ−０が、％Ｇ２＝０ｘ００１０００００＋％Ｇ１＝０ｘ３ｃ０、ＯＰ＝０ｘ００１００３ｃ０、ＣＰＵ−１が、％Ｇ２＝０ｘ００１０００１０＋％Ｇ１＝０ｘ３ｃ０、ＯＰ＝０ｘ００１００３ｄ０、ＣＰＵ−２が、％Ｇ２＝０ｘ００１０００２０＋％Ｇ１＝０ｘ３ｃ０、ＯＰ＝０ｘ００１００３ｅ０、ＣＰＵ−３が、％Ｇ２＝０ｘ００１０００３０＋％Ｇ１＝０ｘ３ｃ０、ＯＰ＝０ｘ００１００３ｆ０となる。
【００７１】
▲２▼：ベースレジスタ（％Ｇ２）のアドレス＋ディスプレースメント値のアドレスを適用した場合、次のようになる。例えば、８バイトアクセスの例では、ストア命令をＳＴＸ（Ｘ：例えば、８バイト）、Ｉレジスタ番号５を％Ｉ５、オペランドアドレスをＯＰとすると、ＳＴＸ，％Ｉ５，［％Ｇ２＋０ｘ２ｃ８］は、次のようになる。なお、％Ｉ５は、オペランドアドレスを格納するレジスタである。
【００７２】
すなわち、ＣＰＵ−０が、％Ｇ２＝０ｘ００１０００００＋Ｉｍｍ１３＝０ｘ２ｃ８、ＯＰ＝０ｘ００１００２ｃ８、ＣＰＵ−１が、％Ｇ２＝０ｘ００１０００１０＋Ｉｍｍ１３＝０ｘ２ｃ８、ＯＰ＝０ｘ００１００２ｄ８、ＣＰＵ−２が、％Ｇ２＝０ｘ００１０００２０＋Ｉｍｍ１３＝０ｘ２ｃ８、ＯＰ＝０ｘ００１００２ｅ８、ＣＰＵ−３が、％Ｇ２＝０ｘ００１０００３０＋Ｉｍｍ１３＝０ｘ２ｃ８、ＯＰ＝０ｘ００１００２ｆ８となる。なお、前記［％Ｇ２＋０ｘ２ｃ８］は主記憶メモリ１のアドレスを示す。
【００７３】
前記のように、全てのＣＰＵから同一命令を実行した時に、前記ＯＰで表されるアクセス命令のアクセスアドレスは、ＣＰＵ毎に必ず決められた範囲（この場合は、１６バイト）を指すことができる。従って、各ＣＰＵでランダムに発行したアクセス命令（例えば、ストア命令）は、全ＣＰＵに同一アクセス空間（メモリ領域）を割り当てた時に、各ＣＰＵからの重複アクセスを防止できることになる。この状態を図４に示す。
【００７４】
図４では、ＣＰＵ−０〜ＣＰＵ−３は、それぞれ１６バイトずつ離れた領域（アドレス範囲）に割り当てられており、アクセスする場合の重複が防止できている。すなわち、全ＣＰＵに同一アクセス空間（メモリ領域）を割り当てた時に、各ＣＰＵからの重複アクセスを確実に防止することが可能になる。
【００７５】
この場合、主記憶メモリ１へアクセスする際のデータのブロック単位は、例えば６４バイトであるが、一般的には、（ブロックサイズ）／（ＣＰＵ数）＝各ＣＰＵに割り当てられたアドレス範囲（図４では１６バイト）である。
【００７６】
§３：例２の説明・・・図５参照
図５は例２の説明図であり、アクセス用レジスタ番号とメモリアドレスの関係を示している。例２は、ウインドウの切り換えだけでキャッシュコヒーレンシィの動作環境を実現することで、ランダム命令列にキャッシュコヒーレンシィの動作環境をより多く生成させる例である。
【００７７】
例えば、「Ultra-sparc 」等のＲＩＳＣ（Reduced Instruction Set Computer）型コンピュータ（縮小命令セット・コンピュータ）には、汎用レジスタ（％Ｇ０〜％Ｇ７）と各ウインドウ毎にインレジスタ（％Ｉ０〜％Ｉ７）、ローカルレジスタ（％Ｌ０から％Ｌ７）と、アウトレジスタ（％Ｏ０〜％Ｏ７）があり、ウインドウレジスタは、図５に示したように、隣り合うウインドウのイン（％Ｉ）とアウト（％Ｏ）のレジスタが重複している（％Ｉと％Ｏは内容が同じ）。
【００７８】
この場合、ローカルレジスタ（％Ｌ０〜％Ｌ７）はウインドウ（ＣＷ０〜ＣＷ４）を切り換えると異なる内容となるが、隣り合うウインドウのアウトレジスタ（％Ｏ）とインレジスタ（％Ｉ）の内容は同じである。これらのレジスタは、プログラムが或るウインドウを指している時は、その時のウインドウのレジスタと汎用レジスタ（％Ｇ）のみが使用可能となる。
【００７９】
そのため、アクセス用として定義するのは、どのウインドウからでも参照できる汎用レジスタ（％Ｇ）を使用しているケースが殆どである。この場合、多くても７つのアドレスしか指定できない。この発明では、アクセスレジスタとして着目したのは、各ウインドウのローカルレジスタ（％Ｌ０〜％Ｌ７）であり、ウインドウが５つあると仮定すると、４０個のアドレスが指定可能となる（ウインドウの切り換えだけで、極めて多くのレジスタ値が簡単に得られる）。
【００８０】
この４０個のレジスタに、予め、論理空間の異なるアドレスを設定しておくことにより、命令でカレントウインドウを切り換えるだけで、キャッシュコヒーレンシィの動作環境がより多く作れる。この場合、カレントウインドウＣＷ０〜ＣＷ４（ＣＷ：Current Window）は、プログラム上のウインドウ切り換え命令で切り換え可能なウインドウである。そして、％ＬのレジスタをＣＷで切り換えることにより、ベースレジスタとして使用する。すなわち、％ＬをＣＷ０〜ＣＷ４で切り換えることにより、該％Ｌを前記ベースレジスタとして使用する。
【００８１】
また、前記％Ｇは前記インデックスレジスタとして使用する。この場合、％Ｇには、インデックスポインタとして、−６４、０、＋６４の値が格納されているので、これを使用すれば、ＣＷの切り換えだけでキャッシュコヒーレンシィの動作環境を簡単に実現することが可能になる。
【００８２】
このように、各ウインドウ毎に設けられたローカルレジスタ（％Ｌ）に、予め、論理空間の異なるアドレスを設定しておき、このローカルレジスタをウインドウにより切り換えて、前記ランダム命令列で生成されるアクセス命令のオペランドアドレスを指すベースレジスタとして使用し、前記汎用レジスタ（％Ｇ）を前記オペランドアドレスのインデックス部を指すインデックスレジスタとして使用することにより、ウインドウの切り換えだけで、キャッシュコヒーレンシィの動作環境を実現できるようにした。
【００８３】
なお、この場合には、主記憶メモリ１に対するアクセス命令のオペランドアドレスの生成は、▲１▼：ベースレジスタ（％Ｌ）のアドレス＋インデックスレジスタ（％Ｇ）のアドレス（％Ｌ＋％Ｇ）で決まる。
【００８４】
§４：例３の説明・・・図６参照
図６は例３の説明図であり、Ｌ１＄リプレースメント用レジスタの定義（ＣＷ０，１，２，３，４）を示している。図６は図５に示したローカルレジスタ％Ｌを拡大した説明図である。例３は、キャッシュリプレース（キャッシュデータの置き換え）の回数をより多くすることにより、ランダム命令列にキャッシュコヒーレンシィの動作環境をより多く生成させる例である。
【００８５】
ところで、アクセスするアドレスレジスタの数を多く定義しても、実際にアクセスするアドレスがアクセスレジスタ＋（−）０ｘ１００００のように範囲が大きすぎると、キャッシュの該当ライン（キャッシュライン）にアクセスする確率が非常に少なくなり、結果としてキャッシュリプレース（キャッシュデータの追い出し）の機会が少なくなってしまう。
【００８６】
この場合のキャッシュリプレースは、キャッシュデータの置き換え、すなわち、Ｌ１＄からＬ２＄へのデータの追い出しを意味する。例えば、Ｌ１＄に新たなデータを書き込む場合、Ｌ１＄の領域に空きがなければ、Ｌ１＄内の古いデータから追い出してＬ２＄へ書き込み、この時追い出しにより空いたＬ１＄内の領域に新たなデータを書き込む。
【００８７】
そこで、アドレスを決定する条件である、インデックスレジスタ（％Ｌ）と、ディスプレースメント値（イミーディエイト値）を、マイナス６４、０、プラス６４（−６４，０，＋６４）の３箇所に固定し、それ以外の値が生成されないようにする。これにより、アクセス域は、ローカルレジスタ（％Ｌ）の指すアドレスのプラス、マイナス６４バイト以内に限定される。
【００８８】
この場合、ベースレジスタとして使用する％Ｌ０〜％Ｌ７（図５のレジスタ％Ｌを図６に拡大して示してある。）は、それぞれＬ１＄のＷａｙ−０〜Ｗａｙ−３のそれぞれの幅である３２ＫＢに合わせ、３２ＫＢずつ離して設ける。例えば、％Ｌ０と％Ｌ１の間隔は３２ＫＢ、％Ｌ１と％Ｌ２の間隔は３２ＫＢ、％Ｌ２と％Ｌ３の間隔は３２ＫＢとなっている。
【００８９】
そして、各％Ｌ毎に、インデックスレジスタとして使用する汎用レジスタ％Ｇは、３２ＫＢのバウンダリから、−６４バイト、０バイト、＋６４バイトの間隔に設定されている。従って、インデックスレジスタとして使用する汎用レジスタ％Ｇにはインデックス値を入れるが、この場合、％Ｇ１にインデックス値＝−６４を入れ、％Ｇ２にインデックス値＝＋０を入れ、％Ｇ３にインデックス値＝＋６４を入れる。
【００９０】
また、ディスプレースメント値（イミーディエイト値）も、−６４バイト、０バイト、＋６４バイトの値を使用する。このようにして、ベースレジスタ（％Ｌ）の値に、インデックスレジスタ（％Ｇ）の値、或いはディスプレースメント値（イミーディエイト値）を加算して、アクセス命令のオペランドアドレスを生成することにより、アクセス命令を生成する。
【００９１】
従って、各ＣＰＵで無作為に生成された命令列で、４回アクセス命令を実行すれば、確率的に１回は同一キャッシュラインを指すことになる。この場合のキャッシュラインは、図６に示したＬ１＄内のＷａｙ−０〜Ｗａｙ−３に跨がって斜線で図示したラインである。
【００９２】
§５：例４の説明・・・図７参照
図７は例４の説明図であり、試験空間とＭＭＵテーブルとの関係を示している。例４は、キャッシュリプレースの回数をより多く生成することにより、ランダム命令列にキャッシュコヒーレンシィの動作環境をより多く生成させる例である。
【００９３】
キャッシュリプレース回数をより多くする方法として、コンテキストの切り換えがある。なお、例４において、コンテキストとは連続した１つの論理空間の固まりのことを言う（ＵＮＩＸマシンの用語）。また、ＭＭＵはアドレス変換テーブルのことである。
【００９４】
図７に示したように、同一論理空間（論理アドレス＝０〜ｎ）を持ち、異なるコンテキストCtx （この場合、Ctx=ａ、Ctx=ｂ、Ctx=ｃ）をＮ個（この例ではＮ＝３）用意する。この場合、各コンテキスト毎に、論理アドレス０〜ｎまでの論理空間を有し、それぞれ、論理アドレスと物理アドレスとの対応表を持っている。そして、それぞれの論理アドレスと物理アドレスとの対応表には異なるデータを設定しておく。
【００９５】
例えば、図７のCtx=ａ、Ctx=ｂ、Ctx=ｃの論理空間には、論理アドレスと物理アドレスが対応づけられているので、この対応アドレス情報を用いれば、コンテキストの切り換えにより、図示矢印で示すように、論理アドレスが同じでも、異なる物理アドレス（実メモリのアドレス）を指す。
【００９６】
従って、前記環境でランダム命令列を実行させ（最初のコンテキストは、Ctx=ａとする）、或るところでコンテキストを切り換える（例えば、Ctx=ａからCtx=ｂに切り換える）ことにより、コンテキストを変える前後では論理アドレスは同じであるが、コンテキストが異なることから、同一論理アドレスをアクセスしても、キャッシュリプレースは行われる。
【００９７】
前記のように、キャッシュリプレースさせる環境を組み合わせることにより、従来のキャッシュコヒーレンシィの発生回数に比べ、桁違いに多くのキャッシュコヒーレンシィ状態を実現させることが可能になる。従って、短期間でキャッシュコヒーレンシィの検証が可能となり、ハードウェアの品質も保証できるようになる。
【００９８】
§６：例５の説明・・・図８参照
図８は例５の説明図であり、Ａ図は試験命令の説明図、Ｂ図はキャッシュアドレステーブルを示す。例５は、ランダム命令列に意図したキャッシュコヒーレンシィ論理を組み込むことにより、複数ＣＰＵから試験対象となる同一キャッシュラインを導く例である。
【００９９】
ところで、前記例１〜例４までの試験方法では、全ＣＰＵからランダムにアクセス命令を発行することにより、各ＣＰＵ内のキャッシュと主記憶メモリ１間の全状態の組み合わせを実現させることはできるが、この方法では、どれだけ流せば全組み合わせが実現できるか定かでない。そこで、この全状態の組み合わせの実現時間を短縮させるための手法として、次のような例５の手法がある。
【０１００】
試験対象キャッシュアドレスの選択は、セレクトライン命令「Select Line( ) 」で与えられる。この試験では、試験対象となるキャッシュアドレスを選択し、各ＣＰＵに割り振ったシフトバイトを加算した値を求める。この場合、全ＣＰＵが同一物理アドレスを指すようにする必要があるため、次のような方法を用いる。
【０１０１】
▲１▼：各ＣＰＵ毎に、試験キャッシュアドレスのテーブルを作成する。このテーブル内の論理アドレスは、ＣＰＵ間で異なっても良いが、物理アドレスは必ず一致させる必要がある。
【０１０２】
▲２▼：各ＣＰＵから呼び出されたセレクトライン命令（関数）「Select Line( ) 」は、呼び出し元のリンクアドレス（Ａ）の下位の数ビットをインデックスとし、キャッシュアドレステーブル（Ｂ）から対象アドレス（論理アドレス）を求める。この場合、各ＣＰＵは同一アドレス（Ａ）からセレクトライン命令（関数）「Select Line( ) 」を呼ぶため、各ＣＰＵが選択したアドレスは全て同一物理アドレスを指す。
【０１０３】
▲３▼：選ばれた論理アドレスから１ブロック（６４バイト）を、各ＣＰＵで分割するため、自ＣＰＵ番号に１６を掛けた値を足し込む。このようにして選択された試験アドレスは、各ＣＰＵから見て同一メモリブロック（６４バイト）内で、かつ、アクセスが重複しない値となる。
【０１０４】
この場合、前記セレクトライン命令（関数）は、Select Line( ) ＝｛Ｃ＝^*（Ｂ＋（Ａ＆０ｘｆ８）；Ｄ＝Ｃ＋（ＣＰＵ番号×１６）；Ｒｅｔｕｒｎ（Ｄ）｝で表される。この場合、Ａはリンクアドレス、Ｂは論理アドレス、Ｃは物理アドレス、Ｄは１６バイトずらしたアドレス、１６はずらすバイト数、０ｘｆ８は下位のビット、^*は括弧内の内容、Ｒｅｔｕｒｎ（Ｄ）はＤの内容を持ってリターンすることを意味している。
【０１０５】
すなわち、Select Line( ) ＝｛Ｃ＝^*（Ｂ＋（Ａ＆０ｘｆ８）；Ｄ＝Ｃ＋（ＣＰＵ番号×１６）；Ｒｅｔｕｒｎ（Ｄ）｝は、Ｂの論理アドレス＋（アドレスＡ＋下位のビット「０ｘｆ８」）の内容をもってＣの値とし、このＣの値に（ＣＰＵ番号×１６）の値を足した値をＤとし、このＤの値を持ってリターンする、という内容である。
【０１０６】
また、図８のＡ図では、各ＣＰＵが同一命令（テスト命令、或いは試験命令）を実行することを示している。そして、図８のＢ図では、各ＣＰＵ毎の論理アドレスＢが異なっていても、その論理アドレスから変換される物理アドレス（キャッシュの物理アドレス）は全て同じアドレスになることを示している。
【０１０７】
例えば、ＣＰＵ−０の論理アドレス「５００００００」は物理アドレス「２０００００」に変換され、ＣＰＵ−１の論理アドレス「７００００００」は物理アドレス「２０００００」に変換される。また、ＣＰＵ−０の論理アドレス「５００２０００」は物理アドレス「２０２０００」に変換され、ＣＰＵ−１の論理アドレス「７００２０００」は物理アドレス「２０２０００」に変換される。
【０１０８】
このように、各ＣＰＵが実行する前記ランダム命令列内に試験対象キャッシュアドレスの選択を行う特定命令（セレクトライン命令）を設けておき、この特定命令の実行により、各ＣＰＵとも、同一の試験対象キャッシュアドレスを選択し、このアドレスに、各ＣＰＵに割り振ったシフトバイトを加算した値を求め、この値により、各ＣＰＵから見て、同一キャッシュアドレスで、かつ、アクセスが重複しないようにできる。すなわち、前記セレクト関数により選択されたアドレスは、各ＣＰＵから見て、同一メモリブロック内で、アクセスが重複しないようにすることができる。
【０１０９】
§７：例６の説明・・・図９参照
図９は例６の説明図であり、（ａ）はルーチンテーブル例、（ｘ）はマスタＣＰＵ処理（ＣＰＵ−０）、（ｙ）はスレーブＣＰＵ処理（ＣＰＵ−１，２，３）を示す。この場合、ＣＰＵ−０をマスタとし、他のＣＰＵをスレーブとしているが、マスタＣＰＵはいずれのＣＰＵでも良い。例６は、ランダム命令列に意図したキャッシュコヒーレンシィ論理を組み込むことにより、意図したキャッシュコヒーレンシィ論理の検証がランダム命令試験で実現可能にした例である。
【０１１０】
図９に示したようなキャッシュコヒーレンシィ（キャッシュの一致性）を試験するルーチンを予め用意する。例えば、（ａ）に示したようなルーチンテーブルを用意する。この（ａ）に示したルーチンでは、ＩＦ文を使い、各ＣＰＵ専用処理部（ｘ）、（ｙ）を設ける。これは全ＣＰＵからこのルーチンに起動がかかるためであり、かつ、ＣＰＵ毎に異なる動作をさせるためである。
【０１１１】
このＣＰＵ毎の専用処理部（ｘ）、（ｙ）に、キャッシュコヒーレンシィの論理（従来例で説明した「ＭＯＥＳＩ」として周知）を組み込んだこのルーチンを、ランダム命令列の生成過程で無作為に展開することにより、意図的なキャッシュコヒーレンシィ試験が実現可能になる。以下に、前記ルーチン内の詳細論理を説明する。
【０１１２】
先ず、試験対象キャッシュライン（図２の斜線部分参照）を選択（Select Line命令で選択）し、ＣＰＵの同期をとった後、各ＣＰＵの命令列に分岐する（ＣＰＵ毎に、処理を分けて実行する）。ここでは定義上、ＣＰＵ−０をマスタ、その他のＣＰＵをスレーブとする。前記マスタはキャッシュに或る状態を設定する。例えば、ストア命令（Store ）を使用する。スレーブは、マスタが設定したキャッシュの状態に対し、変化を与える。例えば、ロード命令（Load）により変化を与える（データを変化させる）。
【０１１３】
この例では、マスタが最新のキャッシュ情報を持っている時に、スレーブからその情報を読み出した時のキャッシュ状態の変化を実現させている。このようにして、前記ストア命令と、ロード命令を変化させることにより、多種多様なキャッシュの状態変化を明確に実現でき、その時のキャッシュコヒーレンシィ（キャッシュの一致性）の検証が可能になる。
【０１１４】
このように、前記（ａ）に示すルーチンを、ランダム命令列生成時に無作為に挿入することにより意図したキャッシュコヒーレンシィ論理の検証がランダム命令試験で実現可能となる。例えば、前記キャッシュコヒーレンシィは、次のような内容である。すなわち、データは、主記憶メモリ１や各ＣＰＵのＬ１＄、Ｌ２＄に格納されているが、最新のデータは必ずしも主記憶メモリ１にある必要はなく、いずれかのＣＰＵのＬ１＄、或いはＬ２＄に格納されていることもある。
【０１１５】
つまり、最新のデータは前記主記憶メモリ１、或いは各ＣＰＵのＬ１＄、或いはＬ２＄のいずれかにあれば良く、どこにあっても良いが、必ずどこかになければならない。これらの論理は全て、「ＭＯＥＳＩ」として確立された論理に従って、処理されている。
【０１１６】
§８：例７の説明・・・図１０、図１１参照
図１０は例７の説明図（その１）であり、Ａ図は中間言語の命令定義体、Ｂ図はテーブル例である。また、図１１は例７の説明図（その２）であり、（ｘ）はマスタＣＰＵ処理（ＣＰＵ−０）、（ｙ）はスレーブＣＰＵ処理（ＣＰＵ−１、２、３）を示す。
【０１１７】
例７は、ランダム命令列に意図したキャッシュコヒーレンシィ論理を組み込むことにより、キャッシュコヒーレンシィ試験における、意図したキャッシュデータの組み合わせをランダムに実現させる例である。例７の手法では、意図したキャッシュの動作は１ルーチンで１つ実現できるが、本来の試験はランダム命令試験であることから、意図したキャッシュの動作にランダム性を持たせ、キャッシュ動作の組み合わせが１つのルーチン内で出来ることが望ましい。これを実現する手法を以下に説明する。
【０１１８】
先ず、ルーチン内で定義する図１０のＡ図に示したような中間言語（マクロ命令）の命令定義体を用意する。この中間言語（マクロ命令）はランダム命令列を生成するジェネレータによって解析され、実行形式の命令に変換される。
【０１１９】
この場合、前記中間言語のパラメータの１つとして、次のような機能を持たせる。すなわち、「ＣＨＡＩＮ」，「ＣＨＡＩＮＬＩＭＩＴ」，「ＥＮＤ」の各命令において、「ＣＨＡＩＮ」は、以降に命令列が存在することを意味し、「ＣＨＡＩＮＬＩＭＩＴ」は、以降に命令列が存在することを意味し、かつ、本パラメータで指定した命令列（連続していることが条件）に対して、メモリに展開する際、展開する命令の順番を無作為に入れ換えることを意味する。「ＥＮＤ」は、命令列の最終を意味する。
【０１２０】
以上の機能を例を挙げて説明する。図１０のＢ図、及び図１１に示した例は、キャッシュコヒーレンシィ試験の１例である。この場合、ＣＰＵ−０、１、２、３が、それぞれ初期化状態からストア命令（STORE)とロード命令(LOAD)を実行した後に、各ＣＰＵからキャッシュ状態を変える幾つかの命令（この例では、D flush,U flush,Load,Store,Casxa）を、前記チェインリミット「ＣＨＡＩＮＬＩＭＩＴ」を付加して図１０のＢ図に示したテーブルを作成する。
【０１２１】
そして、このテーブルが、前記ジェネレータから選択されると、先頭から順に実行形式の命令（機械語）に置き換えられる。この過程で、図１１に示した（ｂ）の命令列を置き換えるが、この時、置き換える命令の順序が変わるため、このテーブルが選択される度に、異なる命令列が生成され、その結果、種々のキャッシュ状態の遷移が実現できる。
【０１２２】
従って、試験の目的である、キャッシュ状態を変化させる命令の種類と、その命令を発行する直前の状態を変えることにより、少ないテーブルで多くのキャッシュ状態遷移試験が実現可能になる。
【０１２３】
§９：例８の説明・・・図１２参照
図１２は例８の説明図であり、Ａ図はランダム命令の説明図、Ｂ図は状態１、Ｃ図は状態２である。例８は、命令のキャッシュコヒーレンシィ試験の実現を可能にする手法である。前記例１〜例７のようなデータ（命令以外の通常のデータ）のキャッシュコヒーレンシィではなく、命令のキャッシュコヒーレンシィ試験を可能にさせるには、命令実行空間を複数持たせる必要がある。この条件を満足し、かつ、前記例１〜例７の機能を有効にする手法は、次の手順により実現する。
【０１２４】
▲１▼：生成したランダム命令列を別の空間にそっくり複写する。
【０１２５】
▲２▼：両方のランダム命令空間（元の命令空間と複写後の命令空間）に対して、論理アドレスが等しく、物理アドレス（ランダム命令の空間）が異なるＭＭＵテーブル（アドレス変換テーブル）を作成する。そして、各ＭＭＵテーブルには、異なるコンテキスト値を定義する。なお、この場合のコンテキストとは、空間を切り換えるための制御レジスタのことであり、コンテキスト値は前記制御レジスタの値である。
【０１２６】
▲３▼：初期値として、コンテキスト値＝１０とし、例えば、論理アドレス＝０ｘ００００００から試験を開始する。
【０１２７】
▲４▼：或る割り込みを契機に、命令の論理アドレスをＭＭＵテーブルから任意に選択する。この値にランダム命令列内のアドレス（ランダム命令列の空間が１ＭＢと仮定した場合は、１ＭＢバウンダリの値をオアしたアドレス）を命令カウンタに設定する。
【０１２８】
例えば、０ｘ５０５００１０番地で割り込みが発生した場合、新命令アドレス（０ｘ７００００００）にランダム命令列内アドレス（０ｘ５００１０）を足した値＝０ｘ７０５００１０となる。そして、このアドレスで復帰することにより、ランダム命令列の実行順序が乱されることがなく、かつ論理アドレスを変化させることができる。
【０１２９】
▲５▼：また、或る割り込みを契機に、コンテキストを切り換えて異なったコンテキスト値（１０→２０）を得る。これにより、論理アドレスの命令カウンタは変わらずに、物理アドレスのみを変更できる。この場合、０ｘ３０００００のランダム命令空間の命令が動作する。
【０１３０】
▲６▼：前記▲１▼〜▲５▼の処理により、ランダム命令列は頭から順次実行され、かつ、実行する空間のみ変化することになる。つまり、実行される物理空間、及び実行論理アドレスが順次切り換わるため、命令のキャッシュコヒーレンシィ試験が可能となる。
【０１３１】
なお、Ｌ２＄が２ＭＢと仮定すると、実アドレスで２ＭＢ離れたアドレスをアクセスすると、キャッシュのリプレースが発生する。また、Ｌ１＄が６４ＫＢの４Ｗａｙであると仮定すると、１６ＫＢ離れた論理アドレスを５回アクセスすることでキャッシュのリプレースが発生する。
【０１３２】
§１０：試験例例の説明・・・図１３、図１４参照
図１３は試験例の説明図、図１４はランダム命令列生成処理フローチャートである。以下、図１３、図１４に基づいて具体的な試験例を説明する。
【０１３３】
(1) ：試験例の説明
キャッシュコヒーレンシィの試験は例えば、次のようにして行う。
【０１３４】
▲１▼：全ＣＰＵから同一タスクに起動をかける。
【０１３５】
▲２▼：起動されたタスクはマスタＣＰＵ（この例では、ＣＰＵ−０）と、スレーブＣＰＵ（この例では、ＣＰＵ−１、２、３・・・ｎ）にＣＰＵを分け、マスタＣＰＵのみ命令の生成を行う。
【０１３６】
▲３▼：ランダム命令の生成処理を行う（後述する）。
【０１３７】
▲４▼：命令生成後、全ＣＰＵで同期をとり、その後、各ＣＰＵはレジスタに初期値を設定し、生成されたランダム命令域の先頭に処理を渡す。この場合、初期値を設定するレジスタの中にアクセス命令のアクセス専用に割り振られたレジスタ（例えば、Ｇ１：インデックス専用、Ｇ２：ベースレジスタ専用）には、データ域を指す論理アドレスと、そのアドレスからのディスプレートメントを指すためのインデックス値を設定する。また、アクセスの変更は、ランダム命令列から呼び出されたサブルーチンにより変更される。また、何かの割り込みを契機に変更しても良い。
【０１３８】
▲５▼：ランダム命令列の最後の復帰処理により元のタスクに戻る。
【０１３９】
▲６▼：最後に試験の結果得られたデータの比較を行うが、このデータ比較の論理は、例えば、次のａ、ｂの通りである。
【０１４０】
ａ：ランダム命令列の実行で或る条件（例えば、割り込み）で全レジスタとメモリの内容をトレースする。この場合、本ランダム命令列を、走行環境を変えて２回実行し、２度目の実行の時、１度目にトレースした情報と比較する。
【０１４１】
ｂ：本タスク処理を実機と、ソフトシュミレータとで実行させ、或る条件で双方のレジスタとメモリの内容を比較する。
【０１４２】
(2) ：ランダム命令の生成処理
以下、図１４に基づいてランダム命令の生成処理を説明する。なお、Ｓ１〜Ｓ８は各処理ステップを示す。ランダム命令の生成処理が開始されると、先ず、命令生成域にランダムデータを書き込む（Ｓ１）。次に、命令生成域のランダムデータを用い、ルーチンテーブル、又は命令変換テーブルから任意の１つを取り出し、命令に変換後、その命令又は命令列を命令生成域に書き込む。
【０１４３】
この処理を順次行うことで、命令生成域全てにランダム命令を作り上げる。具体的には次の通りである。前記のようにして命令生成域にランダムデータを書き込んだ後、命令生成域の先頭にポインタＰを設定する（Ｓ２）。そして、前記ポインタＰの内容からどちらかに分岐する（Ｓ３）。
【０１４４】
一方に分岐した場合は次のような処理を行う。先ず、命令変換テーブルから任意の命令データをランダムに選択し、その中にあるＡＮＤと、ＯＲデータと、ポインタＰから読み出したランダムデータとを掛け合わせて命令を生成する。生成した命令は、ポインタＰのアドレスに格納する。なお、アクセス命令を生成する場合は、決められたアクセス用レジスタが選ばれるように、前記ＡＮＤ／ＯＲデータを定義する（Ｓ４）。
【０１４５】
その後、ポインタＰの値を更新し（Ｓ５）、最終判定を行ない（Ｓ６）、最終処理になるまで、前記Ｓ３の処理から繰り返す。また、前記Ｓ３の処理の結果、他方に分岐した場合は、次のように処理を行う。先ず、ルーチンテーブルから任意のルーチンをランダムに選択し、そのルーチン内のコマンド列を順次命令に変換し、ポインタＰのアドレス以降に格納する（Ｓ７）。その後、生成した命令の命令数だけポインタＰの値を更新し（Ｓ８）、前記Ｓ６の処理へ移行する。以上の処理を行ない、最終判定で最終処理まで終了したと判定したら、ランダム命令生成処理を終了する。
【０１４６】
§１１：試験用プログラムと記録媒体の説明
図２に示した情報処理装置（情報処理システム）は、図２に示した構成の他に、通信制御部、ディスプレイ装置、キーボード、フレキシブルディスクドライブ（フロッピィディスクドライブ）（以下、「ＦＤＤ」と記す）、ＣＤ−ＲＯＭドライブ、ハードディスク装置（以下「ＨＤＤ」と記す）等を備えている。
【０１４７】
そして、図２に示したシステムは、予めＲＯＭ６に格納（記録、或いは記憶）しておいたプログラムを、ＣＰＵの制御により読み出して主記憶メモリ１へ格納し、該ＣＰＵが前記プログラムを実行することにより、前記キャッシュコヒーレンシィ試験（情報処理装置の試験）を行う。
【０１４８】
しかし、本願発明は、このような例に限らず、例えば、ＨＤＤのハードディスクに、次のようにして試験用のプログラムを格納し、このプログラムをＣＰＵが実行することで前記キャッシュコヒーレンシィ試験を行うことも可能である。
【０１４９】
▲１▼：他の装置で作成されたフレキシブルディスク（フロッピィディスク）に格納されているプログラム（他の装置で作成したプログラムデータ）を、ＦＤＤにより読み取り、ＨＤＤの記録媒体（ハードディスク）に格納する。
【０１５０】
▲２▼：光磁気ディスク、或いはＣＤ−ＲＯＭ等の記憶媒体に格納されているデータを、ＣＤ−ＲＯＭドライブにより読み取り、ＨＤＤの記録媒体（ハードディスク）に格納する。
【０１５１】
▲３▼：ＬＡＮ等の通信回線を介して他の装置から伝送されたプログラム等のデータを、通信制御部を介して受信し、そのデータをＨＤＤの記録媒体（ハードディスク）に格納する。
【０１５２】
【発明の効果】
以上説明したように、本発明によれば次のような効果がある。
【０１５３】
(1) ：キャッシュコヒーレンシィを引き起こすアクセスレジスタの増加、及びキャッシュリプレースの発生確率を増加させた状態で、ランダム命令生成の特質を活かしたキャッシュコヒーレンシィ機能の検証が可能になる。つまり、キャッシュに影響を与えるアクセス命令の前後に、ランダム命令が生成されることで、アクセス命令の発信タイミングをずらすことができ、色々な環境下でキャッシュの状態検証が可能となる。従って、ハードウエアの品質向上が実現できる。
【０１５４】
(2) ：マルチＣＰＵを対象としたランダム命令によるキャッシュコヒーレンシィ試験で、各ＣＰＵが同一ブロックをアクセスでき、かつ別のＣＰＵからデータが破壊されないようにしてキャッシュコヒーレンシィ試験を実現できる。
【０１５５】
(3) ：マルチＣＰＵ構成の情報処理装置を対象とし、該ＣＰＵ内でランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、ランダム命令列で生成されるアクセス命令のオペランドアドレスを指すベースレジスタの値を各ＣＰＵ毎に一定値づつずらした値に設定し、前記オペランドアドレスのインデックス部を指すインデックスレジスタの値と、前記オペランドアドレス内のディスプレースメント値を予め規定しておく。
【０１５６】
このようにすれば、全てのＣＰＵから同一命令を実行した時に、アクセス命令のアクセスアドレスは、ＣＰＵ毎に必ず決められたアドレス範囲（例えば、１６バイト）を指すことができる。すなわち、全ＣＰＵに同一アクセス空間を割り当てた時の各ＣＰＵからの重複アクセスを回避することができる。
【０１５７】
従って、マルチＣＰＵを対象としたランダム命令試験で、各ＣＰＵが同一ブロックをアクセスでき、かつ別のＣＰＵからデータが破壊されないようにしてキャッシュコヒーレンシィ試験を実現できる。
【０１５８】
(4) ：前記情報処理装置の試験を行う際、同一論理空間を持ち、命令により切り換え可能な異なるコンテキストを複数用意する。そして、コンテキスト毎に、それぞれ論理アドレスと実メモリの物理アドレスとの対応情報を持ち、ランダム命令列を実行させる過程で、特定命令によりコンテキストを切り換えた際、同一論理アドレスをアクセスしても、実メモリの異なる物理アドレスを指すようにしてキャッシュリプレースを行わせる。このようにすれば、キャッシュリプレース回数をより多く生成することができる。
【０１５９】
(5) ：前記情報処理装置の試験を行う際、各ＣＰＵが実行するランダム命令列内に試験対象キャッシュアドレスの選択を行う特定命令を設けておき、この特定命令の実行により、各ＣＰＵとも、同一の試験対象キャッシュアドレスを選択し、このアドレスに、各ＣＰＵに割り振ったシフトバイトを加算した値を求め、この値により、各ＣＰＵから見て、同一キャッシュアドレスで、かつ、アクセスが重複しないようにする。
【０１６０】
すなわち、複数ＣＰＵから試験対象となる同一メモリブロック内でアクセスが重複しないメモリアドレスを導出することができる。このようにすれば、ランダム命令列に意図したキャッシュコヒーレンシィ論理を組み込む手法を実現でき、試験時の全組み合わせの実現時間を短縮させることが可能になる。
【０１６１】
(6) ：前記情報処理装置の試験を行う際、生成したランダム命令列を別の空間に複写する処理と、両方のランダム命令空間に対して、論理アドレスが等しく、物理アドレスが異なるアドレス変換テーブルを作成し、該テーブルに異なるコンテキスト値を定義する処理と、初期値として、コンテキストに値を設定し、或る論理アドレスから試験を開始させる処理と、或る割り込みを契機に、命令の論理アドレスを前記アドレス変換テーブルから選択し、この値にランダム命令内アドレスを命令カウンタに設定する処理と、或る割り込みを契機に、コンテキストの値を切り換えることにより、論理アドレスの命令カウンタは変わらずに、物理アドレスのみ変更できるようにする処理とを行う。
【０１６２】
そして、前記処理により、ランダム命令列を頭から順次実行し、かつ実行空間のみ変化させることで、命令キャッシュのキャッシュコヒーレンシィ試験を可能にする。
【０１６３】
(7) ：前記記録媒体のプログラムを読み出して実行することにより、ランダム命令列で生成されるアクセス命令のオペランドアドレスを指すベースレジスタの値を各ＣＰＵ毎に一定値づつずらした値に設定し、前記オペランドアドレスのインデックス部を指すインデックスレジスタの値と、前記オペランドアドレス内のディスプレースメント値を予め規定しておく。
【０１６４】
このようにすれば、全てのＣＰＵから同一命令を実行した時に、アクセス命令のアクセスアドレスは、ＣＰＵ毎に必ず決められたアドレス範囲を指すことができる。すなわち、全ＣＰＵに同一アクセス空間を割り当てた時の各ＣＰＵからの重複アクセスを回避することができる。
【０１６５】
従って、マルチＣＰＵを対象としたランダム命令試験で、各ＣＰＵが同一ブロックをアクセスでき、かつ別のＣＰＵからデータが破壊されないようにしてキャッシュコヒーレンシィ試験を実現できる。
【０１６６】
以上の説明に関して、更に以下の項を開示する。
【０１６７】
(a) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリを備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内で、ベースレジスタの値に、インデックスレジスタの値、或いはディスプレースメント値を加算して、アクセス命令のオペランドアドレスを生成することにより、ランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、汎用レジスタと、各ウインドウ毎にインレジスタ、ローカルレジスタ、アウトレジスタが有り、命令が或るウインドウを指している時は、その時のウインドウのレジスタと、汎用レジスタのみが使用可能になる場合、各ウインドウ毎に設けられたローカルレジスタに、予め、論理空間の異なるアドレスを設定しておき、このローカルレジスタをウインドウにより切り換えて、前記ランダム命令列で生成されるアクセス命令のオペランドアドレスを指すベースレジスタとして使用し、前記汎用レジスタを前記オペランドアドレスのインデックス部を指すインデックスレジスタとして使用することにより、命令によるウインドウの切り換えだけで、キャッシュコヒーレンシィの動作環境をより多く実現する。
【０１６８】
このようにすれば、既存の多数のレジスタを効率的に利用し、ウインドウの切り換えだけでキャッシュコヒーレンシィの動作環境を簡単に実現できる。また、ベースレジスタとして使用できるレジスタの数も多いので、多くの動作環境を簡単に実現できる。
【０１６９】
(b) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリを備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内で、ベースレジスタの値に、インデックスレジスタの値、或いはディスプレースメント値を加算して、アクセス命令のオペランドアドレスを生成することにより、ランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、アクセス命令のアドレスを決定する条件である、インデックスレジスタの値と、ディスプレースメント値を、キャッシュメモリの幅で規定される値ｎに対し、−ｎ、０、＋ｎの３か所に固定し、それ以外の値が生成されないようにすることにより、キャッシュリプレースの回数をより多く生成する。
【０１７０】
このように、アクセス命令のアドレスを決定する条件である、インデックスレジスタの値と、ディスプレースメント値を、キャッシュメモリの幅で規定される値ｎ（例えば、ｎ＝６４）に対し、−ｎ、０、＋ｎの３か所に固定し、それ以外の値が生成されないようにすれば、キャッシュリプレースの回数をより多く生成することができる。
【０１７１】
(c) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリを備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内でランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、キャッシュコヒーレンシィを試験するルーチンを用意し、このルーチンに、各ＣＰＵ毎の専用処理部を設け、該専用処理部にキャッシュコヒーレンシィ試験の論理を組み込んだこのルーチンを、ランダム命令列の生成過程で無作為に展開することにより、意図したキャッシュコヒーレンシィ論理の検証をランダム命令試験で実現可能にする。このようにすれば、意図したキャッシュコヒーレンシィ論理の検証がランダム命令試験で実現可能になる。
【０１７２】
(d) ：複数のＣＰＵをバスで接続し、各ＣＰＵにキャッシュメモリを備えたマルチＣＰＵ構成の情報処理装置を対象とし、前記ＣＰＵ内でランダム命令列を生成してキャッシュコヒーレンシィ試験を行う情報処理装置の試験方法において、マクロ命令の命令定義体を用意し、このマクロ命令をランダム命令列を生成する生成手段によって解析し、先頭から順に実行形式の命令に変換する過程で、展開する命令の順番を無作為に入れ換えることにより、より多くのキャッシュ状態遷移を実現させる。このようにすれば、より多くのキャッシュ状態遷移を実現させ、キャッシュコヒーレンシィ試験を行うことができる。
【図面の簡単な説明】
【図１】本発明の原理説明図である。
【図２】本発明の実施の形態におけるシステム構成図である。
【図３】本発明の実施の形態における例１の説明図（その１）である。
【図４】本発明の実施の形態における例１の説明図（その２）である。
【図５】本発明の実施の形態における例２の説明図である。
【図６】本発明の実施の形態における例３の説明図である。
【図７】本発明の実施の形態における例４の説明図である。
【図８】本発明の実施の形態における例５の説明図である。
【図９】本発明の実施の形態における例６の説明図である。
【図１０】本発明の実施の形態における例７の説明図（その１）である。
【図１１】本発明の実施の形態における例７の説明図（その２）である。
【図１２】本発明の実施の形態における例８の説明図である。
【図１３】本発明の実施の形態における試験例の説明図である。
【図１４】本発明の実施の形態におけるランダム命令列生成処理フローチャートである。
【図１５】従来例のシステム構成図である。
【図１６】従来のランダム命令試験例である。
【符号の説明】
１主記憶メモリ（主記憶装置）
２システムコントローラ
３キャッシュ制御部
４キャッシュ情報テーブル
５Ｉ／Ｏ制御部（入出力制御部）
６ＲＯＭ
９キャッシュメモリ
Ｌ１＄一次キャッシュ（一次キャッシュメモリ）
Ｌ２＄二次キャッシュ（二次キャッシュメモリ）[0001]
BACKGROUND OF THE INVENTION
The present invention is a test of an information processing apparatus that enables verification of a cache coherency function in a random instruction test for a multi-CPU system in which a plurality of CPUs are connected by a bus and each CPU includes a cache memory.On the wayRelated.
[0002]
[Prior art]
A conventional example will be described below.
[0003]
§1: Explanation of cash coherency
In recent years, as a means for speeding up computers, apart from main memory (main memory), high-speed memories ranging from several kilobytes to several megabytes, called cache memories, are arranged in the CPU to speed up processing. . In this case, data read from the main memory into the cache can always be processed at high speed while operating on the cache memory.
[0004]
This control only needs to consider the consistency (Cache Coherency) between the cache memory inside the CPU and the main memory when one CPU is used. However, if multiple CPUs are installed, the CPU It is necessary to consider the mutual cache memory, resulting in complicated hardware control. This control will be briefly described below with reference to the drawings.
[0005]
§2: Description of system configuration and processing outline ... See FIG.
FIG. 15 is a system configuration diagram of a conventional example. This system is an example of a multi-CPU system, and includes a plurality of CPUs (CPU-0, CPU-1,...), A main memory 1 (one memory shared by all CPUs), a system controller 2, and the like. Are connected by a bus. Each CPU has a cache memory (hereinafter also simply referred to as “cache”).
[0006]
In this case, a primary cache (hereinafter referred to as “L1 $”) is provided inside each CPU, and a secondary cache (hereinafter referred to as “L2 $”) is provided outside the chip of each CPU. The L1 $ is a small capacity cache, and the L2 $ is a larger capacity cache than the L1 $.
[0007]
Further, a system controller 2 is connected to the bus, and a ROM 6 or the like is connected to another bus connected to the system controller 2 via an input / output control unit (hereinafter referred to as “I / O control unit”) 5. Has been. The system controller 2 performs various controls including management of cache memory in the system. A cache controller 3 having a cache information table 4 is provided in the system controller 2.
[0008]
The cache control unit 3 has a function of storing L1 $ and L2 $ information of each CPU in the cache information table 4 and performing cache control of these L1 $ and L2 $. For example, when the CPU captures data in the main storage memory 1, first, it looks at the internal L1 $. If there is no data corresponding to L1 $, the CPU goes to the other CPU to obtain the data. At this time, the CPU asks the system controller 2 for the cache information, and the other CPU is based on the information. Read the data.
[0009]
Therefore, the system controller 2 stores the L1 $ and L2 $ information of each CPU in the cache information table 4 and manages them while constantly updating them, and performs cache control on each CPU based on this information. It has become.
[0010]
In the system, an outline of processing when each CPU executes a load instruction (Load) and a store instruction (Store) is as follows. For example, the load instruction is a process for fetching data in the main memory 1 into a register in the CPU. In this case, if it is the first load instruction, the data is stored in the main memory 1 → L2 $ → L1 $ → CPU. Transferred in order of internal registers.
[0011]
The store instruction is a process for writing data stored in a register in the CPU into the main memory 1, and the process when this instruction is executed is as follows. First, it is determined whether or not the data corresponding to the data to be stored stored in the register in the CPU is in L1 $. If there is such data, the data corresponding to L1 $ is updated to the new data (the store target). And the processing ends. At this time, since the data in the main memory 1 is not rewritten, the old data remains as it is.
[0012]
However, if there is no data corresponding to the data to be stored in L1 $, it is determined whether there is data corresponding to the data to be stored in L2 $. If there is such data, the corresponding data in L2 $ is determined. Is rewritten with new data (the data to be stored), and the process is terminated. At this time, since the data in the main memory 1 is not rewritten, the old data remains as it is.
[0013]
Further, if there is no data corresponding to the data to be stored in either L1 $ or L2 $, the corresponding data is read from the main memory 1 and written to L2 $, and the data corresponding to L2 $ is new data. The data is rewritten with (stored data) and the process is terminated. At this time, since the data in the main memory 1 is not rewritten, the old data remains as it is.
[0014]
The data written to the main memory 1 is the data that has been driven out of L2 $. Each process is constantly monitored by the system controller 2 and the cache memory information is stored in the cache information table 4 by the cache control unit 3. The data in L1 $ is always in L2 $. Accordingly, when an access instruction is executed by each CPU, processing is performed based on information in the cache information table 4 from the system controller 2.
[0015]
The cache control is described in many documents as the “MOESI” theory and is well known (a well-known theory), and a detailed description thereof will be omitted. For example, the following references are known as an example of the reference regarding the “MOESI”.
[0016]
Reference: “ultraSPARC^TM-IUser's Manual ", Revision 1.0, Sep 18, 1995," SPARC Technology Business "A Sun Microsystems, Inc. Business 2550 Garcia Avenue, Mountain View, CA 94043 USA, Part No; STP1030-UG. (See P91-97) ).
§3: Explanation of processing in the system
In the system, for example, when the CPU-0 loads the contents at the address 100 of the main memory 1, the CPU-0 cache (L1 $) has one block from the address 100 of the main memory 1 (for example, 64 bytes) of data is read. Next, when new data is stored at the address 100 of the main memory 1, the cache (L1 $ or L2 $) of the CPU-0 is replaced with new data, but the main memory 1 retains old data. is doing.
[0017]
In this state, when the CPU-1 loads the contents at address 100 in the main memory 1, the data at that time is not read from the main memory 1 but is read from the cache (L1 $ or L2 $) of the CPU-0. This is because the latest data at address 100 is held in the cache (L1 $ or L2 $) of the CPU-0. A method for verifying cache consistency in a system having such a mechanism is described below.
[0018]
§4: Explanation of the method for verifying cache consistency ... See FIG.
FIG. 16 shows a conventional random instruction test example. In general, when multiple tasks are operated simultaneously, areas for rewriting data such as stack area and data access area are provided for the number of tasks, and each task is realized by using the space allocated to each task. I am letting. This method is applicable when one task is shared and operated by a plurality of CPUs, and generally, this method is also often used for multi-CPU random instruction tests.
[0019]
In the prior art, a random instruction test (including an instruction for accessing a memory) operating under one CPU is simply operated simultaneously under a plurality of CPUs. In other words, if a random instruction is simultaneously executed from a plurality of CPUs, an access instruction included in the random instruction sequence is also randomly transmitted from the plurality of CPUs, and as a result, a test between the cache and the main memory can be performed. .
[0020]
For example, in the example shown in FIG. 16, the random instruction sequence is different and the memory access area is different from the first test example (see FIG. 16A). The test example (see FIG. 16B) is shown. In this case, in the test example 1, the CPU-0 and the CPU-1 perform tests by using different memory areas and executing different instruction sequences. In Test Example 2, CPU-0 and CPU-1 use the same memory area and execute different instruction sequences for testing.
[0021]
The problem here is that the access area from each CPU exists in a separate space, so that the memory access operation of one CPU does not act on the cache of another CPU. That is, the current situation is that a cache coherency test cannot be performed across a plurality of CPUs. In order to realize this test, access to the same cache line is required, but such a method is not provided at present.
[0022]
[Problems to be solved by the invention]
The conventional apparatus as described above has the following problems.
[0023]
As described above, in order to enable access to the same cache line (for example, 64 bytes), each CPU needs to access the same block address (64 bytes from the 64 byte boundary). Here, it may be considered that the space (memory area) accessed by each CPU should be the same. However, since each CPU operates asynchronously, if each CPU uses the same address, the memory is used depending on the timing. Different data may be read when the contents written in are read. This is because another CPU may write another content to the address immediately before reading the content of the address.
[0024]
The present invention solves such a conventional problem, and in a random instruction test for multiple CPUs, each CPU can access the same block, and a method in which data is not destroyed from another CPU is used to achieve cache coherency. The purpose is to be able to realize the test.
[0025]
[Means for Solving the Problems]
FIG. 1 is a diagram for explaining the principle of the present invention. 1 is a main memory, 2 is a system controller, 3 is a cache control unit, 4 is a cache control information table, 6 is a ROM storing a program, and 9 is a cache memory. . In order to achieve the above object, the present invention is configured as follows.
[0027]
  (1): An information processing apparatus for processing a cache coherency test by generating a random instruction sequence in the CPU and targeting a multi-CPU information processing apparatus in which a plurality of CPUs are connected by a bus and each CPU includes a cache memory. In the test method, a plurality of different contexts that have the same logical space and can be switched by an instruction are prepared, each context has correspondence information between a logical address and a physical address of a real memory, and the random instruction sequence is executed. In the process, when the context is switched by a specific instruction, even if the same logical address is accessed, cache replacement is performed by pointing to a different physical address in the real memory, and more cache replacement times are generated. .
[0028]
  (2)A multi-CPU information processing apparatus in which a plurality of CPUs are connected by a bus and each CPU includes a cache memory 9, and an information processing apparatus that generates a random instruction sequence in the CPU and performs a cache coherency test In this test method, a specific instruction for selecting a test target cache address is provided in the random instruction sequence executed by each CPU, and each CPU selects the same test target cache address by executing this specific instruction. Then, a value obtained by adding the shift byte allocated to each CPU to this address is obtained, and from this value, the same cache address and access are not duplicated as seen from each CPU.
[0029]
  (3)A multi-CPU information processing apparatus in which a plurality of CPUs are connected by a bus and each CPU includes a cache memory 9, and an information processing apparatus that generates a random instruction sequence in the CPU and performs a cache coherency test In this test method, a process of copying the generated random instruction sequence to another space, and an address conversion table having the same logical address and different physical addresses are created for both random instruction spaces, and different contexts are created in the table. A process for defining a value, a process for setting a value as a context as an initial value, starting a test from a certain logical address, and a certain interrupt, selecting a logical address of an instruction from the address conversion table, In response to a process that sets the random instruction address to this value in the instruction counter and a certain interrupt, By changing the value of the text, the instruction counter of the logical address does not change and only the physical address can be changed. By the above process, the random instruction sequence is executed sequentially from the head, and only the execution space is executed. By changing it, cache coherency test of instruction cache was made possible.
[0031]
(Function)
The operation of the present invention based on the above configuration will be described with reference to FIG.
[0033]
In this way, when the same instruction is executed from all the CPUs, the access address of the access instruction can point to an address range determined for each CPU. That is, duplicate access from each CPU when the same access space is allocated to all CPUs can be avoided.
[0034]
Accordingly, in a random instruction test for multiple CPUs, each CPU can access the same block, and a cache coherency test can be realized so that data is not destroyed from another CPU.
[0035]
  (a): The above(1)Then, when testing the information processing apparatus, a plurality of different contexts having the same logical space and switchable by an instruction are prepared. Each context has correspondence information between the logical address and the physical address of the real memory, and even if the same logical address is accessed when the context is switched by a specific instruction in the process of executing the random instruction sequence, Cache replacement (for example, eviction of cache data from L1 $ to L2 $) is performed by pointing to a different physical address in the memory. In this way, it is possible to generate more cache replacement times.
[0036]
  (b): The above(2)Then, when testing the information processing apparatus, a specific command (for example, Select) that selects a test target cache address in a random command sequence executed by each CPU. Line ()) is provided, and by executing this specific instruction, the same test target cache address is selected for each CPU, and a value obtained by adding the shift byte allocated to each CPU to this address is obtained. Thus, as viewed from each CPU, the same cache address is used so that accesses are not duplicated.
[0037]
That is, it is possible to derive a memory address from which access does not overlap in the same memory block to be tested from a plurality of CPUs. In this way, it is possible to realize a technique for incorporating the intended cache coherency logic into a random instruction sequence, and it is possible to reduce the realization time of all combinations during testing.
[0038]
  (c): The above(3)Then, when testing the information processing device, a process of copying the generated random instruction sequence to another space and creating an address translation table with the same logical address and different physical addresses for both random instruction spaces Then, a process for defining different context values in the table, a process for setting a value in the context as an initial value, starting a test from a certain logical address, and a logical interrupt of the instruction triggered by a certain interrupt By selecting from the address conversion table and setting the value in the random instruction in the instruction counter to this value and switching the context value triggered by a certain interrupt, the logical address instruction counter does not change, and the physical address Only to be able to change.
[0039]
Then, by the above processing, the random instruction sequence is sequentially executed from the head and only the execution space is changed, thereby enabling the cache coherency test of the instruction cache.
[0043]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described in detail below with reference to the drawings.
[0044]
§1: System description ... See Fig. 2
FIG. 2 is a system configuration diagram. This system is an example of a multi-CPU system (information processing apparatus having a multi-CPU configuration), and a plurality of (n) CPUs (CPU-0, CPU-1... CPU-n) and a main memory on a bus. A memory 1 (one memory shared by all CPUs), a system controller 2 and the like are mounted. Each CPU has a cache memory (hereinafter also simply referred to as “cache”).
[0045]
In this case, L1 $ (primary cache) is provided inside each CPU (inside the CPU chip), and L2 $ (secondary cache) is provided outside each CPU (chip different from the CPU chip). Connected by bus. In this example, the primary cache and the secondary cache are provided, but generally more caches (tertiary cache, quaternary cache, etc., generally up to the n-th cache) may be provided. it can.
[0046]
The L1 $ is a small-capacity cache, and is composed of, for example, four 32 KB (kilobyte) cache memories (referred to as Way-0, Way-1, Way-2, and Way-3, respectively) (four Cache area) The L2 $ is a large-capacity cache (larger capacity than L1 $) having a capacity of 1 MB, 2 MB, or 4 MB. In this case, in the drawing, the left end of Way-0, Way-1, Way-2, and Way-3 of L1 $ is address = 0, and the right end is address = 32K. The hatched portion in FIG. 2 is a 64-byte cache line.
[0047]
A system controller 2 is connected to the bus, and a ROM 6 or the like is connected to another bus connected to the system controller 2 via an input / output control unit (hereinafter referred to as “I / O control unit”) 5. Has been. The system controller 2 performs various controls including cache management in the system, and a cache control unit 3 having a cache information table 4 is provided in the system controller 2.
[0048]
The cache control unit 3 has a function of storing L1 $ and L2 $ information (cache information) of each CPU in the cache information table 4 and performing cache management (or cache control) of these L1 $ and L2 $. ing. For example, when the CPU captures (loads) data in the main memory 1, first, the CPU looks at the internal L1 $.
[0049]
If there is no data corresponding to L1 $ and L2 $, the CPU goes to the other CPU to obtain the data. At this time, the CPU asks the system controller 2 for cache information, and based on the information. Read cache data from another CPU.
[0050]
Accordingly, the system controller 2 stores the L2 $ information of each CPU in the cache information table 4, manages the information while constantly updating the information, and performs cache control on each CPU based on this information. It has become. The ROM 6 stores a cache test program described below.
[0051]
When performing a cache coherency test described below, a CPU (for example, CPU-0) loads a program stored in the ROM 6 to the main memory 1 by IPL, and then the CPU stores the main memory. A test in this system is performed by fetching and executing the program of the memory 1.
[0052]
In the system, an outline of processing when each CPU executes a load instruction (Load) and a store instruction (Store) is as follows. For example, the load instruction is a process of fetching data in the main memory 1 into a register in the CPU. In this case, if it is the first load instruction (with all L1 $ and L2 $ being empty) Data is transferred in the order of main memory 1 → L2 $ → L1 $ → CPU internal registers.
[0053]
In this case, for example, if data at address 100 in the main memory 1 is to be loaded, data of 64 bytes is read from address 100 in the main memory 1, this data is stored in L2 $, and then the L2 Data is transferred from $ to L1 $ and stored in Way-3 of L1 $.
[0054]
When data is sequentially loaded by repeating the operation in this way, 64 bytes of data are sequentially stored in the L1 $ in the way-3, the way-2, the way-1, and the way-0. Data read from the main memory 1 is sequentially stored in the cache line shown in FIG.
[0055]
The store instruction is a process for writing data stored in a register in the CPU into the main memory 1, and the process when this instruction is executed is as follows. First, it is determined whether or not the data corresponding to the data to be stored stored in the register in the CPU is in L1 $. If there is such data, the data corresponding to L1 $ is updated to the new data (the store target). And the processing ends. At this time, since the data in the main memory 1 is not rewritten, the old data remains as it is.
[0056]
However, if there is no data corresponding to the data to be stored in L1 $, it is determined whether there is data corresponding to the data to be stored in L2 $. If there is such data, the corresponding data in L2 $ is determined. Is rewritten with new data (the data to be stored), and the process is terminated. At this time, since the data in the main memory 1 is not rewritten, the old data remains as it is.
[0057]
Further, if there is no data corresponding to the data to be stored in either L1 $ or L2 $, the corresponding data is read from the main memory 1 and written to L2 $, and the data corresponding to L2 $ is new data. The data is rewritten with (stored data) and the process is terminated. At this time, since the data in the main memory 1 is not rewritten, the old data remains as it is. Hereinafter, a specific example of the cache coherency test will be described in detail.
[0058]
§2: Explanation of Example 1 ... See FIGS. 3 and 4
3 is an explanatory diagram (part 1) of Example 1. FIG. 3A is a definition of a base register for generating an operand address, FIG. 3B is a definition of an index register for generating an operand address, and FIG. 3C is a definition of a displacement value for generating an operand address. Indicates. FIG. 4 is an explanatory diagram (part 2) of the first example.
[0059]
Example 1 is a test method using a technique for preventing duplicate access from each CPU when the same access space is allocated to all CPUs. Each CPU is provided with various registers. In the following description, an index register provided in each CPU is indicated as% G1, and a base register is indicated as% G2 (% G1 and% G2 are Both are register numbers).
[0060]
The operand address of the access instruction for the main memory 1 is generated by (1): base register (% G2) address + index register (% G1) address (% G2 +% G1), or (2): base register It is determined by (% G2) address + displacement value (% G2 + displacement value). Therefore, by defining the value of% G1, the value of% G2, and the displacement value, the space (area of the main memory 1) accessed by each CPU can be made the same, and overlapping access of areas can be avoided. Can do. Therefore, the definition of each register is as follows.
[0061]
(1): Definition of operand address generation base register
The value of the base register (% G2) indicating the operand address of the access instruction generated by the random instruction sequence is fixedly defined for each CPU, and each CPU is shifted by 16 bytes to the value of the register (% G2). Set the value. The value of 16 bytes defined here is a value when a maximum of 4 CPUs are operated when the cache access unit is 64 bytes. That is, this value varies depending on the number of CPUs to be tested and the cache access unit.
[0062]
In this example, the values of the base registers (% G2) of CPU-0 to CPU-3 are set as shown in FIG. 3A (x in the figure is a symbol meaning hexadecimal hex). For example, the address of the base register (% G2) in the CPU-0 is set to n = “0x00100000”, and a value shifted by 16 bytes is set in the base register of another CPU.
[0063]
That is, the address of the base register (% G2) in the CPU-1 is set to n + 16 = “0x00100010”, the address of the base register (% G2) in the CPU-2 is set to n + 32 = “0x00100020”, and the CPU− 3, the address of the base register (% G2) in n = n + 48 = “0x00100030”.
[0064]
(2): Definition of operand address generation index register
The value of the index register (% G1) indicating the index part of the operand address of the access instruction generated by the random instruction sequence is defined to be fixed, and the content is set to the cache access boundary (64-byte boundary). In this example, as shown in FIG. 3B, “0bm... Mmm000000” is set in the index register (% G1). In this case, for example, 0x3c0 (the unit of m is a bit).
[0065]
(3): Definition of displacement value (direct value) for operand address generation
Bits 4 and 5 of the displacement value in the operand address of the access instruction generated by the random instruction sequence are fixed to 0. In this example, the displacement value is set to “0bnnnnnnnn00mmmm”. In this case, for example, the displacement value = 0x2c8 (the units of m and n are bits). In the value “0bnnnnnnnn00 mmmm”, m at the right end of the figure is bit 0 and m on the left side is bit 1. Therefore, bit 4 is 0 of the fifth bit from the right end.
[0066]
The bits 4 and 5 are the conditions described in the definition of the base register for generating the operand address in (1) (the value of 16 bytes defined above operates a maximum of 4 CPUs when the cache access unit is 64 bytes. In other words, this value varies depending on the number of CPUs to be tested and the access unit of the cache.
[0067]
The value of m in this field is defined by the type of instruction generated. For example, when a 1-byte storage instruction is generated, a value of “mmmm” (m is an arbitrary value), and when a 2-byte storage instruction is generated, a value of “mmm0” is generated. If the command is generated, an “mm00” value is used. If an 8-byte storage instruction is generated, the “mm00” value is used.
[0068]
That is, it is necessary to define a value that does not exceed 16 boundaries when data is accessed from that address. This is because accessing beyond the 16th boundary will reach the area accessed by the next CPU. The operand address of the access instruction generated under the above conditions can always indicate an address determined for each CPU on the 64-byte boundary. For example, when the following instruction is generated and the values in the above example are applied, the result is as shown in FIG. 3C.
[0069]
{Circle around (1)} When the address of the base register (% G2) + the address of the index register (% G1) is applied, the result is as follows. For example, in the 8-byte access example, if the store instruction is STX (X: 8 bytes, for example), the I register number 5 provided in the CPU is% I5, and the access address (operand address) of the access instruction is OP, STX,% I5, [% G2 +% G1] is as follows. [% G2 +% G1] indicates the address of the main memory 1.
[0070]
That is, CPU-0 is% G2 = 0x00100000 +% G1 = 0x3c0, OP = 0x001003c0, CPU-1 is% G2 = 0x00100010 +% G1 = 0x3c0, OP = 0x001003d0, CPU-2 is% G2 = 0x00100020 +% G1 = 0x3c0, OP = 0x001003e0, CPU-3 becomes% G2 = 0x00100030 +% G1 = 0x3c0, OP = 0x001003f0.
[0071]
{Circle around (2)} When the base register (% G2) address + displacement value address is applied, the result is as follows. For example, in the 8-byte access example, if the store instruction is STX (X: for example, 8 bytes), the I register number 5 is% I5, and the operand address is OP, STX,% I5, [% G2 + 0x2c8] It becomes like this. % I5 is a register for storing an operand address.
[0072]
That is, CPU-0 is% G2 = 0x00100000 + Imm13 = 0x2c8, OP = 0x001002c8, CPU-1 is% G2 = 0x00100010 + Imm13 = 0x2c8, OP = 0x001002d8, CPU-2 is% G2 = 0x00100020 + Imm13 = 0x2c8, OP = 0x0010e CPU-3 becomes% G2 = 0x00100030 + Imm13 = 0x2c8, OP = 0x001002f8. [% G2 + 0x2c8] indicates the address of the main memory 1.
[0073]
As described above, when the same instruction is executed from all the CPUs, the access address of the access instruction represented by OP can always indicate a range (16 bytes in this case) determined for each CPU. . Therefore, an access instruction (for example, a store instruction) issued randomly by each CPU can prevent duplicate access from each CPU when the same access space (memory area) is allocated to all CPUs. This state is shown in FIG.
[0074]
In FIG. 4, CPU-0 to CPU-3 are assigned to an area (address range) that is 16 bytes away from each other, thereby preventing duplication when accessing. That is, when the same access space (memory area) is allocated to all the CPUs, it is possible to reliably prevent duplicate accesses from the respective CPUs.
[0075]
In this case, the block unit of the data when accessing the main memory 1 is, for example, 64 bytes. In general, (block size) / (number of CPUs) = address range assigned to each CPU (see FIG. 4 is 16 bytes).
[0076]
§3: Explanation of Example 2 ... See FIG.
FIG. 5 is an explanatory diagram of Example 2, and shows the relationship between access register numbers and memory addresses. Example 2 is an example in which more cache coherency operating environments are generated in a random instruction sequence by realizing a cache coherency operating environment only by switching windows.
[0077]
For example, a RISC (Reduced Instruction Set Computer) type computer (reduced instruction set computer) such as “Ultra-sparc” has a general-purpose register (% G0 to% G7) and an inregister (% I0 to% I7) for each window. ), Local registers (% L0 to% L7) and out registers (% O0 to% O7). As shown in FIG. 5, the window registers are in (% I) and out (%) of adjacent windows. O) is duplicated (% I and% O have the same contents).
[0078]
In this case, the local registers (% L0 to% L7) have different contents when the windows (CW0 to CW4) are switched, but the contents of the out register (% O) and the in register (% I) of the adjacent windows are the same. is there. When the program points to a certain window, only the register of the window at that time and the general-purpose register (% G) can be used.
[0079]
Therefore, in most cases, a general-purpose register (% G) that can be referred to from any window is used to define for access. In this case, only 7 addresses can be specified at most. In the present invention, the local registers (% L0 to% L7) of each window are focused on as access registers, and assuming that there are five windows, 40 addresses can be specified (only switching of windows). So many register values can be easily obtained).
[0080]
By setting different addresses in the logical space in advance in these 40 registers, more cache coherency operating environments can be created simply by switching the current window with an instruction. In this case, current windows CW0 to CW4 (CW: Current Window) are windows that can be switched by a window switching command on the program. Then, the% L register is used as a base register by switching with CW. That is, by switching% L between CW0 to CW4,% L is used as the base register.
[0081]
The% G is used as the index register. In this case, since the values of −64, 0, and +64 are stored as index pointers in% G, the cache coherency operating environment can be easily realized only by switching the CW. Is possible.
[0082]
In this way, different addresses in the logical space are set in advance in the local register (% L) provided for each window, and the local register is switched according to the window, and the access generated by the random instruction sequence. By using the general register (% G) as an index register that points to the index part of the operand address, a cache coherency operating environment can be realized simply by switching windows. I was able to do it.
[0083]
In this case, the generation of the operand address of the access instruction for the main memory 1 is determined by (1): base register (% L) address + index register (% G) address (% L +% G). .
[0084]
§4: Explanation of Example 3 ... See FIG.
FIG. 6 is an explanatory diagram of Example 3, and shows the definition (CW0, 1, 2, 3, 4) of the L1 $ replacement register. FIG. 6 is an explanatory diagram enlarging the local register% L shown in FIG. Example 3 is an example of generating more cache coherency operating environments in a random instruction sequence by increasing the number of cache replacements (replacement of cache data).
[0085]
By the way, even if a large number of address registers to be accessed are defined, if the address to be actually accessed is too large, such as access register + (−) 0x10000, the probability of accessing the corresponding line (cache line) of the cache is high. As a result, the number of opportunities for cache replacement (cache data expulsion) decreases.
[0086]
The cache replacement in this case means replacement of cache data, that is, eviction of data from L1 $ to L2 $. For example, when writing new data to L1 $, if there is no space in the L1 $ area, the old data in L1 $ is evicted and written to L2 $. Write data.
[0087]
Therefore, the index register (% L) and the displacement value (immediate value), which are conditions for determining the address, are fixed at three locations of minus 64, 0, plus 64 (−64, 0, +64). , Prevent other values from being generated. As a result, the access area is limited to within plus or minus 64 bytes of the address indicated by the local register (% L).
[0088]
In this case,% L0 to% L7 (register% L in FIG. 5 is enlarged and shown in FIG. 6) used as base registers are respectively the widths of Way-1 to Way-3 of L1 $. It is set apart by 32KB according to a certain 32KB. For example, the interval between% L0 and% L1 is 32 KB, the interval between% L1 and% L2 is 32 KB, and the interval between% L2 and% L3 is 32 KB.
[0089]
For each% L, the general-purpose register% G used as an index register is set at intervals of −64 bytes, 0 bytes, and +64 bytes from the 32 KB boundary. Accordingly, an index value is entered in the general-purpose register% G used as an index register. In this case, an index value = −64 is entered in% G1, an index value = + 0 is placed in% G2, and an index value = + 64 is placed in% G3. Insert.
[0090]
Also, the displacement value (immediate value) uses values of −64 bytes, 0 bytes, and +64 bytes. In this way, by adding the value of the index register (% G) or the displacement value (immediate value) to the value of the base register (% L), the operand address of the access instruction is generated. Generate an access instruction.
[0091]
Accordingly, if an access instruction is executed four times with an instruction sequence randomly generated by each CPU, the same cache line is pointed out once at a probability. In this case, the cache line is a line shown by hatching across Way-0 to Way-3 in L1 $ shown in FIG.
[0092]
§5: Explanation of Example 4 ... See FIG.
FIG. 7 is an explanatory diagram of Example 4 and shows the relationship between the test space and the MMU table. Example 4 is an example of generating more cache coherency operating environments in a random instruction sequence by generating more cache replacement times.
[0093]
As a method for increasing the number of cache replacements, there is context switching. In Example 4, a context refers to a group of one continuous logical space (term of UNIX machine). MMU is an address conversion table.
[0094]
As shown in FIG. 7, the same logical space (logical address = 0 to n) and N different contexts Ctx (in this case, Ctx = a, Ctx = b, Ctx = c) (N = in this example) 3) Prepare. In this case, each context has a logical space from logical addresses 0 to n, and each has a correspondence table between logical addresses and physical addresses. Different data is set in the correspondence table between each logical address and physical address.
[0095]
For example, in the logical space of Ctx = a, Ctx = b, and Ctx = c in FIG. 7, a logical address and a physical address are associated with each other. As shown in the figure, even if the logical addresses are the same, they indicate different physical addresses (real memory addresses).
[0096]
Accordingly, a random instruction sequence is executed in the environment (the first context is Ctx = a), and the context is switched at a certain point (for example, switching from Ctx = a to Ctx = b) before and after changing the context. The logical address is the same, but the context is different. Therefore, even if the same logical address is accessed, cache replacement is performed.
[0097]
As described above, by combining the environments for cache replacement, it is possible to realize an extremely large number of cache coherency states compared to the number of times of occurrence of the conventional cache coherency. Therefore, it is possible to verify cache coherency in a short period of time, and to guarantee the quality of hardware.
[0098]
§6: Explanation of Example 5 ... See FIG.
FIG. 8 is an explanatory diagram of Example 5, FIG. 8A is an explanatory diagram of a test instruction, and FIG. 8B is a cache address table. Example 5 is an example in which the same cache line to be tested is derived from a plurality of CPUs by incorporating the intended cache coherency logic in a random instruction sequence.
[0099]
By the way, in the test methods of Examples 1 to 4, it is possible to realize a combination of all states between the cache in each CPU and the main memory 1 by issuing access commands randomly from all CPUs. In this method, it is unclear how much flow can be achieved for all combinations. Therefore, as a method for shortening the realization time of the combination of all states, there is the following method of Example 5.
[0100]
The cache address to be tested is selected using the select line instruction “Select Line () ”. In this test, a cache address to be tested is selected, and a value obtained by adding shift bytes allocated to each CPU is obtained. In this case, since all the CPUs need to point to the same physical address, the following method is used.
[0101]
(1): A test cache address table is created for each CPU. The logical addresses in this table may differ among CPUs, but the physical addresses must be matched.
[0102]
(2): Select line instruction (function) called by each CPU “Select Line () "uses the lower several bits of the caller's link address (A) as an index and obtains the target address (logical address) from the cache address table (B). In this case, each CPU selects a select line instruction (function) “Select” from the same address (A). In order to call “Line ()”, the addresses selected by the CPUs all indicate the same physical address.
[0103]
(3): In order to divide one block (64 bytes) from the selected logical address by each CPU, a value obtained by multiplying its own CPU number by 16 is added. The test address selected in this way is a value within the same memory block (64 bytes) as viewed from each CPU and the access is not duplicated.
[0104]
In this case, the select line instruction (function) is Line () = {C =^*(B + (A &0xf8); D = C + (CPU number × 16); Return (D)}, where A is a link address, B is a logical address, C is a physical address, and D is shifted by 16 bytes. Address, 16 is the number of bytes to shift, 0xf8 is the lower bit,^*Means the contents in parentheses, and Return (D) means to return with the contents of D.
[0105]
That is, Select Line () = {C =^*(B + (A &0xf8); D = C + (CPU number × 16); Return (D)} is the value of C with the contents of B's logical address + (address A + lower bit “0xf8”). The value obtained by adding the value of (CPU number × 16) to D is D, and the process returns with this D value.
[0106]
In FIG. 8A, each CPU executes the same instruction (test instruction or test instruction). FIG. 8B shows that even if the logical address B for each CPU is different, all the physical addresses (cache physical addresses) converted from the logical address are the same.
[0107]
For example, the logical address “5000000” of the CPU-0 is converted into the physical address “200000”, and the logical address “7000000” of the CPU-1 is converted into the physical address “200000”. Further, the logical address “5002000” of the CPU-0 is converted into the physical address “202000”, and the logical address “7002000” of the CPU-1 is converted into the physical address “202000”.
[0108]
As described above, a specific instruction (select line instruction) for selecting a test target cache address is provided in the random instruction sequence executed by each CPU, and the execution of this specific instruction allows each CPU to have the same test target. A cache address is selected, and a value obtained by adding a shift byte allocated to each CPU to this address is obtained. With this value, the same cache address can be seen from each CPU, and accesses can be prevented from overlapping. That is, the addresses selected by the select function can be prevented from being duplicated in the same memory block as viewed from each CPU.
[0109]
§7: Explanation of Example 6 ... See FIG.
FIG. 9 is an explanatory diagram of Example 6, (a) shows an example of a routine table, (x) shows master CPU processing (CPU-0), and (y) shows slave CPU processing (CPU-1, 2, 3). . In this case, the CPU-0 is the master and the other CPUs are slaves, but the master CPU may be any CPU. Example 6 is an example in which the intended cache coherency logic can be verified by the random instruction test by incorporating the intended cache coherency logic in the random instruction sequence.
[0110]
A routine for testing cache coherency (cache consistency) as shown in FIG. 9 is prepared in advance. For example, a routine table as shown in (a) is prepared. In the routine shown in (a), an IF statement is used to provide each CPU dedicated processing unit (x), (y). This is because this routine is activated from all the CPUs, and different operations are performed for each CPU.
[0111]
This routine in which the cache coherency logic (known as “MOESI” explained in the conventional example) is incorporated in the dedicated processing units (x) and (y) for each CPU is randomly selected in the process of generating a random instruction sequence. By deploying, intentional cache coherency tests can be realized. The detailed logic in the routine will be described below.
[0112]
First, select the cache line to be tested (see the shaded area in Fig. 2) (Select After the CPU is synchronized, the CPU branches to the instruction sequence of each CPU (the process is executed separately for each CPU). Here, by definition, CPU-0 is a master and other CPUs are slaves. The master sets a state in the cache. For example, a store instruction (Store) is used. The slave gives a change to the cache state set by the master. For example, a change is given by a load command (Load) (data is changed).
[0113]
In this example, when the master has the latest cache information, a change in the cache state when the information is read from the slave is realized. In this way, by changing the store instruction and the load instruction, it is possible to clearly realize a variety of cache state changes, and to verify the cache coherency (cache consistency) at that time.
[0114]
As described above, by randomly inserting the routine shown in (a) at the time of generating the random instruction sequence, the intended cache coherency logic can be verified by the random instruction test. For example, the cache coherency has the following contents. In other words, the data is stored in the main memory 1 and L1 $ and L2 $ of each CPU, but the latest data is not necessarily stored in the main memory 1, and the L1 $ or L2 of any CPU. It may be stored in $.
[0115]
That is, the latest data may be in the main memory 1 or either L1 $ or L2 $ of each CPU, and may be anywhere, but it must be somewhere. All of these logics are processed according to the logic established as “MOESI”.
[0116]
§8: Explanation of Example 7 ... See FIGS. 10 and 11
FIG. 10 is an explanatory diagram of Example 7 (part 1), where FIG. 10A is an intermediate language instruction definition body, and FIG. FIG. 11 is an explanatory diagram (part 2) of Example 7, where (x) shows master CPU processing (CPU-0) and (y) shows slave CPU processing (CPU-1, 2, 3).
[0117]
Example 7 is an example in which the intended combination of cache data in the cache coherency test is realized at random by incorporating the intended cache coherency logic in the random instruction sequence. In the method of Example 7, one intended cache operation can be realized in one routine, but since the original test is a random instruction test, the intended cache operation is given randomness, and the combination of cache operations is It is desirable to be able to do it within one routine. A method for realizing this will be described below.
[0118]
First, an intermediate language (macroinstruction) instruction definition body as shown in FIG. 10A defined in the routine is prepared. This intermediate language (macro instruction) is analyzed by a generator that generates a random instruction sequence and converted into an executable instruction.
[0119]
In this case, the following function is provided as one of the parameters of the intermediate language. That is, “CHAIN”, “CHAIN” In each of the “LIMIT” and “END” instructions, “CHAIN” means that there is an instruction sequence thereafter, and “CHAIN” “LIMIT” means that there is an instruction sequence thereafter, and the order of instructions to be expanded is expanded when the instruction sequence specified by this parameter (which must be continuous) is expanded in memory. It means changing at random. “END” means the end of the instruction sequence.
[0120]
The above functions will be described with examples. The example shown in FIG. 10B and FIG. 11 is an example of the cache coherency test. In this case, after the CPU-0, 1, 2, and 3 execute the store instruction (STORE) and the load instruction (LOAD) from the initialization state, respectively, some instructions (in this example, change the cache state). , D flush, U flush, Load, Store, Casxa) for the chain limit "CHAIN" The table shown in FIG. 10B is created by adding “LIMIT”.
[0121]
When this table is selected from the generator, it is replaced with instructions in execution format (machine language) in order from the top. In this process, the instruction sequence of (b) shown in FIG. 11 is replaced. At this time, since the order of the replacement instructions changes, a different instruction sequence is generated each time this table is selected. Transition of the cache state can be realized.
[0122]
Therefore, by changing the type of instruction that changes the cache state, which is the purpose of the test, and the state immediately before issuing the instruction, many cache state transition tests can be realized with a small number of tables.
[0123]
§9: Explanation of Example 8 ... See FIG.
FIG. 12 is an explanatory diagram of Example 8, where A is an explanatory diagram of a random instruction, B is state 1 and C is state 2. Example 8 is a technique that enables the implementation of an instruction cache coherency test. In order to enable the cache coherency test of instructions, not the cache coherency of data (ordinary data other than instructions) as in Examples 1 to 7, it is necessary to provide a plurality of instruction execution spaces. A method of satisfying this condition and enabling the functions of Examples 1 to 7 is realized by the following procedure.
[0124]
(1): The generated random instruction sequence is copied to another space.
[0125]
(2): For both random instruction spaces (original instruction space and copied instruction space), an MMU table (address conversion table) having the same logical address and different physical addresses (random instruction space) is created. . Different context values are defined in each MMU table. The context in this case is a control register for switching the space, and the context value is the value of the control register.
[0126]
{Circle around (3)} As an initial value, context value = 10, for example, the test is started from logical address = 0x000000.
[0127]
{Circle around (4)} An instruction logical address is arbitrarily selected from the MMU table in response to a certain interrupt. In this value, an address in a random instruction sequence (when the space of the random instruction sequence is assumed to be 1 MB, an address obtained by ORing the value of the 1 MB boundary) is set in the instruction counter.
[0128]
For example, when an interrupt occurs at address 0x50501010, a value obtained by adding a random instruction string address (0x50010) to a new instruction address (0x7000000) = 0x705001. By returning at this address, the execution order of the random instruction sequence is not disturbed, and the logical address can be changed.
[0129]
{Circle over (5)} Also, when a certain interrupt occurs, the context is switched to obtain a different context value (10 → 20). As a result, only the physical address can be changed without changing the instruction counter of the logical address. In this case, an instruction in a random instruction space of 0x300000 operates.
[0130]
{Circle around (6)} By the above steps {circle around (1)} to {circle around (5)}, the random instruction sequence is executed sequentially from the beginning, and only the execution space changes. In other words, since the physical space to be executed and the execution logical address are sequentially switched, the cache coherency test of the instruction can be performed.
[0131]
Assuming that L2 $ is 2 MB, a cache replacement occurs when an address 2 MB away from the real address is accessed. Assuming that L1 $ is 4 KB of 64 KB, cache replacement occurs when a logical address that is 16 KB apart is accessed five times.
[0132]
§10: Description of test examples: see FIGS. 13 and 14
FIG. 13 is an explanatory diagram of a test example, and FIG. 14 is a flowchart of random instruction sequence generation processing. Hereinafter, specific test examples will be described with reference to FIGS.
[0133]
(1): Explanation of test examples
For example, the cache coherency test is performed as follows.
[0134]
(1): The same task is activated from all CPUs.
[0135]
(2): The activated task is divided into a master CPU (CPU-0 in this example) and a slave CPU (CPU-1, 2, 3,... N in this example). Is generated.
[0136]
{Circle around (3)} Random instruction generation processing is performed (described later).
[0137]
{Circle around (4)} After the instruction is generated, all CPUs are synchronized, and thereafter, each CPU sets an initial value in the register and passes the process to the head of the generated random instruction area. In this case, a register (for example, G1: index only, G2: base register only) allocated exclusively for access instruction access in a register for setting an initial value includes a logical address indicating a data area and the address. Set an index value to point to the displacement of Further, the access is changed by a subroutine called from a random instruction sequence. Moreover, it may be changed in response to some interruption.
[0138]
{Circle around (5)} Return to the original task by the last return processing of the random instruction sequence.
[0139]
{Circle around (6)} Finally, the data obtained as a result of the test is compared. The logic of this data comparison is, for example, as a and b below.
[0140]
a: The contents of all registers and memory are traced under certain conditions (for example, an interrupt) upon execution of a random instruction sequence. In this case, the random instruction sequence is executed twice while changing the traveling environment, and compared with the information traced for the first time at the second execution.
[0141]
b: This task process is executed by the real machine and the soft simulator, and the contents of both registers and memory are compared under a certain condition.
[0142]
(2): Random instruction generation processing
The random instruction generation process will be described below with reference to FIG. In addition, S1-S8 shows each process step. When the random instruction generation process is started, first, random data is written in the instruction generation area (S1). Next, using random data in the instruction generation area, any one is extracted from the routine table or instruction conversion table, converted into an instruction, and then the instruction or instruction sequence is written into the instruction generation area.
[0143]
By sequentially performing this process, random instructions are created in the entire instruction generation area. Specifically, it is as follows. After writing random data in the instruction generation area as described above, the pointer P is set at the head of the instruction generation area (S2). Then, the content of the pointer P branches to either one (S3).
[0144]
When branching to one side, the following processing is performed. First, arbitrary instruction data is randomly selected from the instruction conversion table, and an instruction is generated by multiplying AND, OR data, and random data read from the pointer P therein. The generated instruction is stored at the address of the pointer P. When an access instruction is generated, the AND / OR data is defined so that a predetermined access register is selected (S4).
[0145]
Thereafter, the value of the pointer P is updated (S5), a final determination is made (S6), and the process from S3 is repeated until the final process is reached. If the result of S3 is branched to the other, processing is performed as follows. First, an arbitrary routine is selected at random from the routine table, and the command sequence in the routine is sequentially converted into instructions and stored after the address of the pointer P (S7). Thereafter, the value of the pointer P is updated by the number of instructions of the generated instruction (S8), and the process proceeds to S6. When the above processing is performed and it is determined that the final processing is completed in the final determination, the random instruction generation processing is ended.
[0146]
§11: Description of test program and recording medium
In addition to the configuration shown in FIG. 2, the information processing apparatus (information processing system) shown in FIG. 2 includes a communication control unit, a display device, a keyboard, a flexible disk drive (floppy disk drive) (hereinafter referred to as “FDD”). ), A CD-ROM drive, a hard disk device (hereinafter referred to as “HDD”), and the like.
[0147]
The system shown in FIG. 2 reads a program stored (recorded or stored) in the ROM 6 in advance under the control of the CPU, stores it in the main memory 1, and the CPU executes the program. Then, the cache coherency test (information processing device test) is performed.
[0148]
However, the present invention is not limited to such an example. For example, the test program is stored in the hard disk of the HDD as follows, and the CPU executes the program to perform the cache coherency test. It is also possible.
[0149]
(1): A program (program data created by another device) stored on a flexible disk (floppy disk) created by another device is read by FDD and stored in a recording medium (hard disk) of the HDD.
[0150]
(2): Data stored in a storage medium such as a magneto-optical disk or CD-ROM is read by a CD-ROM drive and stored in a recording medium (hard disk) of the HDD.
[0151]
(3): Data such as a program transmitted from another device via a communication line such as a LAN is received via the communication control unit, and the data is stored in a recording medium (hard disk) of the HDD.
[0152]
【The invention's effect】
As described above, the present invention has the following effects.
[0153]
(1): It is possible to verify a cache coherency function that takes advantage of the characteristics of random instruction generation while increasing the number of access registers causing cache coherency and increasing the probability of occurrence of cache replacement. That is, by generating a random instruction before and after an access instruction that affects the cache, the timing of issuing the access instruction can be shifted, and the cache state can be verified under various environments. Therefore, hardware quality can be improved.
[0154]
(2): In a cache coherency test using random instructions for multiple CPUs, each CPU can access the same block, and a cache coherency test can be realized such that data is not destroyed from another CPU.
[0155]
(3): In a test method for an information processing apparatus that targets an information processing apparatus having a multi-CPU configuration and generates a random instruction sequence in the CPU and performs a cache coherency test, an access instruction generated by the random instruction sequence The value of the base register indicating the operand address is set to a value shifted by a certain value for each CPU, and the index register value indicating the index part of the operand address and the displacement value in the operand address are defined in advance. deep.
[0156]
In this way, when the same instruction is executed from all the CPUs, the access address of the access instruction can point to an address range (for example, 16 bytes) that is always determined for each CPU. That is, duplicate access from each CPU when the same access space is allocated to all CPUs can be avoided.
[0157]
Accordingly, in a random instruction test for multiple CPUs, each CPU can access the same block, and a cache coherency test can be realized so that data is not destroyed from another CPU.
[0158]
(4): When testing the information processing apparatus, a plurality of different contexts having the same logical space and switchable by an instruction are prepared. Each context has correspondence information between the logical address and the physical address of the real memory, and even if the same logical address is accessed when the context is switched by a specific instruction in the process of executing the random instruction sequence, Cache replacement is performed by pointing to a different physical address in the memory. In this way, it is possible to generate more cache replacement times.
[0159]
(5): When testing the information processing apparatus, a specific instruction for selecting a test target cache address is provided in a random instruction sequence executed by each CPU, and by executing this specific instruction, The same cache address to be tested is selected, and a value obtained by adding the shift byte allocated to each CPU to this address is obtained. With this value, the same cache address is seen from each CPU, and access is not duplicated. To.
[0160]
That is, it is possible to derive a memory address from which access does not overlap in the same memory block to be tested from a plurality of CPUs. In this way, it is possible to realize a technique for incorporating the intended cache coherency logic into a random instruction sequence, and it is possible to reduce the realization time of all combinations during testing.
[0161]
(6): When testing the information processing apparatus, a process of copying the generated random instruction sequence to another space, and an address conversion table having the same logical address and different physical addresses for both random instruction spaces , A process for defining different context values in the table, a process for setting a value in the context as an initial value and starting a test from a certain logical address, and a logical address of an instruction triggered by a certain interrupt By selecting the address from the address conversion table and setting the random instruction address in the instruction counter to this value and switching the context value triggered by a certain interrupt, the instruction counter of the logical address remains unchanged. And processing to change only the physical address.
[0162]
Then, by the above processing, the random instruction sequence is sequentially executed from the head and only the execution space is changed, thereby enabling the cache coherency test of the instruction cache.
[0163]
(7): By reading and executing the program of the recording medium, the value of the base register indicating the operand address of the access instruction generated by the random instruction sequence is set to a value shifted by a constant value for each CPU, The value of the index register indicating the index part of the operand address and the displacement value in the operand address are defined in advance.
[0164]
In this way, when the same instruction is executed from all the CPUs, the access address of the access instruction can point to an address range determined for each CPU. That is, duplicate access from each CPU when the same access space is allocated to all CPUs can be avoided.
[0165]
Accordingly, in a random instruction test for multiple CPUs, each CPU can access the same block, and a cache coherency test can be realized so that data is not destroyed from another CPU.
[0166]
Regarding the above description, the following items are further disclosed.
[0167]
(a): A multi-CPU information processing apparatus in which a plurality of CPUs are connected by a bus and each CPU is provided with a cache memory. In the CPU, a base register value, an index register value, or a display is displayed. In the test method of the information processing apparatus that generates the random instruction sequence and performs the cache coherency test by adding the instruction value and generating the operand address of the access instruction, a general-purpose register, an in-register for each window, When there is a local register and an out register and the instruction points to a certain window, when only the register of the window at that time and the general-purpose register can be used, the local register provided for each window is pre- Set a different address in the logical space and load this local register. By switching between windows and using as a base register indicating the operand address of the access instruction generated by the random instruction sequence, and using the general-purpose register as an index register indicating the index part of the operand address, More cache coherency operating environment is realized by just switching.
[0168]
In this way, an operating environment for cache coherency can be easily realized simply by switching windows while efficiently using a large number of existing registers. In addition, since there are a large number of registers that can be used as base registers, many operating environments can be easily realized.
[0169]
(b): An information processing apparatus having a multi-CPU configuration in which a plurality of CPUs are connected by a bus and each CPU is provided with a cache memory. In the CPU, a base register value, an index register value, or a display is displayed. This is a condition for determining the address of the access instruction in the test method of the information processing apparatus that generates the random instruction sequence and performs the cache coherency test by adding the operand value and generating the operand address of the access instruction. By fixing the value of the index register and the displacement value at three locations of -n, 0, + n with respect to the value n defined by the width of the cache memory so that other values are not generated , Generate more cache replacements.
[0170]
In this way, the index register value and the displacement value, which are conditions for determining the address of the access instruction, are set to −n, 0 with respect to a value n (for example, n = 64) defined by the width of the cache memory. , + N and fixing other values so that no other values are generated, the number of cache replacements can be generated more.
[0171]
(c): Information for performing a cache coherency test by connecting a plurality of CPUs by a bus and generating a random instruction sequence in the CPUs, each of which includes a cache memory in each CPU. In the processing apparatus test method, a routine for testing cache coherency is prepared, and in this routine, a dedicated processing unit for each CPU is provided, and this routine in which the logic of the cache coherency test is incorporated in the dedicated processing unit, Random expansion in the process of generating a random instruction sequence enables verification of the intended cache coherency logic to be realized by a random instruction test. In this way, the intended cache coherency logic can be verified by a random instruction test.
[0172]
(d): Information for performing a cache coherency test by connecting a plurality of CPUs via a bus and generating a random instruction sequence in the CPU, with each CPU including a cache memory. In the test method of the processing apparatus, an instruction definition body of a macro instruction is prepared, and this macro instruction is analyzed by a generation unit that generates a random instruction sequence, and in the process of converting from the top to an instruction in an execution format, More cache state transitions are realized by randomly changing the order. In this way, more cache state transitions can be realized and a cache coherency test can be performed.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating the principle of the present invention.
FIG. 2 is a system configuration diagram according to the embodiment of the present invention.
FIG. 3 is an explanatory diagram (part 1) of Example 1 according to the embodiment of the present invention;
FIG. 4 is an explanatory diagram (part 2) of Example 1 according to the embodiment of the present invention.
FIG. 5 is an explanatory diagram of Example 2 in the embodiment of the present invention.
FIG. 6 is an explanatory diagram of Example 3 in the embodiment of the present invention.
FIG. 7 is an explanatory diagram of Example 4 in the embodiment of the present invention.
FIG. 8 is an explanatory diagram of Example 5 in the embodiment of the present invention.
FIG. 9 is an explanatory diagram of Example 6 in the embodiment of the present invention.
FIG. 10 is an explanatory diagram (No. 1) of Example 7 in the embodiment of the present invention.
FIG. 11 is an explanatory diagram (No. 2) of Example 7 in the embodiment of the present invention.
FIG. 12 is an explanatory diagram of Example 8 in the embodiment of the present invention.
FIG. 13 is an explanatory diagram of a test example in the embodiment of the present invention.
FIG. 14 is a flowchart of a random instruction sequence generation process in the embodiment of the present invention.
FIG. 15 is a system configuration diagram of a conventional example.
FIG. 16 is a conventional random instruction test example.
[Explanation of symbols]
1 Main memory (Main memory)
2 System controller
3 Cache control unit
4 Cache information table
5 I / O control unit (input / output control unit)
6 ROM
9 Cache memory
L1 $ primary cache (primary cache memory)
L2 $ secondary cache (secondary cache memory)

Claims

A test of an information processing apparatus that performs a cache coherency test by generating a random instruction sequence in the CPU, targeting an information processing apparatus having a multi-CPU configuration in which a plurality of CPUs are connected by a bus and each CPU includes a cache memory. In the method
A plurality of different contexts that have the same logical space and can be switched by an instruction are prepared, and each context has correspondence information between a logical address and a physical address of a real memory,
In the process of executing the random instruction sequence, when the context is switched by a specific instruction, even if the same logical address is accessed, cache replacement is performed by pointing to a different physical address in the real memory, and the number of cache replacements is set. A method for testing an information processing apparatus, characterized by generating more.

A test of an information processing apparatus that performs a cache coherency test by generating a random instruction sequence in the CPU, targeting an information processing apparatus having a multi-CPU configuration in which a plurality of CPUs are connected by a bus and each CPU includes a cache memory. In the method
A process of copying the generated random instruction sequence to another space;
For both random instruction spaces, creating an address translation table with the same logical address and different physical addresses, and defining different context values in the table;
A process for setting a value as a context as an initial value and starting a test from a certain logical address;
In response to a certain interrupt, a process of selecting a logical address of an instruction from the address conversion table, and setting a random instruction address to this value in an instruction counter;
A process that allows only the physical address to be changed without changing the instruction counter of the logical address by switching the context value triggered by an interrupt.
A test method for an information processing apparatus, which enables a cache coherency test of an instruction cache by executing a random instruction sequence sequentially from the head and changing only the execution space by the above processing.