JP3955843B2

JP3955843B2 - Microprocessor parallel simulation system

Info

Publication number: JP3955843B2
Application number: JP2003384971A
Authority: JP
Inventors: 中島　　浩
Original assignee: 株式会社半導体理工学研究センター
Priority date: 2003-11-14
Filing date: 2003-11-14
Publication date: 2007-08-08
Anticipated expiration: 2023-11-14
Also published as: JP2005149078A

Description

本発明は、マイクロプロセッサの並列シミュレーションシステムに関する。 The present invention relates to a microprocessor parallel simulation system.

ＳＯＣ（ＳｙｓｔｅｍＯｎＣｈｉｐ）を含むシステムを開発する際、一部の工程においてターゲットとなるＳＯＣの完成以前にソフトウェアを含めたシステム全体の性能評価や動作検証を行わなければならない場合がある。また、精度良く性能評価や動作検証を行おうとする場合には、ハードウェア機構の挙動をクロックレベルで忠実に再現するＣＡ（ＣｙｃｌｅＡｃｃｕｒａｔｅ）なシミュレーションを行う必要がある。 When developing a system including an SOC (System On Chip), it may be necessary to perform performance evaluation and operation verification of the entire system including software before the completion of the target SOC in some processes. In addition, when performing performance evaluation and operation verification with high accuracy, it is necessary to perform a CA (Cycle Accurate) simulation that faithfully reproduces the behavior of the hardware mechanism at the clock level.

一方、ＣＡＳ（ＣｙｃｌｅＡｃｃｕｒａｔｅＳｉｍｕｌａｔｉｏｎ）におけるシミュレーション実行速度は、対象となるマイクロプロセッサが高性能である場合、１ＣＰＵのシミュレーションプラットフォームを用いると、実際の５０００倍〜１００００倍程度の実行時間が必要となることがある。そのため、シミュレータの高速化について様々な研究がなされている。 On the other hand, the simulation execution speed in CAS (Cycle Accurate Simulation) requires an actual execution time of about 5000 to 10000 times when a 1-CPU simulation platform is used when the target microprocessor has high performance. There is. For this reason, various studies have been made on speeding up the simulator.

ところで、一般的に長大なシミュレーションを短時間で終了させるための手法として、何らかの基準でシミュレーション全体をいくつかの部分に分割し、複数台の計算機（シミュレーションノード）を用いてそれら複数の部分シミュレーションの実施を並列化する、というものが考えられる。しかし、ＳＯＣをＣＡレベルでシミュレートする場合、空間的な分割による（例えば、ＳＯＣを構成要素毎に分割することによる）複数部分のシミュレーションの並列化では、構成要素間の通信のオーバーヘッドが大きくなってしまう。このとき、シミュレーション高速化の効果を得ることは困難である。 By the way, in general, as a method for ending a long simulation in a short time, the entire simulation is divided into several parts according to some criteria, and a plurality of computers (simulation nodes) are used to perform the simulation of the plurality of partial simulations. It is possible to parallelize the implementation. However, when simulating the SOC at the CA level, the parallel overhead of the simulation of a plurality of parts by spatial division (for example, by dividing the SOC for each component) increases the communication overhead between the components. End up. At this time, it is difficult to obtain the effect of speeding up the simulation.

また、時間的な分割（例えば、適当な命令数毎にシミュレーションを分割すること）を行う場合、時間方向の連続性を確保することが困難である。いきおい、シミュレーション結果に正確性を担保することが著しく難しい。 Further, when performing time division (for example, dividing a simulation for each appropriate number of instructions), it is difficult to ensure continuity in the time direction. Surprisingly, it is extremely difficult to ensure the accuracy of simulation results.

本発明は、上記のようなマイクロプロセッサのＣＡＳを時間的に分割しつつも、高速かつ正確なシミュレーションを実施する並列シミュレーションシステムを提供することを目的とする。 An object of the present invention is to provide a parallel simulation system that performs high-speed and accurate simulation while temporally dividing the CAS of the microprocessor as described above.

本発明は上記の目的を達成するためになされたものである。本発明に係るマイクロプロセッサのための並列シミュレーションシステムは、
所定のネットワークで接続された、計算機である複数Ｐ（Ｐは２以上の自然数）のシミュレーションノードを含むマイクロプロセッサのための並列シミュレーションシステムであって、
シミュレーション対象システムにおける全命令列からｐ（ｐは自然数であり、且つ、ｐ≧Ｐ）個の部分命令列への分割点が設定されており、各部分命令列のスケジューリングシミュレーションの初期値として各分割点でのマイクロプロセッサの論理的状態及び仮想的な内部状態が与えられ、これら初期値を利用して各部分命令列のスケジューリングシミュレーションがＰ個のシミュレーションノードで分けられて行われるマイクロプロセッサのための並列シミュレーションシステムにおいて、
第ｉ（１≦ｉ≦ｐ−１）番目の部分命令列につき、スケジューリングシミュレーションを行なう第１のシミュレーションノードにて、その部分命令列の最後の命令から、さらに一定数（ｋ）の命令列のスケジューリングシミュレーションを後続して行なう後続シミュレーション手段と、
上記後続して行なわれた一定数の命令列のスケジューリングシミュレーションを終えた時点での第１のシミュレーションノードでの命令スケジューリング機構の状態を、上記第ｉ番目の部分命令列に後続する第（ｉ＋１）番目の部分命令列につきスケジューリングシミュレーションを行なう第２のシミュレーションノードに伝送する伝送手段と、
上記第２のシミュレーションノードが第（ｉ＋１）番目の部分命令列の先頭から上記一定数までの命令列のスケジューリングシミュレーションを行ない、その時点での命令スケジューリング機構の状態を保存する保存手段と、
上記伝送手段により伝送された上記命令スケジューリング機構の状態と、上記保存手段により保存された上記命令スケジューリング機構の状態とを比較して、自ノードで実施されるスケジューリングシミュレーションの正当性を検証する上記第２のシミュレーションノードにおける比較検証手段と
を含む。 The present invention has been made to achieve the above object. A parallel simulation system for a microprocessor according to the present invention includes:
A parallel simulation system for a microprocessor including a plurality of P (P is a natural number of 2 or more) simulation nodes connected by a predetermined network,
Division points from all instruction sequences in the simulation target system to p (p is a natural number and p ≧ P) partial instruction sequences are set, and each division instruction is used as an initial value for scheduling simulation of each partial instruction sequence. For a microprocessor in which a logical state and a virtual internal state of a microprocessor at a point are given, and scheduling simulation of each partial instruction sequence is performed by dividing them into P simulation nodes using these initial values. In a parallel simulation system,
For the i (1 ≦ i ≦ p−1) -th partial instruction sequence, a first number of instruction sequences of a certain number (k) further from the last instruction in the partial instruction sequence at the first simulation node that performs the scheduling simulation. Subsequent simulation means for performing scheduling simulation subsequently,
The state of the instruction scheduling mechanism in the first simulation node at the time when the scheduling simulation of a certain number of instruction sequences performed subsequently is finished is the (i + 1) th following the i-th partial instruction sequence. Transmission means for transmitting to the second simulation node for performing the scheduling simulation for the th partial instruction sequence;
The second simulation node performs a scheduling simulation of the instruction sequence from the top of the (i + 1) th partial instruction sequence to the predetermined number, and stores a state of the instruction scheduling mechanism at that time;
The state of the instruction scheduling mechanism transmitted by the transmission unit and the state of the instruction scheduling mechanism stored by the storage unit are compared to verify the validity of the scheduling simulation performed at the own node. Comparison verification means in two simulation nodes .

本発明の利用により、マイクロプロセッサの並列シミュレーションシステムを、シミュレーションノードの個数に応じて、高速化・効率化できる。 By utilizing the present invention, it is possible to increase the speed and efficiency of a parallel simulation system for a microprocessor according to the number of simulation nodes.

以下において、図面を参照しつつ、本発明に係る好適な実施の形態を説明する。 Hereinafter, preferred embodiments according to the present invention will be described with reference to the drawings.

図６は、本発明の好適な実施の形態に係る並列シミュレーションシステム２が稼動する、ハードウエア構成の概略図を示す。図６では、複数のコンピュータ４がＬＡＮ６を介して相互に接続される。コンピュータ４は、通常のワークステーションやパーソナルコンピュータであればよい。後で説明するように、これらの個々のコンピュータ４が、時間的に分割された（マイクロプロセッサの）ＣＡＳの（特にスケジューリングシミュレーションの）夫々をシミュレートするシミュレーションノードである。 FIG. 6 shows a schematic diagram of a hardware configuration in which the parallel simulation system 2 according to the preferred embodiment of the present invention operates. In FIG. 6, a plurality of computers 4 are connected to each other via a LAN 6. The computer 4 may be an ordinary workstation or personal computer. As will be explained later, these individual computers 4 are simulation nodes which simulate each of the time-divided (microprocessor) CAS's (especially in the scheduling simulation).

まず、本明細書では、ＣＡＳを
（ａ）命令の論理的挙動をシミュレートする「命令シミュレーション」と、
（ｂ）命令スケジューリング機構をシミュレートする「スケジューリングシミュレーション」と
に分ける。ここで、前者（ａ）は後者の１／１００程度の時間コストで実行可能であることが、当業者には明白である。よって、後者（ｂ）を並列実行することによりシミュレーション全体を高速化することができる。 First, in this specification, CAS is (a) “instruction simulation” for simulating the logical behavior of an instruction,
(B) It is divided into “scheduling simulation” for simulating the instruction scheduling mechanism. Here, it is obvious to those skilled in the art that the former (a) can be executed at a time cost of about 1/100 of the latter. Therefore, the entire simulation can be speeded up by executing the latter (b) in parallel.

本発明の好適な実施の形態に係る並列シミュレーションシステムでは、ＣＡＳのスケジューリングシミュレーションを並列化する。つまり、シミュレートされる命令列を時間方向に分割していくつかの部分命令列を得て、各々の部分命令列に関する命令スケジューリングを並列実行する。以下に、その並列シミュレーションシステムにおけるスケジューリングシミュレーションの処理の流れを説明する。 In the parallel simulation system according to the preferred embodiment of the present invention, CAS scheduling simulation is parallelized. That is, the instruction sequence to be simulated is divided in the time direction to obtain several partial instruction sequences, and the instruction scheduling for each partial instruction sequence is executed in parallel. The flow of scheduling simulation processing in the parallel simulation system will be described below.

≪スケジューリングシミュレーションの処理の流れ≫
≪１≫予め、（ＳＯＣを含む）性能検証対象システムについて「命令シミュレーション」を行う。その結果、実行（シミュレート）された命令の総数をＮとし、実行された命令の列をＩ（１），Ｉ（２），・・・Ｉ（Ｎ）としたとき、それら命令の列が、以下のような複数の（ここでは、ｐ個の）部分命令列{Ｓ（１），Ｓ（２），．．．，Ｓ（ｐ）}に分割される。 ≪Scheduling simulation process flow≫
<< 1 >> A “command simulation” is performed on the performance verification target system (including the SOC) in advance. As a result, when the total number of executed (simulated) instructions is N and the executed instruction sequence is I (1), I (2),... I (N), the instruction sequence is , A plurality of (here, p) partial instruction sequences {S (1), S (2),. . . , S (p)}.

Ｓ（１）＝｛Ｉ（ｓ_１＝１），．．．，Ｉ（ｔ_１）｝，
Ｓ（２）＝｛Ｉ（ｓ_２＝ｔ_１＋１），．．．，Ｉ（ｔ_２）｝，
・
・
Ｓ（ｐ−１）＝｛Ｉ（ｓ_ｐ−１＝ｔ_ｐ−２＋１），．．．，Ｉ（ｔ_ｐ−１）｝，
Ｓ（ｐ）＝｛Ｉ（ｓ_ｐ＝ｔ_ｐ−１＋１），．．．，Ｉ（ｔ_ｐ＝Ｎ）｝ S (1) = {I (s ₁ = 1),. . . , I (t ₁ )},
S (2) = {I (s ₂ = t ₁ +1),. . . , I (t ₂ )},
・
・
S (p−1) = {I (s _p−1 = t _p−2 +1),. . . , I (t _p-1 )},
S (p) = {I (s _p = t _p−1 +1),. . . , I (t _p = N)}

上記の分割数ｐおよび個々の分割点ｔ_ｉ（１≦ｉ≦ｐ−１）を定める方法については、後で説明する（≪命令列の分割法≫）。また、上記の「命令シミュレーション」、及び部分命令列への分割は、どのコンピュータで行なってもよく、例えば、図６においては、一つのシミュレーションノードＰ_１で行なえばよい。 A method of determining the number of divisions p and the individual division points t _i (1 ≦ i ≦ p−1) will be described later (<< instruction sequence division method >>). Also, division into "instruction simulation", and a partial instruction strings described above may be performed in any computer, for example, in FIG. 6 may be performed by one of the simulation node P _1.

≪２≫次に、ｐ個の部分命令列Ｓ（１），Ｓ（２），．．．Ｓ（ｐ）に対する「スケジューリングシミュレーション」が、Ｐ（≦ｐ）個のシミュレーションノードにより並列に実行される（図６参照）。 << 2 >> Next, p partial instruction sequences S (1), S (2),. . . The “scheduling simulation” for S (p) is executed in parallel by P (≦ p) simulation nodes (see FIG. 6).

部分命令列Ｓ（ｉ）＝｛Ｉ（ｓ_ｉ），．．．，Ｉ（ｔ_ｉ）｝を担当するシミュレーションノードは、Ｉ（ｓ_ｉ）の実行開始時点でのマイクロプロセッサの論理的状態Ａ（ｉ）をまず獲得する。ここで、「マイクロプロセッサの論理的状態Ａ（ｉ）」は、Ｉ（ｔ _ｉ−１）まで命令（シミュレーション）を進めてきたときのマイクロプロセッサに係るレジスタやメモリなどの値である。この論理状態の獲得は、次の２つの手順のうちの何れかで得ることを想定しており、何れの手順でもよい。
（手順１）：上記≪１≫の命令シミュレーションにおいて、個々の部分命令列Ｓ（ｉ）の最後の命令Ｉ（ｔ_ｉ）を実行した時点で、論理的状態Ａ（ｉ＋１）を保存する。
（手順２）：Ｓ（ｉ）のスケジューリングシミュレーションに先だって、それに先行する全ての命令Ｉ（１）からＩ（ｔ_ｉ−１）に関する命令シミュレーションを行い、論理的状態Ａ（ｉ）を得る。 Partial instruction sequence S (i) = {I (s _i ),. . . , I (t _i )} first acquires the microprocessor's logical state A (i) at the start of execution of I (s _i ). Here, the “microprocessor logical state A (i)” is a value of a register, a memory, or the like related to the microprocessor when the instruction (simulation) is advanced to I ( t _i−1 ). The acquisition of the logical state is assumed to be obtained in one of the following two procedures, and any procedure may be used.
(Procedure 1): In the instruction simulation of << 1 >> above, when the last instruction I (t _i ) of each partial instruction sequence S ( _i ) is executed, the logical state A ( i + 1 ) is saved.
(Procedure 2): Prior to the scheduling simulation of S (i), the instruction simulation regarding all the instructions I (1) to I (t _i-1 ) preceding it is performed to obtain the logical state A (i).

マイクロプロセッサの論理的状態を、獲得されたＡ（ｉ）とし、更に命令スケジューリング機構の状態を仮想的な初期状態とした上で、部分命令列の命令スケジューリングを
・Ｉ（ｓ_ｉ）から、Ｉ（ｔ_ｉ）を超えて、Ｉ（ｔ_ｉ＋ｋ）まで、
行う。この「仮想的な初期状態」は、例えば、命令パイプラインに全く命令が存在しない状態であればよい。 The logical state of the microprocessor is assumed to be A (i) obtained, and the state of the instruction scheduling mechanism is assumed to be a virtual initial state. Then, the instruction scheduling of the partial instruction sequence is changed from I (s _i ) to I (T _i ) to I (t _i + k)
Do. This “virtual initial state” may be, for example, a state in which no instruction exists in the instruction pipeline.

ここで、上記の「ｋ」は、命令スケジューリングの正当性を検証するために、Ｓ（ｉ）を担当するシミュレーションノードとＳ（ｉ＋１）を担当するシミュレーションノードとで、重複してスケジューリングシミュレーションする部分命令列の長さ（定数）である。 Here, “k” is a portion for performing scheduling simulations redundantly between a simulation node in charge of S (i) and a simulation node in charge of S (i + 1) in order to verify the validity of instruction scheduling. This is the length (constant) of the instruction sequence.

図１にて、
・Ｓ（１）を担当するシミュレーションノードとＳ（２）を担当するシミュレーションノードとで重複してスケジューリングシミュレーションする様子、
・Ｓ（２）を担当するシミュレーションノードとＳ（３）を担当するシミュレーションノードとで重複してスケジューリングシミュレーションする様子、及び、
・Ｓ（３）を担当するシミュレーションノードとＳ（４）を担当するシミュレーションノードとで重複してスケジューリングシミュレーションする様子を
模式的に示す。
図２では、Ｓ（１）及び重複部分を担当するシミュレーションノードが（例えば）Ｐ_１であり、Ｓ（２）及び重複部分を担当するシミュレーションノードが（例えば）Ｐ_２であり、Ｓ（３）及び重複部分を担当するシミュレーションノードが（例えば）Ｐ_３であり、Ｓ（４）及び重複部分を担当するシミュレーションノードが（例えば）Ｐ_４であることを、模式的に示す。 In FIG.
・ Scheduling simulation is duplicated between the simulation node in charge of S (1) and the simulation node in charge of S (2),
A state in which a scheduling simulation is duplicated between the simulation node in charge of S (2) and the simulation node in charge of S (3), and
A schematic illustration of a scheduling simulation overlapping between a simulation node in charge of S (3) and a simulation node in charge of S (4) is shown.
In Figure 2, the simulation node in charge of S (1) and the overlapping portion (for example) is _{P 1,} the simulation node in charge of the S (2) and overlapping parts are (for example) a _{P 2,} S (3) and simulation node in charge of the overlapping portion (for example) a P _3, the simulation node in charge of the S (4) and overlapping parts are the (for example) is a P _4, shown schematically.

≪３≫部分命令列Ｓ（ｉ−１）（ｉ＞２）を担当するシミュレーションノードは、担当する最後の命令Ｉ（ｔ_ｉ−１＋ｋ）が命令スケジューリング機構に投入された時点における「命令スケジューリング機構の状態Ｍ（ｔ_ｉ−１＋ｋ）」を、部分命令列Ｓ（ｉ）を担当するシミュレーションノードにＬＡＮ６を介して伝達する（図３の（ア））。（正確に言うと、下記で説明するように、Ｓ（ｉ−２）を担当するシミュレーションノードから伝達された状態Ｍ（ｔ_ｉ−２＋ｋ）とＭ’（ｓ_ｉ−１＋ｋ−１）との比較処理及びそれに条件付で付随する処理が完了してから伝達する。） << 3 >> The simulation node in charge of the partial instruction sequence S (i−1) (i> 2) “instruction scheduling at the time when the last instruction I (t _i−1 + k) in charge is input to the instruction scheduling mechanism. The mechanism state M (t _i−1 + k) ”is transmitted to the simulation node in charge of the partial instruction sequence S (i) via the LAN 6 ((a) in FIG. 3). (To be precise, as described below, the states M (t _i-2 + k) and M ′ (s _i−1 + k−1) transmitted from the simulation node in charge of S (i−2) Will be communicated after completing the comparison process and the conditionally accompanying process.)

Ｓ（ｉ）を担当するシミュレーションノードは、そのスケジューリングシミュレーションの過程において、命令Ｉ（ｓ_ｉ＋ｋ−１）＝Ｉ（ｔ_ｉ−１＋ｋ）が命令スケジューリング機構に投入された時点における命令スケジューリング機構の状態Ｍ’（ｓ_ｉ＋ｋ−１）を保存しておき、上記のＳ（ｉ−１）を担当するシミュレーションノードから伝達されてきた命令スケジューリング機構の状態Ｍ（ｔ_ｉ−１＋ｋ）と比較する（図３の（イ））。 The simulation node in charge of S (i) has the instruction scheduling mechanism at the time when the instruction I (s _i + k−1) = I (t _i−1 + k) is input to the instruction scheduling mechanism in the process of the scheduling simulation. The state M ′ (s _i + k−1) is stored and compared with the state M (t _i−1 + k) of the instruction scheduling mechanism transmitted from the simulation node in charge of the above S (i−1). ((A) in FIG. 3).

≪４≫上記≪３≫における命令スケジューリング機構の状態比較が一致すれば、Ｓ（ｉ）に関するスケジューリングシミュレーションは正当、ということになる。よって、Ｉ（ｓ_ｉ＋ｋ−１）が命令スケジューリング機構に投入された以降の挙動（例えば実行クロック数など）をスケジューリングシミュレーションの部分結果とし、Ｓ（ｉ）を担当するシミュレーションノードは、状態Ｍ（ｔ_ｉ＋ｋ）をＳ（ｉ＋１）を担当するシミュレーションノードにＬＡＮ６を介して伝達する（図３の（ウ））。 << 4 >> If the state comparison of the instruction scheduling mechanism in << 3 >> above matches, the scheduling simulation for S (i) is valid. Therefore, the behavior (for example, the number of execution clocks) after I (s _i + k−1) is input to the instruction scheduling mechanism is a partial result of the scheduling simulation, and the simulation node in charge of S (i) is in the state M ( t _i + k) is transmitted to the simulation node in charge of S (i + 1) via the LAN 6 ((c) in FIG. 3).

仮に一致しなければ、Ｓ（ｉ）を担当するシミュレーションノードは、Ｓ（ｉ−１）を担当するノードから伝達された命令スケジューリング機構の状態Ｍ（ｔ_ｉ−１＋ｋ）を初期状態として、Ｉ（ｓ_ｉ＋ｋ−１）以降のスケジューリングシミュレーションを再度行う。そして、その再実行結果を部分結果とし、再実行により得られる状態Ｍ（ｔ_ｉ＋ｋ）をＳ（ｉ＋１）を担当するシミュレーションノードにＬＡＮ６を介して伝達する。 If they do not match, the simulation node in charge of S (i) uses the instruction scheduling mechanism state M (t _i-1 + k) transmitted from the node in charge of S (i−1) as the initial state, and I The scheduling simulation after (s _i + k−1) is performed again. Then, the re-execution result is set as a partial result, and the state M (t _i + k) obtained by the re-execution is transmitted to the simulation node in charge of S (i + 1) via the LAN 6.

≪５≫上記の≪３≫と≪４≫とにおける命令スケジューリング機構の状態伝達の処理は、最初と最後の部分命令列Ｓ（１）およびＳ（ｐ）を担当するシミュレーションノードでは、例外的なものとなる。 << 5 >> The state transmission processing of the instruction scheduling mechanism in the above << 3 >> and << 4 >> is exceptional in the simulation node in charge of the first and last partial instruction sequences S (1) and S (p). It will be a thing.

即ち、Ｓ（１）を担当するシミュレーションノードは、他のシミュレーションノードからの状態伝達を受け取らない。つまり、ここでのスケジューリングシミュレーションは無条件に正当であり、他のシミュレーションノードからの状態伝達を待たずにＭ（ｔ_１＋ｋ）をＳ（２）を担当するシミュレーションノードヘ伝達する（図１のＩ（ｔ_１）、Ｉ（ｔ_１＋ｋ）部分参照）。 That is, the simulation node in charge of S (1) does not receive state transmission from other simulation nodes. That is, the scheduling simulation here is unconditionally valid, and M (t ₁ + k) is transmitted to the simulation node in charge of S (2) without waiting for state transmission from other simulation nodes (FIG. 1). I (t ₁ ), see I (t ₁ + k) part).

一方、Ｓ（ｐ）を担当するシミュレーションノードは、他のシミュレーションノードヘの状態伝達を行わない。よって、状態Ｍ（ｔ_ｐ−１＋ｋ）と状態Ｍ’（ｓ_ｐ＋ｋ−１）と比較し、一致すれば、正当な結果が得られたとして、全体のスケジューリングシミュレーションを完了する。仮に一致しなければ、Ｓ（ｐ）を担当するシミュレーションノードは、状態Ｍ（ｔ_ｐ−１＋ｋ）を初期状態として、Ｉ（ｓ_ｐ＋ｋ−１）以降のスケジューリングシミュレーションを再度行い正当な結果を得て、全体のスケジューリングシミュレーションを完了する。 On the other hand, the simulation node in charge of S (p) does not transmit the state to other simulation nodes. Therefore, the state M (t _p−1 + k) is compared with the state M ′ (s _p + k−1), and if they match, the entire scheduling simulation is completed assuming that a valid result is obtained. If they do not match, the simulation node in charge of S (p) sets the state M (t _p−1 + k) as the initial state, performs the scheduling simulation after I (s _p + k−1) again, and obtains a valid result. And complete the entire scheduling simulation.

≪６≫以上により、スケジューリングシミュレーションの処理が完了する。 << 6 >> Thus, the scheduling simulation process is completed.

≪命令列の分割法≫
部分命令列を作成するための命令列分割数ｐは、任意に決定され得る。スケジューリングシミュレーションを並列実行するシミュレーションノード数Ｐに対して、ｐ≧Ｐであることが望ましい。更に、シミュレーションノードに対する負荷の均衡のために、ｐがＰの定数倍であることが望ましい。 ≪ Instruction sequence division method ≫
The instruction sequence division number p for creating the partial instruction sequence can be arbitrarily determined. It is desirable that p ≧ P with respect to the number P of simulation nodes that execute the scheduling simulation in parallel. Furthermore, it is desirable that p is a constant multiple of P for load balancing on the simulation node.

また、ｐ＝ｃＰとしたとき、定数ｃの値を小さくして状態不一致（即ち、≪スケジューリングシミュレーションの処理の流れ≫の≪３≫における、命令スケジューリング機構の状態比較の不一致）の生起数を小さくする第１の方法と、ｃを大きくして状態不一致が生じた場合の再実行時間を小さくする第２の方法とが想定され得る。何れかを選択するのか若しくはそれらに近いものを選択するのかは、シミュレーション対象のマイクロプロセッサの構造やノード数ｐなどに応じて適宜に決定され得る。 In addition, when p = cP, the value of the constant c is decreased to reduce the number of occurrences of state mismatch (that is, mismatch in state comparison of instruction scheduling mechanism in << 3 >> of << Scheduling simulation process flow >>). It is possible to envisage a first method that reduces the re-execution time when c is increased and a state mismatch occurs. Whether or not to select one of them can be appropriately determined according to the structure of the microprocessor to be simulated, the number of nodes p, and the like.

また、予め部分命令列の長さや分割点の選択方法を定めておき、命令列全体を先頭から手繰りつつ上記の条件に合致した位置で分割する、というような分割法も採用できる。このときは、命令列全体の命令総数Ｎが判明する前に、どんどん分割していくことになる。 Further, it is possible to adopt a division method in which the length of a partial instruction sequence and a method for selecting a division point are determined in advance, and the entire instruction sequence is divided at a position that meets the above conditions while starting from the beginning. At this time, before the total number N of instructions in the entire instruction sequence is determined, the number of instructions is divided.

前にも説明したように、部分命令列の分割点ｔ_ｉは、シミュレーションノードへの負荷均衡の観点から、部分命令列の長さｔ_ｉ−ｓ_ｉ＋１がほぼ一定になるように定められることが望ましい。一方、スケジューリング機構の状態比較での状態不一致の確率を小さくするために、不一致が生じにくい分割点を選ぶことができる。例えば、命令キャッシュミスや分岐予測ミスのように、命令スケジューリング機構への命令投入が滞る事象が生じる命令を分割点に選べば、相当の重複実行によって不一致が生じにくくなることが容易に想像できる。けだし、各部分命令列のスケジューリングシミュレーションの仮想的な内部状態の初期値は、分岐予測ミス発生時の内部状態と略類似のもの（例えば、パイプラインが空、など）であるからである。このような分割点は、命令シミュレーションを行うことのみによって、高い確度・信頼度で発見することができる。 As described above, the division point t _i of the partial instruction sequence is determined so that the length t _i −s _i +1 of the partial instruction sequence is substantially constant from the viewpoint of load balancing to the simulation node. Is desirable. On the other hand, in order to reduce the probability of state mismatch in the state comparison of the scheduling mechanism, it is possible to select a division point where mismatch is unlikely to occur. For example, if an instruction that causes an event in which instruction input to the instruction scheduling mechanism is delayed, such as an instruction cache miss or a branch prediction miss, is selected as a division point, it can be easily imagined that mismatches are less likely to occur due to considerable overlapping execution. However, this is because the initial value of the virtual internal state of the scheduling simulation of each partial instruction sequence is substantially similar to the internal state when the branch prediction error occurs (for example, the pipeline is empty, etc.). Such division points can be found with high accuracy and reliability only by performing instruction simulation.

また、重複実行する部分命令列の長さ「ｋ」を大きくすると、スケジューリング機構の状態比較での状態不一致が生じにくくなり、小さくすると重複実行による時間損失が小さくなる。したがって、シミュレーション対象のマイクロプロセッサの構造、ノード数Ｐ、分割数ｐ、分割点の選択方法などに応じて、長さ「ｋ」を適切に設定することができる。 Further, if the length “k” of the partial instruction sequence to be executed in duplicate is increased, it becomes difficult to cause a state mismatch in the state comparison of the scheduling mechanism, and if it is reduced, time loss due to the duplicate execution is reduced. Therefore, the length “k” can be appropriately set according to the structure of the microprocessor to be simulated, the number of nodes P, the number of divisions p, the method of selecting division points, and the like.

≪命令スケジューリング機構の状態比較での状態不一致発生時の、別の処理について≫
上記では、命令スケジューリング機構の状態比較での状態不一致発生時には、Ｓ（ｉ）を担当するシミュレーションノードが、Ｓ（ｉ−１）を担当するシミュレーションノードから伝達された命令スケジューリング機構の状態Ｍ（ｔ_ｉ−１＋ｋ）を再度の初期状態として、Ｉ（ｓ_ｉ＋ｋ−１）以降のスケジューリングシミュレーションを再び行っている。このような不一致発生時には、Ｓ（ｉ−１）を担当するシミュレーションノードが、そのままＩ（ｔ_ｉ−１＋ｋ）＝Ｉ（ｓ_ｉ＋ｋ−１）以降Ｉ（ｔ_ｉ＋ｋ）までのスケジューリングシミュレーションを行ない、得られる状態Ｍ（ｔ_ｉ＋ｋ）をＳ（ｉ＋１）を担当するシミュレーションノードにＬＡＮ６を介して伝達する、という処理手順であってもよい。 ≪About another processing when a state mismatch occurs in the state comparison of the instruction scheduling mechanism≫
In the above description, when a state mismatch occurs in the state comparison of the instruction scheduling mechanism, the simulation node in charge of S (i) transmits the state M (t of the instruction scheduling mechanism transmitted from the simulation node in charge of S (i-1). The scheduling simulation after I (s _i + k−1) is performed again with _i−1 + k) as the initial state again. When such a mismatch occurs, the simulation node in charge of S (i-1) performs a scheduling simulation from I (t _i-1 + k) = I (s _i + k-1) to I (t _i + k) as it is. The processing procedure may be such that the state M (t _i + k) obtained is transmitted to the simulation node in charge of S (i + 1) via the LAN 6.

図４では、Ｓ（２）を担当するシミュレーションノードＰ_２が、状態Ｍ（ｔ_２＋ｋ）と状態Ｍ’（ｓ_３＋ｋ−１）との比較での不一致発生後、そのままＩ（ｔ_２＋ｋ）＝Ｉ（ｓ_３＋ｋ−１）以降Ｉ（ｔ_３＋ｋ）までのスケジューリングシミュレーションを行ない、状態状態Ｍ（ｔ_３＋ｋ）を得ていることを、（Ａ）の点線部分で模式的に示している。 In FIG. 4, the simulation node P _{2 in} charge of S (2) does not change the state M (t ₂ + k) and the state M ′ (s ₃ + k−1), and then continues as I (t ₂ + k ) = I (s ₃ + k-1) to I (t ₃ + k) and subsequent simulations are performed, and the state M (t ₃ + k) is obtained schematically by the dotted line part in (A). ing.

図５では、ある命令スケジューリングシミュレーションシステムにおいて、シミュレーションノードが４個（Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４）備わり、命令列が１６分割され、各シミュレーションノードにそのうちの４分割ずつ割り当てられることを、模式的に示す。最上部の命令列を示す直線にて、例えば（１）が付される分割は、Ｐ_１に割り当てられることを示している。ここで、上記の図４における、状態不一致発生時の処理手順を利用するならば、例えば、シミュレーションノードＰ_３での分割で不一致が生じた場合、シミュレーションノードＰ_２でその分割分を継続して実施することになる。よって、全体の処理時間を概観すれば、Ｐ_１，Ｐ_３及びＰ_４は、４分割分の処理で完結するが、Ｐ_２のみ一つの分割分、処理が長くなる。図５ではそれは点線矢印（Ｂ）で示される。表現を換えると、シミュレーションノード数に比べて、命令列の分割を相当数に多くすれば、上記状態不一致が発生しても全体の処理時間はあまり延長されない、ということになる。 In FIG. 5, in an instruction scheduling simulation system, four simulation nodes (P ₁ , P ₂ , P ₃ , P ₄ ) are provided, the instruction sequence is divided into 16 parts, and each of the simulation nodes is assigned with 4 parts of them. Is shown schematically. Split by a straight line that indicates the instruction sequence of the top, for example, (1) are attached, shows that assigned to P _1. Here, in FIG. 4 above, if use of state inconsistency at the time of occurrence procedure, for example, if a mismatch occurs in dividing the simulation node P _3, to continue the division partial simulation node P ₂ Will be implemented. Therefore, if the overall processing time is overviewed, P ₁ , P _3, and P ₄ are completed by the processing for four divisions, but only P ₂ is processed by one division. In FIG. 5, it is indicated by a dotted arrow (B). In other words, if the number of instruction strings is considerably increased compared to the number of simulation nodes, the overall processing time is not significantly extended even if the above-mentioned state mismatch occurs.

部分命令列Ｓ（１）を担当するシミュレーションノードと部分命令列Ｓ（２）を担当するシミュレーションノードとで重複してスケジューリングシミュレーションする様子、部分命令列Ｓ（２）を担当するシミュレーションノードと部分命令列Ｓ（３）を担当するシミュレーションノードとで重複してスケジューリングシミュレーションする様子、及び、部分命令列Ｓ（３）を担当するシミュレーションノードと部分命令列Ｓ（４）を担当するシミュレーションノードとで重複してスケジューリングシミュレーションする様子を模式的に示す。A state in which a scheduling simulation is duplicated between the simulation node in charge of the partial instruction sequence S (1) and the simulation node in charge of the partial instruction sequence S (2), the simulation node in charge of the partial instruction sequence S (2) and the partial instruction Duplicate scheduling simulation with the simulation node in charge of the sequence S (3), and overlap between the simulation node in charge of the partial command sequence S (3) and the simulation node in charge of the partial command sequence S (4) A schedule simulation is schematically shown. 部分命令列Ｓ（１）及び重複部分を担当するシミュレーションノードがＰ_１であり、部分命令列Ｓ（２）及び重複部分を担当するシミュレーションノードがＰ_２であり、部分命令列Ｓ（３）及び重複部分を担当するシミュレーションノードがＰ_３であり、部分命令列Ｓ（４）及び重複部分を担当するシミュレーションノードがＰ_４であることを、模式的に示す。Simulation node in charge of the partial instruction strings S (1) and overlapping portion is P _1, the simulation node in charge of the partial instruction strings S (2) and overlapping portion is P _2, partial instruction strings S (3) and simulation node in charge of the overlapping portion is P _3, the simulation node in charge of partial instruction strings S (4) and the overlapping portion is P _4, shown schematically. 一つのシミュレーションノードにおける命令スケジューリング機構の状態が、他のシミュレーションノードに伝達されそこでの状態と比較される処理が、順次実施されていく様子を模式的に示す。The state in which the state of the instruction scheduling mechanism in one simulation node is transmitted to another simulation node and compared with the state there is schematically shown. 部分命令列Ｓ（２）を担当するシミュレーションノードＰ_２が、状態Ｍ（ｔ_２＋ｋ）と状態Ｍ’（ｓ_３＋ｋ−１）との比較での不一致発生後、そのままＩ（ｔ_２＋ｋ）＝Ｉ（ｓ_３＋ｋ−１）以降Ｉ（ｔ_３＋ｋ）までのスケジューリングシミュレーションを行ない、状態Ｍ（ｔ_３＋ｋ）を得ていることを、点線部分で模式的に示す。After the simulation node P _{2 in} charge of the partial instruction sequence S (2) has generated a mismatch in the comparison between the state M (t ₂ + k) and the state M ′ (s ₃ + k−1), I (t ₂ + k) It is schematically shown by a dotted line portion that a scheduling simulation from I = s (s ₃ + k−1) to I (t ₃ + k) is performed and the state M (t ₃ + k) is obtained. ある命令スケジューリングシミュレーションシステムにおいて、シミュレーションノードが４個（Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４）備わり、命令列が３２分割され、各シミュレーションノードにそのうちの４分割ずつ割り当てられることを、模式的に示す。In an instruction scheduling simulation system, it is schematically shown that four simulation nodes (P ₁ , P ₂ , P ₃ , P ₄ ) are provided, an instruction sequence is divided into 32, and each of the simulation nodes is assigned with 4 of them. Shown in 本発明の好適な実施の形態に係る並列シミュレーションシステムが稼動する、ハードウエア構成の概略図である。1 is a schematic diagram of a hardware configuration in which a parallel simulation system according to a preferred embodiment of the present invention operates.

Explanation of symbols

２並列シミュレーションシステム、４コンピュータ、６ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）。

2 parallel simulation system, 4 computer, 6 LAN (Local Area Network).

Claims

A parallel simulation system for a microprocessor including a plurality of P (P is a natural number of 2 or more) simulation nodes connected by a predetermined network,
Division points from all instruction sequences in the simulation target system to p (p is a natural number and p ≧ P) partial instruction sequences are set, and each division instruction is used as an initial value for scheduling simulation of each partial instruction sequence. For a microprocessor in which a logical state and a virtual internal state of a microprocessor at a point are given, and scheduling simulation of each partial instruction sequence is performed by dividing them into P simulation nodes using these initial values. In a parallel simulation system,
For the i (1 ≦ i ≦ p−1) -th partial instruction sequence, a first number of instruction sequences of a certain number (k) further from the last instruction in the partial instruction sequence at the first simulation node that performs the scheduling simulation. Subsequent simulation means for performing scheduling simulation subsequently,
The state of the instruction scheduling mechanism in the first simulation node at the time when the scheduling simulation of a certain number of instruction sequences performed subsequently is finished is the (i + 1) th following the i-th partial instruction sequence. Transmission means for transmitting to the second simulation node for performing the scheduling simulation for the th partial instruction sequence;
The second simulation node performs a scheduling simulation of the instruction sequence from the top of the (i + 1) th partial instruction sequence to the predetermined number, and stores a state of the instruction scheduling mechanism at that time;
The state of the instruction scheduling mechanism transmitted by the transmission unit and the state of the instruction scheduling mechanism stored by the storage unit are compared to verify the validity of the scheduling simulation performed at the own node. A parallel simulation system for a microprocessor, comprising: comparison verification means in two simulation nodes .

The comparison verification unit in the second simulation node compares the state of the instruction scheduling mechanism transmitted by the transmission unit with the state of the instruction scheduling mechanism stored by the storage unit, and a mismatch has occurred. In case,
In the first simulation node, a second subsequent simulation that performs scheduling simulation for the subsequent instruction sequence to the end of the (i + 1) th partial instruction sequence and to the predetermined number of instruction sequences thereafter. Means,
The state of the instruction scheduling mechanism in the first simulation node at the time of the execution result of the second subsequent simulation means is the (i + 2) th (following the (i + 1) th partial instruction sequence) (provided that 1 at this time) The parallel transmission for the microprocessor according to claim 1, further comprising: a second transmission means for transmitting to a third simulation node for performing a scheduling simulation for the ≦ i ≦ p-2) th partial instruction sequence. Simulation system.

The comparison verification unit in the second simulation node compares the state of the instruction scheduling mechanism transmitted by the transmission unit with the state of the instruction scheduling mechanism stored by the storage unit, and a mismatch has occurred. In case,
The second simulation node replaces the state of the instruction scheduling mechanism with the state of the instruction scheduling mechanism transmitted by the transmission means, and then extends to the last instruction of the (i + 1) th partial instruction sequence. The parallel simulation system for a microprocessor according to claim 1, further comprising third subsequent simulation means for performing scheduling simulation again up to the predetermined number of subsequent instruction sequences .

An instruction in which all instruction sequences of a simulation target system are subjected to instruction simulation in advance, and an event in which an instruction input to the instruction scheduling mechanism is delayed at that time is set as a candidate for a division point. 4. A parallel simulation system for the microprocessor according to any one of 3 above.