JPH10260832A

JPH10260832A - Information processor

Info

Publication number: JPH10260832A
Application number: JP6432797A
Authority: JP
Inventors: Yoshio Miki; 良雄三木; Yuji Tsushima; 雄次對馬
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-03-18
Filing date: 1997-03-18
Publication date: 1998-09-29

Abstract

PROBLEM TO BE SOLVED: To shorten the schedule time to parallelly execute an instruction for successive execution. SOLUTION: An instruction interpretation circuit 204 in a processor 200 generates a group of interpretation instructions to branch blocks of a program and stores it in interpretation instruction memory 201 when a scalar processor 100 carries out the program. Each interpretation instruction includes the numbers of physical resources (a computing element, a register, etc.) of a transfer source and a transfer destination which are decided by an instruction scheduler 104 to each of plural instructions that are decided as parallelly executable by the scheduler 104. When the branch block is executed again later, an interpretation instruction string in the memory 201 is successively carried out. A transfer circuit 203 supplies data to an input of a computing element that is selected for an instruction when the computing element of the transfer source generates the data that is used for an operation which is requested by each interpretation instruction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術の分野】本発明は、逐次実行用のプ
ログラムの命令を並列に実行するスーパースカラー方式
の情報処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a superscalar information processing apparatus for executing instructions of a program for sequential execution in parallel.

【０００２】[0002]

【従来の技術】命令により制御される情報処理装置の処
理速度を向上させるために、命令列が潜在的に有する並
列実行性を利用する方法がある。例えば、マイク・ジョ
ンソン著“スーパースカラー・プロセッサ”日経ＢＰ出
版センタ、１９９４年、第１０４頁から第１１８頁およ
び第１２５頁から第１４４頁（以下、文献１と呼ぶ）に
示されているようなスーパースカラー方式では、複数の
演算器が１つのプロセッサ内に設けられ、複数の命令を
異なるの演算器を使って並列に実行する。並列実行可能
な命令は、命令実行ステージ以前のパイプラインステー
ジ、例えば命令デコードステージにおいて、複数の命令
が利用するハードウエア資源の競合や、命令実行に伴う
データの依存関係を調べることによって抽出される。2. Description of the Related Art In order to improve the processing speed of an information processing apparatus controlled by an instruction, there is a method that utilizes the parallel execution potential of an instruction sequence. For example, as shown in Mike Johnson's "Superscalar Processor", Nikkei BP Publishing Center, 1994, pp. 104-118 and 125-144 (hereinafter referred to as Reference 1). In the superscalar method, a plurality of arithmetic units are provided in one processor, and a plurality of instructions are executed in parallel using different arithmetic units. Instructions that can be executed in parallel are extracted by examining the competition of hardware resources used by a plurality of instructions and the dependence of data accompanying instruction execution in a pipeline stage before the instruction execution stage, for example, the instruction decode stage. .

【０００３】文献１の１２５ページ以降に記載されてい
るように、アウトオブオーダ方式も可能である。この方
法では、並列実行可能な命令を検索する範囲を広げ、並
列実行可能な命令を元の命令順序とは異なる順序で実行
する。As described on page 125 and subsequent pages of Document 1, an out-of-order method is also possible. In this method, the range of searching for instructions that can be executed in parallel is expanded, and instructions that can be executed in parallel are executed in an order different from the original instruction order.

【０００４】アウトオブオーダ実行を実現す方法とし
て、文献１の１０９頁に記載されているＴｏｍａｓｕｌ
ｏアルゴリズムが知られている。Ｔｏｍａｓｕｌｏアル
ゴリズムでは、ある命令のデコードが終了し、その命令
が利用する演算器や必要とする入力データが解読された
後に、その命令がリザベーションステーションと呼ばれ
る一種の命令バッファに格納される。リザベーションス
テーションは実行待ちの命令格納場所として用いられ、
演算器で実行終了した命令は、その終了をリザベーショ
ンステーション内の全ての命令に告知する。これによ
り、待ち状態にある命令は自分自身の実行可能性を調べ
ることができ、実行可能となった命令から順に演算器へ
送られる。その結果として元の命令列順序とは異なった
順序で命令を実行可能となる。２命令間の細かな依存関
係の利用法としては、ＨｅｎｎｅｓｙａｎｄＰａｔｔ
ｅｒｓｏｎ， "ＣｏｍｐｕｔｅｒＡｒｃｈｉｔｅｃｔ
ｕｒｅＡＱｕａｎｔｉｔａｔｉｖｅＡｐｐｒｏａｃｈ
（ｓｅｃｏｎｄｅｄｉｔｉｏｎ）"，Ｍｏｒｇａｎ
ＫａｕｆｍａｎｎＰｕｂｌｉｓｈｅｒｓ，Ｉｎｃ．１
９９５，ＰＰ．１４７（以下、文献２と呼ぶ）に記載さ
れているように、データフォワーディングが知られてい
る。この方法では、或るレジスタの値を入力データとす
る命令が存在した時に、そのレジスタの値が確定した後
に命令の実行を開始するのではなく、そのレジスタの値
を更新しようとしている演算器から直接に値をその命令
に転送し、レジスタの値が確定する前にその命令の実行
を開始する。[0004] As a method of realizing out-of-order execution, Tomasuul described on page 109 of Document 1 is described.
o Algorithms are known. In the Tomasulo algorithm, decoding of a certain instruction is completed, and after an arithmetic unit used by the instruction and necessary input data are decoded, the instruction is stored in a kind of instruction buffer called a reservation station. The reservation station is used as a storage location for instructions waiting to be executed.
The instruction that has been executed by the arithmetic unit notifies the end to all the instructions in the reservation station. Thus, the instruction in the waiting state can check its own feasibility, and is sent to the arithmetic unit sequentially from the executable instruction. As a result, the instructions can be executed in an order different from the original instruction sequence. As a method of using a fine dependency between two instructions, Hennesynd Patt
erson, "Computer Architecture
ureA Quantitative Approach
(Second edition) ", Morgan
Kaufmann Publishers, Inc. 1
995, PP. 147 (hereinafter referred to as reference 2), data forwarding is known. In this method, when there is an instruction using the value of a certain register as input data, the execution of the instruction is not started after the value of the register is determined, but from the arithmetic unit that is going to update the value of the register. Transfers the value directly to the instruction and starts executing the instruction before the register value is determined.

【０００５】このように、命令の実行に先立ってそれら
の命令間の関係を認識する方式は、命令が利用するハー
ドウエア資源を具体的に決定すること、必要なデータが
準備でき次第命令実行を開始することを制御方式として
実現している。しかし、この制御方式を実現するための
組み合わせ論理は大規模になる傾向が強く、ややもすれ
ば高速実行のために設けた回路によって、全体の実行時
間が消費されるという事態に陥る可能性もある。このよ
うな事態を回避するための一案として、ＶＬＩＷ（Ｖｅ
ｒｙＬｏｎｇＩｎｓｔｒｕｃｔｉｏｎＷｏｒｄ）
（文献２、第２７８頁−第２８９頁）が提案されてい
る。ＶＬＩＷではコンパイラによって並列実行可能な複
数の命令を予め抽出し、それらの命令を一つの命令語と
してまとめておく。これにより、上記の認識に必要な回
路規模を抑制することが可能となる。しかしながら、命
令語が変わるために全てのプログラムをコンパイルし直
す必要がある。[0005] As described above, the method of recognizing the relationship between the instructions prior to the execution of the instructions is to specifically determine the hardware resources used by the instructions and to execute the instructions as soon as necessary data is prepared. Starting is realized as a control method. However, the combinational logic for realizing this control method tends to be large-scale, and the circuit provided for high-speed execution may possibly consume the entire execution time. As one solution to avoid such a situation, VLIW (VeW
ry Long Instruction Word)
(Literature 2, pages 278 to 289) has been proposed. In the VLIW, a plurality of instructions that can be executed in parallel by a compiler are extracted in advance, and those instructions are put together as one instruction word. This makes it possible to reduce the circuit scale required for the above recognition. However, it is necessary to recompile all programs because the instruction word changes.

【０００６】[0006]

【発明が解決しようとする課題】上述のように、ＶＬＩ
Ｗは実行可能形式のバイナリーファイル資産を有効利用
できない。さらに、スーパーコンピュータや信号処理専
用プロセッサが対象とする、ループを多く使用する数値
演算プログラムなどでは、コンパイルによるスケジュー
リングが可能であるが、これらはごく一部の特殊例に過
ぎない。一般のプログラムでは、頻繁な分岐命令やメモ
リアクセスなど命令実行時間の動的変化が存在するた
め、コンパイラで静的にハードウエア利用順序を最適化
するには限度がある。そこで、汎用プロセッサとしては
スーパースカラー方式のように、ハードウエアを用いた
命令並列性の抽出および命令実行順序の決定、つまりス
ケジューリングが重要である。ところが、ハードウエア
を用いたスケジューリングが必要とする論理量が膨大な
ことから、このスケジューリング部分の動作速度がプロ
セッサ全体の動作速度を律速してしまうという問題があ
る。As described above, the VLI
W cannot make effective use of executable binary file assets. Further, in a supercomputer or a signal processing dedicated processor, a numerical operation program using a lot of loops can be scheduled by compiling, but these are only a few special examples. In a general program, there are dynamic changes in the instruction execution time such as frequent branch instructions and memory accesses, and therefore there is a limit in statically optimizing the order of hardware use by a compiler. Therefore, it is important for a general-purpose processor to extract instruction parallelism using hardware and determine an instruction execution order, that is, scheduling, as in the superscalar method. However, there is a problem that the operation speed of the scheduling portion determines the operation speed of the whole processor because the amount of logic required by the scheduling using hardware is enormous.

【０００７】また、リザベーションステーションでは、
明らかにまだ実行不可能な命令に対しても、実行可能性
のチェックが無駄に実施される。さらに演算器の数より
多くの命令が実行可能となった場合には、実行すべき命
令を選抜するなどの処理も必要である。この問題は、演
算器における命令実行の前段階での時間浪費につなが
る。In the reservation station,
Even for instructions that are obviously not executable, the feasibility check is performed wastefully. Further, when more instructions than the number of arithmetic units can be executed, it is necessary to perform processing such as selecting an instruction to be executed. This problem leads to a waste of time before the instruction is executed in the arithmetic unit.

【０００８】本発明の目的は、より少ないオーバヘッド
で複数の命令を並列に実行できる情報処理装置を提供す
るにある。An object of the present invention is to provide an information processing apparatus capable of executing a plurality of instructions in parallel with less overhead.

【０００９】本発明のより具体的な目的は、命令のスケ
ジューリングによる命令の実行速度の低下を低減できる
情報処理装置を提供することにある。A more specific object of the present invention is to provide an information processing apparatus capable of reducing a decrease in instruction execution speed due to instruction scheduling.

【００１０】本発明の他のより具体的な目的は、命令が
必要とするデータが準備でき次第、より少ない遅延でも
ってその命令の実行を開始できる情報処理装置を提供す
ることにある。Another object of the present invention is to provide an information processing apparatus capable of starting execution of an instruction with less delay as soon as data required by the instruction is prepared.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
に、本発明による情報処理装置には、複数の第１種の演
算器と複数の第１種のレジスタを含む複数の第１種の物
理資源と、実行すべきプログラムに含まれた複数の第１
種の命令の内、並列に実行可能な複数の第１種の命令を
選択し、それぞれの第１種の命令の実行に使用する複数
の第１種の物理資源をそれらの選択された複数の第１種
の命令に割り当て、それらの複数の第１種の命令のそれ
ぞれに割り当てられた物理資源を使用してそれらの第１
種の命令を並列に実行するように上記選択された複数の
第１種の命令の実行を制御する命令スケジューラとが設
けられ、これらを用いてスーパースカラーモードで上記
プログラムの第１種の命令が実行される。本発明で
は、上記命令スケジューラにより複数の第１種の命令が
並列に実行されるごとに、該並列に実行された複数の第
１種の命令の各々に対して上記命令スケジューラにより
割り当てられた複数の物理資源を識別可能な少なくとも
一つの第２種の命令を記憶するメモリと、複数の第２種
の演算器と複数の第２種のレジスタを含む複数の第２種
の物理資源と、上記プログラムが再度実行されたとき
に、そのプログラムに含まれた複数の第１種の命令に代
えて、該複数の第１種の命令に対応して上記メモリに記
憶された少なくとも一つの第２種の命令を実行する命令
実行回路とが設けられ、上記命令実行回路は、その第２
種の命令に対応する上記複数の第１種の命令の各々に対
して割り当てられた、上記一つの第２種の命令から識別
可能な複数の物理資源に対応する複数の第２種の物理資
源を使用してその第２種の命令を実行する。In order to achieve the above object, an information processing apparatus according to the present invention comprises a plurality of first type arithmetic units and a plurality of first type registers including a plurality of first type registers. Physical resources and a plurality of first resources included in the program to be executed
A plurality of first-type instructions that can be executed in parallel are selected from among the plurality of types of instructions, and a plurality of first-type physical resources used for executing the respective first-type instructions are assigned to the selected plurality of physical resources. Assigned to the first type instructions and using their assigned physical resources to allocate their first type instructions.
An instruction scheduler for controlling execution of the selected plurality of first-type instructions so as to execute the same type of instruction in parallel, and using the instruction scheduler to execute the first-type instructions of the program in the superscalar mode. Be executed. In the present invention, each time a plurality of first type instructions are executed in parallel by the instruction scheduler, the plurality of first type instructions executed in parallel are assigned to the plurality of first type instructions by the instruction scheduler. A memory for storing at least one second type instruction capable of identifying the physical resources of the above, a plurality of second type physical resources including a plurality of second type arithmetic units and a plurality of second type registers, When the program is executed again, at least one second type stored in the memory corresponding to the plurality of first type instructions instead of the plurality of first type instructions included in the program. And an instruction execution circuit for executing the instruction of
A plurality of second type physical resources corresponding to a plurality of physical resources identifiable from the one second type instruction assigned to each of the plurality of first type instructions corresponding to the first type instruction To execute the second type of instruction.

【００１２】この結果、上記プログラム内の第１種の命
令列を再度実行するときには、このメモリに記憶してあ
る第２種の命令を実行することで、元の第１種の命令の
実行時の命令の実行順序および使用する演算器等の物理
資源の情報を利用することになる。この結果、命令スケ
ジューラにより再度並列実行可能性の認識をする必要が
なく、第２種の命令列はより高速に実行できる。As a result, when re-executing the first type instruction sequence in the program, the second type instruction stored in the memory is executed to execute the original first type instruction at the time of execution. In this case, the information of the execution order of the instruction and the physical resources such as the arithmetic unit to be used is used. As a result, the instruction scheduler does not need to recognize the possibility of parallel execution again, and the second type of instruction sequence can be executed at higher speed.

【００１３】さらには、上記メモリおよび命令実行回路
の動作クロックを、上記命令スケジューラのそれよりも
高めることにより、第２種の命令列の実行速度を速める
ことも可能である。このためには、上記複数の第２種の
物理資源は、上記複数の第１種の物理資源とは別に設け
ることが望ましい。しかし、これらの物理資源を共通の
物理資源により実現することも可能である。Further, by increasing the operation clock of the memory and the instruction execution circuit to be higher than that of the instruction scheduler, the execution speed of the second type of instruction sequence can be increased. For this purpose, it is desirable that the plurality of second type physical resources be provided separately from the plurality of first type physical resources. However, it is also possible to realize these physical resources with a common physical resource.

【００１４】本発明のより具体的な態様では、各第２種
の命令は、使用する物理資源と命令実行に必要なデータ
を生成する物理資源の情報を持ち、それらの情報を位置
情報として対応づけられた第２種の命令の転送回路内
に、各第２種の命令の実行を待ちあわせる命令待ち合わ
せ回路が設けられる。これにより、リザーベーションス
テーションで起きていた、明らかに実行が不可能な命令
に対する実行可能性チェックや、実行直前での命令選抜
に必要な論理や時間の削減が可能となる。In a more specific aspect of the present invention, each second type instruction has information on a physical resource to be used and a physical resource for generating data necessary for executing the instruction, and the information is corresponded as position information. An instruction queuing circuit for waiting for execution of each second type instruction is provided in the attached second type instruction transfer circuit. As a result, it becomes possible to check the feasibility of an instruction that cannot be executed, which has occurred in the reservation station, and to reduce the logic and time required for selecting an instruction immediately before execution.

【００１５】[0015]

【発明の実施の形態】以下、本発明に係る情報処理装置
を、図面に示したいくつかの実施の形態を参照してさら
に詳細に説明する。なお、以下においては、同じ参照番
号は同じものもしくは類似のものを表わすものとする。
また、第２の実施の形態以降では、第１の実施の形態と
の相違点を主に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an information processing apparatus according to the present invention will be described in more detail with reference to some embodiments shown in the drawings. In the following, the same reference numerals represent the same or similar ones.
In the second and subsequent embodiments, differences from the first embodiment will be mainly described.

【００１６】＜発明の実施の形態＞（１）装置構成図１において、情報処理装置は、二つのプロセッサ１０
０と２００により構成される。１００は、逐次実行用に
作成されたプログラムから命令の並列実行可能性を抽出
し，個々のスカラ命令を並列に実行可能なスーパースカ
ラープロセッサである。スーパースカラープロセッサ１
００は、主記憶１０に接続された命令キャッシュ１０１
と、命令フェッチ回路１０２、命令デコーダ１０３、命
令スケジューラ１０４、複数の演算器１０５およびレジ
スタファイル１０７を有する。２００は、スーパースカ
ラープロセッサ１００により実行されたプログラムが再
度実行されるときに、そのプログラム内の命令をスーパ
ースカラープロセッサ１００よりも高速に実行するため
のプロセッサである。プロセッサ２００は、スーパース
カラープロセッサ１００より速いマシーンクロックで動
作するように構成されている。スーパースカラープロセ
ッサ１００において上記プログラムが最初に実行された
ときに、命令スケジューラ１０４によるスケジュールの
結果として、並列に実行される命令のアドレス３０１、
各命令に割り当てられた、演算器およびレジスタなどの
物理資源の番号、具体的には、その命令の実行結果デー
タを転送する転送先物理資源の番号３０２およびその命
令の実行に使用するデータを供給する転送元物理資源番
号３０３が供給される。このプロセッサ２００では、命
令翻訳回路２０４が、これらのスケジュール結果を反映
した、それらの命令と等価な処理を指定する命令（以下
これを翻訳命令と呼ぶ）を生成し、生成された翻訳命令
列を翻訳命令メモリ２０１に記憶する。<Embodiment of the Invention> (1) Apparatus Configuration In FIG. 1, an information processing apparatus comprises two processors 10
It is composed of 0 and 200. A superscalar processor 100 extracts parallel executable instructions from a program created for sequential execution and can execute individual scalar instructions in parallel. Super scalar processor 1
00 is the instruction cache 101 connected to the main memory 10
And an instruction fetch circuit 102, an instruction decoder 103, an instruction scheduler 104, a plurality of arithmetic units 105, and a register file 107. Reference numeral 200 denotes a processor for executing instructions in the program executed by the superscalar processor 100 at a higher speed than the superscalar processor 100 when the program is executed again. Processor 200 is configured to operate at a faster machine clock than superscalar processor 100. When the above program is first executed in the superscalar processor 100, as a result of scheduling by the instruction scheduler 104, addresses 301,
Supply the numbers of physical resources such as arithmetic units and registers assigned to each instruction, specifically, the number 302 of the destination physical resource to which the execution result data of the instruction is transferred and the data used for executing the instruction The transfer source physical resource number 303 is supplied. In this processor 200, the instruction translation circuit 204 generates an instruction (hereinafter, referred to as a translation instruction) that specifies a process equivalent to those instructions, reflecting the schedule results, and generates the generated translation instruction sequence. It is stored in the translation instruction memory 201.

【００１７】プロセッサ２００には、ここに記憶された
翻訳命令を実行するための複数の演算器２０５と、これ
らの演算器が使用するデータあるいはこれらの演算器が
生成した演算結果データを保持するための複数の一時レ
ジスタ２０６と、これらの一時レジスタ２０６とレジス
タファイル１０７との間でデータを交換するために使用
されるレジスタＩＯ２０７が設けられている。これらの
演算器２０５の数は、スーパースカラープロセッサ１０
０内の演算器１０５のそれと同じであり、一時レジスタ
２０６の数もレジスタファイル１０７に含まれた、命令
で指定可能なレジスタの数と同じである。もっとも、翻
訳命令で使用される物理資源数に依存してより少ない演
算器２０５，一時レジスタ２０６で構成することも可能
である。The processor 200 has a plurality of arithmetic units 205 for executing the translation instructions stored therein and holds data used by these arithmetic units or operation result data generated by these arithmetic units. , And a register IO 207 used for exchanging data between the temporary register 206 and the register file 107. The number of these arithmetic units 205 depends on the superscalar processor 10.
This is the same as that of the arithmetic unit 105 in 0, and the number of temporary registers 206 is also the same as the number of registers that can be specified by the instruction included in the register file 107. Of course, it is also possible to configure with fewer arithmetic units 205 and temporary registers 206 depending on the number of physical resources used in the translation instruction.

【００１８】翻訳命令メモリ２０１に記憶された翻訳命
令の各々は、複数のスカラ命令に関する命令情報を含
み、デコーダ２０２は、翻訳命令メモリ２０１から読み
出された翻訳命令を解読し、転送回路２０３は、デコー
ダ２０２により与えられる命令解読情報が指定する複数
の演算の各々が必要とするデータが上記複数の演算器２
０５のいずれかから供給されるのに同期して、その演算
を開始する。複数の演算器２０５は、複数のスカラ命令
が要求する演算を並列に実行するとともに、スーパース
カラープロセッサ１００内の複数の演算器１０５よりも
高速に演算を実行可能に構成されている。Each of the translation instructions stored in translation instruction memory 201 includes instruction information on a plurality of scalar instructions, decoder 202 decodes the translation instruction read from translation instruction memory 201, and transfer circuit 203 , The data required by each of the plurality of operations specified by the instruction decoding information provided by the decoder 202 is the plurality of arithmetic units 2
05, the calculation is started in synchronism with the supply from any one of the components. The plurality of operation units 205 are configured to execute operations requested by the plurality of scalar instructions in parallel and to execute the operation at a higher speed than the plurality of operation units 105 in the superscalar processor 100.

【００１９】転送回路２０３は、命令スケジューラ１０
４と同じく実行すべき命令が実行可能になるのを待ち、
実行可能になった時点でその命令を、いずれかの演算器
に分配することを基本的な機能としている。しかし、転
送回路２０３は、命令スケジューラ１０４と異なり、後
述するように、どの先行命令の実行終了を待機すべきか
については転送経路中の位置として既にデコードされた
状態となる。このために、転送回路２０３を用いた命令
スケジューリングでは実質的に命令実行可能判定や実行
順序を制御する回路が不要となり、命令スケジューラ１
０４よりも高速処理を実行できる。The transfer circuit 203 includes the instruction scheduler 10
Wait for the instruction to be executed as in step 4,
The basic function is to distribute the instruction to any of the computing units when it becomes executable. However, the transfer circuit 203 is different from the instruction scheduler 104 in that, as described later, which of the preceding instructions should wait for the execution end is already decoded as a position on the transfer path. For this reason, the instruction scheduling using the transfer circuit 203 substantially eliminates the need for a circuit for judging the executable instruction and controlling the execution order.
04 can be executed at higher speed.

【００２０】このように、プロセッサ２００は、スーパ
ースカラープロセッサ１００内命令スケジューラ１０４
のスケジュール結果を使用して、従って、このようなス
ケジュールを実行するよりも速いクロックで命令を実行
する。なお、二つのプロセッサ１００、２００は必ずし
も物理的に近接している必要性はないが、信号線３０１
〜３０５が通常のプロセッサ内部信号に匹敵する動作速
度を必要とするため、例えば同一シリコン基板上に形成
されるなどが好ましい。As described above, the processor 200 controls the instruction scheduler 104 in the superscalar processor 100.
To execute instructions at a faster clock than executing such a schedule. Note that the two processors 100 and 200 do not necessarily need to be physically close to each other, but the signal line 301
305 require an operation speed comparable to that of a normal processor internal signal, and are preferably formed on the same silicon substrate, for example.

【００２１】（２）命令の実行態様スーパースカラープロセッサ１００内の命令フェッチ回
路１０２は、命令デコーダ１０３へ送出した命令のアド
レスを命令アドレス３００として翻訳命令記憶領域２１
１にも供給する。図２を参照するに、プロセッサ２００
では、この命令に対する翻訳命令が翻訳命令メモリ２０
１に記憶されているかを検索する（ステップ２２１）。
もし、その命令アドレスに一致する命令ラインが翻訳命
令メモリ２０１にあると（ステップ２２２）、そのライ
ンに記憶された翻訳命令を読み出す（ステップ２２
３）。翻訳命令デコーダ２０２は、読み出された翻訳命
令をデコードし、その命令に含まれた転送元物理資源の
情報を転送回路２０３内部の位置情報へ変換する（ステ
ップ２２４）。転送回路２０３は実行に必要なデータが
揃った命令から順に演算器２０５，一時レジスタ２０
６、レジスタＩＯ２０７へデータを転送する。データを
受け取った演算器２０５は演算結果を再び転送回路２０
３へ戻し，演算命令が順次実行される（ステップ２２
５）。ステップ２２２で翻訳命令記憶領域２１１内に命
令アドレス３００と同一の命令開始アドレスが発見され
なかった場合には、プロセッサ２００は作動せず、スー
パースカラープロセッサ１００による命令実行が実施さ
れる。すなわち、命令デコーダ１０３にて命令がデコー
ドされ（２２６）、命令スケジューラ１０４で命令の実
行順序が決定された（２２７）後、演算器１０５にて命
令が実行される（２２８）。(2) Instruction execution mode The instruction fetch circuit 102 in the superscalar processor 100 uses the address of the instruction sent to the instruction decoder 103 as the instruction address 300 as the translated instruction storage area 21
1 is also supplied. Referring to FIG.
Then, the translation instruction for this instruction is stored in the translation instruction memory 20.
1 is searched (step 221).
If there is an instruction line corresponding to the instruction address in the translation instruction memory 201 (step 222), the translation instruction stored in that line is read (step 22).
3). The translation instruction decoder 202 decodes the read translation instruction and converts the information on the source physical resource included in the instruction into the position information inside the transfer circuit 203 (step 224). The transfer circuit 203 sequentially operates the arithmetic unit 205 and the temporary register 20 from an instruction having data necessary for execution.
6. Transfer the data to the register IO207. The arithmetic unit 205 having received the data transmits the operation result again to the transfer circuit 20.
3 and the operation instructions are sequentially executed (step 22).
5). When the same instruction start address as the instruction address 300 is not found in the translation instruction storage area 211 in step 222, the processor 200 does not operate and the superscalar processor 100 executes the instruction. That is, the instruction is decoded by the instruction decoder 103 (226), the instruction execution order is determined by the instruction scheduler 104 (227), and then the instruction is executed by the arithmetic unit 105 (228).

【００２２】（３）スーパースカラープロセッサ１００このプロセッサの概要は以下の通りである。命令キャッ
シュ１０１に格納された命令は、命令フェッチ回路１０
２にてフェッチされ、命令デコーダ１０３に送り込まれ
る。命令デコーダ１０３は命令語が示す入出力レジスタ
の番号や演算種別の情報を解読する。命令フェッチ回路
１０２，命令デコーダ１０３は、いずれも複数の命令
（一般的なスーパースカラープロセッサの能力として４
〜８命令）を同時に処理できるものとする。デコーダ１
０３は、いわゆるレジスタリネーミング技法を用いて、
プログラムで指定可能な論理的レジスタ番号をプロセッ
サ内部にある多量な物理レジスタ番号に変換する機能を
持っていても良い。(3) Superscalar processor 100 The outline of this processor is as follows. The instructions stored in the instruction cache 101 are stored in the instruction fetch circuit 10.
2 and is sent to the instruction decoder 103. The instruction decoder 103 decodes information of an input / output register number and an operation type indicated by the instruction word. Each of the instruction fetch circuit 102 and the instruction decoder 103 includes a plurality of instructions (4
~ 8 instructions) can be processed simultaneously. Decoder 1
03 uses the so-called register renaming technique,
It may have a function of converting a logical register number that can be specified by a program into a large number of physical register numbers inside the processor.

【００２３】命令スケジューラ１０４は命令デコーダ１
０３から送られてきた複数の命令について、命令が利用
する演算器の割り当てと命令実行順序の決定を行う。よ
り具体的には、本実施の形態では、分岐命令処理ユニッ
トも一つの演算器として扱われる。複数の演算器１０５
には、整数演算器、浮動小数点演算器、分岐命令処理ユ
ニットのように機能が異なるものおよび機能が同一の複
数の演算器が混在している。演算器の個数はプロセッサ
で並列実行可能な命令数を制限するが、本実施の形態
は、特定の数の演算器に限定はされない。いずれかの演
算器１０５に対して複数の入力データラッチが存在する
場合には、それぞれの入力データラッチに対して一意に
識別可能な番号が付けられているものとする。この番号
としては、例えばその入力データラッチに供給すべき演
算結果データを出力する演算器の識別番号の下位ビット
側に入力データラッチの番号を付加したものを利用でき
る。The instruction scheduler 104 includes the instruction decoder 1
With respect to a plurality of instructions sent from 03, the assignment of arithmetic units used by the instructions and the order of execution of the instructions are determined. More specifically, in the present embodiment, the branch instruction processing unit is also treated as one arithmetic unit. Multiple arithmetic units 105
, There are a plurality of arithmetic units having different functions, such as an integer arithmetic unit, a floating-point arithmetic unit, and a branch instruction processing unit, and a plurality of arithmetic units having the same function. Although the number of arithmetic units limits the number of instructions that can be executed in parallel by the processor, the present embodiment is not limited to a specific number of arithmetic units. If there are a plurality of input data latches for any of the arithmetic units 105, it is assumed that each input data latch is assigned a uniquely identifiable number. As this number, for example, a number obtained by adding the number of the input data latch to the lower bit side of the identification number of the arithmetic unit that outputs the operation result data to be supplied to the input data latch can be used.

【００２４】命令の実行順序は命令実行に必要な入力デ
ータが揃った命令から順に実行することが原則となる。
この制御には例えばＴｏｍａｓｕｌｏアルゴリズムを利
用することができる。命令スケジューラ１０４には実行
完了を待つべき先行命令が存在する命令が停留し、命令
実行に必要な入力データが全て揃った命令は演算器１０
５へ送出される。命令が完了するとレジスタの値が確定
したという意味で値の確定したレジスタ番号がレジスタ
ファイル１０７から命令スケジューラ１０４へ伝えられ
る。命令スケジューラ１０４は、停留している命令の中
に、伝えられた番号のレジスタを入力値として使用する
命令があるか否か、入力値が全て揃った命令があるかを
調べ、そのような命令があればその命令をいずれかの演
算器１０５へ送出する。In principle, the order of execution of instructions is to execute them in order from the instruction having input data necessary for executing the instruction.
For this control, for example, the Tomasulo algorithm can be used. In the instruction scheduler 104, an instruction having a preceding instruction to wait for completion of execution is stalled, and the instruction having all input data necessary for executing the instruction is processed by the arithmetic unit 10
5 is sent. When the instruction is completed, the register number having the determined value is transmitted from the register file 107 to the instruction scheduler 104 in the sense that the register value is determined. The instruction scheduler 104 checks whether there is an instruction that uses the register of the transmitted number as an input value among the stopped instructions, and whether there is an instruction having all input values. If there is, the instruction is sent to any of the arithmetic units 105.

【００２５】以上説明した命令スケジューラ１０４の動
作によって命令の実行順序が動的に決定され、その結果
として各命令が使用した演算器、および命令の入力値と
なるデータを演算した演算器が決定できる。したがっ
て、演算器やレジスタなどの物理資源を識別する番号を
物理資源番号と定義すると、命令の動的な実行履歴はあ
る物理資源からある物理資源へのデータ転送の履歴とし
て見ることができる。命令スケジューラ１０４は、演算
器に送付した命令の命令アドレスである実行命令アドレ
ス３０１、その命令によって起こるデータ転送の転送先
と転送元をそれぞれ転送先物理資源３０２、転送元物理
資源３０３などの命令実行履歴として命令翻訳回路２０
４に送出する。The execution order of instructions is dynamically determined by the operation of the instruction scheduler 104 described above, and as a result, an arithmetic unit used by each instruction and an arithmetic unit that has operated data serving as an input value of the instruction can be determined. . Therefore, if a number for identifying a physical resource such as an arithmetic unit or a register is defined as a physical resource number, a dynamic execution history of an instruction can be viewed as a history of data transfer from a certain physical resource to a certain physical resource. The instruction scheduler 104 executes an instruction such as an execution instruction address 301 which is an instruction address of an instruction sent to the arithmetic unit, and a destination physical resource 302 and a source physical resource 303 for a transfer destination and a transfer source of data transfer caused by the instruction. Instruction translation circuit 20 as history
4

【００２６】以下では、スーパースカラープロセッサ１
００の回路の詳細をさらに説明する。命令フェッチ回路
１０２は、フェッチした命令のアドレス３００を命令デ
コーダ１０３とプロセッサ２００に供給する。このよう
にプロセッサ２００が実行すべき命令のアドレスを命令
フェッチ回路１０２より取り出すと、プロセッサ２００
が実行すべき命令列をできるだけ早い時期に取り出すこ
とになり、それ以降のスーパースカラープロセッサ１０
０がその命令に対して処理を実行するのを回避すること
ができる。なお、実装面積等の制約により命令アドレス
３００を命令フェッチ回路１０２より取り出すことがで
きない場合には、命令デコーダ１０３や命令スケジュー
ラ１０４、演算器１０５といった命令実行ステージによ
り近い場所から命令アドレス３００を取り出すことも可
能である。命令フェッチ回路１０２は命令列の命令キャ
ッシュ１０１からの命令の獲得を滞りなく実行するため
に、分岐命令の予測機能、命令バッファ機能等によっ
て、実行可能性のある命令は先行的かつ余分に獲得する
それ自体公知の機能を有する。したがって、命令フェッ
チ回路１０２から命令デコーダ１０３へ送出された命令
は情報処理装置で実行することが確定した命令列と考え
ることができる。In the following, the superscalar processor 1
The details of the circuit of 00 will be further described. The instruction fetch circuit 102 supplies the address 300 of the fetched instruction to the instruction decoder 103 and the processor 200. When the address of the instruction to be executed by the processor 200 is fetched from the instruction fetch circuit 102, the processor 200
Fetches the instruction sequence to be executed as early as possible, and the subsequent superscalar processor 10
0 can perform no processing on the instruction. If the instruction address 300 cannot be fetched from the instruction fetch circuit 102 due to restrictions on the mounting area or the like, the instruction address 300 should be fetched from a place closer to the instruction execution stage, such as the instruction decoder 103, the instruction scheduler 104, or the computing unit 105. Is also possible. The instruction fetch circuit 102 obtains an executable instruction in advance and extra by a branch instruction prediction function, an instruction buffer function, and the like in order to execute the acquisition of the instruction sequence from the instruction cache 101 without delay. It has a function known per se. Therefore, the instruction sent from the instruction fetch circuit 102 to the instruction decoder 103 can be considered as an instruction sequence determined to be executed by the information processing device.

【００２７】本実施の形態では図４に示したフォーマッ
トの命令を用いる。この命令フォーマットはいわゆるＲ
ＩＳＣプロセッサで用いられる典型例であり、レジスタ
番号等は特別な解釈なしに、命令のビットフィールドを
分割するだけで得ることができる。Ｘフォーマットは一
般の算術論理演算命令に用いられるフォーマットであ
り、冒頭の命令種を表すオペコード（ＯＰＣＤ）フィー
ルド、演算結果が格納されるレジスタ番号であるターゲ
ットレジスタフィールド（ＲＴ）、演算の入力数値が格
納されるレジスタ番号として二つのオペランドレジスタ
フィールド（ＲＡ，ＲＢ）および拡張機能情報フィール
ド（ＥＯ）から構成される。Ｄフォーマットはロード、
ストア命令のフォーマットである。オペコードフィール
ド、ターゲットレジスタフィールド、オペランドレジス
タフィールドはＸフォーマットと同様の機能を持ち、デ
ィスプレースメントフィールド（Ｄ）はロード、ストア
命令のアクセスするメモリ番地計算用の加算値を格納す
る。メモリ番地はオペランドレジスタ（ＲＡ）の値とデ
ィスプレースメントフィールドの値の加算値となる。オ
ペコード（ＯＰＣＤ）フィールドから命令実行で利用さ
れる演算器の種類が判別でき、レジスタに関するＲＴ，
ＲＡ，ＲＢフィールドから、命令が利用するレジスタが
決定される。In this embodiment, an instruction in the format shown in FIG. 4 is used. This instruction format is called R
This is a typical example used in an ISC processor, and a register number and the like can be obtained by merely dividing a bit field of an instruction without special interpretation. The X format is a format used for a general arithmetic and logic operation instruction, and includes an operation code (OPCD) field indicating an instruction type at the beginning, a target register field (RT) which is a register number storing an operation result, and an input numerical value of the operation. The stored register number is composed of two operand register fields (RA, RB) and an extended function information field (EO). Load D format,
This is the format of the store instruction. The operation code field, the target register field, and the operand register field have the same function as the X format, and the displacement field (D) stores an added value for calculating a memory address accessed by a load or store instruction. The memory address is the sum of the value of the operand register (RA) and the value of the displacement field. From the operation code (OPCD) field, the type of the arithmetic unit used in the execution of the instruction can be determined.
The registers used by the instruction are determined from the RA and RB fields.

【００２８】図３において、スーパースカラープロセッ
サ１００内の命令デコーダ１０３は、命令フェッチ回路
１０２から送られてきた命令を図４に例示した命令フォ
ーマットにしたがって分割する。つまり、ラッチ１１０
〜１１３はそれぞれオペコード（ｏｐｃｄ），ターゲッ
トレジスタ（ＲＴ）番号１１７，オペランドレジスタ
（ＲＡ）番号１１８と、最後にオペランドレジスタ（Ｒ
Ｂ）の番号またはディスプレースメント（Ｄ）の値を保
持する。オペコードデコード回路１１５は、ラッチ１１
０が保持するオペコードをデコードし、オペランドレジ
スタ（ＲＢ）を使用するか否かの判定信号１１４と命令
で使用する演算器の物理リソース番号である演算器番号
（ＦＵ）１１６とを生成する。シフト回路１２２は判定
信号１１４がＨｉｇｈレベルのとき、命令フィールドの
１６−２０ビットを下位２７−３１ビットにシフトし、
上位１６−２０ビットは１を埋め直し、拡張データ（Ｅ
Ｘ）１１９として出力する。判定信号１１４がＬｏｗの
ときには、命令フィールドの１６−３１ビットのディス
プレースメント情報がそのまま拡張データ（ＥＸ）１１
９として出力される。In FIG. 3, the instruction decoder 103 in the superscalar processor 100 divides the instruction sent from the instruction fetch circuit 102 according to the instruction format illustrated in FIG. That is, the latch 110
To 113 are an operation code (opcd), a target register (RT) number 117, an operand register (RA) number 118, and finally an operand register (R
B) or the value of the displacement (D). The operation code decode circuit 115 is connected to the latch 11
It decodes the operation code held by 0 and generates a determination signal 114 as to whether or not to use the operand register (RB) and an operation unit number (FU) 116 which is a physical resource number of the operation unit used in the instruction. When the determination signal 114 is at a high level, the shift circuit 122 shifts the 16-20 bits of the instruction field to the lower 27-31 bits,
The upper 16-20 bits are refilled with 1 and the extension data (E
X) Output as 119. When the determination signal 114 is Low, the displacement information of 16-31 bits of the instruction field is directly stored in the extension data (EX) 11.
9 is output.

【００２９】命令スケジューラ１０４は、書き込みスコ
アボード１２０、読み出しスコアボード１２１および命
令実行バッファ０〜命令実行バッファ３を用いて先に述
べた概念の命令スケジューリングを実行する。書き込み
スコアボード１２０と読み出しスコアボード１２１はど
ちらもレジスタ番号をアドレス情報とするメモリであ
る。書き込みスコアボード１２０はメモリデータとして
各レジスタ番号毎に２ビットの領域を持ち、書き込もう
としているレジスタを読み出そうとしている先行命令が
命令実行バッファ０から命令実行バッファ３までに幾つ
存在するかを示すカウンタとなる。読み出しスコアボー
ド１２１はメモリデータとして各レジスタ番号毎に１ビ
ットの領域を持ち、演算器で実行中の命令の中に読み出
そうとしているレジスタ内容を更新しようとする命令が
あるか否かを示す。命令実行バッファ０から命令実行バ
ッファ３は先入れ先出し（ＦＩＦＯ）キューを構成して
おり、命令デコーダ１０３の出力である、演算器番号１
１６，ターゲットレジスタ番号１１７，オペランドレジ
スタ番号１１８，拡張データ１１９の各信号と、命令フ
ェッチ回路１０２からの命令アドレス１２５を保持す
る。命令選択回路１２３は、上記のＦＩＦＯキューに格
納されている命令の内、実行可能で、最もＦＩＦＯキュ
ーに入ってから時間の経つものを選択する。The instruction scheduler 104 executes the above-described concept of instruction scheduling using the write scoreboard 120, the read scoreboard 121, and the instruction execution buffers 0 to 3. Both the write scoreboard 120 and the read scoreboard 121 are memories using register numbers as address information. The write scoreboard 120 has a 2-bit area for each register number as memory data, and determines how many preceding instructions from the instruction execution buffer 0 to the instruction execution buffer 3 to read the register to be written exist. Counter. The read scoreboard 121 has a 1-bit area for each register number as memory data, and indicates whether or not there is an instruction to update the register contents to be read among the instructions being executed by the arithmetic unit. . The instruction execution buffer 0 to the instruction execution buffer 3 constitute a first-in first-out (FIFO) queue, and the operation unit number 1 which is the output of the instruction decoder 103
16, the target register number 117, the operand register number 118, the extension data 119, and the instruction address 125 from the instruction fetch circuit 102. The instruction selection circuit 123 selects, from among the instructions stored in the above-mentioned FIFO queue, the one that is executable and that has passed the most time since entering the FIFO queue.

【００３０】命令スケジューリングの詳細は次の通り、
命令デコーダ１０３から送られてきた各レジスタ番号の
情報は書き込みスコアボード１２０と読み出しスコアボ
ード１２１とに入力される。その間に命令は命令実行バ
ッファ０に取り込まれる。ただし，ヒット信号２４２が
Ｈｉｇｈのときは翻訳命令が実行されるため，その間命
令デコーダ１０３から送られてくる命令のスケジューリ
ングおよび以降の動作はキャンセルされる。上記両スコ
アボードの出力値が０のときは、命令で使用するターゲ
ットレジスタもオペランドレジスタも先行する命令によ
って使用中ではないことになる。つまり当該命令は実行
可能であることがわかる。この段階で実行可能でない命
令は、いずれかのレジスタの状態が変化するのを待つ必
要があり、後続の命令が命令デコーダ１０３から到着す
るに従ってＦＩＦＯキューのより深い位置（命令実行バ
ッファ１〜３）へ進む。命令実行バッファ０〜３には常
に書き込みスコアボード１２０と読み出しスコアボード
１２１の出力信号とレジスタファイル１０７からの演算
終了信号１２４が放送回路１２５を介して通達される。
この通達により、オペランドレジスタへの書き込みが終
了し、かつ先行命令によるターゲットレジスタの読み出
しが完了した命令が実行可能となる。The details of instruction scheduling are as follows:
Information of each register number sent from the instruction decoder 103 is input to the write scoreboard 120 and the read scoreboard 121. In the meantime, the instruction is taken into the instruction execution buffer 0. However, when the hit signal 242 is High, the translation instruction is executed, so that scheduling of the instruction sent from the instruction decoder 103 and subsequent operations are canceled during that time. When the output values of both scoreboards are 0, it means that neither the target register nor the operand register used by the instruction is being used by the preceding instruction. That is, it is understood that the instruction is executable. Instructions that are not executable at this stage need to wait for a change in the state of any of the registers, and as subsequent instructions arrive from instruction decoder 103, deeper positions in the FIFO queue (instruction execution buffers 1-3). Proceed to. The output signals of the write scoreboard 120 and the read scoreboard 121 and the operation end signal 124 from the register file 107 are always notified to the instruction execution buffers 0 to 3 via the broadcast circuit 125.
With this notification, the instruction in which the writing to the operand register has been completed and the reading of the target register by the preceding instruction has been completed can be executed.

【００３１】命令選択回路１２３は命令実行バッファ０
〜３の中で実行可能となった命令を取り出し、オペラン
ドレジスタの番号をレジスタファイル読み出し回路１２
６へ、命令アドレス、ターゲットレジスタ番号、演算器
番号、拡張データを命令発行回路１２７へ送出する。レ
ジスタファイル読み出し回路１２６ではオペランドレジ
スタの内容をレジスタファイル１０７（図１）から読み
出し、オペランドデータとして命令発行回路１２７へ送
り出す。The instruction selection circuit 123 has an instruction execution buffer 0
３3, the executable instruction is fetched and the number of the operand register is registered in the register file reading circuit 12.
6, the instruction address, the target register number, the arithmetic unit number, and the extended data are sent to the instruction issuing circuit 127. The register file read circuit 126 reads the contents of the operand register from the register file 107 (FIG. 1) and sends out the operand data to the instruction issuing circuit 127 as operand data.

【００３２】命令発行回路１２７は演算器番号で識別さ
れる演算器１０５にターゲットレジスタ番号、オペラン
ドデータ、拡張データを送り出す。書き込みスコアボー
ド１２０のメンテナンス動作としては、命令デコーダ１
０３から新規に命令実行バッファ０に到着した命令のオ
ペランドレジスタ（ＲＡ，ＲＢ両方）の番号が書き込み
スコアボード１２０のセット端子（Ｓ）に送られ、該当
レジスタのカウンタ値がインクリメントされる。また、
実行可能となった命令のオペランドレジスタ（ＲＡ，Ｒ
Ｂ両方）の番号はレジスタファイル読み出し回路１２６
から書き込みスコアボード１２０のリセット（Ｒ）端子
に送られ、該当レジスタのカウンタ値が０になる。読み
出しスコアボード１２１はレジスタ更新をする命令が発
行されたときに命令発行回路１２７からターゲットレジ
スタ番号の通知をセット端子（Ｓ）に受け、更新中を示
すフラグ１を立てる。このフラグは演算終了信号１２４
によってリセットされる。先に述べた翻訳命令の概念か
ら実行可能となった命令の命令アドレスが実行命令アド
レス３０１として、演算器番号とターゲットレジスタ番
号とが転送先物理資源番号３０２として、オペランドレ
ジスタ番号と拡張データが転送元物理資源番号３０３と
して命令翻訳回路２０４へ送られる。以上の命令スケジ
ューリング回路の動作は，仮に命令デコーダ１０３の動
作にパイプライン１ステージ分の時間がかかるとする
と，少なくとも２ステージ以上の時間を要する処理とな
る。The instruction issuing circuit 127 sends out a target register number, operand data, and extension data to the arithmetic unit 105 identified by the arithmetic unit number. The maintenance operation of the write scoreboard 120 includes the instruction decoder 1
The number of the operand register (both RA and RB) of the instruction newly arriving at the instruction execution buffer 0 from 03 is sent to the set terminal (S) of the write scoreboard 120, and the counter value of the register is incremented. Also,
The operand register (RA, R
B) is the register file reading circuit 126
Is sent to the reset (R) terminal of the write scoreboard 120, and the counter value of the corresponding register becomes 0. When an instruction to update a register is issued, the read scoreboard 121 receives a notification of the target register number from the instruction issuing circuit 127 at the set terminal (S), and sets a flag 1 indicating that updating is in progress. This flag is used as the operation end signal 124
Reset by The instruction address of an instruction that can be executed from the concept of the translation instruction described above is transferred as the execution instruction address 301, the operation unit number and the target register number are transferred as the destination physical resource number 302, and the operand register number and the extended data are transferred. It is sent to the instruction translation circuit 204 as the original physical resource number 303. The operation of the instruction scheduling circuit described above is a process that requires at least two stages or more if the operation of the instruction decoder 103 takes one pipeline stage.

【００３３】（４）プロセッサ２００（４ａ）プログラム例以下の説明では図５に示したプログラム例を用いる。図
５のプログラム例は５種の命令から構成されており、ｌ
ｆｄ命令５０１は浮動小数点数値を６番レジスタに格納
されている数値とディスプレースメントである８で示さ
れるメモリ番地からロードし、浮動小数点レジスタ１に
格納する。ａｉ命令５０２は６番レジスタの値に４を加
算し、再び６番レジスタの値とする。ｆｍ命令５０３は
浮動小数点レジスタ１番と２番に格納された数値の積を
再び浮動小数点レジスタ１番に格納し、ｆｃｍｐ命令５
０４は浮動小数点レジスタ１番の数値が０番の数値より
も大きいか小さいかを比較し、その比較結果を条件レジ
スタ６番（ＣＲ６）に格納する。最後のｂｃ命令５０５
は分岐命令であり、ｆｃｍｐ命令５０４の結果、浮動小
数点レジスタ１番の数値が０番のレジスタの数値よりも
小さいとき、ラベル＿Ｌ１０に分岐する。(4) Processor 200 (4a) Program Example In the following description, the program example shown in FIG. 5 is used. The program example of FIG. 5 is composed of five types of instructions.
The fd instruction 501 loads the floating-point value from the value stored in the sixth register and the memory address indicated by the displacement 8 and stores it in the floating-point register 1. The ai instruction 502 adds 4 to the value of the sixth register and sets the value of the sixth register again. The fm instruction 503 stores again the product of the numbers stored in the floating-point registers 1 and 2 in the floating-point register 1 again.
04 compares whether the numerical value of the floating-point register No. 1 is larger or smaller than the numerical value of No. 0, and stores the comparison result in the condition register No. 6 (CR6). Last bc instruction 505
Is a branch instruction, and as a result of the fcmp instruction 504, when the value of the floating-point register No. 1 is smaller than the value of the No. 0 register, branch to the label_L10.

【００３４】この命令列の参照、更新するレジスタと命
令の関係をグラフ理論的に図示したものが図６である。
この図は命令実行過程を概念的に表すものであり、本実
施の形態の装置内の機構と直接対応するものではない。
図６からわかるように、グラフ節点６０１，６０２，６
０３で示すレジスタは先行命令で更新された後、ただち
に後続の命令で参照されており、もし先行命令が更新す
るレジスタと値を必要としている演算器の両方へ同時に
実行結果を転送可能であると、全体の実行時間を短縮で
きる。このような実行制御方法はフォワーディングと呼
ばれ、命令スケジューラ１０４が実行前の命令列を解読
することによって、転送路を決定する。つまり、スーパ
ースカラープロセッサ１００で図５の命令列を実行する
と、実行履歴は図７に示したグラフに対応することにな
る。FIG. 6 is a graph theoretically showing the relationship between the instruction sequence referring and updating registers and the instruction.
This diagram conceptually shows the instruction execution process, and does not directly correspond to the mechanism in the apparatus of the present embodiment.
As can be seen from FIG. 6, the graph nodes 601, 602, 6
After the register indicated by 03 is updated by the preceding instruction, it is immediately referred to by the subsequent instruction.If the register indicated by the preceding instruction can simultaneously transfer the execution result to both the register to be updated and the arithmetic unit requiring the value, , The overall execution time can be reduced. Such an execution control method is called forwarding, and the instruction scheduler 104 determines a transfer path by decoding an instruction sequence before execution. That is, when the superscalar processor 100 executes the instruction sequence shown in FIG. 5, the execution history corresponds to the graph shown in FIG.

【００３５】上記のフォワーディングにより、演算結果
は最短経路で転送するようにスケジューリングされてお
り、図７の上方および下方には図５の命令列に対する入
力値を格納したレジスタ番号（レジスタ名）と出力結果
が格納されたレジスタ番号（レジスタ名）が並ぶ。この
グラフで命令種を示す節点を演算器と読みかえると、グ
ラフの枝はレジスタと演算器または演算器と演算器のデ
ータ転送を表していることになる。By the above-mentioned forwarding, the operation result is scheduled to be transferred by the shortest path. The register numbers (register names) storing the input values for the instruction sequence in FIG. The register numbers (register names) in which the results are stored are arranged. If a node indicating an instruction type in this graph is read as an arithmetic unit, the branch of the graph indicates data transfer between the register and the arithmetic unit or between the arithmetic unit and the arithmetic unit.

【００３６】（４ｂ）分岐ブロックの生成本実施の形態での翻訳命令は、この枝が表すデータ転送
の集合であり、枝の横に枝を識別するために付した丸数
字が翻訳命令の最小単位である。また、図７において波
線で囲んだブロックが示すように図５に示した一群の命
令列は、入力値となるブロック６７０、演算内容となる
ブロック６７１、演算結果となるブロック６７２に分類
することが可能である。このことはブロック６７１を一
つのマクロ命令として考えたときに、ブロック６７０が
その実行に対して初期化条件に、ブロック７６２が終了
条件に対応していると考えられる。以上の概念に基づ
き、命令翻訳回路２０４では次の手順に従って翻訳命令
を作成する。(4b) Generation of Branch Block The translation instruction in the present embodiment is a set of data transfer represented by this branch, and the circle number attached to the side of the branch to identify the branch is the minimum of the translation instruction. Is a unit. Also, as indicated by blocks surrounded by broken lines in FIG. 7, the group of instructions shown in FIG. 5 can be classified into a block 670 as an input value, a block 671 as an operation content, and a block 672 as an operation result. It is possible. This means that when considering block 671 as one macroinstruction, block 670 corresponds to an initialization condition for its execution and block 762 corresponds to an end condition. Based on the above concept, the instruction translation circuit 204 creates a translation instruction according to the following procedure.

【００３７】入力信号は先に説明した実行命令アドレス
３０１，転送先物理資源番号３０２，転送元物理資源番
号３０３であり、出力は図１１に示した翻訳命令３種
類、すなわち初期化命令７０１，翻訳命令本体７０２，
終了命令７０３である。また、翻訳命令は一組の初期化
命令７０１と終了命令７０３と、それらに挟まれた１つ
または複数個の翻訳命令本体７０２で１単位を成す。最
初に、翻訳命令生成回路２０４は１命令実行毎に送られ
てくる入力信号を次の基準で分岐ブロックに分割する。
この分岐ブロックに含まれる複数の命令が翻訳命令の１
単位に対応する。The input signals are the execution instruction address 301, the transfer destination physical resource number 302, and the transfer source physical resource number 303 described above. The outputs are the three types of translation instructions shown in FIG. Instruction body 702,
This is an end instruction 703. The translation instruction forms one unit by a set of an initialization instruction 701, an end instruction 703, and one or a plurality of translation instruction bodies 702 sandwiched therebetween. First, the translation instruction generation circuit 204 divides an input signal sent every time one instruction is executed into branch blocks according to the following criteria.
A plurality of instructions included in this branch block correspond to one of the translated instructions.
Corresponds to the unit.

【００３８】翻訳命令生成回路２０４は転送先物理資源
番号３０２を観測し、その転送先が分岐命令処理ユニッ
トであるとき分岐命令が実行されることを認識する。分
岐ブロックは分岐命令と分岐命令に挟まれる命令列であ
り、図５に示した例のように、非分岐命令から始まり、
分岐命令で終了する。ただし、ここでは命令実行結果に
基づく動的な命令実行順序であるので、例えばアセンブ
ラ言語として図５の命令５０１の一つ上が分岐命令であ
るか否かは無関係である。つまり、プログラム中ある分
岐命令によって実行アドレスがかわり、命令５０１にジ
ャンプした場合、命令５０１から命令５０５までが一つ
の分岐ブロックを形成する。また、仮に分岐命令５０５
の条件が成立せず命令５０１に分岐しなくても、分岐命
令５０５の次に実行される命令から新しい分岐ブロック
が開始する。したがって、この定義からプログラム中の
ある命令が複数の分岐ブロックに属する場合もあり得
る。さらにプログラム全体は複数の分岐ブロックにて形
成されているので，実行されたプログラム全体が順次翻
訳命令に変換されると考えられる。The translation instruction generation circuit 204 observes the transfer destination physical resource number 302 and recognizes that the branch instruction is executed when the transfer destination is the branch instruction processing unit. A branch block is an instruction sequence sandwiched between a branch instruction and a branch instruction, and starts with a non-branch instruction as in the example shown in FIG.
End with a branch instruction. However, since the dynamic instruction execution order is based on the instruction execution result here, it does not matter whether or not one of the instructions 501 in FIG. 5 is a branch instruction as an assembler language, for example. In other words, when the execution address is changed by a certain branch instruction in the program and jumps to the instruction 501, the instructions 501 to 505 form one branch block. Also, if the branch instruction 505 is
Is not satisfied and the branch to the instruction 501 is not taken, a new branch block is started from the instruction executed next to the branch instruction 505. Therefore, from this definition, an instruction in a program may belong to a plurality of branch blocks. Further, since the entire program is formed by a plurality of branch blocks, it is considered that the entire executed program is sequentially converted into a translation instruction.

【００３９】（４ｃ）翻訳命令翻訳命令生成の機能は二つの機能に大別される。一つは
翻訳命令本体７０２の生成である。図１１に示すよう
に、翻訳命令本体７０２のフィールドはデータ転送の転
送先と転送元から構成されており、スーパースカラープ
ロッセサ１００から出力される転送先物理資源番号３０
２を，転送先フィールド７０４に転送元物理資源番号３
０３を格納すればよい。翻訳命令本体７０２の命令長は
有限であり、その命令長を越える場合には、新しい翻訳
命令本体７０２を別途生成する。大別した機能の２番目
は初期化命令７０１および終了命令７０３の生成であ
る。命令翻訳回路２０４には、後に説明するように、図
１２に示したスコアボード７５１と意味的に等価なもの
が備えられている。スコアボードはスーパースカラープ
ロセッサ１００に内蔵されている個々のレジスタについ
て、そのレジスタに書き込み動作を実施した演算器の番
号と読み出し動作を実施した演算器番号が記録される。
なお、分岐ブロックが変わった際には、スコアボード全
体が０クリアされる。この番号の記録はスーパースカラ
ープロセッサ１００の命令実行順序に従うため、分岐ブ
ロックが終了した時点では最近に読み出し、あるいは書
き込みを実施した演算器の番号が記録されている。(4c) Translation Instruction The function of generating a translation instruction is roughly classified into two functions. One is generation of a translation instruction body 702. As shown in FIG. 11, the field of the translation instruction body 702 includes the transfer destination and the transfer source of the data transfer, and the transfer destination physical resource number 30 output from the superscalar processor 100.
2 in the transfer destination field 704 and the transfer source physical resource number 3
03 may be stored. The instruction length of the translation instruction body 702 is finite, and if it exceeds the instruction length, a new translation instruction body 702 is separately generated. The second of the roughly classified functions is generation of an initialization instruction 701 and an end instruction 703. As will be described later, the instruction translation circuit 204 includes an instruction translation circuit that is semantically equivalent to the scoreboard 751 shown in FIG. The scoreboard records, for each register incorporated in the superscalar processor 100, the number of the computing unit that has performed the writing operation and the computing unit number that has performed the reading operation in the register.
When the branch block is changed, the entire scoreboard is cleared to zero. Since the recording of this number follows the instruction execution order of the superscalar processor 100, the number of the arithmetic unit that has recently read or written is recorded when the branch block ends.

【００４０】本実施の形態では翻訳命令を実行するプロ
セッサがスーパースカラープロセッサ１００とは独立し
て存在し、レジスタとしては一時レジスタ２０６を用い
る。そのために、一時レジスタ２０６の初期化が必要で
ある。翻訳命令生成回路２０４はスコアボード７５１の
中で読み出しはされるが、書き込みが行われていないレ
ジスタを抽出し、初期化命令７０１を生成する。つま
り、上記の条件を満たすレジスタは図７のブロック６７
０のように、着目分岐ブロック外で値が設定されている
レジスタである。初期化命令７０１の転送先フィールド
はプロセッサ２００の内部に設けられた一時レジスタの
番号であり、転送元はレジスタＩＯ２０７の物理資源番
号である。レジスタＩＯ２０７はスーパースカラープロ
セッサ１００のレジスタファイル１０７と一時レジスタ
２０６間のデータ転送機能を有する。なお、初期化命令
７０１の命令開始アドレスには分岐ブロックの先頭命令
のアドレスが格納される。In this embodiment, a processor for executing a translation instruction exists independently of the superscalar processor 100, and a temporary register 206 is used as a register. Therefore, the temporary register 206 needs to be initialized. The translation instruction generation circuit 204 extracts a register that is read out but not written in the scoreboard 751, and generates an initialization instruction 701. That is, the register satisfying the above condition is the block 67 in FIG.
A register whose value is set outside the branch block of interest, such as 0. The transfer destination field of the initialization instruction 701 is the number of a temporary register provided inside the processor 200, and the transfer source is the physical resource number of the register IO207. The register IO 207 has a data transfer function between the register file 107 of the superscalar processor 100 and the temporary register 206. The instruction start address of the initialization instruction 701 stores the address of the head instruction of the branch block.

【００４１】終了命令７０３も同様にスコアボード７５
１を参照して生成される。終了命令７０３は分岐ブロッ
クが終了した際の結果を一時レジスタ２０６からレジス
タファイル１０７に格納するために存在する。終了命令
７０３の転送先フィールドはレジスタＩＯ２０７の物理
資源番号であり、転送先フィールドは分岐ブロックが終
了した時点でスコアボード７５１に書き込み実績が残っ
ているレジスタ番号である。以上のようにして生成され
た翻訳命令は分岐ブロック終了と同時に、翻訳命令メモ
リ２０１内の書き込み回路２１４へ転送される。以上が
翻訳命令の生成の概略説明である。Similarly, the end instruction 703 is sent to the scoreboard 75.
1 is generated with reference to FIG. The end instruction 703 exists to store the result when the branch block ends from the temporary register 206 to the register file 107. The transfer destination field of the end instruction 703 is the physical resource number of the register IO 207, and the transfer destination field is the register number that has been written on the scoreboard 751 when the branch block ends. The translation instruction generated as described above is transferred to the writing circuit 214 in the translation instruction memory 201 at the same time when the branch block ends. The above is a brief description of the generation of a translation instruction.

【００４２】（４ｄ）命令翻訳回路２０４図８に示すように、命令翻訳回路２０４は、スコアボー
ド７５１ａ，７５１ｂと初期化命令バッファ１３０，翻
訳命令本体バッファ１３１，終了命令バッファ１３２等
から構成される。スコアボード７５１ａと７５１ｂは、
それぞれ図１２に示したスコアボードのＲｅａｄ部とＷ
ｒｉｔｅ部に記載された情報を保持するスコアボードで
あり、それぞれこれらのＲｅａｄ部とＷｒｉｔｅ部に保
持された演算器番号を保持する。分岐検出回路１３３は
演算器番号が分岐命令処理ユニットであるとき、リセッ
ト信号１３４と翻訳命令送出要求信号１３３ａを生成す
る。これにより、先に述べた分岐ブロックが認識でき
る。リセット信号１３４を受けたスコアボード７５１
ａ，７５１ｂは、それぞれの記憶内容を全て０クリアす
る。(4d) Instruction Translation Circuit 204 As shown in FIG. 8, the instruction translation circuit 204 includes scoreboards 751a and 751b, an initialization instruction buffer 130, a translation instruction body buffer 131, an end instruction buffer 132 and the like. . Scoreboards 751a and 751b
The read section and W of the scoreboard shown in FIG.
This is a scoreboard that holds information described in the write section, and holds the arithmetic unit numbers held in the read section and the write section, respectively. When the operation unit number is the branch instruction processing unit, the branch detection circuit 133 generates a reset signal 134 and a translation instruction transmission request signal 133a. Thus, the above-described branch block can be recognized. Scoreboard 751 receiving reset signal 134
a and 751b clear all stored contents to zero.

【００４３】翻訳命令本体の生成は命令スケジューラ１
０４から実行命令アドレス３０１、転送先物理資源番号
３０２、転送元物理資源番号３０３が送られる毎に実施
される。つまり、図１２を用いて説明した概念のとお
り、転送元の物理リソースは、スコアボード７５１ｂに
演算器番号が格納されていれば、その演算器であるし、
演算器番号がスコアボード７５１ｂに格納されていなけ
ればオペランドレジスタが転送元となる。その選択は選
択回路１３５が実施する。また、転送先はターゲットレ
ジスタと演算器のいずれかになる。したがって、２オペ
ランド１ターゲットの命令からは最大３組の転送元と転
送先が生成され、それらは翻訳命令バッファ１３１に格
納する。初期化命令と終了命令はリセット信号１３４に
同期して生成される。リセット信号１３４は０検出回路
１３６ａと１３６ｂに入力され、それぞれのスコアボー
ド内の格納値が０であるレジスタ番号を出力する。スコ
アボード７５１ａの０検出回路１３６ａの出力は、終了
命令の転送先を指定し、終了命令バッファ１３２へ格納
される。スコアボード７５１ｂの０検出回路１３６ｂ
の出力は、条件判定回路１３７においてスコアボード７
５１ａの出力結果と合成される。条件判定回路１３７の
動作内容は、０検出回路１３６ｂから出力された書き込
み実績の無いレジスタ番号のうち、読み出し実績のある
レジスタ番号を選別することである。条件判定回路１３
７から出力されるレジスタ番号は初期化，すなわちレジ
スタファイル１０７の内容を一時レジスタ２０６へ転送
する必要の番号である。プロセッサ２００の初期化命令
はレジスタＩＯ２０７を通してレジスタファイル１０７
の内容が転送されるので，転送元がレジスタＩＯ２０
７，転送先が一時レジスタ２０６となる。この初期化命
令は初期化命令バッファ１３０へ格納される。最後に翻
訳命令送出回路１３８は初期化命令、翻訳命令本体、終
了命令をこの順で結合し、翻訳命令メモリ２０１へ出力
する。The translation instruction body is generated by the instruction scheduler 1
The process is executed every time the execution instruction address 301, the transfer destination physical resource number 302, and the transfer source physical resource number 303 are sent from 04. That is, as in the concept described with reference to FIG. 12, the physical resource of the transfer source is the arithmetic unit if the arithmetic unit number is stored in the scoreboard 751b.
If the arithmetic unit number is not stored in the scoreboard 751b, the operand register is the transfer source. The selection is performed by the selection circuit 135. The transfer destination is either the target register or the arithmetic unit. Therefore, a maximum of three sets of transfer sources and transfer destinations are generated from the instruction with two operands and one target, and these are stored in the translation instruction buffer 131. The initialization command and the termination command are generated in synchronization with the reset signal 134. The reset signal 134 is input to the 0 detection circuits 136a and 136b, and outputs a register number whose stored value in each scoreboard is 0. The output of the 0 detection circuit 136a of the scoreboard 751a specifies the transfer destination of the end instruction and is stored in the end instruction buffer 132. 0 detection circuit 136b of scoreboard 751b
Is output by the condition determination circuit 137 to the scoreboard 7
It is combined with the output result of 51a. The operation of the condition determination circuit 137 is to select a register number having a read record among register numbers having no write record output from the 0 detection circuit 136b. Condition judgment circuit 13
The register number output from 7 is a number required for initialization, that is, for transferring the contents of the register file 107 to the temporary register 206. The initialization instruction of the processor 200 is transmitted to the register file 107 through the register IO207.
Is transferred, the transfer source is the register IO20
7. The transfer destination is the temporary register 206. This initialization instruction is stored in the initialization instruction buffer 130. Finally, the translation instruction sending circuit 138 combines the initialization instruction, the translation instruction body, and the end instruction in this order, and outputs the result to the translation instruction memory 201.

【００４４】（４ｅ）翻訳命令メモリ２０１図９を参照するに、翻訳命令メモリ２０１は、書き込み
回路２１４と、翻訳命令記憶領域２１１と、読み出し回
路２１３とからなる。なお、図には翻訳命令デコーダ２
０２の内部構造も併せて示す。翻訳命令記憶領域２１１
は、ＭＯＳトランジスタからなる複数のメモリセル２３
３および２３４と、付随する制御回路（Ａ）２３５と、
制御回路（Ｒ）２３６およびその他の回路とから構成さ
れるメモリアレイである。翻訳命令記憶領域２１１はよ
り多くのメモリセルが繰り返し存在し、全体としてのセ
ルアレイを構成しているが、それらのメモリセルは簡単
化のために図示していない。翻訳命令記憶領域２１１
は、翻訳命令に含まれる命令開始アドレスを格納する複
数のメモリセル（Ｍ１）２３３と、転送元となる物理資
源番号を格納するための複数のメモリセル（Ｍ）２３４
とから構成されている。それぞれのメモリセルは１ビッ
トのセルで、それらの情報を保持するには翻訳命令記憶
領域２１１に設けられた複数ビット分のメモリセルが使
用されるが、以下では、簡単化のため必要なビット数分
全てのメモリセル２３３，２３４は図示しない。また便
宜上、水平方向に並ぶメモリセル群をライン、垂直方向
に並ぶメモリセル群をカラムと呼ぶ。(4e) Translation Command Memory 201 Referring to FIG. 9, the translation command memory 201 includes a writing circuit 214, a translation command storage area 211, and a reading circuit 213. The figure shows the translation instruction decoder 2
02 is also shown. Translation instruction storage area 211
Are a plurality of memory cells 23 composed of MOS transistors
3 and 234, and an associated control circuit (A) 235;
This is a memory array including a control circuit (R) 236 and other circuits. Although the translation instruction storage area 211 has a larger number of memory cells repeatedly and forms a cell array as a whole, these memory cells are not shown for simplicity. Translation instruction storage area 211
Are a plurality of memory cells (M1) 233 for storing an instruction start address included in a translation instruction and a plurality of memory cells (M) 234 for storing a physical resource number as a transfer source.
It is composed of Each memory cell is a one-bit cell. To hold the information, memory cells for a plurality of bits provided in the translation instruction storage area 211 are used. All the memory cells 233 and 234 for several minutes are not shown. For convenience, a group of memory cells arranged in a horizontal direction is called a line, and a group of memory cells arranged in a vertical direction is called a column.

【００４５】書き込み回路２１４は、命令翻訳回路２０
４から線２１９を介して供給される翻訳命令がまだこの
翻訳命令メモリ２０１に記憶されていないときには、こ
の翻訳命令を翻訳命令記憶領域２１１に格納する。その
際、その翻訳命令が、図１１に示した初期化命令７０１
であるときには、その命令７０１は、翻訳命令記憶領域
２１１内のいずれか一つのラインに記憶される。この命
令内の各転送元は、転送先となり得る複数の物理資源に
一対一に対応し、その対応する物理資源に対して定めら
れた水平方向の記憶位置に記憶される。すなわち、その
翻訳命令７０１中の複数の転送先の各々に対応する水平
方向の位置に、その転送先と対をなす転送元を記憶す
る。このために、書き込み回路２１４では、カラムデコ
ーダ２７０が翻訳命令中の各転送先をデコードし、翻訳
命令記憶領域２１１のｘ方向位置を算出する。書き込み
増幅器２７１が、算出された位置にその転送先と対をな
す転送元データを書き込む。命令翻訳回路２０４から順
次与えられる複数の翻訳命令は、翻訳命令記憶領域２１
１内のまだ命令が記憶されていないカラムに順次書き込
まれる。このような書き込みは、書き込み制御回路２１
２の制御下で行われる。翻訳命令によっては同じ転送先
を複数含む場合がある。このような翻訳命令を翻訳命令
記憶領域２１１に格納するときには、転送元情報を複数
のラインに分けて翻訳命令記憶領域２１１に記憶する必
要がある。その結果、それらの翻訳命令では、図１３と
異なって、転送元が稠密に詰まってはいない状態として
記憶される。The write circuit 214 is provided for the instruction translation circuit 20
When the translation instruction supplied from line 4 via the line 219 is not stored in the translation instruction memory 201 yet, the translation instruction is stored in the translation instruction storage area 211. At this time, the translation instruction is the initialization instruction 701 shown in FIG.
When, the instruction 701 is stored in any one line in the translation instruction storage area 211. Each transfer source in this instruction corresponds one-to-one with a plurality of physical resources that can be transfer destinations, and is stored in a horizontal storage position defined for the corresponding physical resource. That is, the transfer source paired with the transfer destination is stored in the horizontal position corresponding to each of the plurality of transfer destinations in the translation instruction 701. For this purpose, in the write circuit 214, the column decoder 270 decodes each transfer destination in the translation instruction, and calculates the position of the translation instruction storage area 211 in the x direction. The write amplifier 271 writes the source data paired with the destination at the calculated position. A plurality of translation instructions sequentially given from the instruction translation circuit 204 are stored in the translation instruction storage area 21.
Instructions are sequentially written to columns in which no instruction has been stored. Such writing is performed by the write control circuit 21.
2 is performed. Depending on the translation command, the same transfer destination may be included. When storing such a translation command in the translation command storage area 211, it is necessary to store the transfer source information in the translation command storage area 211 by dividing it into a plurality of lines. As a result, these translation instructions are stored in a state where the transfer source is not densely packed, unlike in FIG.

【００４６】命令翻訳回路２０４から与えられた命令が
初期化命令７０２のときには、転送先がデコードされ，
翻訳命令７０２Ａとして翻訳命令記憶領域２１１へ格納
される。命令翻訳回路２０４から与えられたこの翻訳命
令が終了命令７０３であるときには、その命令が指定す
る最後の転送元を翻訳命令記憶領域２１１へ書き込んだ
ラインのメモリセル（Ｍ１）２３３には全て１が書き込
まれる。なお、翻訳回路２０４から与えられた命令がこ
の翻訳メモリ２０１に記録されているか否かは、後に説
明するように、ヒット信号２４２により書き込み制御回
路２１２に伝えられる。もし、この翻訳命令がヒットし
なかったときにはこのヒット信号２４２がＬｏｗにな
る。When the instruction given from instruction translation circuit 204 is initialization instruction 702, the transfer destination is decoded,
The translation instruction is stored in the translation instruction storage area 211 as the translation instruction 702A. When this translation instruction given from the instruction translation circuit 204 is the end instruction 703, all 1s are stored in the memory cell (M1) 233 of the line in which the last transfer source specified by the instruction has been written into the translation instruction storage area 211. Written. Whether or not the instruction given from the translation circuit 204 is recorded in the translation memory 201 is transmitted to the write control circuit 212 by a hit signal 242 as described later. If the translation instruction does not hit, the hit signal 242 becomes Low.

【００４７】各メモリセル（Ｍ１）２３３は、内容の一
致検索機能を持ち、命令フェッチ回路１０２が生成する
命令アドレス３００とそのメモリセル内に保持している
データの一致照合を行う。ある、ラインの命令開始アド
レス、つまりメモリセル（Ｍ１）２３３の内容が全て一
致した場合には、信号線２３０の電位がＬｏｗとなり、
制御回路（Ａ）２３５が読み出しライン選択信号２１６
をＨｉｇｈに引き上げる。読み出しライン選択信号２１
６がＨｉｇｈになると該当するラインのメモリセルは、
メモリセル（Ｍ１）２３３も，メモリセル（Ｍ）２３４
も保持していた論理値をデータ出力線２１８へ出力す
る。この読み出し動作に関してカラム方向の選択は行わ
ないため、１ライン分のデータが読み出し回路２１３に
到達する。つまり、翻訳メモリ２０１は命令開始アドレ
スをタグ情報とした一種のＣＡＭ（Ｃｏｎｔｅｎｔｓ
ＡｓｓｏｃｉａｔｉｖｅＭｅｍｏｒｙ）として機能す
る。Each memory cell (M1) 233 has a content matching search function, and performs a matching check between the instruction address 300 generated by the instruction fetch circuit 102 and the data held in the memory cell. If the instruction start address of a certain line, that is, all the contents of the memory cell (M1) 233 match, the potential of the signal line 230 becomes Low,
The control circuit (A) 235 outputs the read line selection signal 216
Is raised to High. Read line selection signal 21
When 6 becomes High, the memory cell of the corresponding line becomes:
The memory cell (M1) 233 is also a memory cell (M) 234.
Is output to the data output line 218. Since no selection in the column direction is performed for this read operation, data for one line reaches the read circuit 213. That is, the translation memory 201 is a kind of CAM (Contents) using the instruction start address as tag information.
It functions as an Associate Memory.

【００４８】最も基本的なメモリセル（Ｍ）２３４は、
図１０（ｂ）に示すように、トランジスタＴ１とトラン
ジスタＴ２のゲート間容量に電荷を蓄積する。書き込み
ライン選択信号２１５がＨｉｇｈのとき、トランジスタ
Ｔ１が開き、データ入力線２１７の値がこのゲート間容
量に記憶される。読み出しライン選択信号２１６がＨｉ
ｇｈになると、トランジスタＴ３が開き、記憶していた
情報が負論理でデータ出力線２１８に現れる。データ出
力線２１８は、カラム毎に設けられたプリチャージ用ト
ランジスタ２３８（図９）によって、メモリセルの読み
出し前にＨｉｇｈレベルへ引き上げられる。The most basic memory cell (M) 234 is
As shown in FIG. 10B, charges are stored in the capacitance between the gates of the transistor T1 and the transistor T2. When the write line selection signal 215 is High, the transistor T1 opens, and the value of the data input line 217 is stored in this inter-gate capacitance. Read line selection signal 216 is Hi
At gh, the transistor T3 opens and the stored information appears on the data output line 218 in negative logic. The data output line 218 is pulled up to a high level by a precharge transistor 238 (FIG. 9) provided for each column before reading the memory cell.

【００４９】メモリセル（Ｍ１）２３３は、図１０
（ａ）に示すように、トランジスタＴ４，Ｔ５，Ｔ６は
それぞれメモリセル（Ｍ）２３４のトランジスタＴ１，
Ｔ２，Ｔ３に対応し、同等な役割を持つ。セル内の２つ
のインバータはそれぞれデータ入力線２１７と命令アド
レス３００の反転信号を生成するために存在するが、こ
れらをセル内に記載したのは図面の記載の簡単化のため
であり、本来は、カラム毎にこれらの一組のインバータ
が存在すればよい。トランジスタＴ７とＴ４およびトラ
ンジスタＴ８とＴ９の結線にはそれぞれ書き込んだデー
タの正論理値と負論理値の両方が記憶され、トランジス
タＴ１０のゲートには、このセルに記憶されているデー
タと命令アドレス３００の排他的ＮＯＲの論理値が最終
的に生成される。つまり、照会情報として与えられた命
令開始アドレスがこのセルに記憶されている情報と一致
した場合、信号線２３０はこのメモリセル（Ｍ１）２３
３内で導通状態となる。The memory cell (M1) 233 corresponds to FIG.
As shown in (a), the transistors T4, T5 and T6 are respectively the transistors T1 and T1 of the memory cell (M) 234.
Corresponds to T2 and T3 and has an equivalent role. The two inverters in the cell exist for generating the data input line 217 and the inverted signal of the instruction address 300, respectively. However, these are described in the cell for the sake of simplification of the drawing, and originally, It is only necessary that one set of these inverters exist for each column. Both the positive logical value and the negative logical value of the written data are stored in the connections of the transistors T7 and T4 and the transistors T8 and T9, and the data stored in this cell and the instruction address 300 are stored in the gate of the transistor T10. Is finally generated. That is, when the instruction start address given as the inquiry information matches the information stored in this cell, the signal line 230 connects to the memory cell (M1) 23
3 becomes conductive.

【００５０】制御回路（Ａ）２３５は、図１０（ｄ）に
示す回路からなる。クロックの立ち上がりより前は、ト
ランジスタT１４が信号線２３０を電源電位にプリチャ
ージする。先に述べたように、メモリセル（Ｍ１）２３
３内のトランジスタＴ１０が全て導通しているときは、
クロックの立ち上がり以降、信号線２３０はＬｏｗとな
る。読み出し回路２１３が与える強制的読み出し要求
（Ｉｎｓｔｒｕｃｔｉｏｎｒｅｑｕｅｓｔ）２３７
は、後述する動作の時以外にはＬｏｗレベルである。こ
のため、メモリセル（Ｍ１）２３３で検出された一致
は、ゲートｇ１を経て読み出しライン選択信号２１６を
Ｈｉｇｈに設定することになる。このようにして、翻訳
命令記憶領域２１１からは、命令アドレス３００に一致
したラインのデータがデータ出力線２１８に現れる。The control circuit (A) 235 comprises the circuit shown in FIG. Before the rising of the clock, the transistor T14 precharges the signal line 230 to the power supply potential. As described above, the memory cell (M1) 23
When all the transistors T10 in 3 are conducting,
After the rise of the clock, the signal line 230 goes low. Forced read request (Instruction request) 237 given by the read circuit 213
Is at a low level except during the operation described below. Therefore, the match detected in the memory cell (M1) 233 sets the read line selection signal 216 to High via the gate g1. In this manner, from the translation instruction storage area 211, the data of the line corresponding to the instruction address 300 appears on the data output line 218.

【００５１】制御回路（Ｒ）２３６は、リフレッシュ制
御２４０，ラインデコーダ２３９と協調してメモリセル
のリフレッシュを実施する回路で、図１０（ｃ）に示す
構造を有する。リフレッシュ時にはリフレッシュ制御２
４０がリフレッシュ信号２３２をＨｉｇｈレベルにした
状態で通常の読み出し動作を実行する。つまり、ライン
デコーダ２３９がリフレッシュするラインの読み出しラ
イン選択信号２１６をＨｉｇｈにする。制御回路（Ｒ）
２３６のトランジスタＴ１１とＴ１２のゲート間にデー
タが保持される。次にラインデコーダ２３９は書き込み
ライン選択信号２１５をＨｉｇｈレベルに上げ、リフレ
ッシュ制御２４０がリフレッシュ信号２３２をＬｏｗに
することで、制御回路（Ｒ）２３６に保持されていたデ
ータはプリチャージインバータを経て元のラインに書き
戻される。また、ＯＲゲート２４１は全読み出しライン
選択信号の論理的ＯＲをとることにより、先に述べた命
令開始アドレスの検索におけるヒット信号２４２を生成
する。The control circuit (R) 236 is a circuit for refreshing the memory cells in cooperation with the refresh control 240 and the line decoder 239, and has a structure shown in FIG. Refresh control 2 when refreshing
40 performs a normal read operation with the refresh signal 232 at a high level. That is, the read line selection signal 216 of the line to be refreshed by the line decoder 239 is set to High. Control circuit (R)
Data is held between the gates of the 236 transistors T11 and T12. Next, the line decoder 239 raises the write line select signal 215 to High level, and the refresh control 240 changes the refresh signal 232 to Low, so that the data held in the control circuit (R) 236 passes through the precharge inverter and returns to the original state. Is written back to the line. The OR gate 241 generates a hit signal 242 in the above-described search of the instruction start address by performing a logical OR of all the read line selection signals.

【００５２】こうして、命令翻訳回路２０４から初めて
翻訳メモリ２０１に供給された命令に対してもこのヒッ
トチェックが行われ、その命令がヒットしなかったとき
には先に記載したように、その命令がこの翻訳メモリ２
０１に書き込まれる。さらに、すでにその命令が翻訳メ
モリ２０１に書き込まれた後に、再度実行されたときに
は、その命令が属する命令ブロックの先頭の命令の命令
アドレスに対するヒットチェックの結果、その先頭の命
令がヒットする。その結果、この命令から始まる一連の
命令に対する複数の翻訳命令が順次翻訳メモリ２０１か
ら読み出されることになる。In this manner, the hit check is also performed on the instruction supplied from the instruction translation circuit 204 to the translation memory 201 for the first time. If the instruction is not hit, as described above, the instruction is Memory 2
01 is written. Further, when the instruction has already been written to the translation memory 201 and is executed again, as a result of the hit check on the instruction address of the first instruction of the instruction block to which the instruction belongs, the first instruction is hit. As a result, a plurality of translation instructions for a series of instructions starting from this instruction are sequentially read from the translation memory 201.

【００５３】読み出し回路２１３には、転送先となり得
る複数の物理資源に対応して、複数のＦＩＦＯキュー２
６０が設けられ、各ＦＩＦＯキュー２６０は、センスア
ンプを介して対応する物理資源に対して設けられたデー
タ出力線２１８に接続されている。翻訳命令記憶領域２
１１から転送先データ出力線２１８に読み出されたアナ
ログ信号は、このセンスアンプで論理信号に確定された
後、ＦＩＦＯキュー２６０へ書き込まれる。この書き込
みは、ラッチ２６１にオール１が読み出されるまで、す
なわち、終了命令が読み出されるまで繰り返される。こ
のとき、ラインアドレスのインクリメントはＦＩＦＯキ
ュー２６０からラインデコーダ２３９へ伝えられる。The readout circuit 213 has a plurality of FIFO queues 2 corresponding to a plurality of physical resources which can be transfer destinations.
60 are provided, and each FIFO queue 260 is connected to a data output line 218 provided for a corresponding physical resource via a sense amplifier. Translation instruction storage area 2
The analog signal read from 11 to the transfer destination data output line 218 is written into the FIFO queue 260 after being determined as a logic signal by this sense amplifier. This writing is repeated until all 1s are read into the latch 261, that is, until the end instruction is read. At this time, the increment of the line address is transmitted from the FIFO queue 260 to the line decoder 239.

【００５４】（４ｆ）翻訳命令の実行態様図９において、翻訳命令デコーダ２０２は、翻訳命令の
各転送元フィールドをデコードする複数の部分デコーダ
２０２Ａと命令待機ユニット８０２からの転送要求であ
るｉｎｓｔｒｕｃｔｉｏｎｒｅｑｕｅｓｔ８０８から
命令読み出し回路２１３内部のＦＩＦＯキュー２６０へ
の転送要求信号２０２ｃを生成するORゲート２０２Ｂと
からなり、各部分デコーダ２０２Ａに対して、全ての物
理資源、すなわち、複数の演算器２０５、複数の一時レ
ジスタ２０６、レジスタＩＯ２０７の全てに対応する複
数のｉｎｓｔｒｕｃｔｉｏｎ信号８０１が設けられてい
る。先に述べたように、翻訳命令メモリ２０１内部では
命令が要求する転送先物理資源の番号は、翻訳命令記憶
領域２１１内の位置情報として記憶されているので、翻
訳命令デコーダ２０２は、転送元物理資源のみをデコー
ドすればよい。つまり、各部分デコーダは、読み出され
た翻訳命令の中の対応する転送元フィールドをデコード
し、その転送元物理資源に対応する一つのｉｎｓｔｒｕ
ｃｔｉｏｎ信号を選択して起動する。このデコード結果
は、その部分デコーダが対応する転送先となる物理資
源、すなわち、演算器２０５、一時レジスタ２０６また
はレジスタＩＯ２０７のいずれか一つが待つべきデータ
がどの転送元の物理資源で生成されるのかを示してい
る。各ｉｎｓｔｒｕｃｔｉｏｎ信号８０１は１ビットの
データ線から構成される。(4f) Execution Mode of Translation Instruction In FIG. 9, the translation instruction decoder 202 includes a plurality of partial decoders 202A for decoding each source field of the translation instruction and an instruction request 808 which is a transfer request from the instruction waiting unit 802. And an OR gate 202B for generating a transfer request signal 202c to the FIFO queue 260 inside the instruction reading circuit 213. All the physical resources, that is, a plurality of arithmetic units 205, a plurality of temporary A plurality of instruction signals 801 corresponding to all of the register 206 and the register IO 207 are provided. As described above, in the translation instruction memory 201, the number of the destination physical resource requested by the instruction is stored as position information in the translation instruction storage area 211, so that the translation instruction decoder 202 Only the resources need to be decoded. That is, each partial decoder decodes the corresponding transfer source field in the read translation instruction, and outputs one instruct corresponding to the transfer source physical resource.
Select and activate the Ction signal. The decoding result is based on the physical resource of the transfer destination corresponding to the partial decoder, that is, the physical resource of the transfer source in which the data to be waited by any one of the arithmetic unit 205, the temporary register 206, and the register IO 207 is generated. Is shown. Each instruction signal 801 is composed of a 1-bit data line.

【００５５】図１４において、転送回路２０３は、全て
の物理資源、すなわち、複数の演算器２０５、複数の一
時レジスタ２０６およびレジスタＩＯ２０７の各々の各
入力端に対応して、その物理資源に対応する部分デコー
ダの複数のｉｎｓｔｒｕｃｔｉｏｎ信号８０１と、それ
ぞれのｉｎｓｔｒｕｃｔｉｏｎ信号８０１が対応する物
理資源の出力線との交点に命令待機ユニット８０２が設
けられている。各物理資源の各入力端に対応して設けら
れた複数の命令待機ユニット８０２の出力は、ワイアド
オアされて、その入力端に供給される。In FIG. 14, the transfer circuit 203 corresponds to all the physical resources, that is, the plurality of arithmetic units 205, the plurality of temporary registers 206, and each input terminal of the register IO 207, and corresponds to the physical resources. An instruction waiting unit 802 is provided at an intersection of a plurality of instruction signals 801 of the partial decoder and an output line of a physical resource corresponding to each of the instruction signals 801. The outputs of the plurality of instruction waiting units 802 provided corresponding to the respective input terminals of the respective physical resources are wired-ORed and supplied to the input terminals.

【００５６】命令待機ユニット８０２の内部構成を図１
５に示す。ｉｎｓｔｒｕｃｔｉｏｎ信号８０１は先に述
べた翻訳命令デコーダ２０２で生成された信号であり、
着目している命令待機ユニット８０２の関係する転送元
と転送先との間にデータ転送が必要であることを意味す
る信号である。FIG. 1 shows the internal configuration of the instruction waiting unit 802.
It is shown in FIG. The instruction signal 801 is a signal generated by the translation instruction decoder 202 described above.
This signal indicates that data transfer is necessary between the transfer source and the transfer destination related to the instruction waiting unit 802 of interest.

【００５７】ｉｎｓｔｒｕｃｔｉｏｎｒｅｑｕｅｓｔ
８０８は命令待機ユニット８０２から翻訳命令デコーダ
２０２へのデータ転送要求であり、翻訳命令デコーダ２
０２内のゲートでOR論理がとられ、読み出し回路２１３
内のＦＩＦＯキューから新たな転送元データが取り出さ
れる。ｄａｔａ−ｉｎ信号８０３は所定のデータ幅、例
えば３２ビット幅のデータ線であり、転送元からデータ
を伝送する。ｄａｔａ−ｉｎ信号８０３に新しいデータ
が到着したことはｄａｔａ−ｉｎ−ｖａｌｉｄ信号８０
５が有効になることで示される。同様に、ｄａｔａ−ｏ
ｕｔ信号８０４は転送先へのデータ転送線であり、ｄａ
ｔａ−ｉｎ信号８０３と同様に３２ビット等の幅が用意
される。ｄａｔａ−ｏｕｔ信号８０４の有効性はｄａｔ
ａ−ｏｕｔ−ｖａｌｉｄ信号８０６によって示される。
命令待機ユニット８０２の基本的な役割はｉｎｓｔｒｕ
ｃｔｉｏｎ信号８０１が有効、すなわちデータ転送が必
要な命令が存在するときに、ｄａｔａ−ｉｎ信号８０３
の内容をｄａｔａ−ｏｕｔ信号８０４に出力することで
ある。転送先の演算器やレジスタはそれぞれの機能を実
現するために必要なデータが揃った時点で動作を開始す
る。[0057] instruction request
Reference numeral 808 denotes a data transfer request from the instruction waiting unit 802 to the translation instruction decoder 202.
OR logic is taken by the gate in the readout circuit 213
The new transfer source data is taken out from the FIFO queue in the. The data-in signal 803 is a data line having a predetermined data width, for example, a 32-bit width, and transmits data from a transfer source. The arrival of new data in the data-in signal 803 indicates that the data-in-valid signal 80 has been received.
5 is enabled. Similarly, data-o
The out signal 804 is a data transfer line to the transfer destination,
As in the case of the ta-in signal 803, a width such as 32 bits is prepared. The validity of the data-out signal 804 is data
Indicated by a-out-valid signal 806.
The basic role of the instruction waiting unit 802 is instruct
When the ction signal 801 is valid, that is, when there is an instruction that requires data transfer, the data-in signal 803
Is output to the data-out signal 804. The operation unit and the register at the transfer destination start operation when data necessary for realizing the respective functions is prepared.

【００５８】より具体的には上記の制御が転送制御回路
８０７で実現される。転送制御回路８０７の回路と機能
は図１６に示した回路図およびタイムチャートの通りで
ある。つまり，ラッチ８０００とゲート８００２の組お
よびラッチ８００１とゲート８００３の組はそれぞれｉ
ｎｓｔｒｕｃｔｉｏｎ信号８０１とｄａｔａ−ｉｎ−ｖ
ａｌｉｄ信号８０５がＨｉｇｈになった状態をｄａｔａ
−ｏｕｔ−ｖａｌｉｄ信号８０６がＨｉｇｈになるまで
保持する。ゲート８００４は上記２つの信号がともにＨ
ｉｇｈになった状態を検出し，ラッチ８００５の状態を
１サイクルの間反転させる。このラッチ８００５の出力
はｄａｔａ−ｏｕｔ−ｖａｌｉｄ信号８０６，ｉｎｓｔ
ｒｕｃｔｉｏｎｒｅｑｕｅｓｔ８０８として出力され
る。ｉｎｓｔｒｕｃｔｉｏｎ信号８０１が有効である場
合，この命令待機ユニット８０２は転送元からのデータ
を待っている状態であり、次にｄａｔａ−ｉｎ信号８０
３にデータが到着すると、転送先の演算器やレジスタの
格納動作を起動する意味でｄａｔａ−ｏｕｔ信号８０４
から同一データを出力する。図１４に戻り、ｄａｔａ−
ｉｎ信号８０３およびｄａｔａ−ｉｎ−ｖａｌｉｄ信号
８０５は同一水平位置にある命令待機ユニット８０２に
全て接続され、電気信号は一斉に伝達するよう接続され
ている。したがって、先に述べたように命令が転送元と
転送先に接続される結線の交差する位置で実行を待機す
る動作が実現できる。More specifically, the above control is realized by the transfer control circuit 807. The circuit and function of the transfer control circuit 807 are as shown in the circuit diagram and time chart of FIG. That is, the set of the latch 8000 and the gate 8002 and the set of the latch 8001 and the gate 8003 are i
Nstruction signal 801 and data-in-v
The state in which the “alid” signal 805 becomes High is “data”.
The signal is held until the -out-valid signal 806 becomes High. The gate 8004 outputs the signal H
The state of the latch 8005 is detected, and the state of the latch 8005 is inverted for one cycle. The output of this latch 8005 is a data-out-valid signal 806, inst
It is output as the function request 808. When the instruction signal 801 is valid, the instruction waiting unit 802 is in a state of waiting for data from the transfer source.
3, the data-out signal 804 means that the storage operation of the destination arithmetic unit or register is started.
Output the same data. Returning to FIG.
The in signal 803 and the data-in-valid signal 805 are all connected to the command waiting unit 802 located at the same horizontal position, and are connected so as to transmit electric signals all at once. Therefore, as described above, the operation of waiting for execution at the position where the instruction intersects the connection between the transfer source and the transfer destination can be realized.

【００５９】次に翻訳命令の実行形態を視覚的に把握す
る目的で、図１７から図２０に図５で例示したプログラ
ムの実行例を示す。図１７から図２０において丸数字は
図７の丸数字に対応しており、翻訳命令のデータ転送を
あらわしている。また、丸数字の０は翻訳命令の初期化
命令に、丸数字の１２は終了命令に対応している。図１
７の上方に囲んだ領域９００は翻訳命令メモリ２０１に
格納された翻訳命令を表し、下方の領域９０１は図１４
において命令待機ユニット８０２が設けられていた領域
に対応する。領域９０１の下に並べられた図１における
演算器２０５〜レジスタＩＯ２０７に対応しており、説
明の便宜上実行するプログラムと同じ演算名とレジスタ
名で対応関係を示している。なおレジスタＩＯ２０７は
図１７から図２０においては１つだけ例示しているが、
これは図面の都合上であり、データ転送が必要なレジス
タの数だけレジスタＩＯも存在する。また図１７中の丸
数字１２下方にある黒丸は、翻訳命令の初期化命令を起
動するために存在し、初期化命令毎、かつレジスタＩＯ
２０７対応に存在する。また，本発明の原理から一時レ
ジスタ２０６の物理資源番号はレジスタファイル１０７
内のレジスタ番号と一致している必要がある。初期化命
令によって一時レジスタにレジスタデータの初期値を転
送する際，一時レジスタ２０６の物理資源番号も同時に
付与され，以降の命令実行ではその物理資源番号が使用
される。Next, for the purpose of visually grasping the execution form of the translation instruction, FIGS. 17 to 20 show examples of execution of the program illustrated in FIG. In FIGS. 17 to 20, the circled numbers correspond to the circled numbers in FIG. 7, and represent the data transfer of the translation command. The circled numeral 0 corresponds to a translation instruction initialization instruction, and the circled numeral 12 corresponds to an end instruction. FIG.
7 represents a translation instruction stored in the translation instruction memory 201, and a lower area 901 corresponds to FIG.
Corresponds to the area where the instruction waiting unit 802 is provided. It corresponds to the arithmetic unit 205 to the register IO 207 in FIG. 1 arranged below the area 901, and the correspondence is indicated by the same operation name and register name as the program executed for convenience of explanation. Although only one register IO 207 is illustrated in FIGS. 17 to 20,
This is for the convenience of the drawing, and there are as many register IOs as the number of registers requiring data transfer. The black circle below the circled number 12 in FIG. 17 exists to activate the initialization instruction of the translation instruction, and is provided for each initialization instruction and for the register IO.
207. Also, from the principle of the present invention, the physical resource number of the temporary register 206 is stored in the register file 107.
Must match the register number in. When the initial value of the register data is transferred to the temporary register by the initialization instruction, the physical resource number of the temporary register 206 is also given at the same time, and the physical resource number is used in the subsequent instruction execution.

【００６０】翻訳命令はｉｎｓｔｒｕｃｔｉｏｎ信号８
０１が空の場合にはただちに領域９００から領域９０１
の命令待機ユニット８０２へ転送される。すなわち、翻
訳命令の実行は図１７の状態からただちに図１８の状態
へ移行する。図１７に存在した黒丸のデータ転送命令は
レジスタＩＯ２０７を起動し、レジスタファイル１０７
から所望の（図７からわかるように、本プログラム例で
はｆｐ０，ｒ６，ｆｐ２の初期内容）データを読み出
し、レジスタＩＯの出力結果とする。図１５の命令待機
レジスタＩＯの出力線９０２上には丸数字０の命令が
待機しており（図１８）、レジスタＩＯを介してレジス
タファイル１０７の内容が信号線８０３に出力される
と、丸数字０の直下に位置づけられたそれぞれの演算器
へデータを転送し、演算を起動する。ここで信号線８０
３は命令待機ユニット８０２のｄａｔａ−ｉｎ信号８０
３と同一であり、各演算器、一時レジスタの出力内容は
命令待機ユニット８０２へ一斉に送られる。The translation instruction is the instruction signal 8
If 01 is empty, immediately from area 900 to area 901
Is transferred to the instruction waiting unit 802. That is, the execution of the translation instruction immediately shifts from the state of FIG. 17 to the state of FIG. The data transfer instruction indicated by a black dot in FIG.
, The desired data (initial contents of fp0, r6, and fp2 in this example of the program as can be seen from FIG. 7) is read as the output result of the register IO. An instruction with a circle number 0 is waiting on the output line 902 of the instruction waiting register IO in FIG. 15 (FIG. 18). When the contents of the register file 107 are output to the signal line 803 via the register IO, the circle is output. The data is transferred to each of the computing units positioned immediately below the numeral 0, and the computation is started. Here, the signal line 80
3 is a data-in signal 80 of the instruction waiting unit 802
3 and the output contents of each arithmetic unit and temporary register are sent to the instruction waiting unit 802 all at once.

【００６１】以降の処理はいわゆる玉突き状態で進行す
る。例えば、丸数字０によってｒ６（レジスタ６番）の
内容が読み出されると図１８に示すように、その出力線
上に待機している丸数字１と丸数字１０に対応してａｉ
（加算）とｌｆｄ（浮動小数点データのロード）を実施
する演算器へｒ６のデータが転送される。ｆｐ２（浮動
小数点レジスタ２番）、ｆｐ０（浮動小数点レジスタ０
番）に対応した丸数字０のデータ転送命令も同様であ
り、結果として図１９の状態を得る。図１９の状態から
は丸数字の１１によって１２が実行され、加算後のｒ６
の内容がレジスタファイルに書き戻され、図２０の状態
となる。最終的には丸数字１２で示したデータ転送が実
施され、図７のブロック６７２が示すように、ｒ６と条
件レジスタＣＲ６，浮動小数点レジスタｆｐ１の値がレ
ジスタファイル１０７に書き込まれる。The subsequent processing proceeds in a so-called thrust state. For example, when the content of r6 (register 6) is read by the circled number 0, as shown in FIG. 18, ai corresponding to the circled number 1 and the circled number 10 waiting on its output line
The data of r6 is transferred to a computing unit that performs (addition) and ifd (load floating-point data). fp2 (floating-point register 2), fp0 (floating-point register 0)
The same applies to the data transfer instruction of the circled number 0 corresponding to (No.), and as a result, the state of FIG. 19 is obtained. From the state of FIG. 19, 12 is executed by the circled number 11, and r6 after the addition is performed.
Are written back to the register file, and the state shown in FIG. 20 is obtained. Finally, the data transfer indicated by the circled numeral 12 is performed, and the values of r6, the condition register CR6, and the floating-point register fp1 are written to the register file 107, as indicated by the block 672 in FIG.

【００６２】以上の翻訳命令実行では命令待機ユニット
で待機中の命令は真に必要なデータのみを待つために、
命令スケジューラ１０４で実施されていた従来技術のよ
うに演算を終了した全データと待機中の全命令との照合
等が不要となり、高速な命令待機と実行が実現できる。
また、この過程が従来技術より高速であるということ
は、同じ実行内容の命令列でも、翻訳命令メモリ２０１
に蓄積された命令列の方が、従来の命令デコーダ１０
３，命令スケジューラ１０４を経由する実行より高速で
あることを意味する。In the above translation instruction execution, the instruction waiting in the instruction waiting unit waits only for truly necessary data.
Unlike the prior art executed by the instruction scheduler 104, it is not necessary to collate all the data whose operation has been completed with all the instructions in the standby state, and thus it is possible to realize high-speed instruction standby and execution.
The fact that this process is faster than in the prior art means that even if the instruction sequence has the same execution
Are stored in the instruction decoder 10 of the related art.
3. It means that it is faster than the execution via the instruction scheduler 104.

【００６３】以上に示した実施の形態によれば、命令実
行に必要な物理資源の割り当て、決定に関しては最初の
命令実行時には従来技術と同様な手段により認識するた
め、処理時間の改善はない。しかし、２回目以降の同一
命令の実行に関しては翻訳命令メモリ２０１内に格納さ
れているデータ転送先、データ転送元の情報を利用する
ため、スーパースカラープロセッサ１００内の命令デコ
ーダ１０３による処理と命令スケジューラ１０４による
処理を省略することができ、その分の高速化が可能とな
る。実質的効果は、同じ命令列を利用する割合、すなわ
ち翻訳メモリのヒット率に依存する。命令キャッシュな
どの研究から知られているように、実行命令の局所性は
高く、特に翻訳メモリ２０１を同一チップ内のＤＲＡＭ
等で実現すると、ヒット率はほぼ１００％を実現するこ
とが可能である。According to the above-described embodiment, the allocation and determination of the physical resources required for the execution of the instruction are recognized at the time of the first execution of the instruction by means similar to the prior art, so that the processing time is not improved. However, since the information of the data transfer destination and the data transfer source stored in the translation instruction memory 201 is used for the second and subsequent executions of the same instruction, the processing by the instruction decoder 103 in the superscalar processor 100 and the instruction scheduler The processing by 104 can be omitted, and the speed can be increased accordingly. The substantial effect depends on the ratio of using the same instruction sequence, that is, the hit ratio of the translation memory. As is known from researches on instruction caches and the like, the locality of execution instructions is high.
And so on, it is possible to achieve a hit rate of almost 100%.

【００６４】また、本実施の形態では、実行すべき命令
が実行に必要なデータをデータが転送されてくる経路上
で待つため、リザベーションステーションのように、明
らかに関連性のない命令に対して命令実行可能性のチェ
ックを行うようなことはない。事実、従来技術では上記
の冗長な機能のために、実行待ちの命令は自分の必要と
するデータを番号で管理しており、演算終了時に告知さ
れる番号との照合が必要である。このことから、一つの
リザベーションステーションに待機可能な命令は４命令
程度が限度であり、比較にも時間を必要とした。本実施
の形態では、転送先、転送元の物理資源情報は転送回路
２０３中の位置情報として利用され、最終的に命令とし
て転送回路２０３内で待機する命令は実行を起動するた
めの１ビットの情報でしかない。このことから、実行条
件の成立の判定は高速であり、構造が簡単になることか
ら多くの命令を実行待機させることができる。また、デ
ータの転送先に対応した場所で命令実行を待機するた
め、存在する演算器数以上の命令を実行しようとするこ
ともないし、それらの調停回路も必要でなくなる。Further, in this embodiment, since an instruction to be executed waits for data necessary for execution on a path through which data is transferred, an instruction which is clearly unrelated, such as a reservation station, cannot be executed. There is no need to check for instruction executability. In fact, in the prior art, due to the above-mentioned redundant function, an instruction waiting to be executed manages data required by itself by a number, and needs to be compared with a number notified at the end of operation. For this reason, the number of instructions that can wait in one reservation station is limited to about four, and the comparison requires time. In the present embodiment, the physical resource information of the transfer destination and the transfer source is used as the position information in the transfer circuit 203, and the instruction that finally waits in the transfer circuit 203 as the instruction is a 1-bit instruction for starting the execution. Information only. Thus, the determination of the satisfaction of the execution condition is fast, and the structure is simplified, so that many instructions can be put on standby. Further, since the execution of the instruction is waited for at a place corresponding to the data transfer destination, there is no attempt to execute the instructions more than the number of the existing arithmetic units, and the arbitration circuit thereof is not required.

【００６５】＜変形例＞本発明は、以上の実施例に限定
されるものではなく、以下に示す変形例およびその他の
いろいろの変形例として、実施可能である。<Modifications> The present invention is not limited to the above embodiments, but can be implemented as the following modifications and various other modifications.

【００６６】（１）実施の形態１ではスーパースカラー
プロセッサ１００の複数の演算器１０５、レジスタファ
イル１０７とは別にプロセッサ２００用に複数の演算器
２０５、複数の一時レジスタ２０６が設けられた。しか
し、この後者の演算器とレジスタに代えて、前者に含ま
れた演算器およびレジスタを使用することも可能であ
る。このためには、実施の形態１の転送回路２０３を、
スーパースカラープロセッサ１００内の演算器１０５と
レジスタファイル１０７内のレジスタの間でデータ転送
するように、これらの演算器とレジスタに接続する。こ
のときには、プロセッサ２００内の翻訳メモリ２０１、
翻訳命令デコーダ２０２、転送回路２０３等の回路を動
作させるクロックは、スーパースカラープロセッサ１０
０を動作させるクロックと同じとすることが現実的であ
る。したがって、実施の形態１で述べたような高速な動
作は実現できないが、翻訳命令を使用する結果、命令ス
ケジューラ１０４使用しないで翻訳命令を実行できるの
で、この点で従来より高速な動作が期待できる。(1) In the first embodiment, apart from the plurality of arithmetic units 105 and the register file 107 of the superscalar processor 100, a plurality of arithmetic units 205 and a plurality of temporary registers 206 are provided for the processor 200. However, it is also possible to use the arithmetic unit and the register included in the former in place of the latter arithmetic unit and the register. To this end, the transfer circuit 203 of the first embodiment is
These arithmetic units and registers are connected so that data is transferred between the arithmetic unit 105 in the superscalar processor 100 and the registers in the register file 107. At this time, the translation memory 201 in the processor 200,
The clock for operating the circuits such as the translation instruction decoder 202 and the transfer circuit 203 is the super scalar processor 10
It is realistic to use the same clock as that for operating 0. Therefore, although the high-speed operation as described in the first embodiment cannot be realized, the translation instruction can be executed without using the instruction scheduler 104 as a result of using the translation instruction. .

【００６７】（２）上記の実施形態では図１１に示した
翻訳命令の転送先を命令翻訳回路２０４でデコードし、
翻訳命令メモリ２０１内部には図１３のように転送元だ
けの情報を記録していた。これは、図１４に示すよう
に、関連するレジスタや演算器と翻訳命令メモリ内のデ
ータを対応づけておいた方が、翻訳命令の取り出しから
実行までの時間が短縮されるからである。しかし、この
方法は翻訳命令メモリ２０１の利用効率が必ずしも高く
ないため、図１１に示した翻訳命令の形態をそのまま翻
訳命令メモリ２０１内に格納することも可能である。た
だし、この方法では翻訳命令読み出し時に転送先に関わ
るデコードをするため、本発明の主眼である、命令実行
までの処理時間は若干犠牲となる。(2) In the above embodiment, the destination of the translated instruction shown in FIG.
As shown in FIG. 13, information of only the transfer source is recorded in the translation instruction memory 201. This is because, as shown in FIG. 14, the time from the fetching of the translation instruction to the execution of the instruction is reduced by associating the data in the translation instruction memory with the associated registers and arithmetic units. However, in this method, the efficiency of using the translation instruction memory 201 is not always high, so that the translation instruction form shown in FIG. 11 can be stored in the translation instruction memory 201 as it is. However, in this method, the decoding related to the transfer destination is performed at the time of reading the translated instruction, so that the processing time until the instruction execution, which is the main feature of the present invention, is slightly sacrificed.

【００６８】（３）翻訳命令メモリとしてはＳＲＡＭや
ＤＲＡＭといった半導体メモリを想定しているが、たと
えば図２１に示すように、ハードディスクのような外部
記憶装置２０１Ａに格納する形態も可能である。翻訳命
令フェッチ回路２０１Ｂは読み出し回路２１３と同様に
翻訳命令を蓄積する手段（本変形例では外部記憶装置２
０１Ａ）から翻訳命令の転送元情報を読み出す。レジス
タファイル１０７Ａはレジスタファイル１０７と論理，
物理両面で同一のレジスタファイルである。ただし、こ
の場合には上述の実施形態とは異なり、翻訳命令はスカ
ラープロセッサ１００の動作と同時には実行されず、別
途プロセッサ２００上で実行される。(3) Although a semiconductor memory such as an SRAM or a DRAM is assumed as the translation instruction memory, for example, as shown in FIG. 21, a form in which the translation instruction memory is stored in an external storage device 201A such as a hard disk is also possible. The translation instruction fetch circuit 201B stores the translation instruction in the same manner as the readout circuit 213 (in this modification, the external storage device 2
01A), the source information of the translation instruction is read. The register file 107A has the same logic as the register file 107,
The register file is the same on both physical sides. However, in this case, unlike the above-described embodiment, the translation instruction is not executed simultaneously with the operation of the scalar processor 100, but is executed separately on the processor 200.

【００６９】[0069]

【発明の効果】本発明によれば、逐次実行用のプログラ
ムの命令に対してスケジュール処理を施してそれらの命
令を並列に実行した後、スケジュール処理の結果をメモ
リに格納するので、再度そのプログラムを実行するとき
には、そのスケジュール処理を再度実行する必要がな
く、そのプログラムをより高速に再実行できる。According to the present invention, the schedule processing is performed on the instructions of the program for sequential execution, the instructions are executed in parallel, and the result of the schedule processing is stored in the memory. Is executed, there is no need to execute the schedule process again, and the program can be executed again at a higher speed.

[Brief description of the drawings]

【図１】本発明に係る情報処理装置の概略ブロック図。FIG. 1 is a schematic block diagram of an information processing apparatus according to the present invention.

【図２】図１の装置における命令実行のフローチャー
ト。FIG. 2 is a flowchart of instruction execution in the apparatus of FIG. 1;

【図３】図１の装置に使用されるスカラープロセッサ１
００の概略構成図。FIG. 3 is a scalar processor 1 used in the apparatus of FIG.
FIG.

【図４】図１の装置に用いる命令のフォーマットを示す
図。FIG. 4 is a diagram showing a format of an instruction used in the apparatus shown in FIG. 1;

【図５】図１の装置で実行されるプログラムの例を示す
図。FIG. 5 is a view showing an example of a program executed by the apparatus shown in FIG. 1;

【図６】図５のプログラムのグラフ表現を示す図。FIG. 6 is a diagram showing a graph representation of the program of FIG. 5;

【図７】図５のプログラムのデータ転送を表すグラフ表
現を示す図。FIG. 7 is a view showing a graph representation representing data transfer of the program of FIG. 5;

【図８】図１の装置に使用される命令翻訳回路の概略ブ
ロック図。FIG. 8 is a schematic block diagram of an instruction translation circuit used in the apparatus of FIG. 1;

【図９】図１の装置に使用する翻訳メモリの概略ブロッ
ク図。FIG. 9 is a schematic block diagram of a translation memory used in the apparatus of FIG. 1;

【図１０】図９の装置に使用される複数種のメモリセル
と複数の制御回路の回路図。FIG. 10 is a circuit diagram of a plurality of types of memory cells and a plurality of control circuits used in the device of FIG. 9;

【図１１】図８の装置により生成される翻訳命令のフォ
ーマットを示す図。FIG. 11 is a diagram showing a format of a translation instruction generated by the device of FIG. 8;

【図１２】図８の装置内に含まれた二つのスコアボード
に保持された内容と同じ内容を保持する等価なスコアボ
ードを示す図。FIG. 12 is a diagram showing an equivalent scoreboard that holds the same content as the content held in two scoreboards included in the apparatus of FIG. 8;

【図１３】図９の装置に記憶された翻訳命令のフォーマ
ットを示す図。FIG. 13 is a view showing a format of a translation instruction stored in the apparatus of FIG. 9;

【図１４】図１の装置に含まれた翻訳命令デコーダと転
送回路の概略構成図。FIG. 14 is a schematic configuration diagram of a translation instruction decoder and a transfer circuit included in the device of FIG. 1;

【図１５】図１４の装置に使用される命令待機ユニット
のブロック図。FIG. 15 is a block diagram of a command waiting unit used in the apparatus of FIG. 14;

【図１６】図１５の装置の回路図とタイムチャート。16 is a circuit diagram and a time chart of the device in FIG.

【図１７】図１の装置における翻訳命令の第１の実行例
を示す図。FIG. 17 is a diagram showing a first execution example of a translation instruction in the device of FIG. 1;

【図１８】図１の装置における翻訳命令の第２の実行例
を示す図。FIG. 18 is a view showing a second execution example of a translation instruction in the apparatus of FIG. 1;

【図１９】図１の装置における翻訳命令の第３の実行例
を示す図。FIG. 19 is a diagram showing a third execution example of a translation instruction in the device of FIG. 1;

【図２０】図１の装置における翻訳命令の第４の実行例
を示す図。FIG. 20 is a view showing a fourth execution example of a translation instruction in the apparatus of FIG. 1;

【図２１】本発明に係る他の情報処理装置の概略ブロッ
ク図。FIG. 21 is a schematic block diagram of another information processing apparatus according to the present invention.

Claims

[Claims]

A plurality of first type physical resources including a plurality of first type arithmetic units and a plurality of first type registers; and a plurality of first type instructions included in a program to be executed. Selecting a plurality of first-type instructions that can be executed in parallel, and assigning a plurality of first-type physical resources to be used for execution of each first-type instruction to the selected plurality of first-type instructions Of the plurality of first-type instructions selected to execute the first-type instructions in parallel using physical resources allocated to each of the plurality of first-type instructions. An instruction scheduler for controlling execution; and each time a plurality of first type instructions are executed in parallel by the instruction scheduler, the instruction scheduler executes each of the plurality of first type instructions executed in parallel by the instruction scheduler. A small resource that can identify multiple allocated physical resources A memory for storing at least one second-type instruction; a plurality of second-type physical resources including a plurality of second-type operation units and a plurality of second-type registers; An instruction for executing at least one second-type instruction stored in the memory corresponding to the plurality of first-type instructions, instead of the plurality of first-type instructions included in the program; An execution circuit, wherein the instruction execution circuit is identifiable from the one second type instruction assigned to each of the plurality of first type instructions corresponding to the second type instruction. An information processing apparatus that executes a second type instruction using a plurality of second type physical resources corresponding to various plurality of physical resources.

2. The plurality of second-type arithmetic units are provided separately from the plurality of first-type arithmetic units, and the plurality of second-type registers are connected to the plurality of first-type registers. The information processing apparatus according to claim 1, wherein the information processing apparatus is provided separately.

3. The plurality of first-type arithmetic units are used as the plurality of second-type arithmetic units, and the plurality of first-type registers are used as the plurality of second-type registers. Item 10. The information processing apparatus according to Item 1.

4. The plurality of second type operation units include the same operation unit as the plurality of first type operation units, and the number of the plurality of second type registers is equal to the plurality of first type operation units. The information processing apparatus according to claim 1, wherein the number of registers is equal to the number of registers.

5. The plurality of second type arithmetic units include a different number of arithmetic units from the plurality of first type arithmetic units, and the number of the plurality of second type registers is equal to the plurality of second type arithmetic units. 2. The information processing apparatus according to claim 1, wherein the number of registers is different from the number of one type of register.

6. A plurality of physical resources including a plurality of arithmetic units and a plurality of registers, and a plurality of second-type instructions respectively corresponding to a set of a plurality of first-type instructions executable in parallel. A memory, and an instruction execution circuit for sequentially executing a plurality of second type instructions stored in the translation memory, wherein each of the plurality of second type instructions corresponds to a corresponding one of the plurality of second type instructions. A plurality of physical resources used to execute an operation required by the first type of instruction are stored in the memory so as to be identifiable, and the instruction execution circuit comprises a plurality of first resources corresponding to each second type of instruction. The second instruction assigned to each of the
An information processing apparatus that executes a second type of instruction using a plurality of physical resources that can be identified from the type of instruction.

7. An executable state by determining a plurality of operation units, a plurality of registers, an operation unit and a register used by an instruction, and monitoring update of data in the register which is an input of instruction execution. An instruction scheduler for sequentially inputting the instructions into the arithmetic unit, a memory for storing an instruction word storage address of the instruction input to the arithmetic unit, an arithmetic unit to be used, and register information, which are already stored in the memory An information processing apparatus comprising: an instruction execution circuit that executes an instruction based on information about an arithmetic unit to be used and a register stored in the memory when executing an instruction at an instruction word storage address again.

8. A plurality of operation units, a plurality of registers, an operation unit and a register used by an instruction are determined, data in the register as an input of instruction execution is updated, and a register in the register as an input of instruction execution is updated. A transfer circuit that determines a transfer path from the arithmetic unit to the register or from the arithmetic unit to the arithmetic unit by monitoring an arithmetic unit that generates data, and an instruction of the instruction input to the arithmetic unit; A memory for storing a word storage address and transfer path information of data necessary for executing the instruction, and when executing again the instruction of the instruction word storage address already stored in the memory, the transfer circuit includes: An information processing apparatus having a circuit for executing a command by switching a data transfer path between a computing unit and a register based on transfer path information stored in the memory.

9. A plurality of arithmetic units, a plurality of registers, and a plurality of intersections between output lines and input lines enabling data transfer between the plurality of arithmetic units and the plurality of registers. A plurality of instruction waiting circuits for waiting for an instruction waiting to be executed, an arithmetic unit for generating data required for instruction execution, or a register holding data necessary for instruction execution, and an arithmetic unit for executing arithmetic An instruction waiting to be executed having the identification information as the data transfer information is provided at the intersection of the input / output lines of a computing unit or a register which requires data transfer to execute the instruction among the plurality of instruction standby circuits. And a circuit for inputting the instruction into one instruction standby circuit.