JP3112861B2

JP3112861B2 - Microprocessor

Info

Publication number: JP3112861B2
Application number: JP09128875A
Authority: JP
Inventors: 広樹高橋; 聡多賀谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-05-19
Filing date: 1997-05-19
Publication date: 2000-11-27
Anticipated expiration: 2017-05-19
Also published as: JPH10320196A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の命令からな
るプログラムをパイプライン実行するマイクロプロセッ
サの方式に関し、かつ、上記マイクロプロセッサが実行
するプログラムに含まれる命令の実行順序を、その命令
のデータ依存性を参照することで、プログラムの記載順
序以外の順序で実行するマイクロプロセッサであって、
しかも、同時あるいは近傍時間にデータ依存性が解決さ
れた複数の命令をスーパースケーラ実行するプロセッサ
の実行順序を決定する方式および構成に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a microprocessor system for executing a program consisting of a plurality of instructions in a pipeline, and the order of execution of instructions contained in the program executed by the microprocessor is stored in the data of the instruction A microprocessor that executes in an order other than the order in which the programs are described by referring to the dependencies,
In addition, the present invention relates to a method and a configuration for determining the execution order of a processor that performs a superscalar execution of a plurality of instructions whose data dependencies have been solved at the same time or near time.

【０００２】[0002]

【従来の技術】命令レベルの並列性を抽出しながら並列
実行を行う汎用のマイクロプロセッサにおいては、命令
が必要とする入力データが揃ったかどうかを確認して、
その静的順序に関わらず依存性が解決されたものから動
的に実行順を決定し実行するＯｕｔＯｆＯｒｄｅｒ実行
形式が一般的である。また、ある時点で実行可能な命令
が使用するリソースの数以上あった場合、実行する命令
の選択方法は、一般的に「古い命令から選択していく」
方法が一般的であり、広く用いられている。2. Description of the Related Art In a general-purpose microprocessor that performs parallel execution while extracting parallelism at an instruction level, it is necessary to confirm whether input data required by an instruction has been prepared.
Regardless of the static order, an OutOfOrder execution form that dynamically determines and executes the execution order from the one whose dependency has been resolved is general. When an instruction executable at a certain time is equal to or more than the number of resources to be used, the method of selecting an instruction to be executed is generally “select from old instructions”.
The method is general and widely used.

【０００３】例えば、図２に示すようにプログラム中に
命令Ｉ１〜Ｉ１０が存在し、そのプログラム内の命令の
静的順序は古い順からＩ１，Ｉ２，Ｉ３，Ｉ４，Ｉ５，
……Ｉ１０であったとする。また、そのデータ依存性は
図２中に矢印で示された方向に存在するとする。即ち、
矢印で指された命令は、その矢印の元が接続されている
命令が実行を終了すると、実行可能となることを示して
いる。複数の矢印から指されている命令は、指し側の命
令がすべて終了して始めて実行可能となることを示して
いる。また、図２中の命令Ｉ９は実行終了までに２０ク
ロックかかる命令であると仮定し、その他の命令は実行
に１クロックしかかからないと仮定する。また、このプ
ログラムを実行するマイクロプロセッサは同時に２つの
命令を実行できる２ウエイのスーパースケーラ実行が可
能なマイクロプロセッサである。For example, as shown in FIG. 2, there are instructions I1 to I10 in a program, and the static order of the instructions in the program is from I1, I2, I3, I4, I5 to oldest.
... Assume I10. It is also assumed that the data dependency exists in the direction indicated by the arrow in FIG. That is,
The instruction pointed by the arrow indicates that the instruction connected to the origin of the arrow is executable when the execution of the instruction ends. The instruction pointed to by a plurality of arrows indicates that the instruction can be executed only after all the instructions on the pointing side have been completed. It is also assumed that the instruction I9 in FIG. 2 takes 20 clocks to complete execution, and the other instructions take only one clock to execute. The microprocessor that executes this program is a microprocessor that can execute two instructions at the same time and can execute a two-way superscalar.

【０００４】この例において、命令Ｉ１が実行される
と、命令Ｉ１が実行を終了すると命令Ｉ２，Ｉ３及び命
令Ｉ９が実行可能となる。「古い命令から選択してい
く」方法に準拠すると、次に実行するべく選択される命
令はＩ２，Ｉ３となる。命令Ｉ２，Ｉ３の実行が終了す
ると、次に実行可能な命令は命令Ｉ４，Ｉ５，Ｉ９であ
るが、やはり古い命令から順に選択していくと命令Ｉ
４，Ｉ５を実行する。命令Ｉ４，Ｉ５が実行を終わる
と、命令Ｉ６，Ｉ７，Ｉ９が実行可能となり、命令Ｉ
６，Ｉ７が実行される。命令Ｉ６，Ｉ７が実行を終える
と、命令Ｉ８，Ｉ９が実行可能となり、実行される。In this example, when the instruction I1 is executed, the instructions I2, I3 and the instruction I9 become executable when the execution of the instruction I1 is completed. According to the "select from old instructions" method, the instructions selected to be executed next are I2 and I3. When the execution of the instructions I2 and I3 is completed, the next executable instructions are the instructions I4, I5 and I9.
4. Execute I5. When the instructions I4 and I5 have been executed, the instructions I6, I7 and I9 become executable and the instruction I
6, I7 are executed. After the execution of the instructions I6 and I7, the instructions I8 and I9 become executable and executed.

【０００５】この場合、命令Ｉ１０は命令Ｉ８，Ｉ９が
終了するまで実行を始めることができないが、命令Ｉ９
は２０クロックかかる命令であるため、Ｉ１０もまたそ
の実行開始を２０クロック待たされることになる。即
ち、命令Ｉ９の大きなレイテンシがプログラムの実行時
間に見えてしまい、性能に大きな影響を与える。具体的
には、上記プログラムでＩ１からＩ１０までの実行時間
は２５クロックであり、命令Ｉ９のレイテンシ２０クロ
ックが全体の実行時間の大きな割合をしめている。In this case, the instruction I10 cannot start executing until the instructions I8 and I9 are completed.
Is an instruction that takes 20 clocks, so that I10 is also delayed by 20 clocks before its execution starts. In other words, the large latency of the instruction I9 appears in the execution time of the program, which greatly affects the performance. Specifically, in the above program, the execution time from I1 to I10 is 25 clocks, and the latency of 20 clocks of the instruction I9 accounts for a large proportion of the entire execution time.

【０００６】即ち上述の例に挙げるように、ある時点に
おいて実行可能な命令が複数ある場合、その実行順序を
「古いものから順に選択する」一般的な方法では、大き
なレイテンシを伴う命令の実行時間が隠蔽できないケー
スが存在し、性能低下を引き起こす。That is, as described in the above example, when there are a plurality of instructions that can be executed at a certain point in time, the general method of selecting the execution order from the oldest one is the execution time of the instruction with a large latency. However, there are cases in which it cannot be hidden, causing a decrease in performance.

【０００７】この問題に対して、特開平４−２５２３３
６においてプログラムの実行順序を最適化し、パイプラ
イン処埋を行うプロセッサによる実行に最適化する方法
が提案されている。これは、命令の実行順序によっては
生じる可能性のある命令待ち合わせ時間、いわゆるパイ
プラインブレークを、適当な命令を挿入し実行順序を変
更することによつて減少させ、実行時間の最適化を行う
ものである。To solve this problem, Japanese Patent Application Laid-Open No. Hei 4-25233
No. 6, a method for optimizing the execution order of programs and optimizing the execution order by a processor that performs pipeline processing is proposed. This is to optimize the execution time by reducing the instruction waiting time that may occur depending on the execution order of instructions, so-called pipeline break, by inserting an appropriate instruction and changing the execution order. It is.

【０００８】この装置は命令列の依存関係を静的に解
析、命令の処理に関して重み付けを行い、命令列の再構
成を行うが、あくまで静的な解析によって行うものであ
るため、実行中に動的に発生するレイテンシ（メモリア
クセス命令がキャッシュミスしフィル処理を行う場合
や、値によって処理時間が変化する演算命令など）には
対応できない。また、ソフト互換性はあるが、ハード的
に互換性のないプロセッサに対しては、それぞれのプロ
セッサに関して処理の重み付けが変化するため、プログ
ラムをそれぞれのプロセッサ用に最適化してやる必要が
あり、非効率的である。This apparatus statically analyzes the dependence of an instruction sequence, performs weighting on instruction processing, and reconstructs an instruction sequence. However, since this apparatus performs static analysis, it does not operate during execution. Latency (such as when a memory access instruction causes a cache miss and performs a fill process, or an operation instruction whose processing time varies depending on the value) cannot be handled. In addition, for processors that have software compatibility but are not hardware compatible, the processing weighting for each processor changes, so it is necessary to optimize programs for each processor. It is a target.

【０００９】[0009]

【発明が解決しようとする課題】第一の問題点は、０ｕ
ｔ０ｆ０ｒｄｅｒ実行を行うマイクロプロセッサにおい
て、プログラム実行中のある時点で、データ依存性が解
消された（即ち実行可能と判断される）ものが同時に複
数あった場合、従来の「古いものから順に選択して実行
する」という選択方法では、大きなオーバヘッドを生じ
る可能性があり、性能低下を引き起こすことである。そ
の理由は、あるプログラムが完成したとすると、そのプ
ログラムに内在するデータ依存関係は固定のものであ
り、そのデータ依存関係に基づきプログラムの並列性を
抽出しながら０ｕｔ０ｆ０ｒｄｅｒ実行できる一般的な
マイクロプロセッサで実行したとすれば、その実行順序
は複数存在することになり、その実行方法でプログラム
の実行時間が変化するためである。The first problem is that 0u
In a microprocessor that executes t0f0rder execution, if there is a plurality of microprocessors at the same time during execution of a program, the data dependency of which has been eliminated (that is, determined to be executable) at the same time. The "execute" selection method can result in significant overhead, causing performance degradation. The reason is that when a certain program is completed, the data dependency inherent in the program is fixed, and a general microprocessor that can execute 0ut0f0rder while extracting the parallelism of the program based on the data dependency is used. If they are executed, there will be a plurality of execution orders, and the execution time of the program will change depending on the execution method.

【００１０】第二の問題点は、命令実行順序を完全に指
定することで第一の問題点を解決するため、プログラム
中に含まれる依存関係を、コンパイル時その他に静的に
解析できたとしても、プログラムの実行順序を完全に最
適化することは非効率かつ困難であることである。その
理由は、同一のソフトウエアが動作するが、アーキテク
チャが異なる複数のマイクロプロセッサにおいては、そ
れらの命令実行時間は一般に異なるために、一つのプロ
グラムをそれぞれのマイクロプロセッサに対応するよう
に静的に解析し最適化をする必要があり、非効率的であ
る。また、たとえ対象のマイクロプロセッサが一つであ
ったとしても、命令のうちには実行時に実行時間が動的
に変化するものがあるため（例：メモリアクセス命令の
キャッシュミスに伴うキャッシュフィル処理）、静的に
解析することが極めて困難だからである。The second problem is that, in order to solve the first problem by completely specifying the order of instruction execution, it is assumed that dependencies included in a program can be statically analyzed at compile time or the like. However, it is inefficient and difficult to completely optimize the execution order of a program. The reason is that the same software operates, but in a plurality of microprocessors having different architectures, their instruction execution times are generally different. Therefore, one program is statically assigned so as to correspond to each microprocessor. It needs to be analyzed and optimized, which is inefficient. Even if the number of target microprocessors is one, there are some instructions whose execution time dynamically changes at the time of execution (eg, cache fill processing due to a cache miss of a memory access instruction). This is because it is extremely difficult to perform static analysis.

【００１１】本発明は上記のような従来のものの欠点を
除去すべくなされたもので、レイテンシを伴う命令実行
時のレイテンシを隠蔽するため、命令実行時の履歴を保
管し、その履歴に基づき命令実行順を実行時に動的に決
定するための機構を備えたマイクロプロセッサを提供す
ることを目的とする。この機構は実行時に動的に実行順
序を決定するため、プログラムの事前の静的解析や最適
化の必要がない。SUMMARY OF THE INVENTION The present invention has been made to eliminate the above-mentioned drawbacks of the prior art. In order to conceal the latency at the time of executing an instruction with a latency, a history at the time of executing the instruction is stored, and the instruction is stored based on the history. It is an object of the present invention to provide a microprocessor having a mechanism for dynamically determining an execution order at the time of execution. Since this mechanism dynamically determines the execution order at the time of execution, there is no need for prior static analysis or optimization of the program.

【００１２】[0012]

【課題を解決するための手段】本発明におけるマイクロ
プロセッサは、命令フェッチを行う命令フェッチ機構
（図１のＦ０）、フェッチした命令のデコードを行う命
令デコード機構（図１のＤ０）、命令を実際に実行する
命令実行機構（図１のＥ０）、実行後の後処理を行う実
行後処理部（図１のＷ０）を備え、マイクロプロセッサ
内あるいはその近傍に、前述の性能低下状態を検出する
性能監視機構（図１のＰ０）と、性能低下を引き起こす
可能性のあるレイテンシを伴う命令に関して、その情報
を格納する記億装置である履歴管理メモリ（図１のＨ
０）を備え、また、この記億装置に格納された命令の履
歴から命令の実行順を決定する選択手段を持った命令実
行制御部（図１のＡ０）を備えることを特徴とする。A microprocessor according to the present invention comprises an instruction fetch mechanism (F0 in FIG. 1) for fetching instructions, an instruction decode mechanism (D0 in FIG. 1) for decoding fetched instructions, and And a post-execution processing unit (W0 in FIG. 1) for performing post-execution post-processing. The performance of detecting the above-mentioned performance degradation state in or near the microprocessor is provided. A monitoring mechanism (P0 in FIG. 1) and a history management memory (H in FIG. 1) which is a storage device for storing information on instructions with a latency that may cause performance degradation.
0), and further includes an instruction execution control unit (A0 in FIG. 1) having a selection means for determining the order of execution of instructions from the history of instructions stored in the storage device.

【００１３】[0013]

【発明の実施の形態】次に、本発明の一実施形態につい
て図面を参照して説明する。図１は本発明の一実施形態
の構成を示すブロック図である。図中のＦ０は命令フェ
ッチ機構、Ｄ０は命令デコード機構であり、実行される
と予測される複数の命令を同時にデコードする。Ｅ０は
命令実行機構であり、同時に複数の命令を実行する。Ｗ
０は実行後処理部である。また、Ａ０は命令実行制御部
であり、依存性が解消された命令のうち、命令実行機構
Ｅ０で次に実行する命令を選択する。性能監視機構Ｐ０
はマイクロプロセッサの稼働率を監視する。履歴管理メ
モリＨ０は、長いレイテンシでプロセッサの稼働率を下
げた履歴のある命令に関する情報を格納する。命令フェ
ッチ機構Ｆ０，命令デコード機構Ｄ０，実行後処理部Ｗ
０，命令実行制御部Ａ０，性能監視機構Ｐ０，履歴管理
メモリＨ０はすべてパイプライン処理を行うものとす
る。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of one embodiment of the present invention. In the figure, F0 is an instruction fetch mechanism, and D0 is an instruction decode mechanism, which simultaneously decodes a plurality of instructions predicted to be executed. E0 is an instruction execution mechanism that executes a plurality of instructions at the same time. W
0 is a post-execution processing unit. A0 is an instruction execution control unit that selects an instruction to be executed next by the instruction execution unit E0 among instructions whose dependencies have been eliminated. Performance monitoring mechanism P0
Monitors microprocessor utilization. The history management memory H0 stores information related to instructions having a history in which the operation rate of the processor has been lowered with a long latency. Instruction fetch mechanism F0, instruction decode mechanism D0, post-execution processing unit W
0, the instruction execution control unit A0, the performance monitoring mechanism P0, and the history management memory H0 all perform pipeline processing.

【００１４】図２に示すプログラムを図１に示す構成で
実行した時の例を示す。まず、マイクロプロセッサＭＰ
の命令フェッチ機構Ｆ０がメモリＭ０から複数の実行す
る命令Ｉ１をフェッチする。メモリＭ０はマイクロプロ
セッサＭＰ内に設置されることもある。命令デコード機
構Ｄ０はフェッチしてきた命令をマイクロプロセッサが
実行可能な信号にデコードする。また、デコード対象の
命令について過去に低稼働率状態クロックが登録されて
いれば、履歴管理メモリＨ０から読み出す。An example when the program shown in FIG. 2 is executed with the configuration shown in FIG. First, the microprocessor MP
Instruction fetch mechanism F0 fetches a plurality of instructions I1 to be executed from the memory M0. The memory M0 may be provided in the microprocessor MP. The instruction decode mechanism D0 decodes the fetched instruction into a signal executable by the microprocessor. If a low operation rate state clock has been registered for the instruction to be decoded in the past, the instruction is read from the history management memory H0.

【００１５】命令実行制御部Ａ０はデコード時に参照し
た履歴管理メモリＨ０の内容を参照し、デコードした命
令Ｉ１が過去に命令Ｉ１自身のレイテンシでプロセッサ
の稼働率を低下させたことがないかを調べる。もし過去
にそのような履歴があった場合、その命令を優先的に実
行する。この例では命令Ｉ１にはそのような履歴がな
く、履歴管理メモリＨ０に登録されていないので、通常
の「古いものから選択する」方法で選択する。The instruction execution control unit A0 refers to the contents of the history management memory H0 referenced at the time of decoding, and checks whether or not the decoded instruction I1 has previously reduced the operating rate of the processor due to the latency of the instruction I1 itself. . If there is such a history in the past, the instruction is executed with priority. In this example, since the instruction I1 has no such history and is not registered in the history management memory H0, the instruction I1 is selected by the usual “select from oldest” method.

【００１６】命令実行制御部Ａ０によって発行（選択）
された命令は、命令実行機構Ｅ０において実行される。
命令実行機構Ｅ０に対して、性能監視機構Ｐ０は実行中
の命令Ｉ１のアドレスを保持し、その命令が命令実行機
構Ｅ０内に存在する時に、命令実行機構Ｅ０に存在する
命令数を計数し（命令実行機構Ｅ０が命令Ｉｘしか実行
していない時は、その命令数は１）、その命令数が事前
に設定した命令数Ｔ０以下の状態のクロック数をカウン
トする。ここで、アドレスＩｘの命令が命令実行機構Ｅ
０に存在するときの低稼働率状態クロック数をＣ（Ｉ
ｘ）と定義する。なお命令実行機構Ｅ０の稼働率はその
時に命令実行機構Ｅ０にいる命令数を基準とし、Ｔ０＝
１とする。そのカウント値が事前に決めておいたクロッ
ク数Ｔ１よりも大きかったら、履歴管理メモリＨ０に命
令アドレスとその低稼働率状態のクロック数を格納す
る。この例ではＴ１＝５とする。命令Ｉ１における低稼
働率状態クロック数Ｃ（Ｉ１）＝１でＴ１よりも小さ
く、履歴管理メモリＨ０に対する登録は行わない。Issued (selected) by the instruction execution control unit A0
The executed instruction is executed in the instruction execution unit E0.
For the instruction execution unit E0, the performance monitoring unit P0 holds the address of the instruction I1 being executed, and when the instruction exists in the instruction execution unit E0, counts the number of instructions existing in the instruction execution unit E0 ( When the instruction execution unit E0 is executing only the instruction Ix, the number of instructions is 1), and the number of clocks in a state where the number of instructions is equal to or less than the preset instruction number T0 is counted. Here, the instruction at the address Ix is the instruction execution unit E
The number of low operation rate state clocks when they are at 0 is represented by C (I
x). The operation rate of the instruction execution unit E0 is based on the number of instructions in the instruction execution unit E0 at that time.
Let it be 1. If the count value is larger than the predetermined number of clocks T1, the instruction address and the number of clocks in the low operation rate state are stored in the history management memory H0. In this example, T1 = 5. The number of clocks in the low operation rate state C (I1) = 1 in the instruction I1 is smaller than T1 and is not registered in the history management memory H0.

【００１７】履歴管理メモリＨ０は命令アドレスＩｘを
キーとして参照する連想記憶として構成し、その内容は
低稼働率状態クロック数Ｃ（Ｉｘ）を格納する。さて図
２において、命令Ｉ１からＩ７の実行はそれぞれ１クロ
ックで終了するため、性能監視機構Ｐ０は低稼働率状態
を検出することはなく、したがって履歴管理メモリＨ０
にも命令Ｉ１からＩ７までの情報は書き込まれない。The history management memory H0 is configured as an associative memory which refers to the instruction address Ix as a key, and stores the number C (Ix) of the low operation rate state clocks. In FIG. 2, since the execution of the instructions I1 to I7 is completed in one clock, the performance monitoring mechanism P0 does not detect the low operation rate state, and therefore, the history management memory H0
No information from the instructions I1 to I7 is written.

【００１８】しかし、命令Ｉ８，Ｉ９が実行されると、
最初の１クロックは命令Ｉ８，Ｉ９が一緒に実行されて
いるため命令実行機構Ｅ０の稼働率は２であるが、次の
クロックでは命令Ｉ８の実行は終了しているため、命令
Ｉ９だけが残りの１９クロックにわたり命令実行機構Ｅ
０に残って処理を続けることになる。この場合、稼働命
令数は１であるため、性能監視機構Ｐ０は命令Ｉ９に対
するＣ（Ｉ９）を１９までカウントアップすることにな
る。Ｃ（Ｉ９）はＴ１よりも大きいため、履歴管理メモ
リＨ０に命令Ｉ９のアドレスとＣ（Ｉ９）＝１９が書き
込まれる。However, when the instructions I8 and I9 are executed,
In the first one clock, the operation rates of the instruction execution unit E0 are 2 because the instructions I8 and I9 are executed together, but in the next clock, the execution of the instruction I8 has been completed, so that only the instruction I9 remains. Instruction execution mechanism E over 19 clocks
The process remains at 0. In this case, since the number of operating instructions is 1, the performance monitoring mechanism P0 counts up C (I9) for the instruction I9 to 19. Since C (I9) is larger than T1, the address of the instruction I9 and C (I9) = 19 are written in the history management memory H0.

【００１９】以上のプロセスでは命令Ｉ１からＩ１０ま
での実行時間は２５クロックかかる。その後プログラム
の実行が進み、再び命令Ｉ１が実行されると、実行可能
命令はＩ２，Ｉ３，Ｉ９となる。命令デコード機構Ｄ０
から命令実行制御部Ａ０に制御が移ると、命令実行制御
部Ａ０は履歴管理メモリＨ０を参照して、過去に命令Ｉ
９が１９クロックの低稼働率状態を引き起こしたことを
検出する。これを検出すると命令実行制御部Ａ０は優先
的に命令Ｉ９を選択する。もう一つは古い方の命令Ｉ２
を選択する。そして、命令実行機構Ｅ０において命令Ｉ
２，Ｉ９が実行を開始する。In the above process, the execution time of the instructions I1 to I10 takes 25 clocks. Thereafter, the execution of the program proceeds, and when the instruction I1 is executed again, the executable instructions are I2, I3, and I9. Instruction decode mechanism D0
From the control to the instruction execution control unit A0, the instruction execution control unit A0 refers to the history management memory H0,
9 detects that it has caused a 19 clock low availability condition. When this is detected, the instruction execution control unit A0 preferentially selects the instruction I9. The other is the older instruction I2
Select Then, the instruction I
2, I9 starts execution.

【００２０】命令Ｉ９が実行されている間、Ｉ２，Ｉ
３，Ｉ４，Ｉ５，Ｉ６，Ｉ７，Ｉ８の順に一つづつ命令
が命令実行機構Ｅ０で実行される。命令Ｉ９はその後１
３クロックの間、稼動命令数１の状態が続く。性能監視
機構Ｐ０においてＣ（Ｉ９）＝１３までカウントアップ
される。Ｔ１よりも大きな値であるが、以前のＣ（Ｉ
９）＝１９よりも小さいため、履歴管理メモリＨ０の値
を変更しない。もし以前の値よりもＣ（Ｉ９）が大きく
なったならば、履歴管理メモリＨ０にＣ（Ｉ９）とアド
レスを登録する。そして命令Ｉ９が実行を終えると、初
めて命令Ｉ１０が実行可能となり、命令実行機構Ｅ０で
実行される。While instruction I9 is being executed, I2, I
Instructions are executed by the instruction execution unit E0 one by one in the order of 3, I4, I5, I6, I7, and I8. Instruction I9 is then 1
During three clocks, the state of the number of operating instructions 1 continues. The performance monitoring mechanism P0 counts up to C (I9) = 13. Although it is a value larger than T1, the C (I
9) Since the value is smaller than = 19, the value of the history management memory H0 is not changed. If C (I9) is larger than the previous value, register C (I9) and the address in the history management memory H0. When the execution of the instruction I9 is completed, the instruction I10 becomes executable for the first time, and is executed by the instruction execution mechanism E0.

【００２１】２回目のプロセスでは、命令Ｉ１からＩ１
０までの実行時間は２２クロックとなり、３クロックの
性能向上が実現された。以上示したように本実施形態に
よれば、複数の実行可能な命令について、その履歴を参
照することによって、レイテンシを伴う可能性のある命
令かどうかを実行時に動的に知ることが出来る。また、
これを命令発行の選択に用いることにより、命令レイテ
ンシによって実行部の稼働率が低下する状態のクロック
数を減少させ、命令の処理効率を向上させることができ
るといった効果がある。In the second process, the instructions I1 to I1
The execution time up to 0 is 22 clocks, and the performance improvement of 3 clocks has been realized. As described above, according to the present embodiment, by referring to the histories of a plurality of executable instructions, it is possible to dynamically know at execution time whether or not the instructions may have a latency. Also,
By using this for selecting instruction issuance, the number of clocks in a state where the operation rate of the execution unit is reduced due to instruction latency can be reduced, and the processing efficiency of instructions can be improved.

【００２２】次に、本発明におけるマイクロプロセッサ
の一実施形態を図３に示して説明する。図３の１，２は
それぞれ命令デコード機構Ｄ０からデコードされた２命
令のアドレスである。Ｈ０は履歴管理メモリである。こ
の構成ではＣＡＭ構造をとり、そのエントリ数は２５６
エントリである。しかし、一般的にはセットアソシアテ
ィブ構成やダイレクトマップ構成も考えられる。エント
リ数は多いほうが望ましい。本実施形態では、同時にデ
コードする２つの命令のアドレスによって同時に値を参
照できるよう、２つのリードポートを持つメモリとして
構成する。その内容は、前述の低稼働率状態クロック数
をしめす８ｂｉｔのデータである。Next, an embodiment of the microprocessor according to the present invention will be described with reference to FIG. Reference numerals 1 and 2 in FIG. 3 denote addresses of two instructions decoded from the instruction decode mechanism D0, respectively. H0 is a history management memory. This configuration has a CAM structure and the number of entries is 256.
Entry. However, in general, a set associative configuration or a direct map configuration is also conceivable. It is desirable that the number of entries be large. In the present embodiment, a memory having two read ports is configured so that a value can be referred to at the same time by the addresses of two instructions to be decoded at the same time. The content is 8-bit data indicating the number of low operation rate state clocks.

【００２３】アドレス１，２によってそれぞれ履歴管理
メモリＨ０から読まれたデータ１０，１１は命令同期機
構２０に送られる。これはデコードした命令の依存関係
を調べるための同期機構であり、一般的なマイクロプロ
セッサに搭載されているものと同様のものである。命令
同期機構２０において命令の依存性が解決されたと判断
された命令はメモリ１４に書き込まれる。メモリ１４に
はその命令の情報と、その命令に対応する低稼働率クロ
ック数を格納する。この例では（８ｂｉｔの低稼働率状
態クロック数＋発行する命令の情報）×６エントリ分を
用意する。The data 10 and 11 read from the history management memory H0 by the addresses 1 and 2 are sent to the instruction synchronization mechanism 20. This is a synchronization mechanism for examining the dependencies of decoded instructions, and is the same as that mounted on a general microprocessor. The instruction determined to have the instruction dependency resolved by the instruction synchronization mechanism 20 is written to the memory 14. The memory 14 stores information on the instruction and the number of low operation rate clocks corresponding to the instruction. In this example, (8-bit low operation rate state clock number + information of issued instruction) × 6 entries are prepared.

【００２４】比較器１５によってメモリ１４に格納され
ている６命令のうち、１番目と２番目に多い低稼働率状
態クロック数の命令を選択する。その２つの命令は命令
実行機構Ｅ０によって実行される。性能監視機構Ｐ０は
命令実行機構Ｅ０の状態を監視し、低稼働率状態クロッ
ク数が前述Ｔ１よりも大きくかつデコード時にメモリ１
４から得られた過去の値よりも大きかった場合に、メモ
リ１４に対してそのクロック数を書き込む。The comparator 15 selects the first and second largest number of low operation rate state clocks among the six instructions stored in the memory 14. The two instructions are executed by the instruction execution unit E0. The performance monitoring mechanism P0 monitors the state of the instruction execution mechanism E0, and the number of low operation rate state clocks is larger than the above-described T1 and the memory 1 is decoded.
If the number of clocks is larger than the past value obtained from 4, the number of clocks is written to the memory 14.

【００２５】次に、性能監視機構（図１のＰ０）の監視
および登録に関する部分の一実施形態を図４に示して説
明する。図４のＥ０は命令実行機構であり、この実施形
態では２つの命令を同時に実行できるように、命令実行
部Ｅ０１，Ｅ０２を備えている。この図において命令実
行機構Ｅ０以外が性能監視機構（図１のＰ０）の一部を
構成している。Next, an embodiment relating to monitoring and registration of the performance monitoring mechanism (P0 in FIG. 1) will be described with reference to FIG. E0 in FIG. 4 is an instruction execution mechanism. In this embodiment, instruction execution units E01 and E02 are provided so that two instructions can be executed simultaneously. In this figure, parts other than the instruction execution mechanism E0 constitute a part of the performance monitoring mechanism (P0 in FIG. 1).

【００２６】命令実行部に与える２つの命令の命令実行
開始信号３，４と、命令実行部Ｅ０１，Ｅ０２が命令実
行の終了時に発行する命令実行終了信号１１１，１１
２、および命令実行機構に現在いる命令数１００をみ
て、カウンタＡ１３，Ａ１４は「それぞれの命令が開始
してから終了するまでの間、命令実行機構にいる命令数
がＴ０（本実施形態では１）以下の状態」をカウントす
る。このカウントした値が、それぞれの命令に対する
「低稼働率状態クロック数」である。Instruction execution start signals 3 and 4 of two instructions given to the instruction execution unit, and instruction execution end signals 111 and 11 issued by the instruction execution units E01 and E02 at the end of instruction execution.
2 and the number of instructions 100 currently in the instruction execution unit, the counters A13 and A14 indicate that “the number of instructions in the instruction execution unit is T0 (1 in this embodiment) from the start to the end of each instruction. ) The following states are counted. The counted value is the “number of low operation rate state clocks” for each instruction.

【００２７】カウントが終了すると、検出器Ａ１５，Ａ
１６でそれぞれの命令に対して、「過去に履歴管理メモ
リ（図１のＨ０）に登録されておらずかつその命令に対
して今回カウントした低稼働率状態クロック数がＴ１
（本実施形態では５）以上である、あるいは今回カウン
トした低稼働率状態クロック数が過去に履歴管理メモリ
（図１のＨ０）に登録されていた値よりも大きい」とい
う状態を検出する。それぞれの命令についてこの状態が
検出されると、信号線１８あるいは１９上に低稼働率状
態クロック数の履歴管理メモリ（図１のＨ０）に対する
登録指示が出される。これを参照して性能監視機構（図
１のＰ０）は履歴管理メモリ（図１のＨ０）に対する登
録を行う。When the counting is completed, the detectors A15, A15
For each instruction at 16, "the number of low operation rate state clocks that have not been registered in the history management memory (H0 in FIG.
(In this embodiment, 5) or more, or the number of low operation rate state clocks counted this time is larger than the value registered in the history management memory (H0 in FIG. 1) in the past. " When this state is detected for each instruction, an instruction to register the low operation rate state clock number in the history management memory (H0 in FIG. 1) is issued on the signal line 18 or 19. By referring to this, the performance monitoring mechanism (P0 in FIG. 1) registers in the history management memory (H0 in FIG. 1).

【００２８】以上、この発明の実施形態を図面を参照し
て詳述してきたが、具体的な構成はこの実施形態に限ら
れるものではなく、この発明の要旨を逸脱しない範囲の
設計の変更等があってもこの発明に含まれる。The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and a design change or the like may be made without departing from the gist of the present invention. Even if there is, it is included in the present invention.

【００２９】[0029]

【発明の効果】第一の効果は、プログラム実行において
最適な命令実行スケジューリングが行えることである。
これによって、マイクロプロセッサのプログラム実行性
能が向上する。その理由は、一度オーバヘッドが発生し
たことが検出されると、２度目以降にはより適した方法
でスケジューリングを行うため、最適な命令実行スケジ
ューリングを行うようになるためである。第二の効果
は、第一の効果を、プログラムの静的解析や事前の最適
化なしに高速化を行えることである。これにより、冗長
なプログラム変換の必要がない。実行時に動的に履歴を
登録し、動的に参照しながらスケジューリングを行って
実行を行うためである。The first effect is that optimal instruction execution scheduling can be performed in program execution.
Thereby, the program execution performance of the microprocessor is improved. The reason is that, once the occurrence of the overhead is detected, the scheduling is performed by a more suitable method for the second and subsequent times, so that the optimal instruction execution scheduling is performed. The second effect is that the first effect can be accelerated without static analysis of the program or prior optimization. This eliminates the need for redundant program conversion. This is because a history is dynamically registered at the time of execution and scheduling is performed while dynamically referring to the history.

[Brief description of the drawings]

【図１】本発明の概略を示す構成図である。FIG. 1 is a configuration diagram showing an outline of the present invention.

【図２】命令実行の順番によってはオーバーヘッドが
大きくなるプログラムの例である。FIG. 2 is an example of a program in which overhead increases depending on the order of instruction execution.

【図３】本発明の一実施形態を示す構成図である。FIG. 3 is a configuration diagram showing an embodiment of the present invention.

【図４】本発明の一実施形態における命令実行機構と
性能監視機構の一部である。FIG. 4 shows a part of an instruction execution mechanism and a performance monitoring mechanism according to an embodiment of the present invention.

[Explanation of symbols]

Ｍ０……メモリ、ＭＰ……マイクロプロセッサ、Ｆ０
……命令フェッチ機構、Ｄ０……命令デコード機構、
Ａ０……命令実行制御部、Ｅ０……命令実行機構、Ｗ
０……実行後処理部、Ｐ０……性能監視機構、Ｈ０…
…履歴管理メモリM0: Memory, MP: Microprocessor, F0
...... Instruction fetch mechanism, D0 ... Instruction decode mechanism,
A0: Instruction execution control unit, E0: Instruction execution mechanism, W
0: Post-execution processing unit, P0: Performance monitoring mechanism, H0:
… History management memory

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平６−95862（ＪＰ，Ａ) 特開平９−120359（ＪＰ，Ａ) 特開平４−252336（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/38 G06F 11/28 - 11/34 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-6-95862 (JP, A) JP-A-9-120359 (JP, A) JP-A-4-252336 (JP, A) (58) Field (Int.Cl. ⁷ , DB name) G06F 9/38 G06F 11/28-11/34

Claims

(57) [Claims]

1. A microprocessor which fetches instructions in a program from a memory, decodes a plurality of instructions simultaneously, and can execute a plurality of instructions at the same time. A microprocessor capable of executing instructions in an order other than the static order of instructions, wherein the microprocessor includes a plurality of execution units.
A microprocessor having means for detecting a state in which an execution unit is not used and a specific instruction that caused the state .

2. The microprocessor according to claim 1, wherein said microprocessor includes a plurality of execution units.
A microprocessor having a performance monitoring mechanism for counting the number of clocks when a certain execution unit is not used .

3. A microprocessor according to claim 2, the registration means in the history management memory and the history management Memorihe is a storage device for registering in association with the clock number of instructions and the condition that caused the state A microprocessor comprising: means for extracting information registered from the storage device for an instruction specified by an address of an instruction in a program.

4. The microprocessor according to claim 3, wherein a group of instructions determined to be executable at any time during execution of the program is determined based on information corresponding to each instruction of the group of instructions extracted from the storage device. And a means for determining an order of executing instructions to be executed, and an instruction executing means for executing instructions according to the order of execution.