JP2760273B2

JP2760273B2 - Arithmetic device and control method thereof

Info

Publication number: JP2760273B2
Application number: JP5326658A
Authority: JP
Inventors: 淳鳥居
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1993-12-24
Filing date: 1993-12-24
Publication date: 1998-05-28
Anticipated expiration: 2013-05-28
Also published as: JPH07182168A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は情報処理装置で用いられ
るプロセッサに関し、特に複数の命令流（プログラム）
を並列に処理可能な高性能なプロセッサに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a processor used in an information processing apparatus, and more particularly to a plurality of instruction streams (programs).
And a high-performance processor capable of processing in parallel.

【０００２】[0002]

【従来の技術】プロセッサの高速化の技術として、複数
の機能実行装置を用意し、命令単位の並列性を利用して
複数の機能実行装置に同時に命令を発行することにより
処理速度を向上する方式が実用化されている。この方式
を用いた場合、理想的には動作クロック周波数よりも高
い割合で命令を処理することが可能である。このような
処理を行なうためには、機能実行装置を追加する他に命
令実行の結果の正当性を保証するための論理回路が必要
となる。この方式は一般にはスーパスカラ方式と呼ば
れ、米Digital Equipement Corporation社が発表したal
pha アーキテクチャや米IBM,Motorola,Apple社のPower
アーキテクチャ、Hewlett-Packard 社のPA-RISC アーキ
テクチャなどに採用されている。2. Description of the Related Art As a technique for increasing the speed of a processor, a method of preparing a plurality of function execution devices and simultaneously issuing instructions to the plurality of function execution devices using the parallelism of the instruction unit to improve the processing speed. Has been put to practical use. When this method is used, it is ideally possible to process instructions at a higher rate than the operation clock frequency. In order to perform such processing, a logic circuit for guaranteeing the validity of the result of instruction execution is required in addition to adding a function execution device. This method is generally called the super scalar method, and has been announced by Digital Equipement Corporation.
pha architecture, IBM IBM, Motorola, Apple Power
Architecture, used in Hewlett-Packard's PA-RISC architecture.

【０００３】しかしながら、命令単位の並列性の利用に
は限界があり、この方式での性能向上は機能実行装置を
無限に増やした場合にも、実際には３〜４倍に抑えられ
てしまうといわれている。この並列性を制限する要素は
命令間の依存関係と分岐による命令流の乱れによるもの
である。これらの制約を解消する方式も様々提案されて
いるが、ハードウェアの極端な複雑化に対して性能向上
は４０倍程度が限界とされている。このような限界は、
Monica S.Lam氏らが1992年に発表した論文（Monica S.L
am and Robert P.Wilson,"Limits of Control Flow on
Parallelsim ",IThe 19th International Symposium on
Computer Architecture,IEEE ComputerSociety Press,
1992,pp.46-57）に述べられている。[0003] However, there is a limit to the use of the parallelism of the instruction unit, and the performance improvement in this method is actually suppressed to three to four times even when the number of function execution devices is increased infinitely. It is said. The factors that limit this parallelism are the dependencies between instructions and the disruption of instruction flow due to branching. Various schemes for overcoming these restrictions have been proposed, but the maximum performance improvement is limited to about 40 times against extreme complexity of hardware. These limitations are:
A paper published by Monica S. Lam et al. In 1992 (Monica SL
am and Robert P. Wilson, "Limits of Control Flow on
Parallelsim ", IThe 19th International Symposium on
Computer Architecture, IEEE ComputerSociety Press,
1992, pp. 46-57).

【０００４】一方、命令単位の並列性ではなく、複数の
命令流の命令を並列に実行することにより機能実行装置
の利用効率をあげて、処理速度向上を図る方法も提案さ
れている。この方式では、命令流間で依存関係がないた
め、前記の命令単位並列性より性能向上が図りやすい。On the other hand, there has been proposed a method for improving the processing speed by increasing the efficiency of use of the function execution device by executing instructions in a plurality of instruction streams in parallel, instead of parallelism in instruction units. In this method, since there is no dependency between instruction streams, it is easier to improve the performance than the above-described instruction unit parallelism.

【０００５】この方式は平田氏らが1993年に発表した論
文（平田博章，木村浩三，永峰聡，西澤貞次，鷺島敬
之，「多重スレッド・多重命令発行を用いる要素プロセ
ッサ，アーキテクチャ」情報処理学会論文誌1993 Vol34
No.4 pp.595-605）で提案された方式などがある。図２
はこの実施例の概要構成を示すものである。以下、この
実施例を従来技術の実施例として説明する。[0005] This method is based on a paper published by Hirata et al. In 1993 (Hiraki Hirata, Kozo Kimura, Satoshi Nagamine, Sadaji Nishizawa, Takayuki Sagishima, "Element Processor Using Multiple Threads and Multiple Instructions, Architecture," IPSJ paper. Magazine 1993 Vol34
No.4 pp.595-605). FIG.
Shows a schematic configuration of this embodiment. Hereinafter, this embodiment will be described as an embodiment of the related art.

【０００６】図２において、１１は命令取得装置、１２
は命令解読装置、１３は機能実行装置、２１は命令間の
依存解析装置である。また、２２は、機能実行装置１３
をスケジュールする命令調停装置である。命令解読装置
１２は命令調停装置２２が命令を受け入れられる状態に
あり、また、命令依存装置２１から、命令発行可能であ
る旨の指示を受けている場合に、命令を発行する。各々
の命令解読装置１２から発行された命令は、命令調停装
置２２によって、必要な機能実行装置１３に割り当てら
れ、実際の実行が行なわれる。この命令調停装置２２に
よって、機能実行装置１３は各命令間で共有し、利用効
率を向上させることが可能となる。また、命令調停装置
２２を機能実行装置１３毎に分散することにより、命令
調停装置の簡単化が可能である。In FIG. 2, reference numeral 11 denotes an instruction acquisition device;
Is an instruction decoding device, 13 is a function execution device, and 21 is a dependency analysis device between instructions. Reference numeral 22 denotes the function execution device 13
Is an instruction arbitration device that schedules The instruction decoding device 12 issues an instruction when the instruction arbitration device 22 is in a state where the instruction can be accepted, and when receiving an instruction from the instruction dependent device 21 that the instruction can be issued. The instructions issued from each of the instruction decoding devices 12 are assigned to the necessary function execution devices 13 by the instruction arbitration device 22, and the actual execution is performed. With the instruction arbitration device 22, the function execution device 13 can be shared among the instructions, and the use efficiency can be improved. Further, by distributing the instruction arbitration device 22 for each of the function execution devices 13, it is possible to simplify the instruction arbitration device.

【０００７】[0007]

【発明が解決しようとする課題】上記の従来型機能実行
装置共有機構は、以下の問題を抱えている。第一に、命
令実行のパイプラインに命令調停処理が加わることによ
り、命令実行のレイテシンが増大し、この結果、分岐命
令などのパイプライン構造に適合しない命令実行時や例
外処理時に処理速度低下をもたらすこと。第二に、命令
調停装置は、命令流が増加し、命令解読装置が増えた場
合には、調停する命令が増加し、また調停論理も複雑化
するため、複雑な構造となり、動作周波数向上を防げる
要因になること。第三に、命令流が増加し、命令解読装
置が増えた場合には機能実行装置と機能実行のために必
要な一時記憶装置（レジスタ）や、バッファ記憶装置
（キャッシュメモリ）間との接続が複雑なスイッチによ
って実現する必要が生じるため、遅延が増大し、動作周
波数向上を防げる要因になることである。これらの問題
によって、上記の機構で同時実行できる命令流及び動作
周波数は制限されてしまう。The above-mentioned conventional function execution device sharing mechanism has the following problems. First, the addition of instruction arbitration processing to the instruction execution pipeline increases the latency of instruction execution, resulting in a reduction in processing speed during instruction execution or exception processing that does not conform to the pipeline structure such as branch instructions. To bring. Second, the instruction arbitration device has a complicated structure because the number of instructions to be arbitrated increases and the arbitration logic becomes complicated when the instruction flow increases and the number of instruction decoding devices increases. Being a preventable factor. Third, if the instruction flow increases and the number of instruction decoding devices increases, the connection between the function execution device and the temporary storage device (register) required for executing the function or the connection between the buffer storage device (cache memory) is reduced. Since it is necessary to realize the operation by using a complicated switch, the delay increases, which is a factor that prevents the operating frequency from being improved. These problems limit the instruction stream and operating frequency that can be executed simultaneously by the above mechanisms.

【０００８】本発明の目的は、高い動作周波数におい
て、命令実行のレイテシンを増やすことなく、実用的な
範囲内で機能実行装置の利用率を高め、命令流間で機能
実行装置の共有を行なうプロセッサの制御方式を提供す
ることにある。SUMMARY OF THE INVENTION An object of the present invention is to increase the utilization rate of a function execution device within a practical range without increasing the latency of instruction execution at a high operating frequency, and to share the function execution device between instruction streams. To provide a control method.

【０００９】[0009]

【課題を解決するための手段】上記問題を解決するため
に、本発明においては、複数の命令流が並列に動作する
プロセッサ、すなわち、複数の命令取得装置、複数の命
令解読装置、複数の機能実行装置、複数の一時記憶装置
（レジスタ）、バッファ装置（キャッシュ）を持つプロ
セッサに、資源割当制御装置を追加する。In order to solve the above problems, in the present invention, a processor in which a plurality of instruction streams operate in parallel, that is, a plurality of instruction acquisition devices, a plurality of instruction decoding devices, and a plurality of functions is provided. A resource allocation control device is added to a processor having an execution device, a plurality of temporary storage devices (registers), and a buffer device (cache).

【００１０】この資源割当制御装置は、各装置からの命
令実行情報と内部設定情報から、命令流のクロック毎に
使用可能な機能実行装置を決定する機構である。命令解
読装置は資源割当制御装置によって決定された機能実行
装置使用許可情報に基づき、命令を実行できる機能実行
装置に対して、命令を発行することが可能となる。つま
り、資源割当制御装置は、プロセッサ内にある全ての機
能実行装置について、命令流毎に使用可否を決定する。
命令解読装置は、使用可能な機能実行装置の中から、命
令流のプログラムカウンタに応じた命令を実行する機能
実行装置を決定、命令発行を行なう。また、その命令発
行情報を資源割当制御装置に通知することによって、機
能実行装置によって機能実行された結果を格納する一時
記憶装置、バッファ記憶装置を選択する。This resource allocation control device is a mechanism for determining a function execution device that can be used for each clock of an instruction flow from the instruction execution information from each device and the internal setting information. The instruction decoding device can issue the instruction to the function execution device that can execute the instruction based on the function execution device use permission information determined by the resource allocation control device. That is, the resource allocation control device determines whether or not all the function execution devices in the processor can be used for each instruction stream.
The instruction decoding device determines a function execution device that executes an instruction according to a program counter of an instruction stream from among available function execution devices, and issues the instruction. Further, by notifying the instruction issuance information to the resource allocation control device, a temporary storage device and a buffer storage device for storing a result of executing the function by the function execution device are selected.

【００１１】資源割当制御装置は、システム内に１つ、
もしくは機能実行装置の数を用意して機能実行装置毎に
分散する。また、各命令解読装置からは、各々同時に一
つの命令を発行するか命令取得装置から、命令解読装置
への同時命令伝達数を複数にして、利用可能な機能実行
装置に応じて、命令を非順序発行、スーパスカラ技術を
利用して、利用可能な機能実行装置に応じて、命令を複
数同時発行する。[0011] One resource allocation control device is provided in the system.
Alternatively, the number of function execution devices is prepared and distributed for each function execution device. In addition, each instruction decoding device issues one instruction at the same time, or sets the number of simultaneous instructions transmitted from the instruction acquisition device to the instruction decoding device to a plurality, and deactivates the instruction according to the available function execution device. A plurality of instructions are issued at the same time according to the available function execution device using the order issue and superscalar technology.

【００１２】また、資源割当装置は、各命令流に平等に
資源を割り付けるか、もしくは、命令流毎の機能実行装
置利用状況をソフトウェア、例えばコンパイラなどであ
らかじめ調べておき資源割当制御装置に伝えることによ
り、資源割当制御装置における機能実行装置の各命令流
毎の割当を決める。もしくは、命令流毎に命令解読装置
から機能実行装置への命令発行情報を資源割当制御装置
が監視することにより、命令流の命令発行の傾向を動的
に検出し、それに応じた資源割当を資源割当制御装置が
行なう。Further, the resource allocating device allocates resources equally to each instruction stream, or checks the use state of the function execution device for each instruction flow by software, for example, a compiler, and notifies the resource allocation control device. Thus, the assignment for each instruction stream of the function execution device in the resource assignment control device is determined. Alternatively, the resource allocation control device monitors the instruction issue information from the instruction decoding device to the function execution device for each instruction flow, thereby dynamically detecting the tendency of instruction issue in the instruction flow, and allocating the resource allocation according to the tendency. This is performed by the assignment control device.

【００１３】命令流が全ての機能実行装置を使用できる
ように論理回路を実現するか、個々の命令流が利用でき
る機能実行装置を制限し、一時記憶装置、バッファ記憶
装置と機能実行装置間の接続を簡単化する方式をとるか
はプロセッサの設計に依存し、更に資源割当制御装置を
それに合わせて分散化し、資源割当装置の論理を簡単化
する手段も選択できる。A logic circuit is realized so that the instruction stream can use all the function execution devices, or the function execution devices that can use the individual instruction streams are limited, and the temporary storage device, the buffer storage device, and the function execution device are used. The method of simplifying the connection depends on the design of the processor, and the resource allocation control device may be distributed in accordance with the design, and a means for simplifying the logic of the resource allocation device may be selected.

【００１４】[0014]

【作用】請求項１に記載の発明のプロセッサでは、命令
解読装置は資源割当制御装置の許可に従って、機能実行
装置に命令を発行する。命令解読装置に入るのと同時に
機能実行装置の使用許可情報も得られ、命令発行時に使
用する機能実行装置を決定してしまうため、命令発行後
に機能実行装置の調停をするというパイプライン段を増
やすことなく、機能実行装置の命令流間における共有が
実現され、かつ機能実行装置を効率的に利用することが
可能である。また、稀にしか用いられないような機能実
行ユニットはプロセッサ内で一つもしくは同時動作命令
流よりも少なくしておくことによって、チップ内の資源
の有効利用が図れる。この資源割当制御装置は、各装置
からの情報と内部情報によって、命令流のクロック毎の
使用可能な機能実行装置を決定する。一つ前の命令実行
までの情報から資源割当が決定されるため、従来方式ほ
ど機能実行装置の利用効率は望めないが、構造の簡単化
とパイプライン短縮による命令実行レイテシンの短縮に
よって、クロック周波数、処理速度向上が図れる。更
に、請求項２に記載の発明において、資源割当制御装置
を分散化し、個々の資源割当制御装置の構造を簡単にす
るものである。In the processor according to the first aspect of the present invention, the instruction decoding device issues an instruction to the function execution device according to the permission of the resource allocation control device. At the same time as entering the instruction decoding device, the use permission information of the function execution device is also obtained, and the function execution device to be used at the time of issuing the instruction is determined. Therefore, the number of pipeline stages for arbitrating the function execution device after the instruction is issued is increased. Without sharing the function execution devices between the instruction streams, the function execution devices can be used efficiently. Also, by making the number of function execution units that are rarely used one or less than the number of simultaneous operation instructions in the processor, resources in the chip can be effectively used. The resource allocation control device determines a usable function execution device for each clock of an instruction stream based on information from each device and internal information. Since resource allocation is determined from the information up to the previous instruction execution, the utilization efficiency of the function execution device cannot be expected as in the conventional method.However, the clock frequency can be reduced by simplifying the structure and shortening the instruction execution latency by shortening the pipeline. The processing speed can be improved. Further, according to the second aspect of the present invention, the resource allocation control devices are decentralized to simplify the structure of each resource allocation control device.

【００１５】機能実行装置の利用率を高め効率向上を図
るため、請求項３に記載の発明では、命令取得装置から
命令解読装置への同時命令伝達数を複数にして、利用可
能な機能実行装置に応じて、命令を非順序発行するもの
である。また、請求項４に記載の発明では、命令取得装
置から命令解読装置への同時命令伝達数を複数にする。
さらに、スーパスカラ技術を利用して、利用可能な機能
実行装置に応じて、命令を同時発行するものである。こ
れらの技術を用いれは、単一命令流から複数の機能実行
装置を効率的に利用することが可能となり、同時動作命
令流数よりも機能実行装置数が多い場合には、特に効果
的である。According to the third aspect of the present invention, the number of simultaneous instruction transmissions from the instruction acquisition device to the instruction decoding device is increased to increase the utilization rate of the function execution device and improve the efficiency. The instructions are issued out of order according to In the invention described in claim 4, the number of simultaneous commands transmitted from the command acquisition device to the command decoding device is set to be plural.
Furthermore, using superscalar technology, instructions are issued simultaneously according to available function execution devices. The use of these techniques makes it possible to efficiently use a plurality of function execution devices from a single instruction stream, and is particularly effective when the number of function execution devices is larger than the number of simultaneously operating instruction streams. .

【００１６】請求項５に記載の発明では、命令流毎の機
能実行装置利用状況をソフトウェア、例えばコンパイラ
などであらかじめ調べておく。この情報を、資源割当制
御装置に伝えることにより、資源割当制御装置における
機能実行装置の各命令流毎の割当を最適化し、効率の良
い実行を行ない、処理を高速化するものである。また、
請求項６に記載の発明では、命令流毎に命令解読装置か
ら機能実行装置への命令発行情報を資源割当制御装置が
監視することにより、命令流の命令発行の傾向を動的に
検出し、それに応じた資源割当を資源割当制御装置が行
なうことによって、効率の良い実行を行ない、処理を高
速化するものである。これらを組み合わせることによ
り、コードにあった資源割当や、実行が遅れているコー
ドに対して優先的に機能実行装置を割り付けることが可
能となる。In the present invention, the use status of the function execution device for each instruction stream is checked in advance by software, for example, a compiler. By transmitting this information to the resource allocation control device, the allocation of each function stream of the function execution device in the resource allocation control device is optimized, efficient execution is performed, and the processing is speeded up. Also,
In the invention according to claim 6, the resource allocation control device monitors the instruction issue information from the instruction decoding device to the function execution device for each instruction flow, thereby dynamically detecting the tendency of instruction issue in the instruction flow, The resource allocation control device performs resource allocation in accordance therewith, thereby performing efficient execution and speeding up the processing. By combining these, it becomes possible to preferentially assign a function execution device to a resource allocation suitable for the code or a code whose execution is delayed.

【００１７】また、機能実行装置において、実行を中断
せざるを得ない事象が生じた際、例えば、キャッシュミ
スなどが生じて必要なデータが得られない場合には、必
要な処置をハードウェアが行なうまで、機能実行装置を
利用できなくなってしまう。請求項７に記載の発明で
は、このような事象が生じた場合に、実行再開に必要な
データを機能実行装置に付属する記憶装置に退避して、
当該機能実行装置を他の命令の実行に使用できるように
して、機能実行装置の利用率の向上を図るものである。In the function execution device, when an event that must be interrupted occurs, for example, when necessary data cannot be obtained due to a cache miss or the like, the necessary processing is performed by hardware. Until the operation is performed, the function execution device cannot be used. According to the invention of claim 7, when such an event occurs, data necessary for resuming execution is saved in a storage device attached to the function execution device,
The function execution device can be used for executing other instructions, thereby improving the utilization rate of the function execution device.

【００１８】また、複数の機能実行装置を複数の命令間
で共有した場合には、命令流が増加し、命令解読装置が
増えた場合には機能実行装置と機能実行のために必要な
一時記憶装置（レジスタ）や、バッファ記憶装置（キャ
ッシュメモリ）間との接続が複雑なスイッチによって実
現する必要が生じるため、必要な論理回路数が増加する
という問題が生じる。命令流が全ての機能実行装置を使
用できるように論理回路を実現するのではなく、個々の
命令流が利用できる機能実行装置を制限することによっ
て、一時記憶装置、バッファ記憶装置と機能実行装置と
の間の接続を簡単化することもできる。Further, when a plurality of function execution devices are shared among a plurality of instructions, the number of instruction streams increases, and when the number of instruction decoding devices increases, the function execution devices and temporary storage required for function execution are provided. Since the connection between the devices (registers) and the buffer storage device (cache memory) needs to be realized by a complicated switch, there arises a problem that the number of necessary logic circuits increases . Instruction stream rather than implementing the logic circuit to be able to use all the functions executing apparatus, by limiting the function execution apparatus each instruction stream is available, temporary storage, buffer storage and function execution unit The connection to and from can also be simplified.

【００１９】更に、命令流が利用できる機能実行装置を
制限することによって、資源割当制御装置を分散化し
て、論理を簡単化することもできる。Further, by limiting the function execution devices that can use the instruction stream, the resource allocation control device can be distributed and the logic can be simplified.

【００２０】以下に図面を参照して本発明をより具体的
に詳述するが、以下の開示は本発明の一実施例に過ぎ
ず、本発明の技術的範囲を何ら限定するものではない。Hereinafter, the present invention will be described in more detail with reference to the drawings. However, the following disclosure is merely an embodiment of the present invention and does not limit the technical scope of the present invention.

【００２１】[0021]

【実施例】請求項１に記載の発明について、図面を参照
しながら説明する。図１は本発明による制御方式を適用
したプロセッサの実施例を示したブロック構成図であ
る。図１に示したプロセッサは２つの命令取得装置１
１、２つの命令解読装置１２と、３つの機能実行装置１
３、資源割当制御装置１４から構成され、２つの命令流
を共有する３つの機能実行装置１３によって同時に実行
するものとする。機能実行装置１３は同じ機能のもので
も、異なる機能のものでも本発明の本筋には影響しない
が、ここでは便宜的に整数演算ユニット、浮動小数点ユ
ニット、ロードストアユニットとする。この場合は、処
理の高速化を狙うよりも、むしろ演算器の効率利用を狙
ったものとなる。なお、この他に、プロセッサで演算を
行なう場合には、一時記憶装置（レジスタ）、バッファ
記憶装置（キャッシュメモリ）などが存在するが、請求
項１から請求項７までの発明とは直接関係ないので、省
略する。また、同時に実行される２つの命令流を命令流
Ａと命令流Ｂとする。BRIEF DESCRIPTION OF THE DRAWINGS FIG. FIG. 1 is a block diagram showing an embodiment of a processor to which a control method according to the present invention is applied. The processor shown in FIG.
One and two instruction decoding devices 12 and three function execution devices 1
3. It is assumed that it is constituted by the resource allocation control device 14 and is simultaneously executed by three function execution devices 13 which share two instruction streams. The function execution device 13 having the same function or a different function does not affect the subject of the present invention, but here, for convenience, an integer operation unit, a floating point unit, and a load store unit. In this case, the aim is to use the computing unit more efficiently, rather than to speed up the processing. In addition, when an operation is performed by a processor, a temporary storage device (register), a buffer storage device (cache memory), and the like are present, but are not directly related to the first to seventh aspects of the present invention. Therefore, the description is omitted. The two instruction streams that are executed simultaneously are an instruction stream A and an instruction stream B.

【００２２】命令取得装置１１から、命令解読装置１２
へは命令が送られ、命令解読装置１２から機能実行装置
１３には、命令解読結果が送られる。資源割当制御装置
１４から命令解読装置１２には、各々の機能実行装置１
３の使用可否情報を送る。また、命令取得装置１１、命
令解読装置１２、機能実行装置１３から、資源割当制御
装置１４にはそれぞれの実行状態情報を送る。From the command acquisition device 11 to the command decoding device 12
Is sent to the function execution device 13 from the command decoding device 12. Each of the function execution devices 1 is transmitted from the resource allocation control device 14 to the instruction decoding device 12.
3 is sent. Also, the execution status information is transmitted from the command acquisition device 11, the command decoding device 12, and the function execution device 13 to the resource allocation control device 14.

【００２３】命令取得装置１１は命令流Ａ、Ｂのプログ
ラムカウンタの値に従って、命令を主記憶装置（メイン
メモリ）もしくはバッファ装置（キャッシュメモリ）か
ら取得する。次に、命令解読装置１２は、命令取得装置
１１から命令内容を受けとり、また、資源割当制御装置
１４から、個々の機能実行装置１３の利用可否情報、例
えば、命令流Ａは整数演算ユニットと浮動小数点ユニッ
トが使用可能で、命令流Ｂはロードストアユニットが使
用可能であるという情報を受けとる。命令流Ａの命令解
読結果が整数演算や浮動小数点演算であれば、無条件に
整数演算ユニットもしくは浮動小数点演算ユニットに当
該命令を発行する。もし、ロードストア命令であれば、
命令解読装置１２でインタロックが生じ、次のサイクル
まで命令発行は行なわれない。命令流Ｂの場合は逆に、
命令解読結果が整数演算や浮動小数点演算であればイン
タロックし、ロードストア命令の場合には、命令発行が
行なわれる。次のサイクル時、前サイクルで命令が発行
された場合には、命令解読装置１２は命令取得装置１１
から新たな命令を受けとる。前サイクルで命令が発行さ
れなかった場合には、新たな命令は受けとらずに、機能
実行装置１３の利用可否情報のみを受けとる。例えば、
今回のサイクルでは命令流Ａはロードストアユニットが
使用可能で、命令流Ｂは整数演算ユニットと浮動小数点
ユニットが使用可能となる。そこで、前サイクルと同様
に命令と使用可能な機能実行装置１３を比較して命令発
行を決定する。資源割当制御装置１４は、命令流間に平
等に資源が割り当てられるような機構とする。この様子
を表示１に示す。また、他の各装置からの実行状態情報
を受けとり、その情報を基に次の機能実行装置１３の使
用可否を決定する。The instruction acquisition device 11 acquires an instruction from a main storage device (main memory) or a buffer device (cache memory) according to the values of the program counters of the instruction streams A and B. Next, the instruction decoding device 12 receives the instruction content from the instruction acquisition device 11, and, from the resource allocation control device 14, the availability information of each function execution device 13, for example, the instruction flow A is a floating instruction with the integer operation unit. The decimal point unit is available and instruction stream B receives information that the load store unit is available. If the instruction decoding result of the instruction stream A is an integer operation or a floating point operation, the instruction is unconditionally issued to the integer operation unit or the floating point operation unit. If it is a load store instruction,
An interlock occurs in the instruction decoding device 12, and the instruction is not issued until the next cycle. On the contrary, in the case of the instruction stream B,
If the instruction decoding result is an integer operation or a floating-point operation, an interlock is performed. If the instruction is a load store instruction, the instruction is issued. In the next cycle, if an instruction is issued in the previous cycle, the instruction decoding device 12
To receive a new order from. If no instruction has been issued in the previous cycle, no new instruction is received and only the availability information of the function execution device 13 is received. For example,
In this cycle, the instruction stream A can use the load store unit, and the instruction stream B can use the integer operation unit and the floating point unit. Thus, the instruction issuance is determined by comparing the instruction with the usable function execution device 13 as in the previous cycle. The resource allocation control device 14 has a mechanism for equally allocating resources between instruction streams. This is shown in Display 1. Further, it receives execution state information from each of the other devices, and determines whether the next function execution device 13 can be used based on the information.

【００２４】[0024]

【表１】 [Table 1]

【００２５】このことによって、複数の機能実行装置１
３を複数の命令流間で共有することが可能となり、機能
実行装置１３の有効利用が図られる。As a result, a plurality of function execution devices 1
3 can be shared among a plurality of instruction streams, and the function execution device 13 can be effectively used.

【００２６】次に、請求項２に記載の発明について、図
面を参照しながら説明する。図３は本発明による制御方
式を適用したプロセッサの実施例である。図３に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と、３つの機能実行装置１３、３つの分散型資
源割当制御装置３１から構成され、２つの命令流を共有
する３つの機能実行装置１３によって同時に実行するも
のとする。本実施例の特徴は請求項１に記載のプロセッ
サにおける、資源割当て制御装置１４を各々の機能実行
装置１３毎に分散化したことである。この結果、資源割
当制御装置の簡単化を図ることが可能である。Next, the second aspect of the present invention will be described with reference to the drawings. FIG. 3 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 3 includes two instruction acquisition devices 11, two instruction decoding devices 12, three function execution devices 13, and three distributed resource allocation control devices 31, and shares two instruction streams. It is assumed that the functions are executed simultaneously by two function execution devices 13. The feature of this embodiment is that the resource allocation control device 14 in the processor according to the first aspect is distributed for each function execution device 13. As a result, the resource allocation control device can be simplified.

【００２７】次に、請求項３に記載の発明について、図
面を参照しながら説明する。図４は本発明による制御方
式を適用したプロセッサの実施例である。図４に示した
プロセッサは２つの命令取得装置／命令キュー４１、２
つの命令解読装置１２と３つの機能実行装置１３、資源
割当制御装置１４から構成され、２つの命令流を共有す
る３つの機能実行装置１３によって同時に実行するもの
とする。本実施例は請求項１に記載のプロセッサの命令
取得装置１１を複数命令を保持できるようにキューを付
加して拡張し、さらに、命令解読装置１２は、複数命令
を命令取得装置から受けとり依存関係を調べ、命令の前
後間に依存関係がなく、もしくは依存関係解消が可能な
場合に、かつ資源割当制御装置１４から得た機能実行装
置の使用許可状況が１番目の命令が実行不可能で２番目
以降の命令が実行可能状態になっている場合に、非順序
実行（out-of-order実行) を行なうように拡張したこと
を特徴とする。Next, a third aspect of the present invention will be described with reference to the drawings. FIG. 4 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 4 has two instruction fetch units / instruction queues 41, 2
One instruction decoding device 12, three function execution devices 13, and a resource allocation control device 14, which are executed simultaneously by three function execution devices 13 sharing two instruction streams. In this embodiment, the instruction acquisition device 11 of the processor according to claim 1 is extended by adding a queue so as to hold a plurality of instructions, and the instruction decoding device 12 receives the plurality of instructions from the instruction acquisition device, and And if there is no dependency between before and after the instruction, or if the dependency can be resolved, and the use permission status of the function execution device obtained from the resource allocation control device 14 indicates that the first instruction cannot be executed and The present invention is characterized in that the instruction is extended to perform out-of-order execution when the next and subsequent instructions are in an executable state.

【００２８】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとし、
命令解読装置１２は同時に二つの命令を命令取得装置／
命令キュー４１から受けとることとする。命令取得装置
／命令キュー４１は命令流Ａ、命令流Ｂのプログラムカ
ウンタの値に従って、命令を主記憶装置（メインメモ
リ）もしくはバッファ装置（キャッシュメモリ）から取
得する。ここでは、命令キューがあるので、複数命令を
取得している状態とする。次に、命令解読装置１２は、
命令取得装置１１から２つの命令内容を受けとり、ま
た、資源割当制御装置１４から、個々の機能実行装置１
３の利用可否情報、例えば、命令流Ａは整数演算ユニッ
トと浮動小数点ユニットが使用可能で、命令流Ｂはロー
ドストアユニットが使用可能であるという情報を受けと
る。命令解読は受けとった複数の命令に対して行なれ、
更に命令間の依存関係についても調べられる。１番目の
命令が命令流Ａの命令解読結果が整数演算や浮動小数点
演算であれば、無条件に整数演算ユニットもしくは浮動
小数点演算ユニットに発行する。１番目の命令がロード
ストア命令かつ２番目の命令が整数演算命令、浮動小数
点演算命令の場合、かつ１番目と２番目の命令間に依存
関係がない場合には、２番目の命令を整数演算ユニット
もしくは浮動小数点演算ユニットに発行する。この条件
を満たさない場合は、命令解読装置１２でインタロック
が生じ、次のサイクルまで命令発行は行なわれない。命
令流Ｂも同様であり、また、次のサイクルでの動作も、
同じように繰り返される。As in the first aspect, the function execution units 13 are an integer operation unit, a floating point unit, and a load store unit, respectively. Also, two instruction streams that are executed simultaneously are an instruction stream A and an instruction stream B,
The command decoding device 12 simultaneously outputs two commands to the command acquisition device /
It is to be received from the instruction queue 41. The instruction acquisition device / instruction queue 41 acquires an instruction from a main storage device (main memory) or a buffer device (cache memory) according to the values of the program counters of the instruction streams A and B. Here, since there is an instruction queue, it is assumed that a plurality of instructions are acquired. Next, the command decoding device 12
The two instruction contents are received from the instruction acquisition device 11, and the individual function execution devices 1 are received from the resource allocation control device 14.
For example, the instruction stream A receives information that the integer operation unit and the floating point unit can be used, and the instruction stream B can use the load store unit. Instruction decoding can be performed on multiple received instructions,
Furthermore, the dependency between instructions is examined. If the first instruction is an instruction operation result of the instruction stream A and the result is an integer operation or a floating-point operation, the instruction is unconditionally issued to the integer operation unit or the floating-point operation unit. If the first instruction is a load / store instruction and the second instruction is an integer operation instruction or a floating-point operation instruction and there is no dependency between the first and second instructions, the second instruction is operated on by integer operation Issued to the unit or floating point unit. If this condition is not satisfied, an interlock occurs in the instruction decoding device 12, and the instruction is not issued until the next cycle. The same is true for the instruction stream B, and the operation in the next cycle is also
The same is repeated.

【００２９】請求項１に記載のプロセッサ実施例では、
１番目の命令が使用許可された機能実行装置において実
行できない場合には、無条件でインタロックしていた
が、本項におけるプロセッサ実施例では、命令の非順序
発行を許すことにしたため、インタロック発生確率が低
下し、スループット及びプログラムの実行処理速度が向
上する。[0029] In a processor embodiment according to claim 1,
When the first instruction cannot be executed by the function execution device whose use is permitted, the interlock is unconditionally performed. In the processor embodiment in this section, the instruction is issued out of order. The probability of occurrence is reduced, and the throughput and the execution speed of the program are improved.

【００３０】次に、請求項４に記載の発明について、図
面を参照しながら説明する。図５は本発明による制御方
式を適用したプロセッサの実施例である。図５に示した
プロセッサは２つの命令取得装置／命令キュー４１、２
つの命令解読装置１２と３つの機能実行装置１３、資源
割当制御装置１４から構成され、２つの命令流を共有す
る３つの機能実行装置１３によって同時に実行するもの
とする。本実施例は請求項１に記載のプロセッサの命令
取得装置１１を複数命令を保持できるようにキューを付
加して拡張し、さらに、命令解読装置１２は、複数命令
を命令取得装置１１から受け取り依存関係を調べ、命令
の前後間に依存関係がなく、もしくは依存関係解消が可
能な場合に、かつ資源割当制御装置１４から得た使用許
可状況が複数の機能実行装置１３を使用可能としている
場合に、スーパスカラ技術を利用して複数の機能実行装
置１３に対して命令の同時発行を行なうように拡張した
ことを特徴とする。Next, the invention according to claim 4 will be described with reference to the drawings. FIG. 5 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 5 has two instruction acquisition devices / instruction queues 41, 2
One instruction decoding device 12, three function execution devices 13, and a resource allocation control device 14, which are executed simultaneously by three function execution devices 13 sharing two instruction streams. In this embodiment, the instruction acquisition device 11 of the processor according to claim 1 is extended by adding a queue so as to hold a plurality of instructions, and the instruction decoding device 12 receives the plurality of instructions from the instruction acquisition device 11, and Investigate the relationship, if there is no dependency between before and after the instruction, or if the dependency can be resolved, and if the use permission status obtained from the resource allocation control device 14 indicates that a plurality of function execution devices 13 can be used. The present invention is characterized in that an instruction is issued so as to simultaneously issue instructions to a plurality of function execution devices 13 using superscalar technology.

【００３１】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとし、
命令解読装置１２は同時に二つの命令を命令取得装置／
命令キュー４１から受けとることとする。命令取得装置
／命令キュー４１は命令流Ａ、Ｂのプログラムカウンタ
の値に従って、命令を主記憶装置（メインメモリ）もし
くはバッファ装置（キャッシュメモリ）から取得する。
ここでは、命令キューがあるので、複数命令を取得して
いる状態とする。次に、命令解読装置１２は、命令取得
装置１１から２つの命令内容を受けとり、また、資源割
当制御装置１４から、個々の機能実行装置１３の利用可
否情報、例えば、命令流Ａは整数演算ユニットと浮動小
数点ユニットが使用可能で、命令流Ｂはロードストアユ
ニットが使用可能であるという情報を受けとる。命令解
読は受けとった複数の命令に対して行なわれ、更に命令
間の依存関係についても調べられる。１番目の命令が命
令流Ａの命令解読結果において１番目の命令が整数演
算、２番目の命令が浮動小数点演算命令の場合、かつ１
番目と２番目の命令間に依存関係がない場合には、１番
目の命令を整数演算ユニット、２番目の命令を浮動小数
点演算ユニットに発行する。また、１番目が浮動小数点
演算、２番目が整数演算で、命令間の依存がない場合は
１番目の命令を浮動小数点演算ユニット、２番目の命令
を整数演算ユニットに発行する。この条件を満たさない
場合は、請求項１と同様に、単一命令のみを発行、もし
くはインタロック状態となる。この動作は、命令流Ｂも
同様であり、また、次のサイクルでの動作も、同じよう
に繰り返される。Similarly to the first aspect, the function execution units 13 are an integer operation unit, a floating point unit, and a load store unit, respectively. Also, two instruction streams that are executed simultaneously are an instruction stream A and an instruction stream B,
The command decoding device 12 simultaneously outputs two commands to the command acquisition device /
It is to be received from the instruction queue 41. The instruction acquisition device / instruction queue 41 acquires an instruction from a main storage device (main memory) or a buffer device (cache memory) according to the values of the program counters of the instruction streams A and B.
Here, since there is an instruction queue, it is assumed that a plurality of instructions are acquired. Next, the instruction decoding device 12 receives the two instruction contents from the instruction acquisition device 11, and also, from the resource allocation control device 14, the availability information of each function execution device 13, for example, the instruction flow A is an integer operation unit. The instruction stream B receives information that the load / store unit is available. Instruction decoding is performed on a plurality of received instructions, and the dependence between instructions is also examined. If the first instruction is an integer operation in the instruction decoding result of instruction stream A, the second instruction is a floating-point operation instruction, and 1
If there is no dependency between the second instruction and the second instruction, the first instruction is issued to the integer operation unit, and the second instruction is issued to the floating point operation unit. If the first instruction is a floating-point operation and the second is an integer operation, and there is no dependence between instructions, the first instruction is issued to the floating-point operation unit and the second instruction is issued to the integer operation unit. If this condition is not satisfied, only a single instruction is issued or an interlock state is established, as in the first aspect. This operation is the same for the instruction stream B, and the operation in the next cycle is similarly repeated.

【００３２】請求項１に記載のプロセッサ実施例では、
資源割当制御装置１４が複数の機能実行装置１３の利用
許可を与えた場合でも、そのうち１つしか使用されなか
ったが、本項におけるプロセッサ実施例ではスーパスカ
ラ技術を利用して命令を同時発行できるように命令解読
装置１２を拡張したため、スループット及びプログラム
の実行処理速度が向上する。特に、頻繁に利用される機
能実行装置１３は並列処理される命令流以上の数用意
し、稀にしか利用されない機能実行装置１３は、プロセ
ッサ内にごく少数のみ用意することによってシステム全
体の機能実行装置の利用率を向上し、更にスループット
を向上することが可能となる。また、請求項３と請求項
４に記載の発明を同時に適用して更に性能を向上させる
ことも可能である。In a processor embodiment according to claim 1,
Even when the resource allocation control device 14 gives permission to use a plurality of function execution devices 13, only one of them is used. However, in the processor embodiment in this section, instructions can be issued simultaneously using superscalar technology. Since the instruction decoding device 12 is expanded, the throughput and the processing speed of executing the program are improved. In particular, the number of frequently used function execution devices 13 is more than the number of instruction streams to be processed in parallel, and the number of rarely used function execution devices 13 is very small. It is possible to improve the utilization rate of the device and further improve the throughput. Further, it is possible to further improve the performance by simultaneously applying the inventions according to the third and fourth aspects.

【００３３】次に、請求項５に記載の発明について、図
面を参照しながら説明する。図６は本発明による制御方
式を適用したプロセッサの実施例である。図６に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と、３つの機能実行装置１３、ソフトウェア優
先度指示型資源割当制御装置５１から構成され、２つの
命令流を共有する３つの機能実行装置１３によって同時
に実行するものとする。本実施例は請求項１に記載のプ
ロセッサの資源割当制御装置１４に対して、ソフトウェ
ア的に命令流の性質や優先度を与えることによって、そ
れに適した機能実行装置１３を割り付けるように機能を
追加したことを特徴とする。この優先度制御は命令流中
に優先度制御用の命令を組み込むことなどによって実現
する。Next, the invention according to claim 5 will be described with reference to the drawings. FIG. 6 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 6 includes two instruction acquisition units 11, two instruction decoding units 12, three function execution units 13, and a software priority indicating type resource allocation control unit 51, and shares two instruction streams. It is assumed that the functions are executed simultaneously by the three function execution devices 13. In this embodiment, a function is added to the resource allocation control device 14 of the processor according to the present invention so that the function execution device 13 suitable for the instruction flow is given by giving the nature and priority of the instruction flow by software. It is characterized by having done. This priority control is realized by incorporating a priority control instruction into the instruction stream.

【００３４】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとす
る。ここで、命令流Ａにおいては、浮動小数点演算を行
なわないコードであれば、コンパイラが浮動小数点ユニ
ットを使用しないという資源割当指示命令を命令流の先
頭に入れておく。この命令を解読すると、資源割当制御
装置５１に対して、命令流Ａに対しては浮動小数点ユニ
ットの使用許可不要である旨の情報が伝えらる。以後、
資源割当制御装置５１は、設定が変更されない限り、浮
動小数点ユニットは命令流Ｂにのみ与えられることにな
る。但し、浮動小数点演算命令が割り当てられない状況
下で命令流中に存在した場合には、命令解読装置１２
は、資源割当制御装置５１に対して、資源割当要求がで
きる構造にしておくことも選択できるものとする。ま
た、命令流Ｂが優先度の高いプロセスの場合、ＯＳなど
のシステムソフトウェアによって、命令流Ｂに対して優
先的に資源を割り当てる命令を資源割当制御装置５１に
発行する。この場合は、資源割当制御装置は命令流Ａと
命令流Ｂでは、命令流Ｂに機能実行装置１３の利用許可
を多く出すように制御を行なう。As in the first aspect, the function execution units 13 are an integer operation unit, a floating point unit, and a load store unit, respectively. The two instruction streams that are executed simultaneously are an instruction stream A and an instruction stream B. Here, in the instruction stream A, if the code does not perform the floating-point operation, a resource allocation instruction instruction that the compiler does not use the floating-point unit is placed at the head of the instruction stream. When this instruction is decoded, information indicating that the use of the floating-point unit is unnecessary for the instruction stream A is transmitted to the resource allocation control device 51. Since then
As long as the setting is not changed, the resource allocation control unit 51 gives the floating point unit only to the instruction stream B. However, if the floating point arithmetic instruction is present in the instruction stream without being assigned, the instruction decoding device 12
It is also possible to select that the resource allocation control device 51 can make a resource allocation request. If the instruction stream B is a high-priority process, an instruction to preferentially allocate resources to the instruction stream B is issued to the resource allocation control device 51 by system software such as an OS. In this case, the resource allocation control device controls the instruction stream A and the instruction stream B so that the use permission of the function execution device 13 is increased in the instruction stream B.

【００３５】この拡張を行なうことによって、命令流の
性質や処理状況に応じた資源割当を行なうことが可能と
なり、機能実行装置の利用率が向上し、スループット及
びプログラムの実行処理速度を向上させることができ
る。By performing this extension, it becomes possible to allocate resources according to the nature of the instruction flow and the processing status, thereby improving the utilization rate of the function execution device, and improving the throughput and the processing speed of executing the program. Can be.

【００３６】次に、請求項６に記載の発明について、図
面を参照しながら説明する。図７は本発明による制御方
式を適用したプロセッサの実施例である。図７に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と３つの機能実行装置１３、動的優先度可変型
資源割当制御装置６１から構成され、２つの命令流を共
有する３つの機能実行装置１３によって同時に実行する
ものとする。本実施例は請求項１に記載のプロセッサの
資源割当制御装置１４に対して、ハードウェア的に命令
実行状況やコードの性質を監視することによって、機能
実行装置１３の割り付けを命令流に適するようにするこ
とを特徴とする。Next, the invention according to claim 6 will be described with reference to the drawings. FIG. 7 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 7 includes two instruction acquisition devices 11, two instruction decoding devices 12, three function execution devices 13, and a dynamic priority variable resource allocation control device 61, and shares two instruction streams. It is assumed that the functions are executed simultaneously by the three function execution devices 13. In the present embodiment, the assignment of the function execution device 13 is adapted to the instruction flow by monitoring the instruction execution status and the nature of the code in hardware with respect to the resource allocation control device 14 of the processor according to the first aspect. It is characterized by the following.

【００３７】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとす
る。実行の初期は請求項１に記載の具体例で示したプロ
セッサと同様、資源割当管理装置６１は命令流Ａ、命令
流Ｂに対して平等に機能実行装置１３の使用許可を与え
るが、その際の命令流Ａ、命令流Ｂそれぞれの各機能実
行装置１３への命令発行率やインタロック発生率を監視
することによって、各命令流に適した機能実行装置１３
をなるべく割り付けることを行なう。例えば、命令流Ａ
では、浮動小数点ユニットはほとんど使用せず、整数演
算ユニットを中心に用いるようなコードの場合、浮動小
数点ユニットは、実際にその命令の実行が生じた場合
に、命令解読装置１２から、資源割当制御装置６１に割
当要求を行なうことによって、浮動小数点ユニットが割
り当てられることになる。一方、命令流Ｂはロードスト
アユニットと浮動小数点ユニットを中心に用いるような
コードであれば、このユニットを中心に割り当てる制御
を行なう。Similarly to the first aspect, the function execution units 13 are an integer operation unit, a floating point unit, and a load store unit, respectively. The two instruction streams that are executed simultaneously are an instruction stream A and an instruction stream B. At the beginning of the execution, the resource allocation management device 61 equally grants the use permission of the function execution device 13 to the instruction stream A and the instruction stream B, similarly to the processor shown in the specific example of claim 1. By monitoring the instruction issuance rate and the interlock occurrence rate of each of the instruction streams A and B to the respective function execution apparatuses 13, the function execution apparatus 13 suitable for each instruction stream is monitored.
Is assigned as much as possible. For example, instruction flow A
In the case of a code in which the floating-point unit is hardly used and the integer operation unit is mainly used, the floating-point unit transmits the resource allocation control from the instruction decoding unit 12 when the instruction is actually executed. By making an assignment request to device 61, a floating point unit will be assigned. On the other hand, if the instruction stream B is a code that mainly uses the load store unit and the floating point unit, the instruction stream B performs control for assigning mainly to this unit.

【００３８】この拡張を行なうことによって、命令流の
性質や処理状況に応じた資源割当を行なうことが可能と
なり、機能実行装置の利用率が向上し、スループット及
びプログラムの実行処理速度を向上させることが可能と
なる。また、請求項５に記載のソフトウェアによる割り
当て優先度制御と併用することによって効率をさらに向
上することが可能となる。By performing this extension, it becomes possible to allocate resources according to the nature of the instruction stream and the processing status, thereby improving the utilization rate of the function execution device, and improving the throughput and the execution processing speed of the program. Becomes possible. Further, the efficiency can be further improved by using in combination with the assignment priority control by software according to the fifth aspect.

【００３９】次に、請求項７に記載の発明について、図
面を参照しながら説明する。図８は本発明による制御方
式を適用したプロセッサの実施例である。図８に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と３つの機能実行装置１３、資源割当制御装置
１４、実行情報退避記憶装置７１から構成され、２つの
命令流を共有する３つの機能実行装置１３によって同時
に実行するものとする。本実施例は請求項１に記載のプ
ロセッサの各々の機能実行装置１３に対して、実行情報
を退避して、他の命令を先に実行することを可能にする
ために、実行情報退避装置７１を付加したことを特徴と
する。Next, the invention according to claim 7 will be described with reference to the drawings. FIG. 8 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 8 includes two instruction acquisition units 11, two instruction decoding units 12, three function execution units 13, a resource allocation control unit 14, and an execution information save storage unit 71, and shares two instruction streams. Are executed simultaneously by the three function execution devices 13. In this embodiment, the execution information saving device 71 saves execution information to each function execution device 13 of the processor according to claim 1 so as to enable another instruction to be executed first. Is added.

【００４０】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとす
る。命令流Ａにおいて、ロード命令が命令発行装置１２
からロードストアユニットに発行されたとする。しかし
ながら、所望のデータがバッファ装置（キャッシュメモ
リ）に存在しない場合、遅延の大きい主記憶装置からデ
ータを転送する。そのため、ロードストアユニットにお
いて当該命令はインタロックする。この状態になると、
請求項１に記載の発明の場合、ロードストアユニット
は、インタロックが回避できるまで使用できなくなって
しまう。本項の発明においては、このようなロック状態
が発生した場合には、実行情報退避記憶装置７１に実行
情報を退避することによって、命令流Ｂに対してロード
ストアユニットの使用を許可できるようにするものであ
る。Similarly to the first aspect, the function execution units 13 are an integer operation unit, a floating point unit, and a load store unit, respectively. The two instruction streams that are executed simultaneously are an instruction stream A and an instruction stream B. In the instruction stream A, the load instruction is
From the load store unit. However, when the desired data does not exist in the buffer device (cache memory), the data is transferred from the main storage device having a long delay. Therefore, the instruction interlocks in the load store unit. In this state,
In the case of the first aspect, the load store unit cannot be used until the interlock can be avoided. In the present invention, when such a lock state occurs, the execution information is saved in the execution information save storage device 71, so that the use of the load store unit can be permitted for the instruction stream B. Is what you do.

【００４１】この拡張を行なうことによって、インタロ
ックが発生した場合でも、他の命令流に対して機能実行
装置を割り当てることが可能となり、機能実行装置の利
用率が向上し、スループット及びプログラムの実行処理
速度を向上させることが可能となる。By performing this extension, even when an interlock occurs, it becomes possible to allocate a function execution device to another instruction stream, thereby improving the utilization rate of the function execution device, and improving the throughput and the program execution. Processing speed can be improved.

【００４２】次に、命令流毎に使用することができる機
能実行装置１３に対して制限を設けて、命令流毎に用意
される機能実行のための一時記憶装置（レジスタ）８１
もしくは、バッファ記憶装置と、機能実行装置１３との
間の結合を簡単化できることを説明する。 Next, a temporary storage device for a limit to the function executing unit 13 that may be used for each instruction stream, function execution prepared for each instruction stream (register) 81
Alternatively, it will be described that the coupling between the buffer storage device and the function execution device 13 can be simplified.

【００４３】まず、図９は請求項１に記載の実施例であ
るが、図１と比較して、一時記憶装置（レジスタ）８７
を図示したことと同時動作の命令流を３、機能実行装置
１３を６個に増やしたことが異なっている。次に、図１
０は本発明による制御方式を適用したプロセッサの実施
例である。図９と比べて、資源割当に制限を設けたこと
を特徴としている。図１０では、３つの命令流をそれぞ
れ、命令流Ａ、命令流Ｂ、命令流Ｃとすると機能実行装
置８１と機能実行装置８６は命令流Ａと命令流Ｃのみ、
機能実行装置８２と機能実行装置８３は命令流Ａと命令
流Ｂのみ、機能実行装置８４と機能実行装置８５は命令
流Ｂと命令流Ｃのみの命令を実行する。このため、命令
解読装置１２と機能実行装置１３間および、機能実行装
置１３と一時記憶装置８７間の結合が簡単化される。ま
た、割り当てに制限を設けることによって、資源割当制
御装置１４の内部構造も簡単化される。また、命令流毎
の機能実行装置１３の利用範囲を隣接命令間に限定する
ことにより、結合同士の交差が少なくなり実装上も有利
である。First, FIG. 9 shows an embodiment according to the first aspect. Compared with FIG. 1, a temporary storage device (register) 87 is provided.
Are different from those shown in FIG. 3 in that the number of instruction streams for simultaneous operation is increased to three and the number of function execution devices 13 is increased to six. Next, FIG.
0 is an embodiment of a processor to which the control method according to the present invention is applied. It is characterized in that the resource allocation is limited compared to FIG. In FIG. 10, when the three instruction streams are an instruction stream A, an instruction stream B, and an instruction stream C, respectively, the function execution devices 81 and 86 perform only the instruction streams A and C,
The function execution devices 82 and 83 execute only the instruction streams A and B, and the function execution devices 84 and 85 execute only the instruction streams B and C. Therefore, the connection between the instruction decoding device 12 and the function execution device 13 and the connection between the function execution device 13 and the temporary storage device 87 are simplified. In addition, by providing a restriction on the allocation, the internal structure of the resource allocation control device 14 is also simplified. In addition, by limiting the range of use of the function execution device 13 for each instruction flow between adjacent instructions, the number of intersections between the connections is reduced, which is advantageous in mounting.

【００４４】この例は、同一機能の機能実行装置１３を
複数持つようなプロセッサにおいて特に有効である。プ
ロセッサの集積度が向上し、プロセッサ内の同時動作命
令流と機能実行装置１３が増えた場合には、本項に示す
制限つき割当機構によって、実装の簡単化、動作周波数
の向上を可能とし、機能実行装置の割当制限による利用
率低下を十分補うことが可能である。また、請求項３、
請求項４に記載の発明を組み合わせることによって、さ
らに効率の向上を図ることが可能である。 This example is particularly effective in a processor having a plurality of function execution devices 13 having the same function. In the case where the degree of integration of the processor is improved and the number of simultaneously operating instruction streams and the number of function execution devices 13 in the processor are increased, the restricted allocation mechanism described in this section makes it possible to simplify the implementation and improve the operating frequency, It is possible to sufficiently compensate for the reduction in the utilization rate due to the restriction on the assignment of the function execution device. Claim 3
By combining the inventions described in claim 4, it is possible to further improve the efficiency.

【００４５】[0045]

【発明の効果】本発明では、複数の命令流を並列動作さ
せる演算装置において、演算の中心となる機能実行装置
を命令流間で共有することによって、機能実行装置を有
効に利用することができる。また、従来例では、命令を
解読後に機能実行装置のスケジューリングを行なってい
るので、命令実行遅延が増加し、制御が複雑になる問題
があるが、本発明の方式では、命令解読前にあらかじめ
使用可能な機能実行装置を命令流毎に与えてしまうた
め、機能実行装置の使用効率が若干低下するが、制御が
簡素になり動作周波数を向上できるという利点によっ
て、使用効率低下を補って余りあるものとなる。また、
命令毎の性質に合わせたスケジューリングをソフトウェ
ア的、ハードウェア的に行なうことによって、利用効率
の低下を最小限に抑えることが可能である。さらに、複
数の命令流が同時に実行される場合に、使用できる機能
実行装置をあらかじめ限定することによって、資源割当
制御装置の簡略化が可能となる。以上のことから、本発
明における複数の命令流を並列動作させる演算装置の方
式は、命令の高速動作、コスト低減に非常に有効であ
る。According to the present invention, in an arithmetic device that operates a plurality of instruction streams in parallel, the function execution device, which is the center of the operation, is shared between the instruction streams, so that the function execution device can be effectively used. . Further, in the conventional example, since the function execution device is scheduled after the instruction is decoded, there is a problem that the instruction execution delay is increased and the control is complicated. However, in the method of the present invention, the function is used in advance before the instruction is decoded. Since the possible function execution devices are provided for each instruction stream, the use efficiency of the function execution devices is slightly reduced. However, the advantage that the control frequency is improved and the operating frequency can be improved is more than compensated for by the reduced use efficiency. Becomes Also,
By performing software-based or hardware-based scheduling in accordance with the property of each instruction, it is possible to minimize a decrease in utilization efficiency. Further, when a plurality of instruction streams are executed simultaneously, the resource allocation control device can be simplified by limiting the usable function execution devices in advance. As described above, the method of the arithmetic device for operating a plurality of instruction streams in parallel according to the present invention is very effective for high-speed instruction operation and cost reduction.

【００４６】[0046]

[Brief description of the drawings]

【図１】本発明によるプロセッサの制御方法を示すため
のプロセッサの実施例の構成図である。FIG. 1 is a configuration diagram of an embodiment of a processor for illustrating a control method of the processor according to the present invention.

【図２】従来方式の機能実行装置の命令流間共用方式の
実施例の構成図である。FIG. 2 is a configuration diagram of an embodiment of an instruction flow sharing method of a conventional function execution device.

【図３】分散型資源割当制御装置を用いた実施例の構成
図である。FIG. 3 is a configuration diagram of an embodiment using a distributed resource allocation control device.

【図４】非順序命令発行を行なう命令解読装置を用いた
実施例の構成図である。FIG. 4 is a configuration diagram of an embodiment using an instruction decoding device that issues out-of-order instructions.

【図５】命令同時発行を行なう命令解読装置を用いた実
施例の構成図である。FIG. 5 is a configuration diagram of an embodiment using an instruction decoding device that issues instructions at the same time.

【図６】ソフトウェア優先度指示型資源割当制御装置を
用いた実施例の構成図である。FIG. 6 is a configuration diagram of an embodiment using a software priority indicating type resource allocation control device.

【図７】動的優先度可変型資源割当制御装置を用いた実
施例の構成図である。FIG. 7 is a configuration diagram of an embodiment using a dynamic priority variable type resource allocation control device.

【図８】機能実行装置に実行情報退避装置を付加した実
施例の構成図である。FIG. 8 is a configuration diagram of an embodiment in which an execution information saving device is added to a function execution device.

【図９】図１と同等の実施例の構成図であり、新たに一
時記憶装置（レジスタ）を表現している。FIG. 9 is a configuration diagram of an embodiment equivalent to that of FIG. 1, and newly represents a temporary storage device (register).

【図１０】利用できる機能実行装置を命令流毎に制限し
た実施例の構成図である。FIG. 10 is a configuration diagram of an embodiment in which usable function execution devices are restricted for each instruction flow.

[Explanation of symbols]

１１命令取得装置１２命令解読装置１３機能実行装置１４資源割当制御装置２１命令依存解析装置２２命令調停装置３１分散型資源割当制御装置４１命令取得装置／命令キュー５１ソフトウェア優先度指示型資源割当制御装置６１動的優先度可変型資源割当制御装置７１実行情報退避記憶装置８１〜８６機能実行装置８７一時記憶装置（レジスタ） REFERENCE SIGNS LIST 11 instruction acquisition device 12 instruction decoding device 13 function execution device 14 resource allocation control device 21 instruction dependency analysis device 22 instruction arbitration device 31 distributed resource allocation control device 41 instruction acquisition device / instruction queue 51 software priority indicating type resource allocation control device 61 Dynamic priority variable type resource allocation control device 71 Execution information save storage device 81-86 Function execution device 87 Temporary storage device (register)

Claims

(57) [Claims]

1. A method for simultaneously executing information processing for a plurality of instruction streams.
A plurality of instruction acquisition devices for extracting instructions from a storage device or the like in an arithmetic device to execute
And instructions fetched from the instruction acquisition device , wherein
Decrypt commands for processing with the function execution device to which permission has been given
And decodes multiple instructions that issue the instruction to the function execution device.
Device and the necessary devices in accordance with the commands issued from the command decoding device.
A plurality of function execution devices for executing functions, a current instruction execution status, the instruction acquisition device, and the instruction decryption.
From the device, the function execution device, and the instruction decoding device
Control the use permission of the function execution device before decoding the instruction
And a resource allocation control device that performs
An arithmetic device.

Wherein the resource allocation control device, operation according to claim 1, characterized in that decentralized for each function executing unit
Equipment .

3. The function execution device according to claim 1, wherein the instruction decoding device has no dependency on the available function execution device before and after the instruction, or the dependency can be eliminated, and the function execution obtained from the resource allocation control device. when the use permission status of the device after the second and the first instruction is not executable instruction is ready to run, according to claim 1 or 2, characterized in the row of TURMERIC unordered execution of instructions The arithmetic unit according to item 1 .

4. The apparatus according to claim 1, wherein the instruction decoding device has a function execution device which has no dependency between before and after the instruction or the dependency can be eliminated. when the use permission of the device is obtained, the arithmetic apparatus according to claim 1 or 2, wherein the simultaneous issuance of command line of TURMERIC.

5. The resource allocation control device according to claim 1, wherein
Set more priority grant function executing unit of the instruction stream for each
3. The arithmetic device according to claim 1 , wherein execution of each instruction stream is optimized.

Wherein said resource allocation control device checks the information of the instruction issue from the instruction decoding unit to the function executing unit, a resource allocation to a subsequent instruction streams each, characterized in that the optimization based on the information The arithmetic device according to claim 1 or 2 , wherein

7. A storage device for saving one or more pieces of execution information
It was added and the function executing device, when suspends execution event occurs, saves the processed information in the storage device, by transferring saving information to the resource allocation control unit suspends execution event The arithmetic device according to claim 1 or 2 , wherein the function execution device in which the error has occurred is used for information processing according to instructions of another instruction flow .