JPH07182168A

JPH07182168A - Arithmetic unit and its control method

Info

Publication number: JPH07182168A
Application number: JP32665893A
Authority: JP
Inventors: Atsushi Torii; 淳鳥居
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-12-24
Filing date: 1993-12-24
Publication date: 1995-07-21
Anticipated expiration: 2013-05-28
Also published as: JP2760273B2

Abstract

PURPOSE:To attain the sharing of a function executing device among the instruction flows by providing a constitution where a resources allocation controller permits an instruction decoding device which issues an instruction next to use the function executing device based on the present instruction executing state. CONSTITUTION:An instruction acquiring device 11 takes the instructions out of a storage, etc., in an arithmetic unit which simultaneously processes the information to plural instruction flows. An instruction decoding device 12 decodes the instruction acquired from the device 11 and starts the processing to a necessary function executing device 13. The device 13 carries out a necessary function by an instruction. Then a resources allocating device 14 checks the present instruction executing state by means of the devices 11, 12 and 13 and permits the device 12 which issues an instruction next to use the device 13. Thus the application factor of the device 13 is improved in a practical range, and the device 13 can be shared among plural instruction flows (programs).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は情報処理装置で用いられ
るプロセッサに関し、特に複数の命令流（プログラム）
を並列に処理可能な高性能なプロセッサに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a processor used in an information processing device, and more particularly to a plurality of instruction streams (programs).
The present invention relates to a high-performance processor capable of processing in parallel.

【０００２】[0002]

【従来の技術】プロセッサの高速化の技術として、複数
の機能実行装置を用意し、命令単位の並列性を利用して
複数の機能実行装置に同時に命令を発行することにより
処理速度を向上する方式が実用化されている。この方式
を用いた場合、理想的には動作クロック周波数よりも高
い割合で命令を処理することが可能である。このような
処理を行なうためには、機能実行装置を追加する他に命
令実行の結果の正当性を保証するための論理回路が必要
となる。この方式は一般にはスーパスカラ方式と呼ば
れ、米Digital Equipement Corporation社が発表したal
pha アーキテクチャや米IBM,Motorola,Apple社のPower
アーキテクチャ、Hewlett-Packard 社のPA-RISC アーキ
テクチャなどに採用されている。2. Description of the Related Art As a technique for accelerating a processor, a method of preparing a plurality of function execution devices and issuing instructions to a plurality of function execution devices at the same time by utilizing instruction parallelism to improve the processing speed Has been put to practical use. When this method is used, it is ideally possible to process instructions at a rate higher than the operating clock frequency. In order to perform such processing, in addition to adding a function execution device, a logic circuit for ensuring the validity of the result of instruction execution is required. This method is generally called the superscalar method, and was announced by Digital Equipement Corporation of the United States.
pha architecture and Power from IBM, Motorola, Apple
Architecture, Hewlett-Packard PA-RISC architecture, etc.

【０００３】しかしながら、命令単位の並列性の利用に
は限界があり、この方式での性能向上は機能実行装置を
無限に増やした場合にも、実際には３〜４倍に抑えられ
てしまうといわれている。この並列性を制限する要素は
命令間の依存関係と分岐による命令流の乱れによるもの
である。これらの制約を解消する方式も様々提案されて
いるが、ハードウェアの極端な複雑化に対して性能向上
は４０倍程度が限界とされている。このような限界は、
Monica S.Lam氏らが1992年に発表した論文（Monica S.L
am and Robert P.Wilson,"Limits of Control Flow on
Parallelsim ",IThe 19th International Symposium on
Computer Architecture,IEEE ComputerSociety Press,
1992,pp.46-57）に述べられている。However, there is a limit to the utilization of parallelism on an instruction-by-instruction basis, and the performance improvement in this method is actually suppressed to 3 to 4 times even if the number of function execution devices is increased infinitely. It is said. The factors that limit the parallelism are the dependency between instructions and the disturbance of the instruction flow due to branching. Various methods have been proposed to solve these restrictions, but the performance improvement is limited to about 40 times due to the extreme complexity of hardware. Such a limit is
Paper published in 1992 by Monica S. Lam et al. (Monica SL
am and Robert P. Wilson, "Limits of Control Flow on
Parallelsim ", IThe 19th International Symposium on
Computer Architecture, IEEE Computer Society Press,
1992, pp.46-57).

【０００４】一方、命令単位の並列性ではなく、複数の
命令流の命令を並列に実行することにより機能実行装置
の利用効率をあげて、処理速度向上を図る方法も提案さ
れている。この方式では、命令流間で依存関係がないた
め、前記の命令単位並列性より性能向上が図りやすい。On the other hand, there has been proposed a method of increasing the utilization efficiency of the function execution device by executing instructions in a plurality of instruction streams in parallel, rather than the parallelism of instruction units, and improving the processing speed. In this method, since there is no dependency between the instruction streams, it is easier to improve the performance than the instruction unit parallelism.

【０００５】この方式は平田氏らが1993年に発表した論
文（平田博章，木村浩三，永峰聡，西澤貞次，鷺島敬
之，「多重スレッド・多重命令発行を用いる要素プロセ
ッサ，アーキテクチャ」情報処理学会論文誌1993 Vol34
No.4 pp.595-605）で提案された方式などがある。図２
はこの実施例の概要構成を示すものである。以下、この
実施例を従来技術の実施例として説明する。This system was published in 1993 by Hirata et al. (Hirata Hirata, Kozo Kimura, Satoshi Nagamine, Sadaji Nishizawa, Noriyuki Sagishima, "Element Processors Using Multiple Threads and Multiple Instructions, Architecture", Information Processing Society of Japan Magazine 1993 Vol34
No.4 pp.595-605). Figure 2
Shows a schematic configuration of this embodiment. Hereinafter, this embodiment will be described as an embodiment of the prior art.

【０００６】図２において、１１は命令取得装置、１２
は命令解読装置、１３は機能実行装置、２１は命令間の
依存解析装置である。また、２２は、機能実行装置１３
をスケジュールする命令調停装置である。命令解読装置
１２は命令調停装置２２が命令を受け入れられる状態に
あり、また、命令依存装置２１から、命令発行可能であ
る旨の指示を受けている場合に、命令を発行する。各々
の命令解読装置１２から発行された命令は、命令調停装
置２２によって、必要な機能実行装置１３に割り当てら
れ、実際の実行が行なわれる。この命令調停装置２２に
よって、機能実行装置１３は各命令間で共有し、利用効
率を向上させることが可能となる。また、命令調停装置
２２を機能実行装置１３毎に分散することにより、命令
調停装置の簡単化が可能である。In FIG. 2, 11 is an instruction acquisition device, and 12
Is an instruction decoding device, 13 is a function execution device, and 21 is a dependency analysis device between instructions. Further, 22 is a function execution device 13
Is an instruction arbitration device that schedules. The instruction decoding device 12 issues an instruction when the instruction arbitration device 22 is in a state in which the instruction can be accepted and when the instruction dependent device 21 receives an instruction that the instruction can be issued. The instruction issued from each instruction decoding device 12 is assigned to the required function execution device 13 by the instruction arbitration device 22 and is actually executed. The instruction arbitration device 22 enables the function execution device 13 to share the commands among them and improve the utilization efficiency. Further, by distributing the instruction arbitration device 22 for each function execution device 13, the instruction arbitration device can be simplified.

【０００７】[0007]

【発明が解決しようとする課題】上記の従来型機能実行
装置共有機構は、以下の問題を抱えている。第一に、命
令実行のパイプラインに命令調停処理が加わることによ
り、命令実行のレイテシンが増大し、この結果、分岐命
令などのパイプライン構造に適合しない命令実行時や例
外処理時に処理速度低下をもたらすこと。第二に、命令
調停装置は、命令流が増加し、命令解読装置が増えた場
合には、調停する命令が増加し、また調停論理も複雑化
するため、複雑な構造となり、動作周波数向上を防げる
要因になること。第三に、命令流が増加し、命令解読装
置が増えた場合には機能実行装置と機能実行のために必
要な一時記憶装置（レジスタ）や、バッファ記憶装置
（キャッシュメモリ）間との接続が複雑なスイッチによ
って実現する必要が生じるため、遅延が増大し、動作周
波数向上を防げる要因になることである。これらの問題
によって、上記の機構で同時実行できる命令流及び動作
周波数は制限されてしまう。The above-mentioned conventional function execution device sharing mechanism has the following problems. First, by adding instruction arbitration processing to the instruction execution pipeline, the latency of instruction execution increases, and as a result, the processing speed decreases when executing instructions that do not conform to the pipeline structure such as branch instructions and exception processing. To bring. Secondly, the instruction arbitration device has a complicated structure because the number of instructions to be arbitrated increases and the number of instruction decoding devices increases, and the arbitration logic also becomes complicated. Be a factor that can be prevented. Thirdly, when the instruction flow increases and the number of instruction decoding devices increases, the connection between the function execution device and the temporary storage device (register) necessary for function execution and the buffer storage device (cache memory) is established. Since it has to be realized by a complicated switch, the delay is increased and it becomes a factor to prevent the improvement of the operating frequency. These problems limit the instruction streams and operating frequencies that can be concurrently executed by the above mechanism.

【０００８】本発明の目的は、高い動作周波数におい
て、命令実行のレイテシンを増やすことなく、実用的な
範囲内で機能実行装置の利用率を高め、命令流間で機能
実行装置の共有を行なうプロセッサの制御方式を提供す
ることにある。An object of the present invention is to increase the utilization rate of a function execution device within a practical range without increasing the number of instruction execution latency at a high operating frequency, and to share the function execution device between instruction streams. It is to provide the control method of.

【０００９】[0009]

【課題を解決するための手段】上記問題を解決するため
に、本発明においては、複数の命令流が並列に動作する
プロセッサ、すなわち、複数の命令取得装置、複数の命
令解読装置、複数の機能実行装置、複数の一時記憶装置
（レジスタ）、バッファ装置（キャッシュ）を持つプロ
セッサに、資源割当制御装置を追加する。In order to solve the above problems, in the present invention, a processor in which a plurality of instruction streams operate in parallel, that is, a plurality of instruction acquisition devices, a plurality of instruction decoding devices, and a plurality of functions. A resource allocation control device is added to a processor having an execution device, a plurality of temporary storage devices (registers), and a buffer device (cache).

【００１０】この資源割当制御装置は、各装置からの命
令実行情報と内部設定情報から、命令流のクロック毎に
使用可能な機能実行装置を決定する機構である。命令解
読装置は資源割当制御装置によって決定された機能実行
装置使用許可情報に基づき、命令を実行できる機能実行
装置に対して、命令を発行することが可能となる。つま
り、資源割当制御装置は、プロセッサ内にある全ての機
能実行装置について、命令流毎に使用可否を決定する。
命令解読装置は、使用可能な機能実行装置の中から、命
令流のプログラムカウンタに応じた命令を実行する機能
実行装置を決定、命令発行を行なう。また、その命令発
行情報を資源割当制御装置に通知することによって、機
能実行装置によって機能実行された結果を格納する一時
記憶装置、バッファ記憶装置を選択する。The resource allocation control device is a mechanism for determining a usable function execution device for each clock of an instruction stream from the instruction execution information and internal setting information from each device. The instruction decoding device can issue an instruction to the function execution device that can execute the instruction based on the function execution device use permission information determined by the resource allocation control device. That is, the resource allocation control device determines the availability of all the function execution devices in the processor for each instruction stream.
The instruction decoding device determines, from among the available function execution devices, a function execution device that executes an instruction according to the program counter of the instruction stream, and issues the instruction. Further, the instruction issue information is notified to the resource allocation control device to select a temporary storage device or a buffer storage device for storing the result of function execution by the function execution device.

【００１１】資源割当制御装置は、システム内に１つ、
もしくは機能実行装置の数を用意して機能実行装置毎に
分散する。また、各命令解読装置からは、各々同時に一
つの命令を発行するか命令取得装置から、命令解読装置
への同時命令伝達数を複数にして、利用可能な機能実行
装置に応じて、命令を非順序発行、スーパスカラ技術を
利用して、利用可能な機能実行装置に応じて、命令を複
数同時発行する。There is one resource allocation control device in the system,
Alternatively, the number of function execution devices is prepared and distributed to each function execution device. In addition, each instruction decoding device issues one instruction at a time, or sets the number of simultaneous instruction transmissions from the instruction acquisition device to the instruction decoding device to a plurality of instructions, and determines the number of instructions to be executed according to the available function execution device. Using sequential issue and superscalar technology, multiple commands are issued simultaneously according to available function execution devices.

【００１２】また、資源割当装置は、各命令流に平等に
資源を割り付けるか、もしくは、命令流毎の機能実行装
置利用状況をソフトウェア、例えばコンパイラなどであ
らかじめ調べておき資源割当制御装置に伝えることによ
り、資源割当制御装置における機能実行装置の各命令流
毎の割当を決める。もしくは、命令流毎に命令解読装置
から機能実行装置への命令発行情報を資源割当制御装置
が監視することにより、命令流の命令発行の傾向を動的
に検出し、それに応じた資源割当を資源割当制御装置が
行なう。Further, the resource allocation device allocates resources equally to each instruction stream, or checks the utilization status of the function execution device for each instruction stream in advance by software, such as a compiler, and informs the resource allocation control device. By this, the allocation for each instruction stream of the function execution device in the resource allocation control device is determined. Alternatively, the resource allocation control device monitors the command issuance information from the instruction decoding device to the function execution device for each instruction stream, thereby dynamically detecting the instruction issuing tendency of the instruction stream and allocating resources accordingly. It is performed by the allocation control device.

【００１３】命令流が全ての機能実行装置を使用できる
ように論理回路を実現するか、個々の命令流が利用でき
る機能実行装置を制限し、一時記憶装置、バッファ記憶
装置と機能実行装置間の接続を簡単化する方式をとるか
はプロセッサの設計に依存し、更に資源割当制御装置を
それに合わせて分散化し、資源割当装置の論理を簡単化
する手段も選択できる。Between the temporary storage device, the buffer storage device and the function execution device, a logic circuit is implemented so that the instruction flow can use all the function execution devices, or the function execution device that each instruction flow can use is limited. It depends on the design of the processor whether or not the method of simplifying the connection is adopted, and the means for simplifying the logic of the resource allocation device can be selected by further distributing the resource allocation control device accordingly.

【００１４】[0014]

【作用】請求項１に記載の発明のプロセッサでは、命令
解読装置は資源割当制御装置の許可に従って、機能実行
装置に命令を発行する。命令解読装置に入るのと同時に
機能実行装置の使用許可情報も得られ、命令発行時に使
用する機能実行装置を決定してしまうため、命令発行後
に機能実行装置の調停をするというパイプライン段を増
やすことなく、機能実行装置の命令流間における共有が
実現され、かつ機能実行装置を効率的に利用することが
可能である。また、稀にしか用いられないような機能実
行ユニットはプロセッサ内で一つもしくは同時動作命令
流よりも少なくしておくことによって、チップ内の資源
の有効利用が図れる。この資源割当制御装置は、各装置
からの情報と内部情報によって、命令流のクロック毎の
使用可能な機能実行装置を決定する。一つ前の命令実行
までの情報から資源割当が決定されるため、従来方式ほ
ど機能実行装置の利用効率は望めないが、構造の簡単化
とパイプライン短縮による命令実行レイテシンの短縮に
よって、クロック周波数、処理速度向上が図れる。更
に、請求項２に記載の発明において、資源割当制御装置
を分散化し、個々の資源割当制御装置の構造を簡単にす
るものである。In the processor according to the first aspect of the present invention, the instruction decoding device issues an instruction to the function execution device according to the permission of the resource allocation control device. Since the use permission information of the function execution unit is obtained at the same time as entering the instruction decoding unit and the function execution unit to be used is determined when the instruction is issued, the pipeline stage to arbitrate the function execution unit after issuing the instruction is increased. It is possible to realize sharing between the instruction streams of the function execution device without using the function execution device and efficiently use the function execution device. Further, the number of function execution units, which are rarely used, is set to one in the processor or less than the simultaneous operation instruction stream, so that the resources in the chip can be effectively used. This resource allocation control device determines an available function execution device for each clock of an instruction stream based on the information from each device and internal information. Since the resource allocation is determined from the information up to the previous instruction execution, the utilization efficiency of the function execution unit cannot be expected as much as the conventional method, but the clock frequency is reduced by the simplification of the structure and the reduction of the instruction execution latency due to the shortened pipeline The processing speed can be improved. Further, in the invention described in claim 2, the resource allocation control devices are distributed to simplify the structure of each resource allocation control device.

【００１５】機能実行装置の利用率を高め効率向上を図
るため、請求項３に記載の発明では、命令取得装置から
命令解読装置への同時命令伝達数を複数にして、利用可
能な機能実行装置に応じて、命令を非順序発行するもの
である。また、請求項４に記載の発明では、命令取得装
置から命令解読装置への同時命令伝達数を複数にする。
さらに、スーパスカラ技術を利用して、利用可能な機能
実行装置に応じて、命令を同時発行するものである。こ
れらの技術を用いれは、単一命令流から複数の機能実行
装置を効率的に利用することが可能となり、同時動作命
令流数よりも機能実行装置数が多い場合には、特に効果
的である。In order to increase the utilization rate of the function executing device and improve the efficiency, in the invention according to claim 3, the number of simultaneous instruction transmissions from the instruction acquiring device to the instruction decoding device is made plural, and the usable function executing device can be used. The commands are issued out of order according to In the invention according to claim 4, the number of simultaneous instruction transmissions from the instruction acquisition device to the instruction decoding device is made plural.
Furthermore, the superscalar technology is used to simultaneously issue instructions according to the available function execution devices. The use of these techniques makes it possible to efficiently use a plurality of function execution units from a single instruction stream, and is particularly effective when the number of function execution units is larger than the number of simultaneously operating instruction streams. .

【００１６】請求項５に記載の発明では、命令流毎の機
能実行装置利用状況をソフトウェア、例えばコンパイラ
などであらかじめ調べておく。この情報を、資源割当制
御装置に伝えることにより、資源割当制御装置における
機能実行装置の各命令流毎の割当を最適化し、効率の良
い実行を行ない、処理を高速化するものである。また、
請求項６に記載の発明では、命令流毎に命令解読装置か
ら機能実行装置への命令発行情報を資源割当制御装置が
監視することにより、命令流の命令発行の傾向を動的に
検出し、それに応じた資源割当を資源割当制御装置が行
なうことによって、効率の良い実行を行ない、処理を高
速化するものである。これらを組み合わせることによ
り、コードにあった資源割当や、実行が遅れているコー
ドに対して優先的に機能実行装置を割り付けることが可
能となる。According to the fifth aspect of the invention, the utilization status of the function execution device for each instruction stream is checked in advance by software such as a compiler. By transmitting this information to the resource allocation control device, the allocation for each instruction stream of the function execution device in the resource allocation control device is optimized, efficient execution is performed, and the processing speed is increased. Also,
In the invention according to claim 6, the resource allocation control device dynamically monitors the instruction issuing information from the instruction decoding device to the function executing device for each instruction stream, thereby dynamically detecting the instruction issuing tendency of the instruction stream, The resource allocation control device performs the resource allocation according to it, so that efficient execution is performed and the processing speed is increased. By combining these, it becomes possible to allocate resources according to the code and preferentially allocate the function execution device to the code whose execution is delayed.

【００１７】また、機能実行装置において、実行を中断
せざるを得ない事象が生じた際、例えば、キャッシュミ
スなどが生じて必要なデータが得られない場合には、必
要な処置をハードウェアが行なうまで、機能実行装置を
利用できなくなってしまう。請求項７に記載の発明で
は、このような事象が生じた場合に、実行再開に必要な
データを機能実行装置に付属する記憶装置に退避して、
当該機能実行装置を他の命令の実行に使用できるように
して、機能実行装置の利用率の向上を図るものである。In the function execution device, when an event that must be interrupted occurs, for example, when a necessary data cannot be obtained due to a cache miss, the hardware takes necessary measures. Until it is done, the function execution device cannot be used. In the invention according to claim 7, when such an event occurs, the data necessary for restarting execution is saved in the storage device attached to the function execution device,
By making the function execution device usable for execution of other instructions, the utilization rate of the function execution device is improved.

【００１８】また、複数の機能実行装置を複数の命令間
で共有した場合には、命令流が増加し、命令解読装置が
増えた場合には機能実行装置と機能実行のために必要な
一時記憶装置（レジスタ）や、バッファ記憶装置（キャ
ッシュメモリ）間との接続が複雑なスイッチによって実
現する必要が生じるため、必要な論理回路数が増加する
という問題が生じる。請求項８に記載の発明では、命令
流が全ての機能実行装置を使用できるように論理回路を
実現するのではなく、個々の命令流が利用できる機能実
行装置を制限することによって、一時記憶装置、バッフ
ァ記憶装置と機能実行装置との間の接続を簡単化するも
のである。When a plurality of function execution units are shared by a plurality of instructions, the instruction flow increases, and when the number of instruction decoding units increases, the function execution units and temporary storage required for function execution are stored. Since the connection between the device (register) and the buffer storage device (cache memory) needs to be realized by a complicated switch, there arises a problem that the number of necessary logic circuits increases. According to the invention described in claim 8, the temporary memory device is not realized by implementing a logic circuit so that the instruction stream can use all the function execution apparatuses, but by limiting the function execution apparatus that can be used by each instruction stream. , To simplify the connection between the buffer storage device and the function execution device.

【００１９】更に、命令流が利用できる機能実行装置を
制限することによって、資源割当制御装置を分散化し
て、論理を簡単化するものである。Furthermore, by restricting the function execution devices that can use the instruction stream, the resource allocation control devices are distributed and the logic is simplified.

【００２０】以下に図面を参照して本発明をより具体的
に詳述するが、以下の開示は本発明の一実施例に過ぎ
ず、本発明の技術的範囲を何ら限定するものではない。The present invention will be described in more detail below with reference to the drawings, but the following disclosure is merely an example of the present invention and does not limit the technical scope of the present invention.

【００２１】[0021]

【実施例】請求項１に記載の発明について、図面を参照
しながら説明する。図１は本発明による制御方式を適用
したプロセッサの実施例を示したブロック構成図であ
る。図１に示したプロセッサは２つの命令取得装置１
１、２つの命令解読装置１２と、３つの機能実行装置１
３、資源割当制御装置１４から構成され、２つの命令流
を共有する３つの機能実行装置１３によって同時に実行
するものとする。機能実行装置１３は同じ機能のもので
も、異なる機能のものでも本発明の本筋には影響しない
が、ここでは便宜的に整数演算ユニット、浮動小数点ユ
ニット、ロードストアユニットとする。この場合は、処
理の高速化を狙うよりも、むしろ演算器の効率利用を狙
ったものとなる。なお、この他に、プロセッサで演算を
行なう場合には、一時記憶装置（レジスタ）、バッファ
記憶装置（キャッシュメモリ）などが存在するが、請求
項１から請求項７までの発明とは直接関係ないので、省
略する。また、同時に実行される２つの命令流を命令流
Ａと命令流Ｂとする。DESCRIPTION OF THE PREFERRED EMBODIMENTS The invention described in claim 1 will be described with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a processor to which a control system according to the present invention is applied. The processor shown in FIG. 1 has two instruction acquisition devices 1
1, 2 instruction decoding devices 12 and 3 function execution devices 1
3. It is assumed that the three function execution units 13 each of which is composed of the resource allocation control unit 14 and shares two instruction streams are simultaneously executed. The function execution unit 13 may have the same function or different functions without affecting the main feature of the present invention, but here, for convenience, an integer arithmetic unit, a floating point unit, and a load / store unit are used. In this case, the aim is not to increase the processing speed but to use the computing unit efficiently. In addition to this, when a processor performs an operation, a temporary storage device (register), a buffer storage device (cache memory), etc. exist, but they are not directly related to the inventions of claims 1 to 7. Therefore, it is omitted. Also, two instruction streams that are executed simultaneously are referred to as instruction stream A and instruction stream B.

【００２２】命令取得装置１１から、命令解読装置１２
へは命令が送られ、命令解読装置１２から機能実行装置
１３には、命令解読結果が送られる。資源割当制御装置
１４から命令解読装置１２には、各々の機能実行装置１
３の使用可否情報を送る。また、命令取得装置１１、命
令解読装置１２、機能実行装置１３から、資源割当制御
装置１４にはそれぞれの実行状態情報を送る。From the instruction acquisition device 11 to the instruction decoding device 12
Is sent to the function execution unit 13. The instruction decoding unit 12 sends the instruction decoding result to the function execution unit 13. From the resource allocation control device 14 to the instruction decoding device 12, each function execution device 1
The availability information of 3 is sent. Further, the execution state information is sent from the instruction acquisition device 11, the instruction decoding device 12, and the function execution device 13 to the resource allocation control device 14.

【００２３】命令取得装置１１は命令流Ａ、Ｂのプログ
ラムカウンタの値に従って、命令を主記憶装置（メイン
メモリ）もしくはバッファ装置（キャッシュメモリ）か
ら取得する。次に、命令解読装置１２は、命令取得装置
１１から命令内容を受けとり、また、資源割当制御装置
１４から、個々の機能実行装置１３の利用可否情報、例
えば、命令流Ａは整数演算ユニットと浮動小数点ユニッ
トが使用可能で、命令流Ｂはロードストアユニットが使
用可能であるという情報を受けとる。命令流Ａの命令解
読結果が整数演算や浮動小数点演算であれば、無条件に
整数演算ユニットもしくは浮動小数点演算ユニットに当
該命令を発行する。もし、ロードストア命令であれば、
命令解読装置１２でインタロックが生じ、次のサイクル
まで命令発行は行なわれない。命令流Ｂの場合は逆に、
命令解読結果が整数演算や浮動小数点演算であればイン
タロックし、ロードストア命令の場合には、命令発行が
行なわれる。次のサイクル時、前サイクルで命令が発行
された場合には、命令解読装置１２は命令取得装置１１
から新たな命令を受けとる。前サイクルで命令が発行さ
れなかった場合には、新たな命令は受けとらずに、機能
実行装置１３の利用可否情報のみを受けとる。例えば、
今回のサイクルでは命令流Ａはロードストアユニットが
使用可能で、命令流Ｂは整数演算ユニットと浮動小数点
ユニットが使用可能となる。そこで、前サイクルと同様
に命令と使用可能な機能実行装置１３を比較して命令発
行を決定する。資源割当制御装置１４は、命令流間に平
等に資源が割り当てられるような機構とする。この様子
を表示１に示す。また、他の各装置からの実行状態情報
を受けとり、その情報を基に次の機能実行装置１３の使
用可否を決定する。The instruction acquisition device 11 acquires an instruction from the main storage device (main memory) or the buffer device (cache memory) according to the values of the program counters of the instruction streams A and B. Next, the instruction decoding device 12 receives the instruction content from the instruction acquisition device 11, and the availability information of each function execution device 13 from the resource allocation control device 14, for example, the instruction stream A is an integer arithmetic unit and a floating point. The decimal point unit is available and instruction stream B receives the information that the load store unit is available. If the instruction decoding result of the instruction stream A is an integer operation or a floating point operation, the instruction is unconditionally issued to the integer operation unit or the floating point operation unit. If it is a load store instruction,
An interlock occurs in the instruction decoding device 12, and no instruction is issued until the next cycle. In the case of command flow B, on the contrary,
If the instruction decoding result is an integer operation or a floating point operation, interlock is performed, and if it is a load / store instruction, the instruction is issued. In the next cycle, when the instruction is issued in the previous cycle, the instruction decoding device 12 causes the instruction acquisition device 11
Receive new orders from. When no instruction is issued in the previous cycle, no new instruction is received, and only the availability information of the function execution device 13 is received. For example,
In this cycle, the instruction stream A can use the load / store unit, and the instruction stream B can use the integer arithmetic unit and the floating point unit. Therefore, similarly to the previous cycle, the instruction is determined by comparing the instruction with the usable function execution device 13. The resource allocation control device 14 has a mechanism such that resources are allocated evenly among instruction streams. This is shown in display 1. Further, it receives the execution state information from each of the other devices and determines the availability of the next function execution device 13 based on the information.

【００２４】[0024]

【表１】 [Table 1]

【００２５】このことによって、複数の機能実行装置１
３を複数の命令流間で共有することが可能となり、機能
実行装置１３の有効利用が図られる。As a result, the plurality of function executing devices 1
3 can be shared among a plurality of instruction streams, and the function execution device 13 can be effectively used.

【００２６】次に、請求項２に記載の発明について、図
面を参照しながら説明する。図３は本発明による制御方
式を適用したプロセッサの実施例である。図３に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と、３つの機能実行装置１３、３つの分散型資
源割当制御装置３１から構成され、２つの命令流を共有
する３つの機能実行装置１３によって同時に実行するも
のとする。本実施例の特徴は請求項１に記載のプロセッ
サにおける、資源割当て制御装置１４を各々の機能実行
装置１３毎に分散化したことである。この結果、資源割
当制御装置の簡単化を図ることが可能である。Next, the invention described in claim 2 will be described with reference to the drawings. FIG. 3 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 3 is composed of two instruction acquisition devices 11, two instruction decoding devices 12, three function execution devices 13, and three distributed resource allocation control devices 31, and shares two instruction streams. It is assumed that the two function execution devices 13 simultaneously execute the functions. The feature of this embodiment resides in that the resource allocation control device 14 in the processor according to claim 1 is distributed for each function execution device 13. As a result, it is possible to simplify the resource allocation control device.

【００２７】次に、請求項３に記載の発明について、図
面を参照しながら説明する。図４は本発明による制御方
式を適用したプロセッサの実施例である。図４に示した
プロセッサは２つの命令取得装置／命令キュー４１、２
つの命令解読装置１２と３つの機能実行装置１３、資源
割当制御装置１４から構成され、２つの命令流を共有す
る３つの機能実行装置１３によって同時に実行するもの
とする。本実施例は請求項１に記載のプロセッサの命令
取得装置１１を複数命令を保持できるようにキューを付
加して拡張し、さらに、命令解読装置１２は、複数命令
を命令取得装置から受けとり依存関係を調べ、命令の前
後間に依存関係がなく、もしくは依存関係解消が可能な
場合に、かつ資源割当制御装置１４から得た機能実行装
置の使用許可状況が１番目の命令が実行不可能で２番目
以降の命令が実行可能状態になっている場合に、非順序
実行（out-of-order実行) を行なうように拡張したこと
を特徴とする。Next, the invention described in claim 3 will be described with reference to the drawings. FIG. 4 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 4 has two instruction acquisition devices / instruction queues 41, 2.
It is assumed that three instruction execution units 12 and three function execution units 13 and a resource allocation control unit 14 are simultaneously executed by three function execution units 13 that share two instruction streams. The present embodiment extends the instruction acquisition device 11 of the processor according to claim 1 by adding a queue so as to hold a plurality of instructions, and further, the instruction decoding device 12 receives a plurality of instructions from the instruction acquisition device. If there is no dependency before or after the instruction, or if the dependency can be resolved, and if the first permission instruction of the function execution device obtained from the resource allocation control device 14 is unexecutable, 2 It is characterized in that it is extended to execute out-of-order execution when the second and subsequent instructions are in an executable state.

【００２８】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとし、
命令解読装置１２は同時に二つの命令を命令取得装置／
命令キュー４１から受けとることとする。命令取得装置
／命令キュー４１は命令流Ａ、命令流Ｂのプログラムカ
ウンタの値に従って、命令を主記憶装置（メインメモ
リ）もしくはバッファ装置（キャッシュメモリ）から取
得する。ここでは、命令キューがあるので、複数命令を
取得している状態とする。次に、命令解読装置１２は、
命令取得装置１１から２つの命令内容を受けとり、ま
た、資源割当制御装置１４から、個々の機能実行装置１
３の利用可否情報、例えば、命令流Ａは整数演算ユニッ
トと浮動小数点ユニットが使用可能で、命令流Ｂはロー
ドストアユニットが使用可能であるという情報を受けと
る。命令解読は受けとった複数の命令に対して行なれ、
更に命令間の依存関係についても調べられる。１番目の
命令が命令流Ａの命令解読結果が整数演算や浮動小数点
演算であれば、無条件に整数演算ユニットもしくは浮動
小数点演算ユニットに発行する。１番目の命令がロード
ストア命令かつ２番目の命令が整数演算命令、浮動小数
点演算命令の場合、かつ１番目と２番目の命令間に依存
関係がない場合には、２番目の命令を整数演算ユニット
もしくは浮動小数点演算ユニットに発行する。この条件
を満たさない場合は、命令解読装置１２でインタロック
が生じ、次のサイクルまで命令発行は行なわれない。命
令流Ｂも同様であり、また、次のサイクルでの動作も、
同じように繰り返される。Similarly to the invention described in claim 1, the function execution unit 13 is an integer arithmetic unit, a floating point unit, and a load store unit, respectively. In addition, two instruction streams that are simultaneously executed are an instruction stream A and an instruction stream B,
The instruction decoding device 12 receives two instructions at the same time by the instruction acquisition device /
It will be received from the instruction queue 41. The instruction acquisition device / instruction queue 41 acquires an instruction from the main storage device (main memory) or the buffer device (cache memory) according to the values of the program counters of the instruction flow A and the instruction flow B. Here, since there is an instruction queue, it is assumed that a plurality of instructions are acquired. Next, the instruction decoding device 12
The two command contents are received from the command acquisition device 11, and the individual function execution devices 1 are received from the resource allocation control device 14.
3, the instruction stream A receives the information that the integer arithmetic unit and the floating point unit can be used, and the instruction stream B can use the load store unit. Command decoding can be performed for multiple commands received,
In addition, the dependency between instructions can be examined. If the first instruction is an integer operation or a floating point operation result of the instruction decoding of the instruction stream A, it is unconditionally issued to the integer operation unit or the floating point operation unit. If the first instruction is a load store instruction and the second instruction is an integer operation instruction or a floating point operation instruction, and there is no dependency between the first and second instructions, the second instruction is an integer operation. Issued to a unit or floating point arithmetic unit. If this condition is not satisfied, an interlock occurs in the instruction decoding device 12, and no instruction is issued until the next cycle. The instruction stream B is the same, and the operation in the next cycle is
The same is repeated.

【００２９】請求項１に記載のプロセッサ実施例では、
１番目の命令が使用許可された機能実行装置において実
行できない場合には、無条件でインタロックしていた
が、本項におけるプロセッサ実施例では、命令の非順序
発行を許すことにしたため、インタロック発生確率が低
下し、スループット及びプログラムの実行処理速度が向
上する。In a processor embodiment as claimed in claim 1,
When the first instruction cannot be executed in the function execution unit for which the use is permitted, the interlock is unconditionally performed. However, in the processor example in this section, the out-of-order issuance of the instruction is allowed. The occurrence probability is reduced, and the throughput and the program execution processing speed are improved.

【００３０】次に、請求項４に記載の発明について、図
面を参照しながら説明する。図５は本発明による制御方
式を適用したプロセッサの実施例である。図５に示した
プロセッサは２つの命令取得装置／命令キュー４１、２
つの命令解読装置１２と３つの機能実行装置１３、資源
割当制御装置１４から構成され、２つの命令流を共有す
る３つの機能実行装置１３によって同時に実行するもの
とする。本実施例は請求項１に記載のプロセッサの命令
取得装置１１を複数命令を保持できるようにキューを付
加して拡張し、さらに、命令解読装置１２は、複数命令
を命令取得装置１１から受け取り依存関係を調べ、命令
の前後間に依存関係がなく、もしくは依存関係解消が可
能な場合に、かつ資源割当制御装置１４から得た使用許
可状況が複数の機能実行装置１３を使用可能としている
場合に、スーパスカラ技術を利用して複数の機能実行装
置１３に対して命令の同時発行を行なうように拡張した
ことを特徴とする。Next, the invention according to claim 4 will be described with reference to the drawings. FIG. 5 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 5 has two instruction acquisition units / instruction queues 41, 2.
It is assumed that three instruction execution units 12 and three function execution units 13 and a resource allocation control unit 14 are simultaneously executed by three function execution units 13 that share two instruction streams. The present embodiment extends the instruction acquisition device 11 of the processor according to claim 1 by adding a queue so as to hold a plurality of instructions, and further, the instruction decoding device 12 receives a plurality of instructions from the instruction acquisition device 11 and depends on them. When the relationship is checked, there is no dependency before and after the instruction, or when the dependency can be resolved, and when the usage permission status obtained from the resource allocation control device 14 enables the plurality of function execution devices 13 to be used. It is characterized in that the superscalar technology is used to extend the simultaneous issue of instructions to a plurality of function execution units 13.

【００３１】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとし、
命令解読装置１２は同時に二つの命令を命令取得装置／
命令キュー４１から受けとることとする。命令取得装置
／命令キュー４１は命令流Ａ、Ｂのプログラムカウンタ
の値に従って、命令を主記憶装置（メインメモリ）もし
くはバッファ装置（キャッシュメモリ）から取得する。
ここでは、命令キューがあるので、複数命令を取得して
いる状態とする。次に、命令解読装置１２は、命令取得
装置１１から２つの命令内容を受けとり、また、資源割
当制御装置１４から、個々の機能実行装置１３の利用可
否情報、例えば、命令流Ａは整数演算ユニットと浮動小
数点ユニットが使用可能で、命令流Ｂはロードストアユ
ニットが使用可能であるという情報を受けとる。命令解
読は受けとった複数の命令に対して行なわれ、更に命令
間の依存関係についても調べられる。１番目の命令が命
令流Ａの命令解読結果において１番目の命令が整数演
算、２番目の命令が浮動小数点演算命令の場合、かつ１
番目と２番目の命令間に依存関係がない場合には、１番
目の命令を整数演算ユニット、２番目の命令を浮動小数
点演算ユニットに発行する。また、１番目が浮動小数点
演算、２番目が整数演算で、命令間の依存がない場合は
１番目の命令を浮動小数点演算ユニット、２番目の命令
を整数演算ユニットに発行する。この条件を満たさない
場合は、請求項１と同様に、単一命令のみを発行、もし
くはインタロック状態となる。この動作は、命令流Ｂも
同様であり、また、次のサイクルでの動作も、同じよう
に繰り返される。Similarly to the invention described in claim 1, the function execution unit 13 is an integer operation unit, a floating point unit, and a load store unit, respectively. In addition, two instruction streams that are simultaneously executed are an instruction stream A and an instruction stream B,
The instruction decoding device 12 receives two instructions at the same time by the instruction acquisition device /
It will be received from the instruction queue 41. The instruction acquisition device / instruction queue 41 acquires an instruction from the main storage device (main memory) or the buffer device (cache memory) according to the values of the program counters of the instruction streams A and B.
Here, since there is an instruction queue, it is assumed that a plurality of instructions are acquired. Next, the instruction decoding device 12 receives two instruction contents from the instruction acquisition device 11, and the availability information of each function execution device 13 from the resource allocation control device 14, for example, the instruction stream A is an integer arithmetic unit. And a floating point unit is available and instruction stream B receives information that a load store unit is available. Instruction decoding is performed on a plurality of received instructions, and the dependency relation between the instructions is also checked. If the first instruction is an integer operation in the instruction decoding result of the instruction stream A, the second instruction is a floating point operation instruction, and 1
If there is no dependency between the second and second instructions, the first instruction is issued to the integer arithmetic unit and the second instruction is issued to the floating point arithmetic unit. If the first is a floating point operation and the second is an integer operation and there is no dependency between instructions, the first instruction is issued to the floating point operation unit and the second instruction is issued to the integer operation unit. When this condition is not satisfied, only a single instruction is issued or the interlock state is set, as in the first aspect. This operation is the same for the instruction stream B, and the operation in the next cycle is similarly repeated.

【００３２】請求項１に記載のプロセッサ実施例では、
資源割当制御装置１４が複数の機能実行装置１３の利用
許可を与えた場合でも、そのうち１つしか使用されなか
ったが、本項におけるプロセッサ実施例ではスーパスカ
ラ技術を利用して命令を同時発行できるように命令解読
装置１２を拡張したため、スループット及びプログラム
の実行処理速度が向上する。特に、頻繁に利用される機
能実行装置１３は並列処理される命令流以上の数用意
し、稀にしか利用されない機能実行装置１３は、プロセ
ッサ内にごく少数のみ用意することによってシステム全
体の機能実行装置の利用率を向上し、更にスループット
を向上することが可能となる。また、請求項３と請求項
４に記載の発明を同時に適用して更に性能を向上させる
ことも可能である。In a processor embodiment as claimed in claim 1,
Even when the resource allocation control device 14 gives permission to use the plurality of function execution devices 13, only one of them is used. However, in the processor embodiment in this section, it is possible to issue instructions simultaneously by using superscalar technology. Since the instruction decoding device 12 is expanded, throughput and program execution processing speed are improved. In particular, the function execution devices 13 that are frequently used are provided in a number larger than the instruction stream to be processed in parallel, and the function execution devices 13 that are rarely used are provided in the processor in a very small number, thereby performing the functions It is possible to improve the utilization factor of the device and further improve the throughput. Further, it is possible to apply the inventions of claims 3 and 4 at the same time to further improve the performance.

【００３３】次に、請求項５に記載の発明について、図
面を参照しながら説明する。図６は本発明による制御方
式を適用したプロセッサの実施例である。図６に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と、３つの機能実行装置１３、ソフトウェア優
先度指示型資源割当制御装置５１から構成され、２つの
命令流を共有する３つの機能実行装置１３によって同時
に実行するものとする。本実施例は請求項１に記載のプ
ロセッサの資源割当制御装置１４に対して、ソフトウェ
ア的に命令流の性質や優先度を与えることによって、そ
れに適した機能実行装置１３を割り付けるように機能を
追加したことを特徴とする。この優先度制御は命令流中
に優先度制御用の命令を組み込むことなどによって実現
する。Next, the invention described in claim 5 will be described with reference to the drawings. FIG. 6 shows an embodiment of a processor to which the control system according to the present invention is applied. The processor shown in FIG. 6 is composed of two instruction acquisition devices 11, two instruction decoding devices 12, three function execution devices 13, and a software priority instruction type resource allocation control device 51, and shares two instruction streams. It is assumed that they are simultaneously executed by the three function execution devices 13. In the present embodiment, a function is added to the resource allocation control device 14 of the processor according to claim 1 so as to allocate a function execution device 13 suitable for the property and priority of the instruction stream by software. It is characterized by having done. This priority control is realized by incorporating an instruction for priority control in the instruction stream.

【００３４】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとす
る。ここで、命令流Ａにおいては、浮動小数点演算を行
なわないコードであれば、コンパイラが浮動小数点ユニ
ットを使用しないという資源割当指示命令を命令流の先
頭に入れておく。この命令を解読すると、資源割当制御
装置５１に対して、命令流Ａに対しては浮動小数点ユニ
ットの使用許可不要である旨の情報が伝えらる。以後、
資源割当制御装置５１は、設定が変更されない限り、浮
動小数点ユニットは命令流Ｂにのみ与えられることにな
る。但し、浮動小数点演算命令が割り当てられない状況
下で命令流中に存在した場合には、命令解読装置１２
は、資源割当制御装置５１に対して、資源割当要求がで
きる構造にしておくことも選択できるものとする。ま
た、命令流Ｂが優先度の高いプロセスの場合、ＯＳなど
のシステムソフトウェアによって、命令流Ｂに対して優
先的に資源を割り当てる命令を資源割当制御装置５１に
発行する。この場合は、資源割当制御装置は命令流Ａと
命令流Ｂでは、命令流Ｂに機能実行装置１３の利用許可
を多く出すように制御を行なう。Similarly to the invention described in claim 1, the function execution unit 13 is an integer arithmetic unit, a floating point unit, and a load store unit, respectively. Also, two instruction streams that are executed simultaneously are referred to as instruction stream A and instruction stream B. Here, in the instruction stream A, a resource allocation instruction that the compiler does not use the floating point unit is placed at the head of the instruction stream if the code does not perform floating point arithmetic. When this instruction is decoded, the resource allocation control device 51 is informed that the instruction stream A does not require the use of the floating point unit. After that,
In the resource allocation control device 51, the floating point unit is given only to the instruction stream B unless the setting is changed. However, if the floating-point operation instruction exists in the instruction stream under the condition that it is not assigned, the instruction decoding device 12
It is also possible to select a structure in which a resource allocation request can be made to the resource allocation control device 51. If the instruction stream B is a process having a high priority, system software such as an OS issues an instruction to the resource assignment control device 51 to preferentially allocate resources to the instruction stream B. In this case, in the instruction stream A and the instruction stream B, the resource allocation control device controls the instruction stream B so as to give a large amount of permission to use the function execution device 13.

【００３５】この拡張を行なうことによって、命令流の
性質や処理状況に応じた資源割当を行なうことが可能と
なり、機能実行装置の利用率が向上し、スループット及
びプログラムの実行処理速度を向上させることができ
る。By carrying out this extension, it becomes possible to perform resource allocation according to the nature of the instruction stream and the processing status, the utilization rate of the function execution device is improved, and the throughput and the program execution processing speed are improved. You can

【００３６】次に、請求項６に記載の発明について、図
面を参照しながら説明する。図７は本発明による制御方
式を適用したプロセッサの実施例である。図７に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と３つの機能実行装置１３、動的優先度可変型
資源割当制御装置６１から構成され、２つの命令流を共
有する３つの機能実行装置１３によって同時に実行する
ものとする。本実施例は請求項１に記載のプロセッサの
資源割当制御装置１４に対して、ハードウェア的に命令
実行状況やコードの性質を監視することによって、機能
実行装置１３の割り付けを命令流に適するようにするこ
とを特徴とする。Next, the invention according to claim 6 will be described with reference to the drawings. FIG. 7 shows an embodiment of a processor to which the control system according to the present invention is applied. The processor shown in FIG. 7 includes two instruction acquisition devices 11, two instruction decoding devices 12, three function execution devices 13, and a dynamic priority variable resource allocation control device 61, and shares two instruction streams. It is assumed that they are simultaneously executed by the three function execution devices 13. In this embodiment, the resource allocation control device 14 of the processor according to the first aspect is monitored by hardware to monitor the instruction execution status and the nature of the code, so that the allocation of the function execution device 13 is suitable for the instruction flow. It is characterized by

【００３７】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとす
る。実行の初期は請求項１に記載の具体例で示したプロ
セッサと同様、資源割当管理装置６１は命令流Ａ、命令
流Ｂに対して平等に機能実行装置１３の使用許可を与え
るが、その際の命令流Ａ、命令流Ｂそれぞれの各機能実
行装置１３への命令発行率やインタロック発生率を監視
することによって、各命令流に適した機能実行装置１３
をなるべく割り付けることを行なう。例えば、命令流Ａ
では、浮動小数点ユニットはほとんど使用せず、整数演
算ユニットを中心に用いるようなコードの場合、浮動小
数点ユニットは、実際にその命令の実行が生じた場合
に、命令解読装置１２から、資源割当制御装置６１に割
当要求を行なうことによって、浮動小数点ユニットが割
り当てられることになる。一方、命令流Ｂはロードスト
アユニットと浮動小数点ユニットを中心に用いるような
コードであれば、このユニットを中心に割り当てる制御
を行なう。Similarly to the invention described in claim 1, the function execution unit 13 is an integer arithmetic unit, a floating point unit, and a load store unit, respectively. Also, two instruction streams that are executed simultaneously are referred to as instruction stream A and instruction stream B. At the initial stage of execution, the resource allocation management device 61 equally gives permission to use the function execution device 13 to the instruction stream A and the instruction stream B, in the same manner as the processor shown in the specific example of claim 1. Of the instruction stream A and the instruction stream B to the respective function execution devices 13 by monitoring the instruction issue rate and the interlock occurrence rate of each of the instruction streams A and B.
Assign as much as possible. For example, command flow A
Then, in the case of a code that mainly uses the floating point unit and mainly uses the integer arithmetic unit, the floating point unit uses the resource allocation control from the instruction decoding device 12 when the execution of the instruction actually occurs. By making an allocation request to device 61, a floating point unit will be allocated. On the other hand, if the instruction stream B is a code that mainly uses the load store unit and the floating point unit, the instruction flow B is controlled so that this unit is mainly assigned.

【００３８】この拡張を行なうことによって、命令流の
性質や処理状況に応じた資源割当を行なうことが可能と
なり、機能実行装置の利用率が向上し、スループット及
びプログラムの実行処理速度を向上させることが可能と
なる。また、請求項５に記載のソフトウェアによる割り
当て優先度制御と併用することによって効率をさらに向
上することが可能となる。By performing this extension, it becomes possible to allocate resources according to the nature of the instruction stream and the processing status, the utilization rate of the function execution device is improved, and the throughput and the program execution processing speed are improved. Is possible. Further, the efficiency can be further improved by using the allocation priority control by the software according to the fifth aspect together.

【００３９】次に、請求項７に記載の発明について、図
面を参照しながら説明する。図８は本発明による制御方
式を適用したプロセッサの実施例である。図８に示した
プロセッサは２つの命令取得装置１１、２つの命令解読
装置１２と３つの機能実行装置１３、資源割当制御装置
１４、実行情報退避記憶装置７１から構成され、２つの
命令流を共有する３つの機能実行装置１３によって同時
に実行するものとする。本実施例は請求項１に記載のプ
ロセッサの各々の機能実行装置１３に対して、実行情報
を退避して、他の命令を先に実行することを可能にする
ために、実行情報退避装置７１を付加したことを特徴と
する。Next, the invention according to claim 7 will be described with reference to the drawings. FIG. 8 shows an embodiment of a processor to which the control method according to the present invention is applied. The processor shown in FIG. 8 includes two instruction acquisition devices 11, two instruction decoding devices 12, three function execution devices 13, a resource allocation control device 14, and an execution information save storage device 71, and shares two instruction streams. It is assumed that the three function execution devices 13 are simultaneously executed. In the present embodiment, the execution information saving device 71 is provided to save the execution information and execute other instructions first for the function execution device 13 of each processor according to claim 1. Is added.

【００４０】請求項１に記載の発明と同様に、機能実行
ユニット１３をそれぞれ、整数演算ユニット、浮動小数
点ユニット、ロードストアユニットとする。また、同時
に実行される２つの命令流を命令流Ａと命令流Ｂとす
る。命令流Ａにおいて、ロード命令が命令発行装置１２
からロードストアユニットに発行されたとする。しかし
ながら、所望のデータがバッファ装置（キャッシュメモ
リ）に存在しない場合、遅延の大きい主記憶装置からデ
ータを転送する。そのため、ロードストアユニットにお
いて当該命令はインタロックする。この状態になると、
請求項１に記載の発明の場合、ロードストアユニット
は、インタロックが回避できるまで使用できなくなって
しまう。本項の発明においては、このようなロック状態
が発生した場合には、実行情報退避記憶装置７１に実行
情報を退避することによって、命令流Ｂに対してロード
ストアユニットの使用を許可できるようにするものであ
る。Similarly to the invention described in claim 1, the function execution unit 13 is an integer operation unit, a floating point unit, and a load store unit, respectively. Also, two instruction streams that are executed simultaneously are referred to as instruction stream A and instruction stream B. In the instruction stream A, the load instruction is the instruction issuing device 12
Issued to the load store unit from However, if the desired data does not exist in the buffer device (cache memory), the data is transferred from the main storage device with a large delay. Therefore, the instruction is interlocked in the load / store unit. When this happens,
In the case of the invention described in claim 1, the load / store unit cannot be used until the interlock can be avoided. In the invention of this section, when such a lock state occurs, the execution information is saved in the execution information saving storage device 71 so that the instruction stream B can be permitted to use the load / store unit. To do.

【００４１】この拡張を行なうことによって、インタロ
ックが発生した場合でも、他の命令流に対して機能実行
装置を割り当てることが可能となり、機能実行装置の利
用率が向上し、スループット及びプログラムの実行処理
速度を向上させることが可能となる。By performing this extension, even if an interlock occurs, the function execution device can be assigned to another instruction stream, the utilization rate of the function execution device is improved, and throughput and program execution are improved. It is possible to improve the processing speed.

【００４２】次に、請求項８に記載の発明について、図
面を参照しながら説明する。本発明は、命令流毎に使用
することができる機能実行装置１３に対して制限を設け
て、命令流毎に用意される機能実行のための一時記憶装
置（レジスタ）８１もしくは、バッファ記憶装置と、機
能実行装置１３との間の結合を簡単化したことを特徴と
する。Next, the invention described in claim 8 will be described with reference to the drawings. The present invention provides a temporary storage device (register) 81 or a buffer storage device for executing a function prepared for each instruction stream by limiting the function execution apparatus 13 that can be used for each instruction stream. , And the function execution device 13 is simplified in connection.

【００４３】まず、図９は請求項１に記載の実施例であ
るが、図１と比較して、一時記憶装置（レジスタ）８７
を図示したことと同時動作の命令流を３、機能実行装置
１３を６個に増やしたことが異なっている。次に、図１
０は本発明による制御方式を適用したプロセッサの実施
例である。図９と比べて、資源割当に制限を設けたこと
を特徴としている。図１０では、３つの命令流をそれぞ
れ、命令流Ａ、命令流Ｂ、命令流Ｃとすると機能実行装
置８１と機能実行装置８６は命令流Ａと命令流Ｃのみ、
機能実行装置８２と機能実行装置８３は命令流Ａと命令
流Ｂのみ、機能実行装置８４と機能実行装置８５は命令
流Ｂと命令流Ｃのみの命令を実行する。このため、命令
解読装置１２と機能実行装置１３間および、機能実行装
置１３と一時記憶装置８７間の結合が簡単化される。ま
た、割り当てに制限を設けることによって、資源割当制
御装置１４の内部構造も簡単化される。また、命令流毎
の機能実行装置１３の利用範囲を隣接命令間に限定する
ことにより、結合同士の交差が少なくなり実装上も有利
である。First, FIG. 9 shows an embodiment described in claim 1. Compared with FIG. 1, a temporary storage device (register) 87 is provided.
3 is different from that shown in FIG. 2 in that the number of simultaneous operation instructions is increased to 3 and the number of function execution units 13 is increased to 6. Next, FIG.
Reference numeral 0 is an embodiment of a processor to which the control method according to the present invention is applied. Compared with FIG. 9, the feature is that the resource allocation is limited. In FIG. 10, assuming that the three instruction streams are the instruction stream A, the instruction stream B, and the instruction stream C, the function execution device 81 and the function execution device 86 have only the instruction flow A and the instruction flow C, respectively.
The function execution unit 82 and the function execution unit 83 execute only the instruction stream A and the instruction stream B, and the function execution units 84 and 85 execute the instruction stream B and the instruction stream C only. Therefore, the connection between the instruction decoding device 12 and the function execution device 13 and the connection between the function execution device 13 and the temporary storage device 87 are simplified. In addition, the internal structure of the resource allocation control device 14 is also simplified by setting the restriction on the allocation. In addition, by limiting the use range of the function execution device 13 for each instruction stream between adjacent instructions, the number of intersections between the connections is reduced, which is advantageous in implementation.

【００４４】本発明は、同一機能の機能実行装置１３を
複数持つようなプロセッサにおいて特に有効である。プ
ロセッサの集積度が向上し、プロセッサ内の同時動作命
令流と機能実行装置１３が増えた場合には、本項に示す
制限つき割当機構によって、実装の簡単化、動作周波数
の向上を可能とし、機能実行装置の割当制限による利用
率低下を十分補うことが可能である。また、請求項３、
請求項４に記載の発明を組み合わせることによって、さ
らに効率の向上を図ることが可能である。The present invention is particularly effective in a processor having a plurality of function executing devices 13 having the same function. When the integration degree of the processor is improved and the number of simultaneous operation instructions in the processor and the function execution device 13 are increased, the limited allocation mechanism shown in this section enables the simplification of the implementation and the improvement of the operating frequency. It is possible to sufficiently compensate for the decrease in the utilization rate due to the allocation limitation of the function execution device. In addition, claim 3,
By combining the invention described in claim 4, it is possible to further improve the efficiency.

【００４５】[0045]

【発明の効果】本発明では、複数の命令流を並列動作さ
せる演算装置において、演算の中心となる機能実行装置
を命令流間で共有することによって、機能実行装置を有
効に利用することができる。また、従来例では、命令を
解読後に機能実行装置のスケジューリングを行なってい
るので、命令実行遅延が増加し、制御が複雑になる問題
があるが、本発明の方式では、命令解読前にあらかじめ
使用可能な機能実行装置を命令流毎に与えてしまうた
め、機能実行装置の使用効率が若干低下するが、制御が
簡素になり動作周波数を向上できるという利点によっ
て、使用効率低下を補って余りあるものとなる。また、
命令毎の性質に合わせたスケジューリングをソフトウェ
ア的、ハードウェア的に行なうことによって、利用効率
の低下を最小限に抑えることが可能である。さらに、複
数の命令流が同時に実行される場合に、使用できる機能
実行装置をあらかじめ限定することによって、資源割当
制御装置の簡略化が可能となる。以上のことから、本発
明における複数の命令流を並列動作させる演算装置の方
式は、命令の高速動作、コスト低減に非常に有効であ
る。According to the present invention, in an arithmetic unit for operating a plurality of instruction streams in parallel, the function executing unit, which is the center of the arithmetic operation, is shared between the instruction streams, so that the function executing unit can be effectively used. . Further, in the conventional example, since the function execution device is scheduled after the instruction is decoded, there is a problem that the instruction execution delay increases and the control becomes complicated. However, in the method of the present invention, it is used before the instruction is decoded. Since the available function execution device is given for each instruction stream, the use efficiency of the function execution device is slightly reduced, but the use efficiency decrease is more than compensated by the advantage that the control is simple and the operating frequency can be improved. Becomes Also,
It is possible to minimize the decrease in utilization efficiency by performing scheduling in software or hardware according to the nature of each instruction. Furthermore, when a plurality of instruction streams are simultaneously executed, the resource execution control device can be simplified by limiting the function execution devices that can be used in advance. From the above, the method of the arithmetic device for operating a plurality of instruction streams in parallel according to the present invention is very effective for high-speed operation of instructions and cost reduction.

【００４６】[0046]

[Brief description of drawings]

【図１】本発明によるプロセッサの制御方法を示すため
のプロセッサの実施例の構成図である。FIG. 1 is a configuration diagram of an embodiment of a processor for illustrating a method of controlling a processor according to the present invention.

【図２】従来方式の機能実行装置の命令流間共用方式の
実施例の構成図である。FIG. 2 is a configuration diagram of an embodiment of an inter-instruction flow sharing method of a conventional function execution device.

【図３】分散型資源割当制御装置を用いた実施例の構成
図である。FIG. 3 is a configuration diagram of an embodiment using a distributed resource allocation control device.

【図４】非順序命令発行を行なう命令解読装置を用いた
実施例の構成図である。FIG. 4 is a configuration diagram of an embodiment using an instruction decoding device that issues an out-of-order instruction.

【図５】命令同時発行を行なう命令解読装置を用いた実
施例の構成図である。FIG. 5 is a configuration diagram of an embodiment using an instruction decoding device that simultaneously issues instructions.

【図６】ソフトウェア優先度指示型資源割当制御装置を
用いた実施例の構成図である。FIG. 6 is a configuration diagram of an embodiment using a software priority indicating resource allocation control device.

【図７】動的優先度可変型資源割当制御装置を用いた実
施例の構成図である。FIG. 7 is a configuration diagram of an embodiment using a dynamic priority variable resource allocation control device.

【図８】機能実行装置に実行情報退避装置を付加した実
施例の構成図である。FIG. 8 is a configuration diagram of an embodiment in which an execution information saving device is added to the function executing device.

【図９】図１と同等の実施例の構成図であり、新たに一
時記憶装置（レジスタ）を表現している。9 is a configuration diagram of an embodiment equivalent to FIG. 1, in which a temporary storage device (register) is newly represented.

【図１０】利用できる機能実行装置を命令流毎に制限し
た実施例の構成図である。FIG. 10 is a configuration diagram of an embodiment in which usable function execution devices are limited for each instruction stream.

[Explanation of symbols]

１１命令取得装置１２命令解読装置１３機能実行装置１４資源割当制御装置２１命令依存解析装置２２命令調停装置３１分散型資源割当制御装置４１命令取得装置／命令キュー５１ソフトウェア優先度指示型資源割当制御装置６１動的優先度可変型資源割当制御装置７１実行情報退避記憶装置８１〜８６機能実行装置８７一時記憶装置（レジスタ） 11 instruction acquisition device 12 instruction decoding device 13 function execution device 14 resource allocation control device 21 instruction dependence analysis device 22 instruction arbitration device 31 distributed resource allocation control device 41 instruction acquisition device / instruction queue 51 software priority directed resource allocation control device 61 dynamic priority variable type resource allocation control device 71 execution information save storage device 81-86 function execution device 87 temporary storage device (register)

Claims

[Claims]

1. In an arithmetic device having a plurality of instruction acquisition devices, a plurality of instruction decoding devices, and a plurality of function execution devices, and executing information processing simultaneously for a plurality of instruction streams, the instruction acquisition device is a storage device or the like. The instruction fetching device fetches the instruction, the instruction decoding device decodes the instruction obtained from the instruction acquiring device and activates the processing to the required function executing device, the function executing device executes the required function according to the instruction, and the resource allocation control device, A method for controlling an arithmetic unit, characterized in that the present instruction execution status is checked from an instruction acquisition unit, an instruction decoding unit, and a function execution unit, and the instruction decoding unit that issues an instruction next is given permission to use each function execution unit.

2. The resource allocation control device according to claim 1,
The control method for an arithmetic unit according to claim 1, wherein the function execution units are distributed.

3. The instruction allocation apparatus according to claim 1, wherein there is no dependency before or after the instruction with respect to the available function execution apparatus, or the dependency can be resolved, and the resource allocation control apparatus. When the use permission status of the function execution device obtained from step 1 is that the first instruction cannot be executed and the second and subsequent instructions are in the executable state, the function execution device is expanded to execute the instructions out of order. The method for controlling the arithmetic unit according to claim 1.

4. The instruction decoding apparatus according to claim 1, wherein there is no dependency before or after an instruction with respect to an available function execution apparatus, or the dependency can be resolved, and a resource allocation control apparatus. The control method of the arithmetic unit according to claim 1, wherein when the use permission of a plurality of function execution units is obtained from the above, it is expanded so as to issue instructions simultaneously.

5. The resource allocation control device according to claim 1, wherein the software permits the resource allocation control device to give priority to the permission of use of the function execution device for each instruction stream, and the function execution device is not used in the instruction stream. 2. The method for controlling an arithmetic unit according to claim 1, wherein the permission of use of the instruction is prohibited and the execution of each instruction stream is optimized.

6. The resource allocation control device according to claim 1, wherein information on instruction issuance from an instruction decoding device to a function execution device is checked, and resource allocation for each subsequent instruction stream is optimized based on that information. The control method of the arithmetic unit according to claim 1, wherein

7. The function execution device in the arithmetic unit according to claim 1, when a storage device for saving one or more pieces of execution information is added, and an event of interrupting execution occurs,
The function execution device in which the event of interrupting the execution occurs is used for information processing by another instruction by saving the processing information in the storage device and transmitting the saving information to the resource allocation control device. 1. A method for controlling the arithmetic unit according to 1.

8. An arithmetic unit having a plurality of instruction acquisition units, a plurality of instruction decoding units, and a plurality of function execution units, which is used for each instruction stream in an arithmetic unit which simultaneously executes information processing for a plurality of instruction streams. By setting a limit on the function execution device capable of executing, a temporary storage device and a buffer storage device for function execution prepared for each instruction stream, coupling between the function execution device, and the function execution device and instruction decoding An arithmetic unit characterized by simplifying the connection with the unit.