JP3451921B2

JP3451921B2 - Processor

Info

Publication number: JP3451921B2
Application number: JP08336898A
Authority: JP
Inventors: 岳人瓶子; 哲也田中; 信生桧垣; 秀一高山; 謙介小谷
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-03-30
Filing date: 1998-03-30
Publication date: 2003-09-29
Anticipated expiration: 2018-03-30
Also published as: JPH11282674A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、プロセッサに関
し、特に並列処理による実行サイクル数の削減とコード
効率の向上を図る技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a processor, and more particularly to a technique for reducing the number of execution cycles and improving code efficiency by parallel processing.

【０００２】[0002]

【従来の技術】近年のマイクロプロセッサ応用製品の高
機能化及び高速化に伴い、高い処理性能を持つマイクロ
プロセッサ（以下、単に「プロセッサ」という。）が望
まれている。そして、これを実現する技術の１つとして
１サイクルに複数の命令を同時に実行するものがあり、
ＶＬＩＷ（Very Long Instruction Word）方式のプロセ
ッサもその技術の１つである。2. Description of the Related Art With the recent advancement in functionality and speed of microprocessor application products, a microprocessor having high processing performance (hereinafter simply referred to as "processor") is desired. And, as one of the techniques for realizing this, there is one that simultaneously executes a plurality of instructions in one cycle,
A VLIW (Very Long Instruction Word) type processor is one of the technologies.

【０００３】このＶＬＩＷ方式のプロセッサは、実行コ
ード生成時にコンパイラ等により静的に命令間の依存関
係を解析し、命令コードの移動を行って実行効率の良い
命令ストリームを生成するものである。この方式は、動
的に命令間の依存関係を解析するスーパースカラ方式の
プロセッサと比べてハードウェアを簡略化でき、このた
め動作周波数を上げやすいという長所を有する。This VLIW type processor statically analyzes the dependency relationship between instructions by a compiler or the like when generating an execution code, moves the instruction code, and generates an instruction stream with high execution efficiency. This method has a merit that the hardware can be simplified as compared with a superscalar processor that dynamically analyzes the dependency between instructions, and therefore the operating frequency can be easily increased.

【０００４】しかし、ＶＬＩＷ方式では、一般に命令を
固定長として取り扱うため、次のような問題がある。However, in the VLIW system, since the instruction is generally treated as a fixed length, there are the following problems.

【０００５】すなわち、長い定数を扱う命令の指定には
多くのビット数を必要とするが、レジスタ間演算命令の
指定にはそれほど多くのビット数を必要としない等、命
令毎に指定に必要なビット数にばらつきがある。ところ
が、ＶＬＩＷ方式では命令を固定長として取り扱うた
め、短いビット数しか必要としない命令を指定するのに
必要以上のビット数を用いざるを得ず、コードサイズが
大きくなってしまう。That is, a large number of bits is required to specify an instruction that handles a long constant, but a very large number of bits is not necessary to specify an inter-register operation instruction. There are variations in the number of bits. However, since the VLIW method handles an instruction as a fixed length, it is inevitable to use an excessive number of bits to specify an instruction that requires only a short number of bits, resulting in a large code size.

【０００６】そして、この問題を解決する１つの手段と
して、命令長を可変とすることが考えられる。As one means for solving this problem, it is possible to make the instruction length variable.

【０００７】図１３は、１つの命令が１個または２個の
命令構成要素（ここでは「ユニット」と呼ぶ）にて構成
され、３つの命令を同時実行可能なプロセッサの命令レ
ジスタ周辺の構成を示すブロック図である。同図におい
て、破線は制御信号を表している。図１３においてユニ
ットキュー５０は、ユニットの並びであり、命令メモリ
等から供給された順にユニットを命令レジスタに転送し
ていく。FIG. 13 shows the configuration around the instruction register of a processor in which one instruction is composed of one or two instruction constituent elements (herein referred to as "unit") and which is capable of simultaneously executing three instructions. It is a block diagram shown. In the figure, the broken line represents the control signal. In FIG. 13, the unit queue 50 is an array of units and transfers the units to the instruction register in the order supplied from the instruction memory or the like.

【０００８】この構成では、命令レジスタＡ５２ａと命
令レジスタＢ５２ｂ、命令レジスタＣ５２ｃと命令レジ
スタＤ５２ｄ、命令レジスタＥ５２ｅと命令レジスタＦ
５２ｆがそれぞれ対になっており、命令は常に命令レジ
スタＡ５２ａ、命令レジスタＣ５２ｃ又は命令レジスタ
Ｅ５２ｅの３つのレジスタのいずれかを先頭として命令
レジスタに格納され、２つのユニットを連結して１つの
命令を構成する場合にのみ、対となっているもう一方の
命令レジスタにユニットが転送される。したがって、命
令レジスタＡ５２ａに転送されたユニットがそのユニッ
ト単体で命令を構成する場合には、命令レジスタＢ５２
ｂにはユニットが転送されないことになる。In this configuration, the instruction register A52a and the instruction register B52b, the instruction register C52c and the instruction register D52d, the instruction register E52e and the instruction register F.
52f are paired, and the instruction is always stored in the instruction register with any one of the three registers of the instruction register A52a, the instruction register C52c, or the instruction register E52e as a head, and the two units are connected to form one instruction. Only when configured does the unit transfer to the other pair of instruction registers. Therefore, when the unit transferred to the instruction register A52a constitutes an instruction by the unit itself, the instruction register B52
No units will be transferred to b.

【０００９】図１３を見るとわかるように、この構成で
はユニットキュー５０の各ユニットがいずれの命令レジ
スタに転送されるのかが一意に決まっていない。また、
各命令レジスタへ転送されるユニットがユニットキュー
５０のいずれのユニットなのかが一意に決まっていな
い。そこで、セレクタ５１ａ〜５１ｄを制御して転送す
るユニットを選択することになる。さらに、これらのセ
レクタの制御は全体を一度に決定することができず、ま
ずセレクタ５１ａ、セレクタ５１ｂの制御が決定され、
命令レジスタＣへ転送されるユニットが決定してから、
このユニット内の命令長に関する情報を参照して、図中
破線で示すようにセレクタ５１ｃ、セレクタ５１ｄの制
御を決定する。As can be seen from FIG. 13, in this configuration, which instruction register each unit of the unit queue 50 is transferred to is not uniquely determined. Also,
Which unit of the unit queue 50 is to be transferred to each instruction register is not uniquely determined. Therefore, the selectors 51a to 51d are controlled to select the unit to be transferred. Furthermore, the control of these selectors cannot be determined at once, and the control of the selectors 51a and 51b is first determined.
After determining the unit to be transferred to the instruction register C,
By referring to the information about the instruction length in this unit, the control of the selector 51c and the selector 51d is determined as shown by the broken line in the figure.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、上記従
来のプロセッサでは、ユニットキューから命令レジスタ
への転送の際の遅延が大幅に大きくなるという問題点が
あった。これは、先行する命令レジスタに転送されたユ
ニット内の命令長に関する情報を参照しなければ、当該
命令レジスタに関するセレクタの制御を決定できないか
らである。また、並列度が増すに従って、転送すべき命
令レジスタの数が増加するので、この遅延はさらに大き
くなっていく。However, the above-mentioned conventional processor has a problem that the delay in transferring from the unit queue to the instruction register becomes significantly large. This is because the control of the selector for the instruction register cannot be determined without referring to the information regarding the instruction length in the unit transferred to the preceding instruction register. Further, as the degree of parallelism increases, the number of instruction registers to be transferred increases, and this delay becomes even larger.

【００１１】一方、ユニットキュー内のユニットと命令
レジスタとの対応を一対一とし、図１３に示したプロセ
ッサで問題となっていたユニットキューから命令レジス
タへの転送におけるセレクタによる遅延の問題を解消す
る技術として、図１４に示すものがある。このプロセッ
サでは、命令を構成する可能性のある全てのユニットの
組み合わせについてデコードしておき、先行する命令デ
コーダから出力される命令長の情報によって、デコード
結果を選択して使用する。具体的には、図中破線で示す
ように、第１命令デコーダ５３ｄが出力する情報により
セレクタ５１ｅの制御を決定し、上記情報と第２命令デ
コーダ５３ｅまたは第３命令デコーダ５３ｆが出力する
情報によりセレクタ５１ｆの制御を決定する。On the other hand, the correspondence between the units in the unit queue and the instruction register is made one-to-one, and the problem of delay due to the selector in the transfer from the unit queue to the instruction register, which has been a problem in the processor shown in FIG. 13, is solved. As a technique, there is one shown in FIG. In this processor, decoding is performed for all combinations of units that may form an instruction, and the decoding result is selected and used based on the instruction length information output from the preceding instruction decoder. Specifically, as indicated by the broken line in the figure, the control of the selector 51e is determined based on the information output from the first instruction decoder 53d, and based on the above information and the information output from the second instruction decoder 53e or the third instruction decoder 53f. The control of the selector 51f is determined.

【００１２】ところが、３つの命令を同時実行するため
に、２ユニット長の命令を解読するデコーダを５つも必
要とし、ハードウェアが非常に大きくなるという問題点
がある。However, in order to execute three instructions at the same time, five decoders for decoding an instruction having a length of two units are required, which causes a problem that the hardware becomes very large.

【００１３】そこで、本発明はかかる問題点を鑑みてな
されたものであり、命令レベルの並列実行に際して、ハ
ードウェア複雑化の問題を克服しつつ、性能向上とコー
ド効率向上を両立するプロセッサを提供することを目的
とする。Therefore, the present invention has been made in view of the above problems, and provides a processor that can achieve both performance improvement and code efficiency improvement while overcoming the problem of hardware complication in parallel execution at the instruction level. The purpose is to do.

【００１４】[0014]

【課題を解決するための手段】上記目的を達成するため
に請求項１記載のプロセッサは、可変長のビット長から
なる実行単位に含まれる可変長のビット長からなる単位
命令を並列実行できるプロセッサであって、命令を順次
フェッチし、出力する命令供給発行部と、複数の解読手
段を備え、前記命令供給発行部より出力された命令を解
読する解読部と、前記解読部により解読された命令を実
行する実行部とを備え、前記単位命令のうち最小のビッ
ト長の単位命令以外は、前記解読手段により解読される
部分と前記解読手段により解読されない部分とを有し、
前記実行単位の最大ビット長は、前記複数の解読手段の
合計のビット長よりも大きいことを特徴とする。 In order to achieve the above-mentioned object, the processor according to claim 1 has a variable bit length.
Unit consisting of variable-length bit length included in the execution unit
A processor that can execute instructions in parallel,
An instruction supply and issue unit that fetches and outputs, and multiple interpreters
A step for solving the instruction output from the instruction supply and issue section.
The deciphering unit to read and the instructions deciphered by the deciphering unit are executed.
And an execution unit that executes the smallest bit among the unit instructions.
Except for unit instruction of G length, it is decoded by the decoding means.
A part and a part not decoded by the decoding means,
The maximum bit length of the execution unit depends on the plurality of decoding means.
It is characterized by being larger than the total bit length.

【００１５】請求項２記載のプロセッサは、前記実行単
位が最大ビット長の場合には、前記実行単位の最後の単
位命令が最大ビット長からなる単位命令であることを特
徴とする請求項１記載のプロセッサである。 A processor according to claim 2 is the execution unit.
If the unit has the maximum bit length, the last unit of the execution unit is
The unit instruction is a unit instruction with the maximum bit length.
The processor according to claim 1, which is a characteristic of the processor.

【００１６】請求項３記載のプロセッサは、最大のビッ
ト長の前記単位命令は、最小のビット長の単位命令と同
じビット長のオペレーションを含んだ語素と定数のみで
構成された語素からなり、前記単位命令はビット長に関
わらず１つの解読手段によって前記最小のビット長の部
分が解読されることを特徴とする請求項１又は２記載の
プロセッサである。 The processor according to claim 3 has the maximum bit size.
The unit instruction with the same bit length is the same as the unit instruction with the smallest bit length.
Only a word element and a constant containing the same bit length operation
The unit instruction is composed of word elements, and the unit instruction
However, by one decoding means, the part of the minimum bit length is
Minutes are deciphered, according to claim 1 or 2.
It is a processor.

【００１７】請求項４記載のプロセッサは、前記プロセ
ッサは、さらに、前記解読部によって解読する対象とな
る命令を格納する命令レジスタを有し、前記命令レジス
タと前記解読手段とは１対１に対応していることを特徴
とする請求項２から３いずれかに記載のプロセッサであ
る。 A processor according to claim 4 is the processor.
Furthermore, the decoder is not a target for decryption by the decryption unit.
And an instruction register for storing an instruction.
Data and the decryption means have a one-to-one correspondence
The processor according to any one of claims 2 to 3,
It

【００１８】請求項５記載のプロセッサは、前記解読手
段は、前記最大のビット長の単位命令を解読する場合に
は、前記定数を格納した前記命令レジスタに対応する解
読手段を無効化することを特徴とする請求項４記載のプ
ロセッサである。 The processor according to claim 5 is the decoding unit.
When decoding a unit instruction with the maximum bit length,
Is the solution corresponding to the instruction register that stores the constant.
5. The program according to claim 4, wherein the reading means is invalidated.
It is a Rossa.

【００１９】請求項６記載のプロセッサは、前記単位命
令が１または複数の語素から構成されることを特徴とす
る請求項１から５いずれかに記載のプロセッサである。 According to a sixth aspect of the present invention, in the processor, the unit instruction
Characterized in that the decree is composed of one or more word elements
The processor according to any one of claims 1 to 5.

【００２０】請求項７記載のプロセッサは、前記命令発
行供給部は所定数の語素を単位として、前記解読部に命
令を出力し、前記単位命令の各々は、前記所定数の語素
における場所により、前記複数の解読手段のいずれに入
力されるかが一意に決まることを特徴とする請求項６記
載のプロセッサである。 The processor according to claim 7 issues the instruction.
The line supply unit instructs the decoding unit in units of a predetermined number of word elements.
An instruction, and each of the unit instructions has a predetermined number of word elements.
Depending on the location in the
7. The method according to claim 6, wherein whether or not the force is applied is uniquely determined.
It is a built-in processor.

【００２１】請求項８記載のプロセッサは、前記実行単
位に、並列度に関する情報が明示的に付与されることを
特徴とする請求項１から７いずれかに記載のプロセッサ
である。 A processor according to claim 8 is the execution unit.
Information about parallelism is explicitly added to the
The processor according to any one of claims 1 to 7, characterized in that
Is.

【００２２】請求項９記載のプロセッサは、前記並列度
に関する情報が、前記実行単位の境界に関する情報であ
ることを特徴とする請求項８記載のプロセッサである。 The processor according to claim 9 is characterized in that the parallel degree is
Is information on the boundary of the execution unit.
9. The processor according to claim 8, wherein:

【００２３】請求項１０記載のプロセッサは、単位命令
の長さに関する情報が、各々の単位命令中に明示的に付
与されることを特徴とする請求項１から９いずれかに記
載のプロセッサである。 The processor according to claim 10 is a unit instruction
Information about the length of the
Given in any one of claims 1 to 9 characterized in that
It is a built-in processor.

【００２４】請求項１１記載のプロセッサは、前記解読
部が発行する解読結果の長さを制御する命令発行制御部
をさらに備えることを特徴とする請求項１から１０いず
れかに記載のプロセッサである。 The processor according to claim 11, wherein the decryption is performed.
Command issue control unit that controls the length of the decoding result issued by the department
11. The method according to claim 1, further comprising:
The processor described there.

【００２５】請求項１２記載のプロセッサは、前記解読
手段の各々の解読結果を有効とするか無効とするかを決
定する命令発行制御部をさらに備えることを特徴とする
請求項１から１１いずれかに記載のプロセッサである。 The processor according to claim 12 is the decoding device.
Decide whether to enable or disable the decoding result of each of the means
Further comprising a command issuing control unit for determining
A processor according to any one of claims 1 to 11.

【００２６】請求項１３記載のプロセッサは、前記プロ
セッサが実行する命令列が、前記実行単位へと静的にス
ケジューリングされることを特徴とする請求項１から１
２いずれかに記載のプロセッサである A processor according to claim 13 is the processor
The instruction sequence executed by the sessa is statically transferred to the execution unit.
The method according to claim 1, wherein it is scheduled.
2 is a processor according to any one of

【００２７】請求項１４記載のプロセッサは、可変長の
ビット長からなる実行単位に含まれる可変長のビット長
からなる単位命令を最大Ｎ（Ｎ：２以上の整数）個並列
実行できるプロセッサであって、前記実行単位のビット
長は、命令フェッチする命令長には限られず可変であ
り、前記単位命令のうち最大のビット長のものをＮ個並
列実行する実行単位のビット長よりも短い所定のビット
長以下の実行単位のみを解読することを特徴とするプロ
セッサである。 A processor according to claim 14 has a variable length.
Variable length bit length included in the execution unit consisting of bit length
Up to N (N: integer of 2 or more) unit instructions consisting of
Executable processor, bit of the execution unit
The length is not limited to the instruction length for instruction fetch, and it is variable.
Of the unit instructions with the maximum bit length
Predetermined bits that are shorter than the bit length of the execution unit for column execution
Professionals characterized by decoding only run units of length or less
It's Sessa.

【００２８】請求項１５記載のプロセッサは、前記命令
フェッチする命令長よりも長い実行単位を解読すること
が可能であることを特徴とする請求項１４記載のプロセ
ッサである。 A processor according to claim 15 is characterized in that the instruction is executed.
Decoding an execution unit longer than the fetched instruction length
15. The process according to claim 14, characterized in that
It's a sass.

【００２９】請求項１６記載のプロセッサは、可変長の
ビット長からなる実行単位に含まれる可変長のビット長
からなる単位命令を最大Ｎ（Ｎ：２以上の整数）個並列
実行できるプロセッサであって、前記実行単位には、並
列度に関する情報が明示的に付与されており、前記単位
命令のうち最大のビット長のものをＮ個並列実行する実
行単位のビット長よりも短い所定のビット長以下の実行
単位のみを解読することを特徴とするプロセッサであ
る。 A processor according to claim 16 has a variable length.
Variable length bit length included in the execution unit consisting of bit length
Up to N (N: integer of 2 or more) unit instructions consisting of
An executable processor, wherein the execution unit is a parallel processor
Information regarding the degree of line is explicitly added, and the unit
Actually execute N instructions with the maximum bit length in parallel
Execution with a predetermined bit length shorter than the bit length of each row
A processor characterized by decoding only units
It

【００３０】請求項１７記載のプロセッサは前記プロセ
ッサは命令を解読する解読部を有し、前記解読部には、
前記単位命令のうち最大のビット長のものをＮ個並列実
行する実行単位のビット長よりも短いビット長の命令が
供給されることを特徴とする請求項１４から１６いずれ
かに記載のプロセッサである。 A processor according to claim 17 is the processor.
The decoder has a decoding unit for decoding an instruction, and the decoding unit has
Among the unit instructions, the one with the maximum bit length is executed in parallel.
An instruction with a bit length shorter than the bit length of the execution unit
Any of claims 14 to 16 being provided.
The processor described in Crab.

【００３１】請求項１８記載のプロセッサは、可変長の
ビット長からなる実行単位に含まれる可変長のビット長
からなる単位命令を最大Ｎ個並列実行できるプロセッサ
であって、前記単位命令の長さは最大をＭビット（Ｍ：
２以上の整数）として複数通りあり、第１の固定長のビ
ット長を単位として命令フェッチし、第２の固定長のビ
ット長を単位として出力する命令供給発行部と、前記命
令供給発行部より出力された前記第２の固定長のビット
長のうち、可変長のビット長の解読結果を発行する解読
部とを備え、前記第２の固定長は、Ｍ×Ｎビットより短
い長さに制限されていることを特徴とするプロセッサで
ある。 A processor according to claim 18 has a variable length.
Variable length bit length included in the execution unit consisting of bit length
Processor that can execute up to N parallel unit instructions consisting of
The maximum length of the unit instruction is M bits (M:
Multiple integers (2 or more), the first fixed-length
The instruction is fetched in units of
Command supply and issue unit that outputs the unit
The second fixed-length bit output from the command supply issuing unit
Decoding that issues a variable length bit length decoding result out of the length
And the second fixed length is shorter than M × N bits.
With a processor that is limited to
is there.

【００３２】請求項１９記載のプロセッサは、前記第２
の固定長が前記第１の固定長よりも長いことを特徴とす
る請求項１８記載のプロセッサである。 The processor according to claim 19 is the second
Has a fixed length longer than the first fixed length.
19. The processor according to claim 18,

【００３３】請求項２０記載のプロセッサは、前記プロ
セッサが並列に実行する前記単位命令のビット長の組み
合わせが所定の制限を満たすように、前記実行単位が静
的にスケジューリングされていることを特徴とする請求
項１９記載のプロセッサである。 A processor according to claim 20 is the processor.
A set of bit lengths of the unit instructions that the sessa executes in parallel
The execution units are static so that the alignment meets the specified restrictions.
Claims that are dynamically scheduled
The processor according to item 19.

【００３４】請求項２１記載のプロセッサは、前記所定
の制限は、前記第２の固定長のビット長を全て発行する
場合には、前記第２の固定長のビット長のうち最後尾に
はビット長がＭビットからなる単位命令が配置される制
限であることを特徴とする請求項２０記載のプロセッサ
である。 According to a twenty-first aspect of the present invention, in the processor, the predetermined
The limitation is that all the bit lengths of the second fixed length are issued.
In this case, at the end of the second fixed-length bit length,
Is a control in which a unit instruction with a bit length of M bits is arranged.
21. Processor according to claim 20, characterized in that
Is.

【００３５】請求項２２記載のプロセッサは、前記所定
の制限は、前記解読部へ出力されるビット長のうちオペ
コードが先頭から所定長以内に配置されるように設けら
れた制限であることを特徴とする請求項２０記載のプロ
セッサである。 According to a twenty-second aspect of the present invention, in the processor, the predetermined
Is limited to the operation of the bit length output to the decoding unit.
Provide it so that the code is placed within the specified length from the beginning.
21. Professional according to claim 20, characterized in that it is a restricted limitation.
It's Sessa.

【００３６】請求項２３記載のプロセッサは、前記実行
単位に、並列度に関する情報が明示的に付与されること
を特徴とする請求項１４または１８記載のプロセッサで
ある。 The processor according to claim 23, wherein the execution is performed.
Information related to the degree of parallelism is explicitly added to the unit.
19. The processor according to claim 14, wherein
is there.

【００３７】請求項２４記載のプロセッサは、前記並列
度に関する情報が、前記実行単位の境界であることを特
徴とする請求項２３記載のプロセッサである。 A processor according to claim 24 is the parallel processor.
The information about the degree is the boundary of the execution unit.
24. The processor according to claim 23, which is a feature.

【００３８】請求項２５記載のプロセッサは、単位命令
の長さに関する情報が、各々の単位命令中に明示的に付
与されることを特徴とする請求項１４から２４いずれか
に記載のプロセッサである。 The processor according to claim 25 is a unit instruction.
Information about the length of the
25. Any one of claims 14 to 24 is provided.
The processor described in 1.

【００３９】請求項２６記載のプロセッサは、前記解読
部が発行する解読結果の長さを制御する命令発行制御部
をさらに備えることを特徴とする請求項１７から２２い
ずれかに記載のプロセッサである。 The processor according to claim 26 is characterized in that the decryption is performed.
Command issue control unit that controls the length of the decoding result issued by the department
23. The method according to claim 17, further comprising:
It is a processor described in some cases.

【００４０】請求項２７記載のプロセッサは、前記プロ
セッサが実行する命令列が、前記実行単位へと静的にス
ケジューリングされることを特徴とする請求項１４から
２６いずれかに記載のプロセッサである。 The processor according to claim 27 is the processor.
The instruction sequence executed by the sessa is statically transferred to the execution unit.
15. From claim 14, characterized in that it is scheduled.
26. The processor according to any one of 26.

【００４１】[0041]

【発明の実施の形態】以下、本発明に係るプロセッサの
実施の形態について、図面を用いて詳細に説明する。（命令フォーマットとアーキテクチャの概要）まず、本プロセッサが解読実行する命令（特許請求の範
囲に記載する「単位命令」に相当する。）の構造につい
て説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of a processor according to the present invention will be described below in detail with reference to the drawings. (Outline of Instruction Format and Architecture) First, the structure of an instruction (corresponding to a “unit instruction” described in the claims) executed and decoded by this processor will be described.

【００４２】図１（ａ）〜図１（ｅ）は本プロセッサの
命令フォーマットを示す図である。FIGS. 1A to 1E are diagrams showing the instruction format of this processor.

【００４３】本プロセッサの各命令は、２１ビットの命
令構成要素（ここでは「ユニット」と呼ぶ。特許請求の
範囲に記載する「語素」に相当する。）にて構成されて
おり、ユニット１つで構成される２１ビット命令とユニ
ット２つで構成される４２ビット命令の２種類の命令フ
ォーマットが存在する。各命令がいずれの長さの命令で
あるかは、１ビットのフォーマット情報１１によって決
定される。具体的には、フォーマット情報１１が“０”
の時はそのユニット単体で２１ビット命令となり、フォ
ーマット情報１１が“１”の時はそのユニットとそれに
後続するユニットとを連結して４２ビット命令となる。Each instruction of this processor is made up of a 21-bit instruction constituent element (herein referred to as a "unit", which corresponds to a "word element" described in the claims). There are two types of instruction formats, a 21-bit instruction composed of two and a 42-bit instruction composed of two units. The length of each instruction is determined by the 1-bit format information 11. Specifically, the format information 11 is “0”
In the case of, the unit itself becomes a 21-bit instruction, and when the format information 11 is "1", the unit and the subsequent unit are connected to form a 42-bit instruction.

【００４４】また、各命令には１ビットの並列実行境界
情報１０を持たせてある。この情報は、この命令とそれ
に後続する命令との間に並列実行の境界が存在するか否
かを示すものである。具体的には、並列実行境界情報１
０が“１”の時はその命令と後続命令の間に並列実行の
境界が存在し、並列実行境界情報１０が“０”の時には
並列実行の境界が存在しないことになる。この情報の利
用方法については後で述べる。Each instruction has 1-bit parallel execution boundary information 10. This information indicates whether or not there is a parallel execution boundary between this instruction and the instruction that follows it. Specifically, parallel execution boundary information 1
When 0 is “1”, there is a parallel execution boundary between the instruction and the subsequent instruction, and when parallel execution boundary information 10 is “0”, there is no parallel execution boundary. How to use this information will be described later.

【００４５】各命令の命令長からフォーマット情報１１
と並列実行境界情報１０を除いた部分にてオペレーショ
ンを指定する。２１ビット命令では１９ビット、４２ビ
ット命令では４０ビットの長さを使用することができる
ことになる。具体的には、“Ｏｐ１”、“Ｏｐ２”、
“Ｏｐ３”のフィールドでは、オペレーションの種類を
表すオペコードを、“Ｒｓ”のフィールドでは、ソース
オペランドとなるレジスタのレジスタ番号を、“Ｒｄ”
のフィールドでは、デスティネーションオペランドとな
るレジスタのレジスタ番号を指定する。また、“ｉｍｍ
５”及び“ｉｍｍ３２”のフィールドでは、それぞれ５
ビットと３２ビットの演算用定数オペランドを指定す
る。そして、“ｄｉｓｐ１３”及び“ｄｉｓｐ３１”の
フィールドでは、それぞれ１３ビットと３１ビットの変
位（ディスプレースメント）を指定する。Format information 11 from the instruction length of each instruction
The operation is specified in the part excluding the parallel execution boundary information 10. A 21-bit instruction can use a length of 19 bits, and a 42-bit instruction can use a length of 40 bits. Specifically, "Op1", "Op2",
In the “Op3” field, the operation code indicating the type of operation is set. In the “Rs” field, the register number of the register serving as the source operand is set to “Rd”.
Field specifies the register number of the register that is the destination operand. In addition, "imm
5 "and" imm32 "fields each have 5
Specifies the bit and 32-bit constant operands for the operation. In the "disp13" and "disp31" fields, 13-bit and 31-bit displacements are designated, respectively.

【００４６】３２ビットの定数などの長い定数を扱う転
送命令や演算命令、大きなディスプレースメントを指定
する分岐命令は４２ビット命令で定義され、それらを除
くほとんどの命令は２１ビット命令で定義されている。
なお、図１を見てわかるように、４２ビット命令の構成
要素である２つのユニットのうち、後ろの方のユニット
すなわち２番目のユニットには、長い定数やディスプレ
ースメントの一部のみが配置され、オペコードは配置さ
れない。Transfer instructions and arithmetic instructions that handle long constants such as 32-bit constants, and branch instructions that specify large displacements are defined as 42-bit instructions, and most of the other instructions are defined as 21-bit instructions. .
As can be seen from FIG. 1, only a part of a long constant or a displacement is arranged in the rear unit, that is, the second unit of the two units which are the constituent elements of the 42-bit instruction. , Opcode is not placed.

【００４７】次に、本プロセッサのアーキテクチャの概
要について説明する。Next, an outline of the architecture of this processor will be described.

【００４８】本プロセッサは、静的な並列スケジューリ
ングを前提としたプロセッサであって、命令の供給と発
行の概念は図２のようになる。This processor is a processor based on static parallel scheduling, and the concept of instruction supply and issue is as shown in FIG.

【００４９】命令の供給は、同図（ａ）に示すように毎
サイクル６４ビット固定長の命令供給単位（ここでは
「パケット」と呼ぶ。特許請求の範囲に記載する「複合
命令」に相当する。）でユニットを３個ずつ供給する。
ユニット３個分の長さは６３ビットであるが、残りの１
ビットについては使用しない。そして、命令の実行は、
同図（ｂ）に示すように１サイクルで並列実行の境界ま
でのユニット（ここでは「実行単位」と呼ぶ）を同時実
行する。つまり、各サイクルにおいて並列実行境界情報
１０が“１”である命令までの命令を並列実行すること
になる。供給されながら実行されずに残ったユニット
は、命令バッファに蓄積され、次のサイクル以降で実行
の対象となる。As shown in FIG. 10A, the instruction supply unit is an instruction supply unit having a fixed length of 64 bits per cycle (herein called "packet". It corresponds to "composite instruction" described in the claims. .) To supply 3 units each.
The length of 3 units is 63 bits, but the remaining 1
Not used for bits. And the execution of the instruction is
As shown in FIG. 6B, units (hereinafter referred to as “execution unit”) up to the boundary of parallel execution are simultaneously executed in one cycle. That is, in each cycle, the instructions up to the instruction whose parallel execution boundary information 10 is "1" are executed in parallel. Units that have been supplied but have not been executed remain in the instruction buffer, and are to be executed in the subsequent cycles.

【００５０】つまり、このアーキテクチャでは、固定長
のパケット単位で命令を供給しておき、静的に求めた情
報を元に、各サイクルにおいて並列度に応じた適切な数
のユニットを発行していく、ということになる。この手
法をとることにより、通常の固定長命令のＶＬＩＷ方式
で発生していた無動作命令（ｎｏｐ命令）が全く無くな
り、コードサイズを削減することができる。That is, in this architecture, instructions are supplied in units of fixed length packets, and an appropriate number of units are issued in each cycle according to the degree of parallelism based on the statically obtained information. ,It turns out that. By adopting this method, the non-operation instruction (nop instruction) generated in the VLIW method of the normal fixed length instruction is completely eliminated, and the code size can be reduced.

【００５１】また、命令内のフォーマット情報１１の値
によって、２つのユニットを１命令として実行する場合
と１つのユニットを１命令として実行する場合がある。
この手法をとることにより、命令の指定に多くのビット
数を必要とする一部の命令に対してのみ長い命令フォー
マットを使用し、他のほとんどの命令については短い命
令フォーマットで指定することができるので、さらにコ
ードサイズを削減することができる。具体例については
後に述べる。（プロセッサのハードウェア構成）次に、本プロセッサのハードウェア構成を説明する。Depending on the value of the format information 11 in the instruction, two units may be executed as one instruction and one unit may be executed as one instruction.
By using this method, the long instruction format can be used only for some instructions that require a large number of bits to specify instructions, and the short instruction format can be specified for most other instructions. Therefore, the code size can be further reduced. A specific example will be described later. (Hardware Configuration of Processor) Next, the hardware configuration of the present processor will be described.

【００５２】図３は、本発明に係るプロセッサのハード
ウェア構成を示すブロック図である。FIG. 3 is a block diagram showing the hardware configuration of the processor according to the present invention.

【００５３】本プロセッサは、１サイクルに最大３つの
命令を並列実行するプロセッサであり、大きく分けて、
命令供給発行部２０、解読部３０、実行部４０から構成
される。This processor is a processor which executes a maximum of three instructions in parallel in one cycle.
The instruction supply / issuing unit 20, the decoding unit 30, and the execution unit 40 are included.

【００５４】命令供給発行部２０は、図示されていない
外部メモリから命令群を供給し、解読部３０に出力する
ものであり、命令フェッチ部２１、命令バッファ２２及
び命令レジスタ２３からなる。The instruction supply / issuance section 20 supplies an instruction group from an external memory (not shown) and outputs it to the decoding section 30, and comprises an instruction fetch section 21, an instruction buffer 22 and an instruction register 23.

【００５５】命令フェッチ部２１は、３２ビットのＩＡ
（インストラクションアドレス）バス及び６４ビットの
ＩＤ（インストラクションデータ）バスを通じて図示さ
れていない外部メモリからユニットのブロックをフェッ
チし、内部の命令キャッシュに保持すると共に、ＰＣ部
４２から出力されたアドレスに相当するユニット群を命
令バッファ２２に供給する。The instruction fetch unit 21 uses a 32-bit IA
A block of units is fetched from an external memory (not shown) through an (instruction address) bus and a 64-bit ID (instruction data) bus, held in an internal instruction cache, and corresponds to an address output from the PC section 42. The unit group is supplied to the instruction buffer 22.

【００５６】命令バッファ２２は、６３ビットのバッフ
ァを２個備えており、命令フェッチ部２１によって供給
されたユニットを蓄積しておくために用いられる。命令
バッファ２２へは、命令フェッチ部２１から６４ビット
単位でパケットが供給される。ここで、パケットの最上
位の１ビットの情報は使用されない。命令バッファ２２
に蓄積されたユニットは、命令レジスタ２３の適切なレ
ジスタに出力される。なお、命令バッファ２２について
は、別の図面においてさらに詳細な構成を示している。The instruction buffer 22 is provided with two 63-bit buffers and is used for accumulating the units supplied by the instruction fetch unit 21. A packet is supplied from the instruction fetch unit 21 to the instruction buffer 22 in units of 64 bits. Here, the most significant 1-bit information of the packet is not used. Instruction buffer 22
The unit stored in 1 is output to an appropriate register of the instruction register 23. The instruction buffer 22 is shown in more detail in another drawing.

【００５７】命令レジスタ２３は、４個の２１ビットレ
ジスタからなり、命令バッファ２２から送られてきたユ
ニットを保持するためのものである。命令レジスタ２３
周辺については、別の図面においてさらに詳細な構成を
示している。The instruction register 23 is made up of four 21-bit registers and holds the unit sent from the instruction buffer 22. Instruction register 23
Regarding the periphery, a more detailed structure is shown in another drawing.

【００５８】解読部３０は、命令レジスタ２３に保持さ
れた命令を解読し、その解読結果に応じた制御信号を実
行部４０に出力するものであり、大きく分けて、命令発
行制御部３１と命令デコーダ３２からなる。The decoding section 30 decodes the instruction held in the instruction register 23 and outputs a control signal according to the decoding result to the execution section 40. The decoding section 30 is roughly divided into an instruction issue control section 31 and an instruction. It consists of a decoder 32.

【００５９】命令発行制御部３１は、命令レジスタ２３
の４個のレジスタに保持されたユニットに対して、ユニ
ット内の並列実行境界情報１０とフォーマット情報１１
を参照することによって、２つのユニットを１つの命令
として扱うように制御したり、並列実行の境界を越えた
ユニットについては、そのユニットの発行を無効化した
りといった発行に関する制御を行う。なお、命令発行制
御部３１については、別の図面においてさらに詳細な動
作説明を行う。The instruction issue control unit 31 includes an instruction register 23.
The parallel execution boundary information 10 and the format information 11 in the unit are stored in the four registers
By referring to, the control is performed such that the two units are treated as one instruction, and the issuance of a unit that crosses the boundary of parallel execution is invalidated. The operation of the instruction issue control unit 31 will be described in more detail in another drawing.

【００６０】命令デコーダ３２は、命令レジスタ２３に
格納された命令群を解読する装置であり、第１命令デコ
ーダ３３、第２命令デコーダ３４及び第３命令デコーダ
３５からなる。これらのデコーダは、基本的に１サイク
ルに１つの命令を解読し、実行部４０に制御信号を与え
る。また、命令内に置かれた定数オペランドについて
は、各命令デコーダから実行部４０のデータバス４８に
転送される。The instruction decoder 32 is a device for decoding the instruction group stored in the instruction register 23, and comprises a first instruction decoder 33, a second instruction decoder 34 and a third instruction decoder 35. These decoders basically decode one instruction in one cycle and give a control signal to the execution unit 40. In addition, the constant operand placed in the instruction is transferred from each instruction decoder to the data bus 48 of the execution unit 40.

【００６１】実行部４０は、解読部３０での解読結果に
基づいて、最大３つの命令を並列実行する回路ユニット
であり、実行制御部４１、ＰＣ部４２、レジスタファイ
ル４３、第１演算部４４、第２演算部４５、第３演算部
４６、オペランドアクセス部４７及びデータバス４８、
４９からなる。The execution unit 40 is a circuit unit that executes a maximum of three instructions in parallel based on the decoding result of the decoding unit 30, and includes an execution control unit 41, a PC unit 42, a register file 43, and a first operation unit 44. , The second operation unit 45, the third operation unit 46, the operand access unit 47 and the data bus 48,
It consists of 49.

【００６２】実行制御部４１は、解読部３０での解読結
果に基づいて実行部４０の各構成要素４２〜４９を制御
する制御回路や配線の総称であり、タイミング制御、動
作許可禁止制御、ステータス管理、割り込み制御等の回
路を有する。The execution control unit 41 is a general term for control circuits and wirings for controlling the respective constituent elements 42 to 49 of the execution unit 40 based on the result of the decoding by the decoding unit 30, and includes timing control, operation permission / prohibition control, and status. It has circuits for management and interrupt control.

【００６３】ＰＣ（プログラムカウンタ）部４２は、次
に解読実行すべき命令が置かれている図示されていない
外部メモリ上のアドレスを命令供給発行部２０の命令フ
ェッチ部２１に出力する。The PC (program counter) unit 42 outputs to the instruction fetch unit 21 of the instruction supply and issue unit 20, the address in the external memory (not shown) where the instruction to be decoded and executed next is placed.

【００６４】レジスタファイル４３は、Ｒ０〜Ｒ３１の
３２個の３２ビットレジスタから構成される。これらの
レジスタに格納された値は、第１命令デコーダ３３、第
２命令デコーダ３４及び第３命令デコーダ３５での解読
結果に基づいて、データバス４８を経由して第１演算部
４４、第２演算部４５及び第３演算部４６に転送され、
そこで演算が施され、又はそこを単に通過した後に、デ
ータバス４９を経由してレジスタファイル４３またはオ
ペランドアクセス部４７に送られる。The register file 43 is composed of 32 32-bit registers R0 to R31. The values stored in these registers are passed through the data bus 48 based on the decoding results of the first instruction decoder 33, the second instruction decoder 34, and the third instruction decoder 35, and the first operation unit 44, the second operation unit 44, and the second operation unit 44. Transferred to the calculation unit 45 and the third calculation unit 46,
Thereupon, an operation is performed, or after simply passing therethrough, the data is sent to the register file 43 or the operand access unit 47 via the data bus 49.

【００６５】第１演算部４４、第２演算部４５及び第３
演算部４６は、それぞれ２個の３２ビットデータに対し
て算術論理演算を行うＡＬＵや乗算器と、シフト演算を
行うバレルシフタを内部に有し、実行制御部４１による
制御の下で演算を実行する。The first arithmetic unit 44, the second arithmetic unit 45 and the third arithmetic unit
The operation unit 46 internally has an ALU and a multiplier that perform an arithmetic logic operation on two pieces of 32-bit data, and a barrel shifter that performs a shift operation, and executes the operation under the control of the execution control unit 41. .

【００６６】オペランドアクセス部４７は、レジスタフ
ァイル４３と図示されていない外部メモリとの間でオペ
ランドの転送を行う回路である。具体的には、例えば、
命令内で、オペコードとして“ｌｄ”（ロード）が置か
れていた場合には、外部メモリに置かれていた１ワード
（３２ビット）のデータがオペランドアクセス部４７を
経てレジスタファイル４３の指定されたレジスタにロー
ドされ、また、オペコードとして“ｓｔ”（ストア）が
置かれていた場合には、レジスタファイル４３の指定さ
れたレジスタの格納値が外部メモリにストアされる。上
記ＰＣ部４２、レジスタファイル４３、第１演算部４
４、第２演算部４５、第３演算部４６及びオペランドア
クセス部４７は、図示されるように、データバス４８
（Ｌ１バス、Ｒ１バス、Ｌ２バス、Ｒ２バス、Ｌ３バ
ス、Ｒ３バス）及びデータバス４９（Ｄ１バス、Ｄ２バ
ス、Ｄ３バス）で接続されている。なお、Ｌ１バス及び
Ｒ１バスはそれぞれ第１演算部４４の２つの入力ポート
に、Ｌ２バス及びＲ２バスはそれぞれ第２演算部４５の
２つの入力ポートに、Ｌ３バス及びＲ３バスはそれぞれ
第３演算部４６の２つの入力ポートに、Ｄ１バス、Ｄ２
バス及びＤ３バスはそれぞれ第１演算部４４、第２演算
部４５及び第３演算部４６の出力ポートに接続されてい
る。（命令バッファの詳細な構成）次に、命令バッファ２２の詳細な構成を説明する。The operand access unit 47 is a circuit that transfers operands between the register file 43 and an external memory (not shown). Specifically, for example,
When “ld” (load) is placed as the opcode in the instruction, the 1-word (32-bit) data placed in the external memory is designated in the register file 43 via the operand access unit 47. When it is loaded into the register and "st" (store) is placed as the operation code, the stored value of the designated register of the register file 43 is stored in the external memory. The PC unit 42, the register file 43, the first arithmetic unit 4
4, the second operation unit 45, the third operation unit 46, and the operand access unit 47 are connected to the data bus 48 as shown.
(L1 bus, R1 bus, L2 bus, R2 bus, L3 bus, R3 bus) and data bus 49 (D1 bus, D2 bus, D3 bus). The L1 bus and the R1 bus are respectively connected to the two input ports of the first arithmetic unit 44, the L2 bus and the R2 bus are respectively connected to the two input ports of the second arithmetic unit 45, and the L3 bus and the R3 bus are respectively the third arithmetic unit. D1 bus, D2 on the two input ports of section 46
The bus and the D3 bus are connected to the output ports of the first arithmetic unit 44, the second arithmetic unit 45, and the third arithmetic unit 46, respectively. (Detailed Configuration of Instruction Buffer) Next, a detailed configuration of the instruction buffer 22 will be described.

【００６７】図４は、命令バッファ２２の詳細な構成を
示すブロック図である。FIG. 4 is a block diagram showing a detailed structure of the instruction buffer 22.

【００６８】命令バッファ２２は命令バッファＡ２２１
及び命令バッファＢ２２２の２個の６３ビットのバッフ
ァからなり、それぞれ３個ずつのユニットを保持するこ
とができる。命令バッファＡ２２１はバッファＡ０、Ａ
１及びＡ２からなり、それぞれ１個ずつのユニットを保
持することができる。同様に、命令バッファＢはバッフ
ァＢ０、Ｂ１及びＢ２からなる。The instruction buffer 22 is the instruction buffer A 221.
And an instruction buffer B222, which are two 63-bit buffers, each capable of holding three units. The instruction buffer A221 is buffers A0 and A
1 and A2, each of which can hold one unit. Similarly, instruction buffer B consists of buffers B0, B1 and B2.

【００６９】命令バッファ２２には、命令フェッチ部２
１から６４ビット単位でパケットが供給される。ただ
し、パケットの最上位の１ビットの情報は使用されな
い。この際、命令バッファＡ２２１と命令バッファＢ２
２２にまたがって供給されることはなく、いずれかのバ
ッファに６３ビット単位で供給されることになる。命令
バッファ２２に蓄えられたユニットは供給された順序を
保っており、その順序やいずれのバッファが有効である
かについては命令バッファ制御部２２３により、状態と
して管理されている。The instruction buffer 22 includes an instruction fetch unit 2
Packets are supplied in units of 1 to 64 bits. However, the most significant 1-bit information of the packet is not used. At this time, the instruction buffer A221 and the instruction buffer B2
It is not supplied over 22 and is supplied to any of the buffers in 63-bit units. The units stored in the instruction buffer 22 maintain the order of supply, and the order and which buffer is valid are managed as a state by the instruction buffer control unit 223.

【００７０】命令バッファ制御部２２３は、毎サイクル
バッファ内の有効なユニットを順に命令レジスタ２３に
転送するため、セレクタ２２４ａ〜２２４ｄの制御を行
う。この制御により、命令バッファ２２内の先頭の４つ
のユニットが命令レジスタ２３に転送される。さらに、
命令レジスタ２３に転送したユニットの中でどれだけの
ユニットが発行されずに残ったか、という解読部３０の
命令発行制御部３１からの情報と、命令フェッチ部２１
から転送されてきたユニットの内いずれのユニットが有
効かという情報とを元に、命令バッファ２２の状態の更
新を行う。The instruction buffer control section 223 controls the selectors 224a to 224d in order to sequentially transfer the valid units in the buffer every cycle to the instruction register 23. By this control, the first four units in the instruction buffer 22 are transferred to the instruction register 23. further,
Information from the instruction issue control unit 31 of the decoding unit 30 indicating how many units among the units transferred to the instruction register 23 remain without being issued, and the instruction fetch unit 21.
The state of the instruction buffer 22 is updated based on the information indicating which unit is valid among the units transferred from.

【００７１】具体的には、まず命令バッファ２２が空の
状態で、あるパケットの２番目のユニットに分岐した場
合には、命令フェッチ部２１からそのパケットが供給さ
れ、供給されたパケットは命令バッファＡ２２１に転送
される。そのパケットの先頭のユニットは無効なので、
命令バッファ制御部２２３の制御により、命令バッファ
２２の状態としてバッファＡ１及びバッファＡ２のみが
有効な状態となる。Specifically, when the instruction buffer 22 is empty and a branch is made to the second unit of a packet, the instruction fetch unit 21 supplies the packet, and the supplied packet is the instruction buffer. It is transferred to A221. The first unit of the packet is invalid, so
Under the control of the instruction buffer control unit 223, as the state of the instruction buffer 22, only the buffer A1 and the buffer A2 are valid.

【００７２】次のサイクルで命令バッファ２２から命令
レジスタ２３に転送したユニットが全く発行されず、命
令フェッチ部２１から６４ビットの有効なパケットが供
給された場合には、そのパケットは命令バッファＢ２２
２に転送され、命令バッファ２２の状態は、バッファＡ
１、Ａ２、Ｂ０、Ｂ１及びＢ２が有効な状態となる。In the next cycle, when the unit transferred from the instruction buffer 22 to the instruction register 23 is not issued at all and the instruction fetch unit 21 supplies a valid packet of 64 bits, the packet is the instruction buffer B22.
2 and the state of the instruction buffer 22 is buffer A
1, A2, B0, B1 and B2 are valid.

【００７３】さらに、次のサイクルでは、命令バッファ
２２に空きがないので、命令フェッチ部２１からの供給
は受け付けず、命令レジスタ２３へは、順にバッファＡ
１、バッファＡ２、バッファＢ０、バッファＢ１のユニ
ットを転送する。Further, in the next cycle, since the instruction buffer 22 has no free space, the supply from the instruction fetch unit 21 is not accepted and the instruction register 23 sequentially receives the buffer A.
1, the units of buffer A2, buffer B0, and buffer B1 are transferred.

【００７４】このように、命令バッファ２２に６３ビッ
ト単位で空きがある場合にのみ命令フェッチ部２１から
パケットの供給を行い、供給された順を管理しておき、
各サイクルにおいて、供給された順に先頭の４つのユニ
ットを命令レジスタ２３に転送していく。（命令レジスタ２３周辺の構成と命令発行制御部３１の
動作）次に、命令レジスタ２３周辺の構成を示し、命令発行制
御部３１の詳細な動作を説明する。As described above, packets are supplied from the instruction fetch unit 21 only when the instruction buffer 22 has a free space in units of 63 bits, and the order of supply is managed,
In each cycle, the first four units in the order supplied are transferred to the instruction register 23. (Structure around instruction register 23 and operation of instruction issue control unit 31) Next, the structure around the instruction register 23 is shown and the detailed operation of the instruction issue control unit 31 will be described.

【００７５】図５は、命令レジスタ２３周辺の構成を示
すブロック図である。図中、破線の矢印は制御信号を表
す。FIG. 5 is a block diagram showing the configuration around the instruction register 23. In the figure, dashed arrows represent control signals.

【００７６】命令レジスタ２３は命令レジスタＡ２３
１、命令レジスタＢ２３２、命令レジスタＣ２３３及び
命令レジスタＤ２３４の４個の２１ビットレジスタから
なる。命令レジスタ２３には、命令バッファ２２からユ
ニットが供給されるわけだが、わかりやすくするために
命令バッファ２２から供給されるユニットの並びである
ユニットキュー５０という概念を考える。そして、ここ
では命令レジスタ２３にはユニットキュー５０からユニ
ットが供給されると考える。The instruction register 23 is the instruction register A23.
1. It consists of four 21-bit registers: instruction register B232, instruction register C233, and instruction register D234. Units are supplied to the instruction register 23 from the instruction buffer 22, but for the sake of clarity, let us consider the concept of a unit queue 50, which is an array of units supplied from the instruction buffer 22. Then, here, it is considered that the unit is supplied from the unit queue 50 to the instruction register 23.

【００７７】図５を見るとわかるように、あるユニット
がいずれの命令レジスタ２３に転送されるかどうかは、
ユニットキュー５０での位置（順序）によって一意に決
まる。つまり、ユニット１は命令レジスタＡ２３１へ、
ユニット２は命令レジスタＢ２３２へ転送されることに
なる。これにより、ユニットキュー５０から命令レジス
タ２３への転送を行う際に、図１３の従来例において存
在したようなユニットの選択を行うセレクタが不要とな
り、ハードウェアが単純化されており、遅延も最小限に
抑えられている。As can be seen from FIG. 5, which instruction register 23 a unit is transferred to is determined by
It is uniquely determined by the position (order) in the unit queue 50. That is, the unit 1 goes to the instruction register A231,
Unit 2 will be transferred to instruction register B 232. As a result, when transferring from the unit queue 50 to the instruction register 23, a selector for selecting a unit, which is present in the conventional example of FIG. 13, becomes unnecessary, the hardware is simplified, and the delay is minimized. It is limited to the limit.

【００７８】図中３３〜３５の各命令デコーダは、２１
ビットのユニットを入力とし、それを解読して、そのユ
ニットが構成する命令の動作に関する制御信号を実行制
御部４１に出力するとともに、ユニット内に配置された
定数オペランドを出力する。図１の命令フォーマットか
らわかるように、４２ビット命令を構成する２つのユニ
ットのうち、２番目のユニットには定数オペランドの一
部しか配置されない。つまり、このユニットにはオペコ
ードが存在しないため、命令デコーダに入力する必要が
ない。そこで、各命令の定数オペランドは、図５に示さ
れるように、命令デコーダが出力したユニット内の定数
と、命令レジスタから無条件に直接転送された定数とを
連結したものということになる。図５の６０〜６２が各
命令の定数オペランドである。Each of the instruction decoders 33 to 35 in the figure has 21
It takes a unit of bits as an input, decodes it, outputs a control signal related to the operation of an instruction formed by the unit to the execution control unit 41, and outputs a constant operand arranged in the unit. As can be seen from the instruction format in FIG. 1, only a part of the constant operand is arranged in the second unit of the two units forming the 42-bit instruction. In other words, since there is no opcode in this unit, there is no need to input it to the instruction decoder. Therefore, as shown in FIG. 5, the constant operand of each instruction is a combination of the constant in the unit output by the instruction decoder and the constant unconditionally directly transferred from the instruction register. Reference numerals 60 to 62 in FIG. 5 are constant operands of each instruction.

【００７９】また、各命令デコーダには、制御信号とし
て１ビットの無動作命令フラグが入力される。このフラ
グを“１”にセットすると、そのデコーダは出力として
無動作命令を出力する。つまり、無動作命令フラグをセ
ットすることにより、その命令デコーダの命令としての
デコードを無効化することができる。A 1-bit non-operation instruction flag is input as a control signal to each instruction decoder. When this flag is set to "1", the decoder outputs a non-operation instruction as an output. That is, by setting the non-operation instruction flag, decoding as an instruction of the instruction decoder can be invalidated.

【００８０】ここで、命令レジスタ２３に格納されたユ
ニットを組み合わせて命令として発行する制御を行う命
令発行制御部３１の動作について説明する。The operation of the instruction issuance control unit 31 for controlling the combination of the units stored in the instruction register 23 to issue an instruction will be described below.

【００８１】命令発行制御部３１は、命令レジスタＡ２
３１及び命令レジスタＢ２３２に格納された各ユニット
の並列実行境界情報１０とフォーマット情報１１を参照
することにより命令デコーダの制御を行う。The instruction issue control unit 31 uses the instruction register A2.
31 and the parallel execution boundary information 10 and the format information 11 of each unit stored in the instruction register B232, the instruction decoder is controlled.

【００８２】まず、これらの情報から、命令レジスタ２
３に格納されたユニットの内どこまでをこのサイクルで
発行するのかを求める。そして、どれだけのユニットが
発行されずに残ったのかの情報を命令バッファ２２内の
命令バッファ制御部２２３に伝達する。First, from these information, the instruction register 2
How much of the unit stored in 3 is to be issued in this cycle is calculated. Then, information about how many units remain without being issued is transmitted to the instruction buffer control unit 223 in the instruction buffer 22.

【００８３】次に命令デコーダ３２を制御し、このサイ
クルで発行される命令についてのみ解読を行うように制
御する。図５からわかるように、命令としてデコードさ
れる可能性のあるユニットは、命令レジスタＡ２３１、
命令レジスタＢ２３２及び命令レジスタＣ２３３に格納
されたユニットのみである。そこで、ユニット内の情報
を参照して、これらのユニットの中で、４２ビット命令
の２ユニット目にあたるものや発行されずに残るものに
関しては、そのユニットの命令としてのデコードを無効
化する。４２ビット命令の２ユニット目にあたるユニッ
トは、直前のユニットが構成する命令の定数オペランド
の一部として直接出力される。Next, the instruction decoder 32 is controlled so that only the instruction issued in this cycle is decoded. As can be seen from FIG. 5, the unit that may be decoded as an instruction is the instruction register A231,
Only the units stored in the instruction register B232 and the instruction register C233. Therefore, by referring to the information in the unit, with respect to the second unit of the 42-bit instruction and the one which remains without being issued among these units, the decoding as the instruction of the unit is invalidated. The unit corresponding to the second unit of the 42-bit instruction is directly output as a part of the constant operand of the instruction formed by the immediately preceding unit.

【００８４】具体的には、命令レジスタＡ２３１のユニ
ット（ユニット１）のフォーマット情報１１が“１”の
ときには、ユニット１と命令レジスタＢ２３２のユニッ
ト（ユニット２）とを連結して４２ビット命令となるの
で、ユニット２の命令としてのデコードを無効化する、
すなわち第２命令デコーダ３４の無動作命令フラグを
“１”にセットする。図５において、命令発行制御部３
１から第２命令デコーダ３４への破線がこの動作に相当
する。ユニット２は、ユニット１が構成する命令の定数
オペランド６０の一部として直接出力される。Specifically, when the format information 11 of the unit (unit 1) of the instruction register A231 is "1", the unit 1 and the unit (unit 2) of the instruction register B232 are connected to form a 42-bit instruction. Therefore, the decoding as the instruction of unit 2 is invalidated,
That is, the non-operation instruction flag of the second instruction decoder 34 is set to "1". In FIG. 5, the instruction issue control unit 3
The broken line from 1 to the second instruction decoder 34 corresponds to this operation. The unit 2 is directly output as a part of the constant operand 60 of the instruction formed by the unit 1.

【００８５】また、ユニット１のフォーマット情報１１
が“０”、ユニット２のフォーマット情報が“１”の時
は、ユニット２と命令レジスタＣ２３３のユニット（ユ
ニット３）とを連結して４２ビット命令となるので、ユ
ニット３の命令としてのデコードをキャンセルする、す
なわち第３命令デコーダ３５の無動作命令フラグを
“１”にセットする。図５において、命令発行制御部３
１から第３命令デコーダ３５への破線がこの動作に相当
する。ユニット３は、ユニット２が構成する命令の定数
オペランド６１の一部として直接出力される。Further, the format information 11 of the unit 1
Is "0" and the format information of the unit 2 is "1", the unit 2 and the unit (unit 3) of the instruction register C233 are connected to form a 42-bit instruction. Cancel, that is, the non-operation instruction flag of the third instruction decoder 35 is set to "1". In FIG. 5, the instruction issue control unit 3
The broken line from 1 to the third instruction decoder 35 corresponds to this operation. The unit 3 is directly output as a part of the constant operand 61 of the instruction formed by the unit 2.

【００８６】このように、フォーマット情報１１を参照
することにより、必要に応じて命令デコーダの無動作フ
ラグを設定し、命令としてのデコードを無効化する。As described above, by referring to the format information 11, the non-operation flag of the instruction decoder is set as necessary, and the decoding as the instruction is invalidated.

【００８７】それから、ユニット１の並列実行境界情報
１０が“１”、フォーマット情報１１が“０”のとき
は、このサイクルではユニット１までしか発行されない
ので、ユニット２とユニット３の命令としてのデコード
を無効化する、すなわち第２命令デコーダ３４と第３命
令デコーダ３５の無動作命令フラグを共に“１”にセッ
トする。図５において、命令発行制御部３１から第２命
令デコーダ３４と第３命令デコーダ３５への破線がこの
動作に相当する。Then, when the parallel execution boundary information 10 of the unit 1 is "1" and the format information 11 is "0", only the unit 1 is issued in this cycle, so that the decoding as the instruction of the units 2 and 3 is performed. Is invalidated, that is, the non-operation instruction flags of the second instruction decoder 34 and the third instruction decoder 35 are both set to "1". In FIG. 5, broken lines from the instruction issue control unit 31 to the second instruction decoder 34 and the third instruction decoder 35 correspond to this operation.

【００８８】また、ユニット１の並列実行境界情報１０
が“０”、ユニット２の並列実行境界情報１０が
“１”、フォーマット情報１１が共に“０”のときは、
このサイクルではユニット２までしか発行されないの
で、ユニット３の命令としてのデコードを無効化する、
すなわち第３命令デコーダ３５の無動作命令フラグを共
に“１”にセットする。図５において、命令発行制御部
３１から第３命令デコーダ３５への破線がこの動作に相
当する。Further, the parallel execution boundary information 10 of the unit 1
Is “0”, the parallel execution boundary information 10 of the unit 2 is “1”, and the format information 11 is both “0”,
Since only unit 2 is issued in this cycle, decoding as an instruction of unit 3 is invalidated.
That is, the non-operation instruction flags of the third instruction decoder 35 are both set to "1". In FIG. 5, the broken line from the instruction issue control unit 31 to the third instruction decoder 35 corresponds to this operation.

【００８９】このように、並列実行境界情報１０を参照
することにより、必要に応じて命令デコーダの無動作フ
ラグを設定し、命令としてのデコードを無効化する。As described above, by referring to the parallel execution boundary information 10, the non-operation flag of the instruction decoder is set as necessary, and the decoding as the instruction is invalidated.

【００９０】以上のような命令発行制御を実現する命令
発行制御部３１とその周辺回路の構成を図６に示す。FIG. 6 shows the configuration of the instruction issue control unit 31 and its peripheral circuits that realize the above-described instruction issue control.

【００９１】前述のように命令発行制御部３１は命令レ
ジスタＡ２３１及び命令レジスタＢ２３２に格納された
ユニットの並列実行境界情報１０とフォーマット情報１
１を参照し、第２命令デコーダ３４及び第３命令デコー
ダ３５の命令としてのデコードを無効化するかどうかを
決定する無動作命令フラグとなる制御信号を出力する。As described above, the instruction issue control unit 31 causes the unit parallel execution boundary information 10 and the format information 1 stored in the instruction register A 231 and the instruction register B 232.
1 is output, and a control signal serving as a non-operation instruction flag for determining whether to invalidate the decoding as the instruction of the second instruction decoder 34 and the third instruction decoder 35 is output.

【００９２】図６のような回路構成をとることにより、
第２命令デコーダ３４は、命令レジスタＡ２３１に格納
されたユニットの並列実行境界情報１０が“１”である
か、またはそのユニットのフォーマット情報１１が
“１”であるときに無効化される。また、第３命令デコ
ーダ３５は、命令レジスタＡ２３１に格納されたユニッ
トもしくは命令レジスタＢ２３２に格納されたユニット
の並列実行境界情報１０が“１”であるか、または命令
レジスタＢ２３２に格納されたユニットのフォーマット
情報１１が“１”であるときに無効化される。By taking the circuit configuration as shown in FIG.
The second instruction decoder 34 is invalidated when the parallel execution boundary information 10 of the unit stored in the instruction register A 231 is “1” or the format information 11 of the unit is “1”. Also, the third instruction decoder 35 determines whether the parallel execution boundary information 10 of the unit stored in the instruction register A 231 or the unit stored in the instruction register B 232 is “1”, or the unit stored in the instruction register B 232. Invalidated when the format information 11 is "1".

【００９３】このように、図１に示したような命令フォ
ーマットをとり、図６に示したような単純な回路を用意
するだけで、必要最低限の情報を参照して高速な命令発
行制御を行うことができる。As described above, the instruction format as shown in FIG. 1 is adopted, and the simple circuit as shown in FIG. 6 is prepared. It can be carried out.

【００９４】以上で述べたような命令発行制御の方法を
とることにより、１サイクルで同時発行可能な命令の命
令長の組み合わせに多少の制限が生じる。本プロセッサ
で同時発行可能な命令の命令長の組み合わせを図７に示
す。By adopting the instruction issue control method as described above, some restrictions are imposed on the combination of instruction lengths of instructions that can be issued simultaneously in one cycle. FIG. 7 shows a combination of instruction lengths of instructions that can be simultaneously issued by this processor.

【００９５】図７を見るとわかるように、本プロセッサ
では、ユニットの並びの先頭から３つ目までのユニット
についてのみ命令としてデコードすることができる。つ
まり、図中（ａ）〜（ｈ）のパターンについて発行する
ことができる。最大で４つのユニットを同時に発行でき
ることになる。ただし、４つのユニットを発行するパタ
ーンの内、図中（ｉ）、（ｊ）のパターンについては同
時発行することができない。（従来の命令発行制御方法との比較）ここで、本実施形態のプロセッサと本発明によらない従
来のプロセッサとの比較を行う。As can be seen from FIG. 7, in this processor, only the third unit from the beginning of the unit sequence can be decoded as an instruction. That is, the patterns (a) to (h) in the figure can be issued. Up to 4 units can be issued at the same time. However, among the patterns for issuing four units, the patterns (i) and (j) in the figure cannot be issued simultaneously. (Comparison with Conventional Instruction Issue Control Method) Here, a comparison is made between the processor of this embodiment and a conventional processor not according to the present invention.

【００９６】まず、図１３に示した従来例において、ユ
ニットキュー５０から命令レジスタへの転送において、
セレクタによる遅延が問題となっていたが、本発明のプ
ロセッサでは、ユニットキュー５０内のユニットと各命
令レジスタが一対一に対応しているため、図１３におい
て存在していたセレクタ５１ａ〜５１ｄが不要となり、
上記遅延の問題が解決されている。First, in the conventional example shown in FIG. 13, in the transfer from the unit queue 50 to the instruction register,
Although the delay due to the selector has been a problem, in the processor of the present invention, the units in the unit queue 50 and the respective instruction registers correspond to each other in a one-to-one manner, so that the selectors 51a to 51d that are present in FIG. 13 are unnecessary. Next to
The above delay problem has been resolved.

【００９７】また、図１３の構成では、並列度が増して
転送すべき命令レジスタが増加していくに従って、セレ
クタが増加し遅延がさらに大きくなっていくのに対し
て、本発明のプロセッサでは、ユニットキュー５０と命
令レジスタの対応は一対一なので、遅延が大きくなるこ
とはない。In the configuration of FIG. 13, as the parallelism increases and the number of instruction registers to be transferred increases, the number of selectors increases and the delay further increases. Since the correspondence between the unit queue 50 and the instruction register is one-to-one, the delay does not increase.

【００９８】一方、この可変長命令方式をスーパースカ
ラ方式にて並列実行を行うプロセッサに適用したものも
提案されている。例えば、論文 The Approach to Multi
pleInstruction Execution in the GMICRO/400 Process
or （PROCEEDINGS, The Eighth TRON Project Symposiu
m(International), 1991参照）にて開示されているＧＭ
ＩＣＲＯ／４００がある。この技術は、図１４の概念を
とりながらもハードウェアを削減するために制限を設け
ている。On the other hand, it is also proposed to apply this variable length instruction method to a processor which executes parallel execution in a superscalar method. For example, the paper The Approach to Multi
pleInstruction Execution in the GMICRO / 400 Process
or (PROCEEDINGS, The Eighth TRON Project Symposiu
GM disclosed in m (International), 1991)
There is ICRO / 400. This technique has a limit in order to reduce the hardware while taking the concept of FIG.

【００９９】図１５は、ＧＭＩＣＲＯ／４００で採用さ
れている命令発行制御方法をとった場合の命令レジスタ
周辺の構成を示すブロック図である。図１５において、
破線は制御信号を表し、５４ａ及び５４ｂは命令内に指
定された定数オペランドを表す。命令デコーダは、入力
された命令を解読し、その結果その命令の実行を制御す
る信号を実行制御部に出力すると共に、命令内に指定さ
れた定数オペランドを出力する。FIG. 15 is a block diagram showing the configuration around the instruction register when the instruction issue control method adopted in GMICRO / 400 is adopted. In FIG.
The dashed lines represent control signals and 54a and 54b represent constant operands specified within the instruction. The instruction decoder decodes the input instruction, outputs a signal that controls the execution of the instruction to the execution control unit as a result, and also outputs a constant operand specified in the instruction.

【０１００】ＧＭＩＣＲＯ／４００の命令発行制御方法
では、ユニット１とユニット２を連結したもの、ユニッ
ト２及びユニット３をそれぞれ一旦デコードしておき、
第１命令デコーダ５３ｉの解読によって１番目の命令が
１ユニット長の命令なのか２ユニット長の命令なのかが
判明した時点で、セレクタ５１ｇおよびセレクタ５１ｈ
を制御することにより、第２命令デコーダ５３ｊもしく
は第３命令デコーダ５３ｋの解読結果を選択して使用す
る。In the GMICRO / 400 instruction issue control method, the unit 1 and the unit 2 are connected, the unit 2 and the unit 3 are once decoded,
When it is determined by the decoding of the first instruction decoder 53i whether the first instruction is a one-unit length instruction or a two-unit length instruction, the selector 51g and the selector 51h.
By controlling, the decoding result of the second instruction decoder 53j or the third instruction decoder 53k is selected and used.

【０１０１】図１５を見るとわかるように、ＧＭＩＣＲ
Ｏ／４００では、図１４の構成に対して、同時実行可能
な命令数を３から２に減らすことにより、第４命令デコ
ーダ５３ｇと第５命令デコーダ５３ｈを削除している。
また、第２命令デコーダ５３ｊと第３命令デコーダｋ
は、入力ビット幅を１ユニット長とし、ハードウェア削
減を図っている。しかし、これによって、同時実行され
る２番目の命令は１ユニット長の命令のみという制限が
発生する。As can be seen from FIG. 15, GMICR
In the O / 400, the fourth instruction decoder 53g and the fifth instruction decoder 53h are deleted by reducing the number of concurrently executable instructions from 3 to 2 in the configuration of FIG.
Also, the second instruction decoder 53j and the third instruction decoder k
Reduces the hardware by setting the input bit width to 1 unit. However, this causes a limitation that the second instruction to be simultaneously executed is only one unit length instruction.

【０１０２】以上のようなハードウェア削減を図って
も、２命令同時発行を可能にするために３つの命令デコ
ーダを必要としており、依然としてハードウェア量が多
いという問題点がある。Even if the hardware is reduced as described above, three instruction decoders are required to enable simultaneous issuance of two instructions, and there is a problem that the amount of hardware is still large.

【０１０３】また、図１５の構成では、第１命令デコー
ダ５３ｉの解読が完了するまでセレクタ５１ｇ、５１ｈ
の制御を決定することができない。このセレクタの制御
が決定するまで、２番目の命令として第２命令デコーダ
５３ｊと第３命令デコーダ５３ｋのいずれの解読結果を
用いるかを決定できず、オペランドとなるレジスタの格
納値の読み出しを開始できない。オペランドとなる可能
性のある全てのレジスタの格納値を先行的に読み出して
おき、それを選択して使用する方法も考えられるが、レ
ジスタファイルの読み出しポート数が増加するため実用
的ではない。このように、図１５の構成では読み出すレ
ジスタを確定するまでの遅延が大きくなる。実際、ＧＭ
ＩＣＲＯ／４００では命令解読ステージの直後のステー
ジでは演算の実行は行わず、オペランドを読み出すステ
ージとしている。In the configuration of FIG. 15, the selectors 51g and 51h are used until the decoding of the first instruction decoder 53i is completed.
Can not determine the control of. Until the control of this selector is determined, it is not possible to determine which decoding result of the second instruction decoder 53j or the third instruction decoder 53k is used as the second instruction, and the reading of the stored value of the register that is the operand cannot be started. . A method may be considered in which the stored values of all registers that may be operands are read out in advance and selected and used, but this is not practical because the number of read ports of the register file increases. As described above, in the configuration of FIG. 15, the delay until the register to be read is determined becomes large. In fact, GM
In the ICRO / 400, the stage immediately after the instruction decoding stage does not execute the operation, and the operand is read out.

【０１０４】さらに、図１５の構成にて並列度が増して
同時発行可能なユニット数が増加していくと、セレクタ
の数が増し、制御が複雑化するという問題点がある。Further, in the configuration of FIG. 15, as the parallelism increases and the number of units that can be simultaneously issued increases, the number of selectors increases and the control becomes complicated.

【０１０５】以上に述べたように、スタティックスケジ
ューリングによってさらなる並列化を実現し、性能向上
を図ることができるが、コードサイズが大きくなるとい
う問題点がある。また、コードサイズを削減する手段と
して可変長命令方式があるが、ハードウェアが複雑にな
るという問題点がある。As described above, the parallel scheduling can be realized and the performance can be improved by the static scheduling, but there is a problem that the code size becomes large. Further, there is a variable length instruction method as a means for reducing the code size, but there is a problem that the hardware becomes complicated.

【０１０６】そして、図１５に示したＧＭＩＣＲＯ／４
００の例においては、命令デコーダを３つ用意しても最
大２命令しが同時実行できないのに対して、本発明のプ
ロセッサの命令発行制御方法を用いると、３つの命令デ
コーダにて最大３命令を同時実行することができる。逆
に、本発明において、最大２命令を同時発行する構成を
想定した場合、２個の命令デコーダにて構成することが
できる。具体的な命令レジスタ周辺の構成は図１６のよ
うになる。これにより、ハードウェアを削減することが
できる。Then, the GMICRO / 4 shown in FIG.
In the example of 00, even if three instruction decoders are prepared, a maximum of two instructions cannot be executed at the same time, whereas if the instruction issue control method of the processor of the present invention is used, a maximum of three instructions can be executed by three instruction decoders. Can be executed simultaneously. On the contrary, in the present invention, when it is assumed that a maximum of two instructions are simultaneously issued, two instruction decoders can be used. A specific configuration around the instruction register is as shown in FIG. This can reduce hardware.

【０１０７】また、図１５の構成では、第１命令デコー
ダ５３ｉの解読が完了するまでセレクタ５１ｇ、５１ｈ
の制御を決定することができず、２番目の命令として第
２命令デコーダ５３ｊと第３命令デコーダ５３ｋのいず
れの解読結果を用いるかを決定できない。そのため、オ
ペランドとなるレジスタを確定するまでの遅延が大きく
なる。これに対して、本発明のプロセッサの構成では、
他のデコーダの解読結果を待たずに、オペランドとなる
レジスタを確定することができるため、解読ステージの
前半にオペランドとなるレジスタの読み出しを開始する
ことができる。その結果、解読ステージの完了時点で、
オペランドとなるレジスタの読み出しも完了させておく
ことができる。これによって、解読ステージの直後のス
テージで演算を実行することができ、実行効率を高める
ことができる。In the configuration of FIG. 15, the selectors 51g and 51h are used until the decoding of the first instruction decoder 53i is completed.
Control cannot be determined, and which decoding result of the second instruction decoder 53j or the third instruction decoder 53k is to be used as the second instruction cannot be determined. Therefore, the delay until the register that becomes the operand is determined becomes large. On the other hand, in the configuration of the processor of the present invention,
Since the register that becomes the operand can be determined without waiting for the decoding result of another decoder, the reading of the register that becomes the operand can be started in the first half of the decoding stage. As a result, at the completion of the decryption stage,
The reading of the register that is the operand can also be completed. As a result, the operation can be executed in the stage immediately after the decoding stage, and the execution efficiency can be improved.

【０１０８】さらに、本発明の構成では、同時発行可能
なユニット数が増加しても単純にデコーダの数を増して
いけばよいのに対して、図１５の構成では、同時発行可
能なユニット数が増加していくと、セレクタの数が増し
て制御が複雑化するという問題点がある。Further, in the configuration of the present invention, even if the number of units that can be simultaneously issued increases, it is sufficient to simply increase the number of decoders, whereas in the configuration of FIG. 15, the number of units that can be simultaneously issued. However, there is a problem in that the number of selectors increases and the control becomes complicated as the number increases.

【０１０９】それから、命令フォーマットの違いによる
差異として次のものがある。本発明では図１のように、
２ユニット長の命令の２番目のユニットには定数オペラ
ンドの一部のみが配置されるため、図５のように２番目
のユニットは命令デコーダには入力されずに直接オペラ
ンドとして出力される。このため、すべての命令デコー
ダは１ユニット長の命令を解読するだけでよい。これに
対して、ＧＭＩＣＲＯ／４００では、２ユニット長の命
令の２番目のユニットにもオペコードが配置されるた
め、図１５の構成で第１命令デコーダ５３ｉは２ユニッ
ト長の命令を解読する必要があり、本発明の構成に比べ
てハードウェアが増加している。（プロセッサの動作）次に、具体的な命令を解読実行した場合の本実施形態の
プロセッサの動作について説明する。Then, there are the following differences due to the difference in instruction format. In the present invention, as shown in FIG.
Since only a part of the constant operand is arranged in the second unit of the 2-unit length instruction, the second unit is directly output as an operand without being input to the instruction decoder as shown in FIG. Therefore, all instruction decoders need only decode instructions that are one unit long. On the other hand, in the GMICRO / 400, since the operation code is also arranged in the second unit of the instruction of 2 unit length, the first instruction decoder 53i needs to decode the instruction of 2 unit length in the configuration of FIG. Yes, the hardware is increased compared to the configuration of the present invention. (Operation of Processor) Next, the operation of the processor of the present embodiment when a specific instruction is decoded and executed will be described.

【０１１０】図８は、３２ビットの定数を扱う処理の一
例を示すフローチャートである。FIG. 8 is a flow chart showing an example of processing for handling a 32-bit constant.

【０１１１】本図に示されている処理は、３２ビットの
定数“０ｘ８７６５４３２１”をレジスタＲ１に転送し
（ステップＳ１００）、レジスタＲ５の格納値をレジス
タＲ０に転送し（ステップＳ１０１）、レジスタＲ０の
格納値にレジスタＲ１の格納値を加え（ステップＳ１０
２）、レジスタＲ３の格納値にレジスタＲ２の格納値を
加え（ステップＳ１０３）、レジスタＲ０の格納値をメ
モリ内のレジスタＲ４の格納値で示されるアドレスに格
納し（ステップＳ１０４）、レジスタＲ０の格納値をレ
ジスタＲ６に転送し（ステップＳ１０５）、最後にレジ
スタＲ３の格納値をレジスタＲ７に転送する（ステップ
Ｓ１０６）というものである。In the processing shown in the figure, the 32-bit constant "0x87665421" is transferred to the register R1 (step S100), the value stored in the register R5 is transferred to the register R0 (step S101), and the register R0 is stored. The value stored in the register R1 is added to the stored value (step S10).
2) The value stored in the register R2 is added to the value stored in the register R3 (step S103), and the value stored in the register R0 is stored in the memory at the address indicated by the value stored in the register R4 (step S104). The stored value is transferred to the register R6 (step S105), and finally the stored value of the register R3 is transferred to the register R7 (step S106).

【０１１２】図９は、図８に示された処理を本プロセッ
サに行わせるプログラムの実行コードの例と実行イメー
ジを示す図である。FIG. 9 is a diagram showing an example of an execution code and an execution image of a program which causes this processor to perform the processing shown in FIG.

【０１１３】このプログラムは、７個の命令で構成され
ており、命令供給単位としては３個のパケット７０〜７
２から構成されている。各命令の処理内容は、実行コー
ドの各フィールドに置かれたニーモニックで表現されて
いる。具体的には、ニーモニック“ｍｏｖ”は、定数及
びレジスタの格納値のレジスタへの転送を表し、ニーモ
ニック“ａｄｄ”は、定数及びレジスタの格納値とレジ
スタの格納値との加算を表し、ニーモニック“ｓｔ”
は、レジスタの格納値のメモリへの転送を表している。This program is composed of seven instructions, and three packets 70 to 7 are provided as an instruction supply unit.
It consists of two. The processing content of each instruction is represented by a mnemonic placed in each field of the execution code. Specifically, the mnemonic “mov” represents the transfer of the constant and the stored value of the register to the register, the mnemonic “add” represents the addition of the constant and the stored value of the register and the stored value of the register, and the mnemonic “add”. st ”
Represents the transfer of the value stored in the register to the memory.

【０１１４】なお、定数は１６進数で表現されている。
また、“Ｒｎ（ｎ＝０〜３１）”はレジスタファイル４
３の中の一つのレジスタを示す。そして、各命令の並列
実行境界情報１０とフォーマット情報１１についても
“０”又は“１”で示してある。The constants are represented by hexadecimal numbers.
In addition, “Rn (n = 0 to 31)” is the register file 4
3 shows one of the registers. The parallel execution boundary information 10 and the format information 11 of each instruction are also indicated by "0" or "1".

【０１１５】図９（ｂ）を用いて、図８に示された処理
における各実行単位ごとの本プロセッサの動作を説明す
る。（実行単位１）パケット７０がメモリから供給され、パケット７０内の
ユニットが順に命令レジスタ２３に転送される。次に、
命令発行制御部３１が各ユニットの並列実行境界情報１
０とフォーマット情報１１を参照して発行を制御する。
具体的には、１番目のユニットのフォーマット情報１１
が“１”であるので、１番目のユニットと２番目のユニ
ットを連結して１つの命令として扱う。つまり、第２命
令デコーダ３４の無動作命令フラグを“１”にセットし
て、命令としてのデコードを無効化する。また、１番目
のユニットの並列実行境界情報１０が“０”であり、３
番目のユニットの並列実行境界情報１０が“１”である
ので、３番目のユニットまでの２個の命令を発行する。
供給されたすべてのユニットを発行するため、命令バッ
ファ２２にはユニットは蓄積されない。The operation of this processor for each execution unit in the processing shown in FIG. 8 will be described with reference to FIG. 9B. (Execution Unit 1) The packet 70 is supplied from the memory, and the units in the packet 70 are sequentially transferred to the instruction register 23. next,
The instruction issue control unit 31 sets the parallel execution boundary information 1 of each unit.
The issuing is controlled by referring to 0 and the format information 11.
Specifically, the format information 11 of the first unit
Is "1", the first unit and the second unit are connected and treated as one instruction. That is, the non-operation instruction flag of the second instruction decoder 34 is set to "1" to invalidate the decoding as an instruction. Further, the parallel execution boundary information 10 of the first unit is “0”, and 3
Since the parallel execution boundary information 10 of the second unit is "1", two instructions up to the third unit are issued.
Since all supplied units are issued, no units are stored in the instruction buffer 22.

【０１１６】実行部４０では、レジスタＲ１に定数“０
ｘ８７６５４３２１”が転送され、レジスタＲ５の格納
値がレジスタＲ０に転送される。（実行単位２）パケット７１がメモリから供給され、パケット７１内の
ユニットが順に命令レジスタ２３に転送される。３個の
ユニット共フォーマット情報１１が“０”であるので、
いずれのユニットも２１ビット命令となる。また、１番
目のユニットの並列実行境界情報１０が“０”であり、
２番目のユニットの並列実行境界情報１０が“１”であ
るので、２番目のユニットまでの２個の命令を発行す
る。３番目のユニットは、発行されずに残ったので命令
バッファ２２に蓄積される。In the execution section 40, the constant "0" is stored in the register R1.
x87654321 ″ is transferred, and the stored value of the register R5 is transferred to the register R0. (Execution unit 2) The packet 71 is supplied from the memory, and the units in the packet 71 are transferred to the instruction register 23 in order. Since the unit common format information 11 is "0",
Both units are 21-bit instructions. Also, the parallel execution boundary information 10 of the first unit is “0”,
Since the parallel execution boundary information 10 of the second unit is "1", two instructions up to the second unit are issued. The third unit remains unissued and is thus stored in the instruction buffer 22.

【０１１７】実行部４０では、レジスタＲ０の格納値に
レジスタＲ１の格納値が加えられてレジスタＲ０に格納
され、レジスタＲ３の格納値にレジスタＲ２の格納値が
加えられてレジスタＲ３に格納される。（実行単位３）パケット７２がメモリから供給され、命令バッファ２２
に蓄積されていた１個のユニットとパケット７２内の２
個のユニットとが順に命令レジスタ２３に転送される。
３個のユニット共フォーマット情報１１が“０”である
ので、いずれのユニットも２１ビット命令となる。ま
た、１番目のユニットの並列実行境界情報１０と２番目
のユニットの並列実行境界情報が“０”であり、３番目
のユニットの並列実行境界情報１０が“１”であるの
で、３番目のユニットまでの３個の命令を発行する。こ
れで、供給されたユニットはすべて発行されたことにな
る。In the execution unit 40, the stored value of the register R0 is added to the stored value of the register R1 and stored in the register R0, and the stored value of the register R3 is added to the stored value of the register R2 and stored in the register R3. . (Execution Unit 3) The packet 72 is supplied from the memory, and the instruction buffer 22
1 in the packet and 2 in packet 72
The individual units are sequentially transferred to the instruction register 23.
Since the format information 11 of the three units is "0", all units are 21-bit instructions. In addition, since the parallel execution boundary information 10 of the first unit and the parallel execution boundary information of the second unit are “0” and the parallel execution boundary information 10 of the third unit is “1”, Issue 3 instructions up to the unit. Now, all the supplied units have been issued.

【０１１８】実行部４０では、レジスタＲ０の格納値が
メモリ内のレジスタＲ４の格納値で示されるアドレスに
転送され、レジスタＲ０の格納値がレジスタＲ６に転送
され、レジスタＲ３の格納値がレジスタＲ７に転送され
る。In the execution unit 40, the value stored in the register R0 is transferred to the address indicated by the value stored in the register R4 in the memory, the value stored in the register R0 is transferred to the register R6, and the value stored in the register R3 is transferred to the register R7. Transferred to.

【０１１９】以上のようにして、本プロセッサにおいて
図８に示した処理を行うプログラムは３つの実行単位で
実行される。実行コードは、４２ビット命令が１個と２
１ビット命令が６個で構成されていたので、コードサイ
ズは１６８ビットである。（従来の固定長ＶＬＩＷ方式のプロセッサとの比較）次に、図８に示した処理を、従来技術の１つとして挙げ
た命令長が固定のＶＬＩＷ方式のプロセッサに行わせた
場合を仮定して、本発明に係るプロセッサの場合と比較
する。As described above, the program for performing the processing shown in FIG. 8 is executed in three execution units in this processor. The execution code consists of one 42-bit instruction and two 42-bit instructions.
The code size is 168 bits because six 1-bit instructions are used. (Comparison with Conventional Fixed-Length VLIW System Processor) Next, assume that the processing shown in FIG. 8 is executed by a VLIW system processor with a fixed instruction length, which is one of the conventional techniques. , In comparison with the case of the processor according to the present invention.

【０１２０】毎サイクル固定長の命令を固定個数発行す
る単純なＶＬＩＷ方式では、３２ビットの定数を転送す
る命令を１命令で指定できるような命令長にすると、非
常にコードサイズが大きくなってしまうため、命令長は
３２ビットとし、３２ビットの定数の転送は１６ビット
ずつ２命令に分けて行うことにする。In the simple VLIW method of issuing a fixed number of fixed length instructions every cycle, if the instruction length that can specify a 32-bit constant transfer instruction is one instruction, the code size becomes very large. Therefore, the instruction length is 32 bits, and the transfer of a 32-bit constant is divided into 2 instructions by 16 bits.

【０１２１】図１０は、図８に示された処理を、命令長
が３２ビット固定のＶＬＩＷ方式のプロセッサに行わせ
るプログラムの実行コードの例と実行イメージを示す図
である。FIG. 10 is a diagram showing an example of an execution code and an execution image of a program which causes a processor of the VLIW system whose instruction length is fixed to 32 bits to perform the processing shown in FIG.

【０１２２】このプログラムは、４個のパケット７３〜
７６から構成されている。各命令の処理内容は、図９に
示したコードと同様に、各フィールドに置かれたニーモ
ニックで表現されている。ただし、ニーモニック“ｓｅ
ｔｈｉ”は、１６ビットの定数をレジスタの上位１６ビ
ットに格納することを表し、ニーモニック“ｓｅｔｌ
ｏ”は、１６ビットの定数をレジスタの下位１６ビット
に格納することを表し、ニーモニック“ｎｏｐ”は、何
もしない命令であることを表している。This program includes four packets 73-
It is composed of 76. The processing content of each instruction is represented by a mnemonic placed in each field, as in the code shown in FIG. However, the mnemonic "se
"thi" indicates that a 16-bit constant is stored in the upper 16 bits of the register, and the mnemonic "setl" is used.
“O” indicates that a 16-bit constant is stored in the lower 16 bits of the register, and the mnemonic “nop” indicates that the instruction does nothing.

【０１２３】図１０（ａ）の実行コードと同図（ｂ）の
実行イメージとを比較するとわかるように、ＶＬＩＷ方
式では、各サイクル供給された命令がそのまま発行され
る。つまり、毎サイクル３２ビット命令が３個発行され
ることになる。並列実行可能な命令が存在しない場合
は、あらかじめソフトウェアで“ｎｏｐ”命令を挿入し
ておく必要がある。そのため、この例でも４個の“ｎｏ
ｐ”命令が挿入されて、コードサイズは３２ビット命令
が１２個なので３８４ビットとなっており、本発明に係
るプロセッサの場合のコードサイズよりも大幅に大きい
ものになっている。As can be seen by comparing the execution code of FIG. 10A with the execution image of FIG. 10B, in the VLIW method, the instruction supplied in each cycle is issued as it is. That is, three 32-bit instructions are issued every cycle. If there is no instruction that can be executed in parallel, it is necessary to insert a “nop” instruction by software in advance. Therefore, in this example also, four "no
Since the p ″ instruction is inserted and the code size is 12 32-bit instructions, the code size is 384 bits, which is significantly larger than the code size in the case of the processor according to the present invention.

【０１２４】また、３２ビットの定数のレジスタへの転
送を２命令に分けて行っているために新たな依存関係が
生じ、実行単位の数が４つとなっている。どのような命
令並べ替えを行っても実行単位の数を減らすことはでき
ない。これによって、本発明に係るプロセッサの場合に
比べて実行サイクル数が１サイクル増加する。（従来の並列実行境界の情報を固定長命令内に持つプロ
セッサとの比較）次に、図８に示した処理を、従来技術の１つとして挙げ
た命令長が固定であり並列実行の境界であるか否かの情
報を命令内に持つ方式のプロセッサに行わせた場合を仮
定して、本発明に係るプロセッサの場合と比較する。Since the transfer of the 32-bit constant to the register is divided into two instructions, a new dependency relationship is created, and the number of execution units is four. No matter how the instructions are rearranged, the number of execution units cannot be reduced. As a result, the number of execution cycles is increased by one cycle as compared with the case of the processor according to the present invention. (Comparison with a conventional processor having information on a parallel execution boundary in a fixed-length instruction) Next, the processing shown in FIG. Assuming a case where a processor having a method of having information on whether or not there is in an instruction is performed, a comparison is made with the case of the processor according to the present invention.

【０１２５】この方式では、命令長が３２ビットのモデ
ルと４０ビットのモデルを考える。命令長が３２ビット
のモデルでは、図１０のＶＬＩＷ方式の場合と同様に、
３２ビットの定数のレジスタへの転送は２命令に分けて
行う。それに対して命令長が４０ビットのモデルでは、
３２ビットの定数のレジスタへの転送を含むすべての種
類の演算を１命令で指定することができる。In this system, a model having an instruction length of 32 bits and a model having an instruction length of 40 bits are considered. In the model with an instruction length of 32 bits, as in the case of the VLIW method of FIG.
Transfer of a 32-bit constant to the register is performed in two instructions. On the other hand, in the model with an instruction length of 40 bits,
All kinds of operations including transfer of a 32-bit constant to a register can be designated by one instruction.

【０１２６】図１１は、図８に示された処理を、命令長
が３２ビット固定であり並列実行の境界の情報を命令内
に持つ方式のプロセッサに行わせるプログラムの実行コ
ードの例と実行イメージを示す図である。FIG. 11 shows an example of a program execution code and an execution image for executing the processing shown in FIG. 8 by a processor of a system having a fixed instruction length of 32 bits and having parallel execution boundary information in the instruction. FIG.

【０１２７】このプログラムは、８個の命令で構成され
ており、命令供給単位としては３個のパケット７７〜７
９から構成されている。各命令の処理内容は、実行コー
ドの各フィールドに置かれたニーモニックで表現されて
いる。３２ビットの定数のレジスタへの転送は、図１０
の命令長３２ビット固定のＶＬＩＷ方式の場合と同様に
１６ビットずつ２個の命令に分けて行う。This program is composed of eight instructions, and three packets 77 to 7 are provided as an instruction supply unit.
It is composed of nine. The processing content of each instruction is represented by a mnemonic placed in each field of the execution code. Transfer of a 32-bit constant to a register is shown in FIG.
As in the case of the VLIW system in which the instruction length is fixed at 32 bits, the instruction is divided into two instructions of 16 bits each.

【０１２８】図１１を見ると分かるように、このモデル
でも図１０のＶＬＩＷ方式の場合と同様に３２ビットの
定数のレジスタへの転送を２命令に分けて実行している
ため、新たな依存関係が生じ、実行サイクル数が本発明
に係るプロセッサの場合に比べて１サイクル増加してい
る。As can be seen from FIG. 11, in this model, as in the case of the VLIW system of FIG. 10, the transfer of the 32-bit constant to the register is executed by dividing it into two instructions. Occurs, and the number of execution cycles is increased by one cycle as compared with the case of the processor according to the present invention.

【０１２９】コードサイズに関しては、“ｎｏｐ”命令
の挿入が発生しないため、図１０のＶＬＩＷ方式の場合
のコードサイズから丁度“ｎｏｐ”命令の分だけ削減さ
れており、３２ビット命令が８個で２５６ビットとなっ
ている。しかし、依然として本発明に係るプロセッサの
場合のコードサイズに比べる・BR>ニ大きい。Regarding the code size, since the insertion of the "nop" instruction does not occur, the code size is reduced from the code size in the case of the VLIW system of FIG. 10 by exactly the "nop" instruction, and eight 32-bit instructions are used. It has 256 bits. However, it is still larger than the code size in the case of the processor according to the present invention.

【０１３０】次に、命令長を４０ビット固定としたモデ
ルとの比較を行う。Next, a comparison is made with a model in which the instruction length is fixed at 40 bits.

【０１３１】図１２は、図８に示された処理を、命令長
が４０ビット固定であり並列実行の境界の情報を命令内
に持つ方式のプロセッサに行わせるプログラムの実行コ
ードの例と実行イメージを示す図である。FIG. 12 shows an example of a program execution code and an execution image for executing the processing shown in FIG. 8 by a processor having an instruction length fixed to 40 bits and having parallel execution boundary information in the instruction. FIG.

【０１３２】このプログラムは、７個の命令で構成され
ており、命令供給単位としては３個のパケット８０〜８
２から構成されている。各命令の処理内容は、実行コー
ドの各フィールドに置かれたニーモニックで表現されて
いる。３２ビットの定数のレジスタへの転送について
も、１命令で指定することが可能である。This program is composed of seven instructions, and three packets 80 to 8 are provided as an instruction supply unit.
It consists of two. The processing content of each instruction is represented by a mnemonic placed in each field of the execution code. Transfer of a 32-bit constant to a register can also be specified by one instruction.

【０１３３】図１２を見ると分かるように、このモデル
では３２ビットの定数のレジスタへの転送を１命令で指
定することができるため、実行単位の数は３つであり、
実行サイクル数は本発明に係るプロセッサの場合と同じ
である。As can be seen from FIG. 12, since transfer of a 32-bit constant to a register can be designated by one instruction in this model, the number of execution units is three,
The number of execution cycles is the same as that of the processor according to the present invention.

【０１３４】命令数は本発明に係るプロセッサの場合と
同じだが、本発明に係るプロセッサの場合は長いビット
数を必要としない命令については２１ビット命令で指定
できるのに対し、このモデルではすべての命令を４０ビ
ット命令で指定する必要があるため、コードサイズは４
０ビット命令が７個で２８０ビットとなっており、本発
明に係るプロセッサの場合に比べて大きくなっている。Although the number of instructions is the same as in the case of the processor according to the present invention, in the case of the processor according to the present invention, an instruction which does not require a long number of bits can be designated by a 21-bit instruction, but in this model, all The code size is 4 because the instruction must be specified as a 40-bit instruction.
Seven 0-bit instructions are 280 bits, which is larger than that of the processor according to the present invention.

【０１３５】以上、本発明に係るプロセッサについて、
実施形態に基づいて説明したが、本発明はこれらの実施
形態に限られないことは勿論である。即ち、（１）上記
実施の形態では、静的なスケジューリングを前提として
いたが、本発明はこれに限定されるものではない。つま
り、スーパースカラ方式のように動的なスケジューリン
グを行うプロセッサにも適用することができる。この場
合は、命令フォーマット内の並列実行境界情報を無く
し、解読部の中に動的に並列実行可能か否かを検出する
並列実行可否検出装置を持たせ、本実施形態において命
令発行制御部にて並列実行境界情報を参照して行ってい
た制御を、並列実行可否検出装置の出力を参照して行え
ばよい。このような構成にしても、可変長命令方式にお
いてハードウェアを簡単化できるという本発明の有意性
は保たれる。（２）上記実施の形態では、３個の命令を同時実行する
ように構成していたが、本発明はこの同時実行命令数に
限定されるものではない。例えば、２個の命令を同時発
行する構成にしてもよい。この場合は、解読部と命令レ
ジスタ周辺の構成を図１６のブロック図に示すように変
更し、実行部の演算器の構成を適宜変更すればよい。（３）上記実施の形態では、図１の命令フォーマットか
らわかるように、ユニット１個または２個にて１個の命
令を構成していたが、本発明はこのユニット数に限定さ
れるものではない。つまり、３個以上のユニットを連結
して１個の命令を構成するような命令フォーマットを定
義してもよい。例えば、１〜４個の単位命令にて命令を
構成する場合には、命令内のフォーマット情報を２ビッ
トにすればよい。（４）上記実施の形態では、図１の命令フォーマットか
らわかるように、ユニット１個または２個にて１個の命
令を構成していたが、必ずしもユニット単体で構成され
る命令が存在する必要はない。例えば、１命令が２個ま
たは３個のユニットから構成されるとしてもよい。この
場合には、命令レジスタと命令デコーダおよび定数オペ
ランドを結ぶ配線を変更すればよい。（５）上記実施の形態では、図１の命令フォーマットか
らわかるように、命令内に並列実行の境界であるか否か
の情報を持たせていたが、この情報は必ずしも必要では
ない。つまり、命令内にはフォーマットに関する情報の
みをもち、並列実行可能な命令が存在しない場合には
“ｎｏｐ”命令を配置するという方法をとってもよい。
この場合においても、各命令の指定に必要な長さの命令
フォーマットにて命令を指定することができるという本
発明の有意性が保たれる。（６）上記実施の形態では、図１の命令フォーマットか
らわかるように、４２ビット命令を構成する２つのユニ
ットのうち２番目のユニットには定数オペランドの一部
のみを配置するようになっていたが、このユニットにオ
ペコードを配置しても構わない。そのためには、図５に
おいて直接定数オペランドの一部として出力していたユ
ニットを命令デコーダへ入力するように変更し、命令デ
コーダの入力ビット幅を増加させればよい。（７）上記実施の形態では、命令バッファの構成として
図４に示すものとしたが、本発明はこの構成およびバッ
ファのサイズに限定されるものではない。例えば、一本
の単純なキュー構造の命令バッファを用いてもよい。As described above, regarding the processor according to the present invention,
Although described based on the embodiments, it goes without saying that the present invention is not limited to these embodiments. That is, (1) In the above embodiment, static scheduling is assumed, but the present invention is not limited to this. In other words, it can also be applied to a processor that performs dynamic scheduling like the superscalar method. In this case, the parallel execution boundary information in the instruction format is eliminated, and a parallel execution propriety detection device for dynamically detecting whether or not parallel execution is possible is provided in the decoding unit. The control performed by referring to the parallel execution boundary information may be performed by referring to the output of the parallel execution propriety detection device. Even with such a configuration, the significance of the present invention that the hardware can be simplified in the variable length instruction system is maintained. (2) In the above embodiment, three instructions are simultaneously executed, but the present invention is not limited to this number of simultaneously executed instructions. For example, the configuration may be such that two instructions are issued simultaneously. In this case, the configuration around the decoding unit and the instruction register may be changed as shown in the block diagram of FIG. 16, and the configuration of the arithmetic unit of the execution unit may be appropriately changed. (3) In the above embodiment, as can be seen from the instruction format of FIG. 1, one instruction is composed of one or two units, but the present invention is not limited to this number of units. Absent. That is, an instruction format may be defined in which three or more units are connected to form one instruction. For example, when an instruction is composed of 1 to 4 unit instructions, the format information in the instruction may be 2 bits. (4) In the above embodiment, as can be seen from the instruction format of FIG. 1, one instruction is composed of one or two units, but an instruction that is composed of a single unit does not necessarily exist. There is no. For example, one instruction may be composed of two or three units. In this case, the wiring connecting the instruction register, the instruction decoder and the constant operand may be changed. (5) In the above-described embodiment, as can be seen from the instruction format in FIG. 1, the instruction has information indicating whether or not it is the boundary of parallel execution, but this information is not always necessary. That is, a method may be adopted in which only the information regarding the format is included in the instruction, and when there is no instruction that can be executed in parallel, the “nop” instruction is arranged.
Even in this case, the significance of the present invention that the instruction can be specified in the instruction format having the length necessary for specifying each instruction is maintained. (6) In the above embodiment, as can be seen from the instruction format in FIG. 1, only a part of the constant operand is arranged in the second unit of the two units forming the 42-bit instruction. However, you may place an opcode in this unit. For that purpose, the unit output as a part of the constant operand in FIG. 5 may be changed to be input to the instruction decoder, and the input bit width of the instruction decoder may be increased. (7) In the above embodiment, the structure of the instruction buffer is shown in FIG. 4, but the present invention is not limited to this structure and the size of the buffer. For example, a single instruction buffer having a simple queue structure may be used.

【０１３６】[0136]

【発明の効果】以上の説明から明らかなように、本発明
のプロセッサによって、命令レベルの並列実行に際し
て、ハードウェア複雑化の問題を克服しつつ、性能向上
とコード効率向上を両立することが可能になる。 As is apparent from the above description, when the processor of the present invention is used for parallel execution at the instruction level.
Improve performance while overcoming the problem of hardware complexity
It is possible to achieve both improvement of code efficiency.

[Brief description of drawings]

【図１】本発明の実施形態に係るプロセッサが実行する
命令の構造を示す図FIG. 1 is a diagram showing a structure of an instruction executed by a processor according to an embodiment of the present invention.

【図２】同プロセッサにおける命令の供給と発行の概念
を示す図FIG. 2 is a diagram showing a concept of instruction supply and issue in the processor.

【図３】同プロセッサのハードウェア構成を示すブロッ
ク図FIG. 3 is a block diagram showing a hardware configuration of the processor.

【図４】同プロセッサの命令バッファ２２の詳細な構成
を示すブロック図FIG. 4 is a block diagram showing a detailed configuration of an instruction buffer 22 of the processor.

【図５】同プロセッサの命令レジスタ２３周辺の構成を
示すブロック図FIG. 5 is a block diagram showing a configuration around an instruction register 23 of the processor.

【図６】同プロセッサの命令発行制御部３１とその周辺
の回路構成を示す図FIG. 6 is a diagram showing a circuit configuration of an instruction issuance control unit 31 and its peripherals of the processor.

【図７】同プロセッサが同時発行可能な命令群の命令長
の組み合わせを示す図FIG. 7 is a diagram showing a combination of instruction lengths of instruction groups that can be issued simultaneously by the processor.

【図８】３２ビットの定数を扱う処理の一例を示すフロ
ーチャートFIG. 8 is a flowchart showing an example of processing for handling a 32-bit constant.

【図９】図８に示された処理を図３のプロセッサに行わ
せるプログラムの実行コードの例と実行イメージを示す
図9 is a diagram showing an example of an execution code and an execution image of a program that causes the processor of FIG. 3 to perform the processing shown in FIG.

【図１０】図８に示された処理を命令長が３２ビット固
定のＶＬＩＷ方式の従来のプロセッサに行わせるプログ
ラムの実行コードの例と実行イメージを示す図FIG. 10 is a diagram showing an example of a program execution code and an execution image for causing the conventional processor of the VLIW system in which the instruction length is fixed to 32 bits to perform the processing shown in FIG. 8;

【図１１】図８に示された処理を、命令長３２ビット固
定で命令内に並列実行境界の情報を持たせる方式の従来
のプロセッサに行わせるプログラムの実行コードの例と
実行イメージを示す図FIG. 11 is a diagram showing an example of an execution code of a program and an execution image for executing the processing shown in FIG. 8 by a conventional processor of a method in which an instruction length is fixed to 32 bits and information of a parallel execution boundary is provided in an instruction.

【図１２】図８に示された処理を、命令長４０ビット固
定で命令内に並列実行境界の情報を持たせる方式の従来
のプロセッサに行わせるプログラムの実行コードの例と
実行イメージを示す図FIG. 12 is a diagram showing an example of an execution code of a program and an execution image for executing the processing shown in FIG. 8 by a conventional processor of a method in which an instruction length is fixed at 40 bits and information of a parallel execution boundary is provided in the instruction.

【図１３】従来のプロセッサにおける命令レジスタ周辺
の構成を示すブロック図FIG. 13 is a block diagram showing a configuration around an instruction register in a conventional processor.

【図１４】従来のプロセッサにおける命令レジスタ周辺
の構成を示すブロック図FIG. 14 is a block diagram showing a configuration around an instruction register in a conventional processor.

【図１５】従来のプロセッサの一例であるＧＭＩＣＲＯ
／４００における命令レジスタ周辺の構成を示すブロッ
ク図FIG. 15 is a GMICRO which is an example of a conventional processor.
Block diagram showing the configuration around the instruction register in the / 400

【図１６】本発明の別の実施形態のプロセッサにおける
命令レジスタ２３周辺の構成を示すブロック図FIG. 16 is a block diagram showing a configuration around an instruction register 23 in a processor according to another embodiment of the present invention.

[Explanation of symbols]

１０並列実行境界情報１１フォーマット情報２０命令供給発行部２１命令フェッチ部２２命令バッファ２３命令レジスタ３０解読部３１命令発行制御部３２命令デコーダ３３第１命令デコーダ３４第２命令デコーダ３５第３命令デコーダ４０実行部４１実行制御部４２ＰＣ部４３レジスタファイル４４第１演算部４５第２演算部４６第３演算部４７オペランドアクセス部４８、４９データバス５０ユニットキュー２２１命令バッファＡ２２２命令バッファＢ２２３命令バッファ制御部２２４ａ〜２２４ｄセレクタ２３１命令レジスタＡ２３２命令レジスタＢ２３３命令レジスタＣ２３４命令レジスタＤ 10 Parallel execution boundary information 11 Format information 20 Command Supply Issuing Department 21 Instruction fetch section 22 Instruction buffer 23 Instruction register 30 Decoding section 31 Instruction issue control unit 32 instruction decoder 33 First instruction decoder 34 Second instruction decoder 35 Third Instruction Decoder 40 Execution unit 41 Execution control unit 42 PC section 43 register file 44 First Operation Unit 45 Second operation unit 46 Third operation unit 47 Operand access part 48, 49 data bus 50 unit queue 221 Instruction buffer A 222 Instruction buffer B 223 Instruction buffer control unit 224a-224d selector 231 Instruction register A 232 Instruction register B 233 Instruction register C 234 Instruction register D

───────────────────────────────────────────────────── フロントページの続き (72)発明者高山秀一大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者小谷謙介大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開平９−26878（ＪＰ，Ａ) 特開平３−147021（ＪＰ，Ａ) 特開平３−53325（ＪＰ，Ａ) 特開平５−289870（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/30 - 9/42 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Shuichi Takayama 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Kensuke Otani 1006 Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Co., Ltd. (56) Reference JP-A-9-26878 (JP, A) JP-A-3-147021 (JP, A) JP-A-3-53325 (JP, A) JP-A-5-289870 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 9/30-9/42

Claims

(57) [Claims]

1. Included in run unit consisting of variable bit length
Unit instructions consisting of variable-length bit lengths can be executed in parallel.
A processor that An instruction supply and issue unit that sequentially fetches and outputs instructions, It is equipped with a plurality of decoding means and is output from the instruction supply and issue section.
A decoding unit that decodes the instructions And an execution unit that executes the instruction decoded by the decoding unit.
Prepare, Of the unit instructions, except the unit instruction with the minimum bit length,
The part to be decrypted by the decryption means and the decryption means
Has an undeciphered part, The maximum bit length of the execution unit depends on the plurality of decoding means.
A process characterized by being larger than the total bit length.
Sa.

2. When the execution unit has the maximum bit length,
The last unit instruction of the execution unit has the maximum bit length
The process according to claim 1, which is a unit instruction.
Sassa.

3. The unit instruction with the maximum bit length is
Operate with the same bit length as the unit instruction of bit length
It consists of a word element that contains only the included word element and a constant, The unit instruction is read by one decoding means regardless of the bit length.
Is characterized in that the portion of the minimum bit length is decoded.
The processor according to claim 1 or 2.

4. The processor further includes a decoding unit.
Therefore, the instruction register that stores the instruction to be decoded
Have There is a one-to-one correspondence between the instruction register and the decoding means.
4. The method according to any one of claims 2 to 3, characterized in that
Processor.

5. The decryption means is a unit of the maximum bit length.
When decoding the order command,
Characterized by disabling the decryption means corresponding to the command register
The processor according to claim 4, wherein:

6. The unit instruction is composed of one or more word elements.
6. The method according to any one of claims 1 to 5, characterized in that
On-board processor.

7. The instruction issue supply unit is a unit of a predetermined number of word elements.
, Output the command to the decoding unit, Each of the unit instructions is a place in the predetermined number of word elements.
Depending on which of the plurality of decoding means is input.
7. The process according to claim 6, wherein the process is uniquely determined.
Sa.

8. Information about the degree of parallelism is disclosed in the execution unit.
8. All of claims 1 to 7, characterized in that
Processor described there.

9. The information regarding the degree of parallelism is the execution unit
9. The information according to claim 8, which is information regarding a boundary of
On-board processor.

10. Information about the length of the unit instruction is given for each
Claims characterized by being explicitly given in a unit command
Item 10. A processor according to any one of items 1 to 9.

11. The length of the decoding result issued by the decoding unit
A command issuing control unit for controlling the device is further provided.
The processor according to claim 1, wherein

12. The method according to claim 12, Validate the decoding result of each of the decoding means
Further, an instruction issue control unit that determines whether to perform or invalidate
It is provided with any one of Claim 1 to 11 characterized by the above-mentioned.
On-board processor.

13. The instruction sequence executed by the processor is
A special feature is that it is statically scheduled into execution units.
The process according to any one of claims 1 to 12,
Sa.

14. Included in the execution unit consisting of variable-length bit length
The maximum number of unit instructions consisting of variable length bit length is N
A processor capable of executing (N: an integer of 2 or more) in parallel
I mean The bit length of the execution unit is the same as the instruction length for instruction fetch.
Is variable without limitation, Among the unit instructions, the one with the maximum bit length is executed in parallel.
Less than a predetermined bit length shorter than the bit length of the execution unit to execute
A process characterized by decoding only the execution units below
Sa.

15. A length longer than an instruction length for fetching the instruction.
Characterized by being able to decipher the execution unit
The processor according to claim 14.

16. Included in the execution unit consisting of variable-length bit length
The maximum number of unit instructions consisting of variable length bit length is N
A processor capable of executing (N: an integer of 2 or more) in parallel
I mean Information regarding the degree of parallelism is explicitly added to the execution unit.
Has been done, Among the unit instructions, the one with the maximum bit length is executed in parallel.
Less than a predetermined bit length shorter than the bit length of the execution unit to execute
A process characterized by decoding only the execution units below
Sa.

17. The processor is a decoding unit for decoding instructions
Have The decoding unit has the largest bit length of the unit instructions.
Shorter than the bit length of the execution unit that executes N objects in parallel
A bit length instruction is provided.
The processor according to any one of 4 to 16.

18. Included in the execution unit consisting of variable-length bit length
Up to N unit instructions with variable length bit length
A processor that can execute in series, The maximum length of the unit instruction is M bits (M: 2 or more
Number) , Instruction fetch is performed in units of the first fixed-length bit length,
Instruction supply that outputs the second fixed length bit length as a unit
Issuer, Of the second fixed length output from the instruction supply and issue unit.
Issue the decryption result of variable length bit length.
And a decoding section that The second fixed length is limited to a length shorter than M × N bits
A processor characterized by being.

19. The second fixed length is greater than the first fixed length
19. The process according to claim 18, which is longer than
Sa.

20. The single processor that the processor executes in parallel
The combination of the bit lengths of the order instructions meets the specified limit.
As such, the run unit is statically scheduled.
20. The processor according to claim 19, characterized in that:

21. The predetermined limit is the second fixed length
When issuing all bit lengths, the second fixed length
The bit length consists of M bits at the end of the bit length.
Claim characterized by being a restriction that unit instructions are arranged
Item 21. The processor according to Item 20.

22. The predetermined limit is output to the decryption unit.
Within the bit length that is
Characterized in that it is a restriction provided to be arranged
21. The processor according to claim 20, wherein:

23. Information on the degree of parallelism is included in the execution unit.
15. It is explicitly given, or
18. The processor according to 18.

24. Information on the degree of parallelism is the execution unit.
24. The professional according to claim 23, which is a boundary of positions.
Sessa.

25. Information about the length of the unit instruction is given for each
Claims characterized by being explicitly given in a unit command
Item 25. The processor according to any one of Items 14 to 24.

26. The length of the decryption result issued by the decryption unit
A command issuing control unit for controlling the device is further provided.
23. The processor according to any one of claims 17 to 22.

27. The instruction sequence executed by the processor is
A special feature is that it is statically scheduled into execution units.
27. The process according to claim 14, wherein
Sa.