JP2011198100A

JP2011198100A - Processor and control method thereof

Info

Publication number: JP2011198100A
Application number: JP2010064725A
Authority: JP
Inventors: Satoru Chiba; 哲千葉
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2010-03-19
Filing date: 2010-03-19
Publication date: 2011-10-06

Abstract

PROBLEM TO BE SOLVED: To provide a processor which attains area reduction and operation frequency reduction, and a control method thereof.SOLUTION: The processor executes a maximum of eight instructions out of first to fourth instructions and fifth to eighth instructions in parallel and includes: first to m-th operation units 142-148 each of which one of the first to eighth instructions is input to; and selectors which are provided correspondingly to the first to m-th operations units 142-148 respectively and each of which selects one of the first to m-th instructions to input the selected instruction to corresponding one of the first to m-th operation units 142-148. The first to fourth instructions are input to the first to eighth selectors respectively, and the fifth to eighth instructions are input to the fifth to eighth and following selectors respectively.

Description

本発明は、並列して命令が実行可能なプロセッサ及びその制御方法に関し、特に、例えばＶＬＩＷ（Very Long Instruction Word）命令を実行可能なプロセッサ及びその制御方法に関する。 The present invention relates to a processor capable of executing instructions in parallel and a control method thereof, and more particularly to a processor capable of executing, for example, a VLIW (Very Long Instruction Word) instruction and a control method thereof.

従来、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）等プロセッサの処理能力向上のため、複数の命令を並列実行する方法がある。この方法にはハードウェアが自動的に命令の並列実行性を検出して並列実行するスーパースカラと呼ばれるものと、プログラムをコンパイル、アセンブルする際に予め命令の並列度を検出しておき、並列実行可能な命令コードを生成しておくＶＬＩＷ（Very Long Instruction Word）と呼ばれるものがある。組み込み機器向けプロセッサでは消費電力、面積の関係で回路構成が簡単なＶＬＩＷ方式を使うことが多い。 Conventionally, there is a method of executing a plurality of instructions in parallel in order to improve the processing capability of a processor such as a CPU (Central Processing Unit) and a DSP (Digital Signal Processor). This method is called a superscalar where the hardware automatically detects the parallel execution of instructions and executes them in parallel, and when the program is compiled and assembled, the parallelism of the instructions is detected in advance. There is a so-called VLIW (Very Long Instruction Word) that generates a possible instruction code. Processors for embedded devices often use the VLIW method with a simple circuit configuration in terms of power consumption and area.

近年のプロセッサでは性能向上のため、命令並列度が高くなることがあり、ＶＬＩＷ型プロセッサは回路構成が簡単と言っても並列実行される命令群に含まれる各命令を、対応する演算ユニットに発行する回路が大きくなり、動作周波数が低下する事が問題となっている。 In order to improve performance in recent processors, the degree of instruction parallelism may increase, and even if the VLIW processor is simple in circuit configuration, each instruction included in a group of instructions executed in parallel is issued to the corresponding arithmetic unit. The problem is that the operating frequency becomes large and the operating frequency decreases.

この問題を解決する手法として特許文献１に記載されているように、命令メモリから読み出した命令群を、各演算ユニットに発行する前に、命令群内の命令の順番を並べ直すことで、発行回路の選択回路を削減し、面積低減、周波数劣化低減を図るものや、特許文献２に記載されているように、コンパイル、アセンブル時に命令群内の命令の順番をある固定の順番で並べて命令メモリに置いておき、発行回路の選択回路を削減し、面積低減、周波数劣化低減を図るものがある。これらはいずれも各命令を対応する各演算ユニットに発行する段階では、各命令がある順番（法則）に基づいて並んでいる点が特徴である。 As described in Patent Document 1 as a technique for solving this problem, the instruction group read from the instruction memory is issued by rearranging the order of the instructions in the instruction group before issuing the instruction group to each arithmetic unit. An instruction memory that reduces the circuit selection circuit to reduce area and frequency degradation, and arranges the order of instructions in the instruction group in a fixed order during compilation and assembly as described in Patent Document 2. In some cases, the selection circuit of the issuing circuit is reduced to reduce the area and the frequency deterioration. All of them are characterized in that each instruction is arranged in a certain order (law) at the stage of issuing each instruction to the corresponding arithmetic unit.

ＭＸ／ＭＹ／ＡＸ／ＡＹ／ＤＸ／ＤＹ／ＳＸ／ＳＹという８つの演算ユニットを持つプロセッサで、命令発行時の命令順が自由な場合と、ある順番（法則）に基づいている場合での命令発行回路の違いについて説明する。ここで、ＭＸ／ＭＹ、ＡＸ／ＡＹ、ＤＸ／ＤＹ、ＳＸ／ＳＹはそれぞれ同一の演算ユニットが２個ずつあり、１個目が＊Ｘ、２個目が＊Ｙであることを示す。また、ＭＹ、ＡＹ、ＤＹ、ＳＹは２個目の演算ユニットであるのでＭＹ、ＡＹ、ＤＹ、ＳＹだけに命令が発行されることはなく、必ずそれぞれＭＸ、ＡＸ、ＤＸ、ＳＹへの命令の発行が行われる必要がある。また説明の都合上、命令メモリから読み出した並列実行される命令群は最大で８並列とする。 Instructions when the order of instructions when issuing instructions is free and when they are based on a certain order (law) in a processor having eight arithmetic units of MX / MY / AX / AY / DX / DY / SX / SY The difference between the issuing circuits will be described. Here, MX / MY, AX / AY, DX / DY, and SX / SY each indicate two identical arithmetic units, the first being * X and the second being * Y. In addition, since MY, AY, DY, and SY are the second arithmetic units, instructions are not issued only to MY, AY, DY, and SY, and instructions to MX, AX, DX, and SY are always issued. Issue needs to be made. For convenience of explanation, the maximum number of instructions executed in parallel read from the instruction memory is 8 in parallel.

命令順が自由の場合、ＭＸ演算ユニットに属する命令は命令レジスタ内の命令１スロット〜命令８スロットのどのスロットにも存在する可能性がある。同様にＡＸ演算ユニットに属する命令、ＤＸ演算ユニットに属する命令、ＳＸ演算ユニットに属する命令も命令レジスタ内の命令１スロット〜命令８スロットのどのスロットにも存在する可能性がある。また、ＭＹ演算ユニットに属する命令、ＡＹ演算ユニットに属する命令、ＤＹ演算ユニットに属する命令、ＳＹ演算ユニットに属する命令はそれぞれＭＸ、ＡＸ、ＤＸ、ＳＸ演算ユニットに属する命令の実行が前提となるため、命令レジスタ内の命令２スロット〜命令８スロットのどのスロットにも存在する可能性がある。そのため、命令発行回路はＭＸ／ＡＸ／ＤＸ／ＳＸ演算ユニット用に命令１スロット〜命令８スロットのいずれからも選択できる８to1の選択回路と、ＭＹ／ＡＹ／ＤＹ／ＳＹ演算ユニット用に命令２スロット〜命令８スロットのいずれからも選択できる７to1の選択回路とを有する。言い換えると８to1の選択回路が４個と７to1の選択回路が４個必要となる。 When the instruction order is free, an instruction belonging to the MX arithmetic unit may exist in any slot of the instruction 1 slot to the instruction 8 slot in the instruction register. Similarly, an instruction belonging to the AX arithmetic unit, an instruction belonging to the DX arithmetic unit, and an instruction belonging to the SX arithmetic unit may exist in any slot of the instruction 1 slot to the instruction 8 slot in the instruction register. In addition, instructions belonging to the MY arithmetic unit, instructions belonging to the AY arithmetic unit, instructions belonging to the DY arithmetic unit, and instructions belonging to the SY arithmetic unit are premised on execution of instructions belonging to the MX, AX, DX, and SX arithmetic units, respectively. There is a possibility that the slot exists in any slot of the instruction 2 slot to the instruction 8 slot in the instruction register. Therefore, the instruction issuance circuit is an 8to1 selection circuit that can be selected from any of the instruction 1 slot to the instruction 8 slot for the MX / AX / DX / SX arithmetic unit, and the instruction 2 slot for the MY / AY / DY / SY arithmetic unit. ~ 7 to 1 selection circuit which can be selected from any of the instruction 8 slots. In other words, four 8to1 selection circuits and four 7to1 selection circuits are required.

次に命令発行時の命令順をＭＸ→ＭＹ→ＡＸ→ＡＹ→ＤＸ→ＤＹ→ＳＸ→ＳＹの順に固定した場合の命令発行回路について説明する。
命令順を上記のように固定した場合、以下のことが言える。 Next, an instruction issue circuit when the order of instructions at the time of issuing instructions is fixed in the order of MX → MY → AX → AY → DX → DY → SX → SY will be described.
When the order of instructions is fixed as described above, the following can be said.

（１）ＭＸ演算ユニットに属する命令は命令１スロットに存在するか、実行される命令がなく命令レジスタに存在しないかのどちらかである。 (1) An instruction belonging to the MX operation unit exists in the instruction 1 slot, or there is no instruction to be executed and there is no instruction register.

（２）ＭＹ演算ユニットに属する命令はＭＸ演算ユニットに属する命令が実行されることが前提となるので命令１スロットに存在することはない。したがって、命令２スロットに存在するか、実行される命令がなく命令レジスタに存在しないかのどちらかである。 (2) Since an instruction belonging to the MY arithmetic unit is premised on an instruction belonging to the MX arithmetic unit being executed, it does not exist in the instruction 1 slot. Therefore, either in the instruction 2 slot, or there is no instruction to be executed and there is no instruction register.

（３）ＡＸ演算ユニットに属する命令が存在する命令スロットはＭＸ演算ユニット、ＭＹ演算ユニットに属する命令の有無に依存する。ＭＸ／ＭＹとも存在せず命令１スロットに存在するか、ＭＸのみ存在して命令２スロットに存在するか、ＭＸ／ＭＹとも存在して命令３スロットに存在するか、実行される命令がなく命令レジスタに存在しないかのいずれかである。 (3) The instruction slot in which an instruction belonging to the AX arithmetic unit exists depends on the presence / absence of an instruction belonging to the MX arithmetic unit and the MY arithmetic unit. MX / MY does not exist and exists in the instruction 1 slot, or only MX exists in the instruction 2 slot, MX / MY exists in the instruction 3 slot, or there is no instruction to be executed Either not present in the register.

（４）ＡＹ演算ユニットに属する命令が存在する命令スロットはＭＸ演算ユニット、ＭＹ演算ユニットに属する命令の有無に依存し、かつＡＸ演算ユニットに属する命令が実行されることが前提で決定される。ＭＸ／ＭＹとも存在せずＡＸが命令１スロット、ＡＹが命令２スロットに存在するか、ＭＸのみ存在してＡＸが命令２スロット、ＡＹが命令３スロットに存在するか、ＭＸ／ＭＹとも存在してＡＸが命令３スロット、ＡＹが命令４スロットに存在するか、実行される命令がなく命令レジスタに存在しないかのいずれかである。 (4) The instruction slot in which an instruction belonging to the AY arithmetic unit exists depends on the presence or absence of an instruction belonging to the MX arithmetic unit and the MY arithmetic unit, and is determined on the assumption that the instruction belonging to the AX arithmetic unit is executed. MX / MY does not exist, AX exists in the instruction 1 slot, AY exists in the instruction 2 slot, or only MX exists, AX exists in the instruction 2 slot, AY exists in the instruction 3 slot, or MX / MY exists Either AX is in the instruction 3 slot and AY is in the instruction 4 slot, or there is no instruction to be executed and there is no instruction register.

同様にしてＤＸ、ＤＹ、ＳＸ、ＳＹ演算ユニットに属する命令が存在する命令スロットは以下の通りになる。 Similarly, the instruction slots in which instructions belonging to the DX, DY, SX, and SY operation units exist are as follows.

（５）ＤＸ演算ユニットに属する命令は命令１スロット〜命令５スロットのいずれかに存在するか、実行される命令が無く命令レジスタに存在しないかのどちらかである。 (5) The instruction belonging to the DX arithmetic unit exists either in the instruction 1 slot to the instruction 5 slot, or there is no instruction to be executed and it does not exist in the instruction register.

（６）ＤＹ演算ユニットに属する命令は命令２スロット〜命令６スロットのいずれかに存在するか、実行される命令が無く命令レジスタに存在しないかのどちらかである。 (6) The instruction belonging to the DY operation unit exists in any of the instruction 2 slot to the instruction 6 slot, or there is no instruction to be executed and there is no instruction register.

（７）ＳＸ演算ユニットに属する命令は命令１スロット〜命令７スロットのいずれかに存在するか、実行される命令が無く命令レジスタに存在しないかのどちらかである。 (7) The instruction belonging to the SX arithmetic unit is either in the instruction 1 slot to the instruction 7 slot, or there is no instruction to be executed and it is not in the instruction register.

（８）ＳＹ演算ユニットに属する命令は命令２スロット〜命令８スロットのいずれかに存在するか、実行される命令が無く命令レジスタに存在しないかのどちらかである。 (8) The instruction belonging to the SY arithmetic unit is either in the instruction 2 slot to the instruction 8 slot, or there is no instruction to be executed and it is not in the instruction register.

そのため、命令発行回路は以下の構成となる。
（ア）ＭＸ／ＭＹに関してはそれぞれ命令１スロット、命令２スロットを直結。
（イ）ＡＸ／ＡＹに関してはそれぞれ命令１スロット〜命令３スロット、命令２スロット〜命令４スロットの３つの命令スロットから選択。
（ウ）ＤＸ／ＤＹに関してはそれぞれ命令１スロット〜命令５スロット、命令２スロット〜命令６スロットの５つの命令スロットから選択。
（エ）ＳＸ／ＳＹに関してはそれぞれ命令１スロット〜命令７スロット、命令３スロット〜命令８スロットの７つの命令スロットから選択。
言い換えると、３to1の選択回路が２個、５to1の選択回路が２個、７to1の選択回路が２個必要ということになる。 Therefore, the instruction issue circuit has the following configuration.
(A) For MX / MY, the instruction 1 slot and instruction 2 slot are directly connected to each other.
(A) For AX / AY, select from three instruction slots of instruction 1 slot to instruction 3 slot and instruction 2 slot to instruction 4 slot, respectively.
(C) For DX / DY, select from five instruction slots of instruction 1 slot to instruction 5 slot and instruction 2 slot to instruction 6 slot, respectively.
(D) For SX / SY, select from 7 instruction slots from instruction 1 slot to instruction 7 slot and instruction 3 slot to instruction 8 slot, respectively.
In other words, two 3to1 selection circuits, two 5to1 selection circuits, and two 7to1 selection circuits are required.

面積比較のため、各選択回路を２to1の選択回路で構成したとすると、８to1選択回路は２to1選択回路７個、７to1選択回路は２to1選択回路６個、５to1選択回路は２to1選択回路４個、３to1選択回路は２to1選択回路２個に置き換えられるので、命令順が自由の場合の命令発行回路部分は７×４＋６×４＝５２で２to1選択回路５２個相当、命令順が固定の場合の命令発行回路部分は６×２＋４×２＋２×２＝２４で２to1選択回路２４個相当となり、面積が大幅に削減できることがわかる。 If each selection circuit is configured with a 2to1 selection circuit for area comparison, the 8to1 selection circuit has seven 2to1 selection circuits, the 7to1 selection circuit has six 2to1 selection circuits, the 5to1 selection circuit has four 2to1 selection circuits, and 3to1. Since the selection circuit is replaced with two 2to1 selection circuits, the instruction issuing circuit portion when the instruction order is free is 7 × 4 + 6 × 4 = 52, which corresponds to 52 2to1 selection circuits, and the instruction issuing circuit when the instruction order is fixed The portion is 6 × 2 + 4 × 2 + 2 × 2 = 24, which corresponds to 24 2to1 selection circuits, and it can be seen that the area can be greatly reduced.

周波数に関しては８to1選択回路と７to1選択回路の論理段数は３段で変わりないので最大遅延パスの論理段数は変わらないが、遅延の大きなパスと小さなパスが明確になる事で論理合成時の最適化が進みやすくなる事や、面積が小さくなる事で配線長の削減、迂回配線の削減等の効果により動作周波数が向上する。 Regarding the frequency, the number of logic stages of the 8to1 selection circuit and the 7to1 selection circuit does not change with 3 stages, so the number of logic stages of the maximum delay path does not change, but optimization at the time of logic synthesis by clarifying a path with a large delay and a small path The operation frequency is improved by the effect of reducing the wiring length, reducing the number of detour wirings, and the like.

このような命令処理方法について更に詳細に説明する。図３は、特許文献２に記載のＶＬＩＷプロセッサを示す図である。図３に示すように、ＶＬＩＷプロセッサは、メモリ２２０から命令を読み出す命令読出部２２１と、４つの命令スロット０〜３を有する命令レジスタ２２２と、命令レジスタ２２２からの命令を振り分ける命令発行部２２３と、命令を実行する命令実行部２２４とを有する。命令実行部２２４の各演算ユニットは、レジスタＰＣ、ＧＲ、ＦＲを参照しつつ命令レジスタ２２２からの命令を実行する。ここで、命令実行部２２４は、整数ユニットであるＩＵ０とＩＵ１、浮動小数点数ユニットであるＦＵ０とＦＵ１、分岐ユニットであるＢＵ０とＢＵ１を備える。また、プロセッサは汎用レジスタＧＲ、浮動小数点レジスタＦＲ、プログラムカウンタＰＣを有する。 Such an instruction processing method will be described in more detail. FIG. 3 is a diagram showing the VLIW processor described in Patent Document 2. As shown in FIG. As shown in FIG. 3, the VLIW processor includes an instruction reading unit 221 that reads an instruction from the memory 220, an instruction register 222 having four instruction slots 0 to 3, and an instruction issuing unit 223 that distributes instructions from the instruction register 222. And an instruction execution unit 224 for executing instructions. Each arithmetic unit of the instruction execution unit 224 executes an instruction from the instruction register 222 while referring to the registers PC, GR, and FR. Here, the instruction execution unit 224 includes IU0 and IU1 that are integer units, FU0 and FU1 that are floating-point units, and BU0 and BU1 that are branch units. The processor includes a general-purpose register GR, a floating-point register FR, and a program counter PC.

ここで、ＶＬＩＷ命令内の基本命令の並びとして、図４に示す２２通りのＶＬＩＷ命令を実行可能であるとする。図４において、記号の意味は次の通りである。Ｉ０は、ＩＵ０にて実行される基本命令が配置されることを意味する。Ｉ１は、ＩＵ１にて実行される基本命令が配置されることを意味する。Ｆ０は、ＦＵ０にて実行される基本命令が配置されることを意味する。Ｆ１は、ＦＵ１にて実行される基本命令が配置されることを意味する。Ｂ０は、ＢＵ０にて実行される基本命令が配置されることを意味する。Ｂ１は、ＢＵ１にて実行される基本命令が配置されることを意味する。空欄は、基本命令を配置しないことを意味する。 Here, it is assumed that the 22 VLIW instructions shown in FIG. 4 can be executed as a sequence of basic instructions in the VLIW instruction. In FIG. 4, the meanings of the symbols are as follows. I0 means that a basic instruction executed in IU0 is arranged. I1 means that a basic instruction executed by IU1 is arranged. F0 means that a basic instruction executed in FU0 is arranged. F1 means that a basic instruction executed in FU1 is arranged. B0 means that a basic instruction executed in BU0 is arranged. B1 means that a basic instruction executed in BU1 is arranged. A blank means that a basic instruction is not arranged.

命令発行部は、命令レジスタから読み込んだ命令を対応する機能ユニットであるＩＵ、ＦＵ、ＢＵへ供給する。最大４命令を同時実行可能で、６つの機能ユニットうち最大４つの機能ユニットに命令を供給する。 The instruction issuing unit supplies the instruction read from the instruction register to the corresponding functional units IU, FU, and BU. Up to 4 instructions can be executed simultaneously, and instructions are supplied to up to 4 functional units out of 6 functional units.

命令スロット０に保持されている基本命令はＩＵ０、ＦＵ０、ＢＵ０へ供給可能である。命令スロット１に保持されている基本命令はＦＵ０、ＩＵ１、ＦＵ１、ＢＵ０、ＢＵ１へ供給可能である。命令スロット２に保持されている基本命令はＩＵ１、ＦＵ１、ＢＵ０、ＢＵ１へ供給可能である。命令スロット３に保持されている基本命令はＦＵ１、ＢＵ０、ＢＵ１へ供給可能である。また、このプロセッサで許されているＶＬＩＷ命令内の基本命令の並びは、図４の通りとする。 Basic instructions held in the instruction slot 0 can be supplied to the IU0, FU0, and BU0. The basic instruction held in the instruction slot 1 can be supplied to FU0, IU1, FU1, BU0, and BU1. The basic instruction held in the instruction slot 2 can be supplied to IU1, FU1, BU0, and BU1. The basic instruction held in the instruction slot 3 can be supplied to FU1, BU0, and BU1. In addition, the arrangement of basic instructions in the VLIW instruction allowed in this processor is as shown in FIG.

このようにして命令の順序が確定している場合は、命令発行部２２３は、各命令を全機能ユニットに供給可能とする必要はなく、所定の機能ユニットに格納可能なように構成することができる。 When the order of the instructions is determined in this way, the instruction issuing unit 223 does not need to be able to supply each instruction to all the functional units but can be configured to be stored in a predetermined functional unit. it can.

特開２００１−１００９９７JP2001-100997 特開２００２−３２３９８２JP 2002-323882

しかしながら、特許文献２に記載の技術は新規プロセッサの場合には問題がないが、過去のプロセッサの延長、すなわち互換性が必要なプロセッサの場合には過去のソフトウェア資産の流用の点で問題がある。具体例を以下に示す。 However, the technique described in Patent Document 2 has no problem in the case of a new processor, but there is a problem in terms of diversion of past software assets in the case of an extension of a past processor, that is, a processor that requires compatibility. . Specific examples are shown below.

特許文献２に記載の方法は、近年の性能向上のために命令並列度が高くなった際の問題解決のための手法である。言い換えればそれ以前のプロセッサでは要求性能、チップ面積の関係でそれほど命令並列度は高くなく、要求される技術ではなかったと言える。例えば前世代プロセッサでは命令並列度が４並列であったとする。４並列程度では命令発行回路の規模、周波数の関係で命令順が自由のケースも十分考えられる。その場合、命令レジスタ内にＡＸ→ＭＸ→ＤＸの順で命令が並んでいることも在りうる。この様な場合、ＭＸは命令１スロットからしか読み出すことが出来ないので命令２スロットに存在するＭＸ命令が読み出せず、実行出来ないという問題が発生する。 The method described in Patent Document 2 is a technique for solving a problem when the degree of instruction parallelism has increased for the recent performance improvement. In other words, it can be said that the instruction parallelism was not so high in the previous processor because of the required performance and the chip area, and it was not a required technology. For example, in the previous generation processor, it is assumed that the instruction parallelism is 4 parallelism. In the case of about 4 parallels, there may be a case where the instruction order is free due to the size and frequency of the instruction issuing circuit. In that case, it is possible that the instructions are arranged in the order of AX → MX → DX in the instruction register. In such a case, since MX can only read from the instruction 1 slot, the MX instruction existing in the instruction 2 slot cannot be read and cannot be executed.

自社開発のソフトウェアの場合は最悪、コンパイル又はアセンブルし直す事で命令順を並べなおすことが可能であるが、ＩＰベンダーから購入したようなソフトウェアの場合、通常ソースコードは知的財産であり入手できないため、命令順を並べなおす難易度が非常に高くなる。現在、ビデオ、画像等マルチメディア処理に特化したＩＰベンダーによるソフトウェアの流通が活発な状況であり、従来技術では流用できないソフトウェアが大量に発生する可能性がある。 In the case of software developed in-house, it is possible to rearrange the order of instructions by compiling or reassembling. However, in the case of software purchased from an IP vendor, the source code is usually intellectual property and cannot be obtained. For this reason, the difficulty of rearranging the order of instructions becomes very high. At present, software distribution by IP vendors specializing in multimedia processing such as video and images is active, and there is a possibility that a large amount of software that cannot be diverted by conventional technology may occur.

本発明に係るプロセッサは、第１命令乃至第ｎ命令（ｎは自然数）と、第（ｎ＋１）命令乃至第ｍ命令（ｍ＞ｎで２以上の自然数）のうち最大でｍ個の命令を並列実行するプロセッサであって、前記第１乃至第ｍ命令のいずれかが入力され実行する第１乃至第ｍ演算ユニットと、前記第１乃至第ｍ演算ユニットにそれぞれ対応して設けられ、前記第１乃至第ｍ命令のいずれかの命令を選択して当該第１乃至第ｍ演算ユニットに入力する第１乃至第ｍ選択器とを有し、前記第１乃至第ｎ命令は、前記第１乃至ｍ選択器のそれぞれに入力され、前記第（ｎ＋１）乃至第ｍ命令は、それぞれ第（ｎ＋１）乃至第ｍ以降の選択器に入力される、ものである。 The processor according to the present invention parallels a maximum of m instructions from the first instruction to the nth instruction (n is a natural number) and the (n + 1) th instruction to the mth instruction (m> n is a natural number of 2 or more). A processor to be executed, which is provided corresponding to each of the first to m-th arithmetic units to which any one of the first to m-th instructions is input and executed; A first to m-th selector for selecting any of the first to m-th instructions and inputting the selected instruction to the first to m-th arithmetic units, wherein the first to n-th instructions are the first to m-th instructions. The (n + 1) th to mth instructions are input to each of the selectors, and are input to the (n + 1) th to mth and subsequent selectors, respectively.

本発明に係るプロセッサの制御方法は、第１命令乃至第ｎ命令（ｎは自然数）と、第（ｎ＋１）命令乃至第ｍ命令（ｍ＞ｎで２以上の自然数）とのうち最大でｍ個の命令を第１乃至第ｍ演算ユニットにより並列実行するプロセッサの制御方法であって、前記第１乃至第ｎ命令が、前記第１乃至第ｍ演算ユニットのいずれか１つに入力され、前記（ｎ＋１）乃至第ｍ命令は、前記（ｎ＋１）乃至第ｍ以降の演算ユニットに入力される The control method of the processor according to the present invention is a maximum of m instructions from the first instruction to the nth instruction (n is a natural number) and the (n + 1) th instruction to the mth instruction (m> n is a natural number of 2 or more). Are controlled in parallel by the first to m-th arithmetic units, and the first to n-th instructions are input to any one of the first to m-th arithmetic units. The (n + 1) th to mth instructions are input to the (n + 1) th to mth arithmetic units.

本発明においては、第１乃至第ｎ命令は、第１乃至ｍ選択器のそれぞれに入力されるので、第１乃至第ｍ演算ユニットのいずれにおいても演算することができる。一方、ｎ以上の命令がある場合は、各命令は順序どおり入力されるので、第（ｎ＋１）乃至第ｍ命令がそれぞれ第（ｎ＋１）乃至第ｍ以降の選択器に入力されることにより、第（ｎ＋１）乃至第ｍ演算ユニットで演算が可能となる。 In the present invention, since the first to n-th instructions are input to the first to m-th selectors, they can be operated in any of the first to m-th arithmetic units. On the other hand, when there are n or more instructions, the instructions are input in order, so that the (n + 1) th to mth instructions are input to the (n + 1) th to mth selectors, respectively. Operations can be performed by (n + 1) to the m-th operation unit.

本発明によれば、面積削減と動作周波数削減を図ることができるプロセッサ及びその制御方法を提供することができる。 According to the present invention, it is possible to provide a processor capable of reducing the area and operating frequency and a control method thereof.

本発明の実施の形態１にかかるプロセサを示す図である。It is a figure which shows the processor concerning Embodiment 1 of this invention. 本発明の実施の形態２にかかるプロセサを示す図である。It is a figure which shows the processor concerning Embodiment 2 of this invention. 特許文献２に記載のＶＬＩＷプロセッサを示す図である。FIG. 11 is a diagram showing a VLIW processor described in Patent Document 2. 図３に示す可変調ＶＬＩＷの実行可能なＶＬＩＷ内の基本命令の並びを示す図である。It is a figure which shows the arrangement | sequence of the basic command in VLIW which can perform the modulation | alteration VLIW shown in FIG.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。この実施の形態は、本発明を、並列命令を実施するプロセッサに適用したものである。
本発明の実施の形態１. Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In this embodiment, the present invention is applied to a processor that executes parallel instructions.
Embodiment 1 of the present invention.

図１は、本発明の実施１の形態にかかるプロセサ１００を示す図である。プロセッサ１００は、命令読出部１１、命令レジスタ１２、命令発行部１３、８つの演算ユニットからなる演算実行部１４及びメモリ１５を有する。なお、メモリ１５は、プロセッサの外部にあってもよい。 FIG. 1 is a diagram showing a processor 100 according to the first embodiment of the present invention. The processor 100 includes an instruction reading unit 11, an instruction register 12, an instruction issuing unit 13, an operation execution unit 14 including eight operation units, and a memory 15. Note that the memory 15 may be external to the processor.

命令読出部１１は、メモリ１５から複数のメモリを読み出し、命令レジスタ１２に供給する。本例の場合、メモリ１５から読み出した命令のうち最大８つの命令を同時に命令レジスタ１２に供給する。 The instruction reading unit 11 reads a plurality of memories from the memory 15 and supplies them to the instruction register 12. In the case of this example, a maximum of eight instructions among the instructions read from the memory 15 are supplied to the instruction register 12 at the same time.

命令レジスタ１２は、命令１〜命令８の８つの命令を並列に実行するため、８つの命令を保持可能に構成されている。 The instruction register 12 is configured to hold eight instructions in order to execute eight instructions 1 to 8 in parallel.

命令発行部１３は、選択回路１３１乃至１３８を有し、命令レジスタ１２からの命令を、各演算ユニットに振り分ける。 The instruction issuing unit 13 includes selection circuits 131 to 138 and distributes the instruction from the instruction register 12 to each arithmetic unit.

演算ユニットは、ＭＸ演算ユニット１４１、ＭＹ演算ユニット１４２、ＡＸ演算ユニット１４３、ＡＹ演算ユニット１４４、ＤＸ演算ユニット１４５、ＤＹ演算ユニット１４６、ＳＸ演算ユニット１４７及びＳＹ演算ユニット１４８の８つの演算ユニットからなる。 The arithmetic unit is composed of eight arithmetic units: MX arithmetic unit 141, MY arithmetic unit 142, AX arithmetic unit 143, AY arithmetic unit 144, DX arithmetic unit 145, DY arithmetic unit 146, SX arithmetic unit 147, and SY arithmetic unit 148. .

ここで、当該プロセッサは、第１命令乃至第ｎ命令（ｎは自然数）と、第（ｎ＋１）命令乃至第ｍ命令（ｍ＞ｎで２以上の自然数）のうち最大でｍ個の命令をを並列実行するプロセッサとすると、演算ユニットは、第１乃至第ｍ命令のいずれかが入力され実行する演算ユニットである。本例では、ｍ＝８つの演算ユニットの例を示している。ここで、ＭＸ演算ユニット１４１及びＭＹ演算ユニット１４２は、同一の演算ユニットを示し、例えば乗算器である。ＡＸ演算ユニット１４３及びＡＹ演算ユニット１４４は、同一の演算ユニットを示し、例えは算術論理回路である。ＤＸ演算ユニット１４５及びＤＹ演算ユニット１４６は、同一の演算ユニットを示し、例えばデータ・ロードストアである。ＳＸ演算ユニット１４７及びＳＹ演算ユニット１４８は同じ演算ユニットを示し、例えばシステム命令である。 Here, the processor receives at most m instructions from the first instruction to the nth instruction (n is a natural number) and the (n + 1) th instruction to the mth instruction (m> n is a natural number of 2 or more). If the processors are executed in parallel, the arithmetic unit is an arithmetic unit that receives and executes any of the first to m-th instructions. In this example, an example of m = 8 arithmetic units is shown. Here, the MX arithmetic unit 141 and the MY arithmetic unit 142 indicate the same arithmetic unit, and are, for example, multipliers. The AX arithmetic unit 143 and the AY arithmetic unit 144 indicate the same arithmetic unit, for example, an arithmetic logic circuit. The DX arithmetic unit 145 and the DY arithmetic unit 146 indicate the same arithmetic unit, and are, for example, a data load store. The SX arithmetic unit 147 and the SY arithmetic unit 148 indicate the same arithmetic unit, and are system instructions, for example.

なお、以下の説明では、ｎ＝４として説明する。すなわち、第１乃至第４命令は、順位が定められておらず、いずれの命令がいずれの演算ユニットに入力されるか不明である。第５乃至第８命令は、この順序で命令が演算ユニットに入力されるものとする。 In the following description, it is assumed that n = 4. That is, the order of the first to fourth instructions is not determined, and it is unknown which instruction is input to which arithmetic unit. In the fifth to eighth instructions, the instructions are input to the arithmetic unit in this order.

この場合、８つの選択器は、第１乃至第８演算ユニットにそれぞれ対応して設けられ、第１乃至第８命令のいずれかの命令を選択して第１乃至第８演算ユニットに入力するものである。そして、第１乃至第４命令は、第１乃至８選択器のそれぞれに入力され、第５乃至第８命令は、それぞれ第５乃至第８以降の選択器に入力される。 In this case, the eight selectors are provided corresponding to the first to eighth arithmetic units, respectively, and select one of the first to eighth instructions and input it to the first to eighth arithmetic units. It is. The first to fourth instructions are input to the first to eighth selectors, and the fifth to eighth instructions are input to the fifth to eighth selectors, respectively.

ここで、命令２〜４は、全選択回路と接続されるが、命令１は、選択回路１、３、５、７のみに接続される。これは、命令１は、ＭＸ演算ユニット１４１、ＡＸ演算ユニット１４３、ＤＸ演算ユニット１４５、ＳＸ演算ユニット１４７のみに入力されるためであり、ＭＹ演算ユニット１４２、ＡＹ演算ユニット１４４、ＤＹ演算ユニット１４６及びＳＹ演算ユニット１４８は、命令１スロットが存在すると初めて命令を入力される可能性があるためである。 Here, the instructions 2 to 4 are connected to all the selection circuits, but the instruction 1 is connected only to the selection circuits 1, 3, 5, and 7. This is because the instruction 1 is input only to the MX arithmetic unit 141, the AX arithmetic unit 143, the DX arithmetic unit 145, and the SX arithmetic unit 147. The MY arithmetic unit 142, the AY arithmetic unit 144, the DY arithmetic unit 146, and This is because the SY operation unit 148 may receive an instruction for the first time when there is an instruction 1 slot.

本実施の形態においては、前世代プロセッサでの命令並列度に対応する命令スロットまでは命令順が自由のケースでも命令発行ができるよう、命令１〜命令ｎ（ｎは前世代プロセッサの最大命令並列度、本実施の形態においては、ｎ＝４）に関しては全ての演算ユニットの選択回路に接続し、命令ｎ以降の命令に関しては従来技術と同様に各演算ユニットと命令順規則によってその命令が存在しうる命令だけを選択回路に接続する。ただし、ＭＹ／ＡＹ／ＤＹ／ＳＹ演算ユニットに対応する選択回路はＭＸ／ＡＸ／ＤＸ／ＳＸ演算ユニットでの命令の実行が前提となるため、命令１スロットには接続しない。図１に示す図は、命令並列度８、前世代プロセッサの命令並列度４の場合の構成を示している。 In this embodiment, instructions 1 to n (n is the maximum instruction parallel of the previous generation processor) so that instructions can be issued even in the case where the instruction order is free up to the instruction slot corresponding to the instruction parallelism in the previous generation processor. In the present embodiment, n = 4) is connected to the selection circuits of all the arithmetic units, and the instructions after the instruction n are present according to the respective arithmetic units and instruction order rules as in the prior art. Only possible instructions are connected to the selection circuit. However, since the selection circuit corresponding to the MY / AY / DY / SY arithmetic unit is premised on the execution of the instruction in the MX / AX / DX / SX arithmetic unit, it is not connected to the instruction 1 slot. The diagram shown in FIG. 1 shows a configuration when the instruction parallelism is 8 and the instruction parallelism of the previous generation processor is 4.

ここで、前世代プロセッサでの最大命令並列度ｎ＝４であるので、順不同な命令が入力される場合、最大でも命令１〜４までしか使用されず、並列命令数は４以下となる。一方、命令数が５以上となる場合は、命令１〜８は、ＭＸ→ＭＹ→ＡＸ→ＡＹ→ＤＸ→ＤＹ→ＭＸ→ＭＹの順となる。 Here, since the maximum instruction parallelism n = 4 in the previous generation processor, when instructions out of order are input, only the instructions 1 to 4 are used at the maximum, and the number of parallel instructions is 4 or less. On the other hand, when the number of instructions is 5 or more, the instructions 1 to 8 are in the order of MX → MY → AX → AY → DX → DY → MX → MY.

例えば、この図１に示す例では、ＤＸ→ＭＸ→ＡＸ→ＳＸの順で命令が並んでいる場合でも命令１スロットに存在するＤＸ命令を選択回路５を経由してＤＸ演算ユニットに発行できる。また、命令２スロットに存在するＭＸ命令を選択回路１を経由してＭＸ演算ユニットに発行できる。さらに、命令３スロットに存在するＡＸ命令を選択回路３を経由してＡＸ演算ユニットに発行できる。さらにまた、命令４スロットに存在するＳＸ命令を選択回路７を経由してＳＸ演算ユニットに発行できる。 For example, in the example shown in FIG. 1, even when instructions are arranged in the order of DX → MX → AX → SX, the DX instruction existing in the instruction 1 slot can be issued to the DX arithmetic unit via the selection circuit 5. Further, the MX instruction existing in the instruction 2 slot can be issued to the MX arithmetic unit via the selection circuit 1. Further, the AX instruction existing in the instruction 3 slot can be issued to the AX arithmetic unit via the selection circuit 3. Furthermore, the SX instruction existing in the instruction 4 slot can be issued to the SX arithmetic unit via the selection circuit 7.

本実施の形態においては、命令２スロット〜命令４スロットはすべての演算ユニットに、命令１スロットはＭＸ／ＡＸ／ＤＸ／ＳＸ演算ユニットに選択回路１〜選択回路８を介して接続されているため、前世代プロセッサの最大並列度に対応する命令１スロット〜命令４スロットにどのような順番で命令が存在しても正しく命令を発行することができる。一方、命令５スロット〜命令８スロットに関しては前世代プロセッサでは使われることがないスロットであり、新プロセッサのみが使用するため、命令数が５以上の場合は、命令順を固定することで面積削減、動作周波数向上を図ることができる。 In this embodiment, the instruction 2 slot to the instruction 4 slot are connected to all the arithmetic units, and the instruction 1 slot is connected to the MX / AX / DX / SX arithmetic unit via the selection circuit 1 to the selection circuit 8. The instructions can be issued correctly regardless of the order in which they exist in the instruction 1 slot to the instruction 4 slot corresponding to the maximum parallelism of the previous generation processor. On the other hand, the instruction 5 slot to the instruction 8 slot are slots that are not used in the previous generation processor and are used only by the new processor. Therefore, when the number of instructions is 5 or more, the area is reduced by fixing the instruction order. The operating frequency can be improved.

すなわち、命令５〜８スロットは、ＤＸ演算ユニット１４５、ＤＹ演算ユニット１４６、ＳＸ演算ユニット１４７、ＳＹ演算ユニット１４８に順に入力されるため、命令５スロットは、選択回路５以降の選択回路５〜８、命令６スロットは、選択回路６以降の選択回路６〜８、命令７スロットは、選択回路７以降の選択回路７〜８、命令８スロットは、選択回路８にのみ接続されていればよい。 That is, since the instruction 5 to 8 slots are sequentially input to the DX operation unit 145, the DY operation unit 146, the SX operation unit 147, and the SY operation unit 148, the instruction 5 slot corresponds to the selection circuits 5 to 8 after the selection circuit 5. The instruction 6 slot may be connected to the selection circuits 6 to 8 after the selection circuit 6, the instruction 7 slot may be connected to the selection circuits 7 to 8 after the selection circuit 7, and the instruction 8 slot only to the selection circuit 8.

本実施の形態においては、従来例同様に２to1選択回路ベースで面積を算出すると、６×２＋４×２＋３×２＋２×２＝３０で２to1選択回路３０個相当となる。従来例の２to1選択回路２４個相当よりは面積は大きくなるが、性能向上のため、命令並列度を拡張した場合でも、前世代プロセッサとのバイナリーレベルでの互換性を維持しつつ面積削減と動作周波数削減を図ることができる。 In the present embodiment, when the area is calculated on the basis of the 2to1 selection circuit as in the conventional example, 6 × 2 + 4 × 2 + 3 × 2 + 2 × 2 = 30, which corresponds to 30 2to1 selection circuits. Although the area is larger than the equivalent of 24 conventional 2to1 selection circuits, even if the instruction parallelism is expanded to improve performance, area reduction and operation are maintained while maintaining binary compatibility with previous generation processors. Frequency reduction can be achieved.

また、遅延の大きなパスと小さなパスが明確になる事で論理合成時の最適化が進みやすくなる事や、面積が小さくなる事で配線長の削減、迂回配線の削減等の効果により動作周波数が向上する。
本発明の実施の形態２. In addition, by clarifying paths with large delays and paths with small delays, the optimization of logic synthesis is easy to proceed, and by reducing the area, the operating frequency is reduced due to effects such as reduced wiring length and reduced bypass wiring. improves.
Embodiment 2 of the present invention.

次に、本発明の実施の形態２について説明する。本実施の形態においては、命令発行部１６及び演算実行部１７の構成が実施の形態１と異なる。すなわち、演算ユニットを３種類とし、ＭＸ、ＭＹ、ＭＺは同一の演算ユニット、ＡＸは、１つの演算ユニットのみ、ＤＸ、ＤＹ、ＤＺ１、ＤＺ２は、同一の演算ユニットとする。この場合の命令発行部１６の接続は図２のようになる。図２は、本実施の形態にかかるプロセッサを示す図である。 Next, a second embodiment of the present invention will be described. In the present embodiment, the configuration of the instruction issuing unit 16 and the operation executing unit 17 is different from that of the first embodiment. That is, there are three types of arithmetic units, MX, MY, and MZ are the same arithmetic unit, AX is only one arithmetic unit, and DX, DY, DZ1, and DZ2 are the same arithmetic unit. The connection of the instruction issuing unit 16 in this case is as shown in FIG. FIG. 2 is a diagram illustrating the processor according to the present embodiment.

実施の形態１と同様、演算ユニットの個数ｍ＝８、前世代プロセッサの最大命令並列度ｎ＝４とする。この場合、命令２〜４スロットは、全ての選択回路１〜８に接続される。一方、命令１スロットは、ＭＸ演算ユニット１７１、ＡＹ演算ユニット１７４、ＤＸ演算ユニット１７５と接続される。上述したように、ＭＹ演算ユニット１７２やＤＹ演算ユニット１７６は、ＭＸ演算ユニット１７１やＤＸ演算ユニット１７５の命令が存在する場合に命令が存在するため、命令１スロットは、選択回路２、３、６〜８に入力する必要がない。 As in the first embodiment, the number m of arithmetic units is 8, and the maximum instruction parallelism n of the previous generation processor is 4. In this case, the instruction 2-4 slots are connected to all the selection circuits 1-8. On the other hand, the instruction 1 slot is connected to the MX arithmetic unit 171, the AY arithmetic unit 174, and the DX arithmetic unit 175. As described above, since the MY arithmetic unit 172 and the DY arithmetic unit 176 have an instruction when the instruction of the MX arithmetic unit 171 and the DX arithmetic unit 175 exists, the instruction 1 slot includes the selection circuits 2, 3, 6. There is no need to enter ~ 8.

本実施の形態においても、例えば、ＤＸ→ＭＸ→ＭＹ→ＡＸのような命令が発行されたとしても、命令１スロットに存在するＤＸ命令は、選択回路５を経由してＤＸ演算ユニット１７５に発行される。また、命令２スロットに存在するＭＸ命令は、選択回路１を経由してＭＸ演算ユニット１７１に発行される。更に、命令３スロットに存在するＭＹ命令は、選択回路２を経由してＭＹ演算ユニット１７２に発行され、命令４スロットに存在するＡＸ命令は、選択回路４を経由してＡＹ演算ユニット１７４に発行される。 Also in this embodiment, for example, even if an instruction such as DX → MX → MY → AX is issued, the DX instruction existing in the instruction 1 slot is issued to the DX arithmetic unit 175 via the selection circuit 5. Is done. The MX instruction existing in the instruction 2 slot is issued to the MX arithmetic unit 171 via the selection circuit 1. Further, the MY instruction existing in the instruction 3 slot is issued to the MY arithmetic unit 172 via the selection circuit 2, and the AX instruction existing in the instruction 4 slot is issued to the AY arithmetic unit 174 via the selection circuit 4. Is done.

また、命令数が５つ以上の場合は、命令は、ＭＸ→ＭＹ→ＭＺ→ＡＸ→ＤＸ→ＤＹ→ＤＺ１→ＤＺ２の順に入力されるため、命令５スロットは、選択回路５以降の選択回路５〜８、命令６スロットは、選択回路６以降の選択回路６〜８、命令７スロットは、選択回路７以降の選択回路７〜８、命令８スロットは、選択回路８にのみ接続されていればよい。 When the number of instructions is five or more, the instructions are input in the order of MX → MY → MZ → AX → DX → DY → DZ1 → DZ2, and therefore the instruction 5 slot has a selection circuit 5 after the selection circuit 5. If the instruction 6 slot is connected to the selection circuit 6 or later, the instruction 7 slot is connected to the selection circuit 7 or later, and the instruction 8 slot is connected to the selection circuit 8 only. Good.

本実施の形態においても、実施の形態と同様の効果を奏する。従来例同様に２to1選択回路ベースで面積を算出すると、５×２＋４×２＋３×２＋２×２＝２０で２to1選択回路２８個相当となる。性能向上のため、命令並列度を拡張した場合でも、前世代プロセッサとのバイナリーレベルでの互換性を維持しつつ面積削減と動作周波数削減を図ることができる。 Also in this embodiment, the same effects as in the embodiment can be obtained. When the area is calculated on the basis of the 2to1 selection circuit as in the conventional example, 5 × 2 + 4 × 2 + 3 × 2 + 2 × 2 = 20 corresponds to 28 2to1 selection circuits. Even when the instruction parallelism is expanded to improve performance, it is possible to reduce the area and operating frequency while maintaining binary compatibility with the previous generation processor.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

１１命令読出部
１２命令レジスタ
１３、１６命令発行部
１４、１７演算実行部
１５メモリ
１４１ＭＸ演算ユニット
１４２ＭＹ演算ユニット
１４３ＡＸ演算ユニット
１４４ＡＹ演算ユニット
１４５ＤＸ演算ユニット
１４６ＤＹ演算ユニット
１４７ＳＸ演算ユニット
１４８ＳＹ演算ユニット
１７１ＭＸ演算ユニット
１７２ＭＹ演算ユニット
１７３ＭＺ演算ユニット
１７４ＡＸ演算ユニット
１７５ＤＸ演算ユニット
１７６ＤＹ演算ユニット
１７７ＤＺ１演算ユニット
１７８ＤＺ２演算ユニット
２２０メモリ
２２１命令読出部
２２２命令レジスタ
２２３命令発行部
２２４命令実行部
ＰＣレジスタ
ＧＲレジスタ
ＦＲレジスタ DESCRIPTION OF SYMBOLS 11 Instruction reading part 12 Instruction register 13, 16 Instruction issuing part 14, 17 Operation execution part 15 Memory 141 MX operation unit 142 MY operation unit 143 AX operation unit 144 AY operation unit 145 DX operation unit 146 DY operation unit 147 SX operation unit 148 SY operation unit 171 MX operation unit 172 MY operation unit 173 MZ operation unit 174 AX operation unit 175 DX operation unit 176 DY operation unit 177 DZ1 operation unit 178 DZ2 operation unit 220 memory 221 instruction read unit 222 instruction issue unit 224 instruction Execution unit PC register GR register FR register

Claims

A processor that executes at most m instructions in parallel among a first instruction to an nth instruction (n is a natural number) and an (n + 1) th instruction to an mth instruction (m> n is a natural number of 2 or more),
First to m-th arithmetic units that receive and execute any of the first to m-th instructions;
First to m-th selectors provided corresponding to the first to m-th arithmetic units, respectively, for selecting any one of the first to m-th instructions and inputting the selected instruction to the first to m-th arithmetic units. And
The first to nth instructions are input to each of the first to m selectors,
The (n + 1) th to mth instructions are input to the (n + 1) th to mth and subsequent selectors, respectively.

When the 1 to m arithmetic units are composed of s (s is a natural number) groups of the same arithmetic unit for every k (k is an arbitrary variable), the first instruction is the first arithmetic unit of each group. The processor according to claim 1, wherein the processor is input only to a selector corresponding to.

3. The processor according to claim 1, wherein the first to nth instructions are instructions whose order is not defined, and the (n + 1) th to mth instructions are instructions whose order is defined.

The processor according to claim 1, wherein n represents a maximum instruction parallelism of a previous generation processor.

5. The processor according to claim 1, wherein the processor is a VLIW (Very Long Instruction Word) type processor.

The first to m-th arithmetic units receive a maximum of m instructions from the first instruction to the n-th instruction (n is a natural number) and the (n + 1) -th instruction to the m-th instruction (m> n is a natural number of 2 or more). A method for controlling a processor that executes in parallel,
The first to n-th instructions are input to any one of the first to m-th arithmetic units;
The processor control method, wherein the (n + 1) through m-th instructions are input to the (n + 1) through m-th and subsequent arithmetic units.

When the 1 to m arithmetic units are composed of s (s is a natural number) groups of the same arithmetic unit for every k (k is an arbitrary variable), the first instruction is the first arithmetic unit of each group. The processor according to claim 6, wherein the processor is input only to a selector corresponding to.

The processor according to claim 6 or 7, wherein the first to nth instructions are instructions whose order is not defined, and the (n + 1) th to mth instructions are instructions whose order is defined.

The processor according to any one of claims 6 to 8, wherein n represents a maximum instruction parallelism of a previous generation processor.