JP2013054625A

JP2013054625A - Information processor and information processing method

Info

Publication number: JP2013054625A
Application number: JP2011193582A
Authority: JP
Inventors: Masuzo Takemoto; 益三嵩本; Hisashi Furukawa; 久史古川
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2011-09-06
Filing date: 2011-09-06
Publication date: 2013-03-21

Abstract

PROBLEM TO BE SOLVED: To provide an information processor which can share a result of an operation in each processor effectively.SOLUTION: An information processor comprises: program storage means 11 to store a program; operation result storage means 45 to store a result of an operation in each core; command readout means 21 to readout an instruction set of command as many as cores from the program storage means in execution order and to store the instruction set in a primary storage unit; command distribution means 32 to store the command included in the instruction set in a command queue of each core; command dependencies determination means 33 to determine whether the command whose operation target is the result of an operation that the command included in a first instruction set stored in the primary storage unit is operated by a first core is included in a second instruction set executed later than the first instruction set, and whether a second core different from the first core executes the command; and a copying means 34 to copy the value of the operation result storage means of the first core in the operation result storage means of the second core when the command dependencies determination means determines that the second core execute the command included in the second instruction set.

Description

本発明は、複数のコアが並列に命令を実行する情報処理装置に関する。 The present invention relates to an information processing apparatus in which a plurality of cores execute instructions in parallel.

プロセッサの処理速度を向上させるアプローチとして動作周波数の高速化だけでなく、複数のプロセッサを搭載するマルチプロセッサ化やプロセッサに複数のＣＰＵコアを搭載するマルチコア化が知られている。マルチプロセッサやマルチコアでは、処理の並列度が高いほど処理効率が向上するため、並列度が増すように処理を分散させることが重要になる。従来から、並列度を向上させる技術が提案されている（例えば、特許文献１参照。）。特許文献１には、タスク単位で複数のプロセッサにタスクを割り当て、実行可能状態にあるタスクを、いずれのタスクも実行していないプロセッサに割り当てるマルチプロセッサが開示されている。 As an approach for improving the processing speed of a processor, not only an increase in operating frequency but also a multiprocessor with a plurality of processors and a multicore with a plurality of CPU cores in a processor are known. In multiprocessors and multicores, processing efficiency improves as the degree of parallelism of processing increases. Therefore, it is important to distribute processing so that the degree of parallelism increases. Conventionally, a technique for improving the degree of parallelism has been proposed (see, for example, Patent Document 1). Patent Document 1 discloses a multiprocessor in which tasks are assigned to a plurality of processors in units of tasks, and tasks that are in an executable state are assigned to processors that are not executing any task.

しかしながら、従来の技術ではタスク単位で処理を実行するため、タスク処理間のバス結合、共有メモリアクセスなどにより実際には並列動作の実現が難しく、マルチコアの性能を最大限引き出すことは困難である。また、余力のあるプロセッサにタスクを割り当てるためには、プロセッサの処理負荷を監視するＯＳの負荷が増大し、また、割り当てのための制御も複雑になる。 However, in the conventional technique, processing is executed in units of tasks, so it is actually difficult to realize parallel operation due to bus connection between task processes, shared memory access, and the like, and it is difficult to maximize the performance of multicore. In addition, in order to assign a task to a processor having a surplus capacity, the load on the OS for monitoring the processing load of the processor increases, and the control for assignment becomes complicated.

また、あるプロセッサが別の複数のプロセッサを制御して並列に処理を実行する技術が提案されている（例えば、特許文献２参照。）。特許文献２には、１つのＲＩＳＣ型プロセッサが複数のＶＬＩＷプロセッサを制御するマイクロプロセッサシステムが開示されている。ＲＩＳＣ型プロセッサは、命令を解読してＶＬＩＷプロセッサに、制御信号、動作クロック周波数、電源電圧を指示する。各ＶＬＩＷプロセッサはそれぞれアプリケーションプログラムを実行する。 In addition, a technique has been proposed in which a certain processor controls a plurality of other processors to execute processing in parallel (see, for example, Patent Document 2). Patent Document 2 discloses a microprocessor system in which one RISC type processor controls a plurality of VLIW processors. The RISC processor decodes the instruction and instructs the VLIW processor about the control signal, the operation clock frequency, and the power supply voltage. Each VLIW processor executes an application program.

また、タスク単位でなく命令単位で処理を分散させる技術も提案されている（例えば、特許文献３参照。）。特許文献３には、命令バッファに格納された命令の依存関係をチェックし、依存関係のない命令を複数のプロセッサに発行し、依存関係がある場合、一方の命令の実行ユニットの実行サイクルを遅延させるスーパースカラプロセッサが開示されている。 In addition, a technique for distributing processing in units of instructions instead of in units of tasks has been proposed (see, for example, Patent Document 3). In Patent Document 3, the dependency relationship of instructions stored in the instruction buffer is checked, an instruction having no dependency relationship is issued to a plurality of processors, and if there is a dependency relationship, the execution cycle of the execution unit of one instruction is delayed A superscalar processor is disclosed.

特開２００８−１４６５０３号公報JP 2008-146503 A 特開２００２−０３２２１８号公報JP 2002-032218 A 特開２０１１−１２８６７２号公報JP 2011-128672 A

しかしながら、特許文献２に記載された技術は、各プロセッサ間でデータを利用する必要がある場合について考慮されていないという問題がある。特許文献２では、複数のＶＬＩＷプロセッサのうちの１つが行った演算結果を他のプロセッサが他の演算で用いる際に、他のプロセッサのレジスタに演算結果が格納されていないので、そのための処理が必要になる。すなわち、他のプロセッサは演算結果を持っているプロセッサから演算結果を取得する必要がある。これは、１つのプロセッサで逐次処理を行う処理方法では考慮する必要のない処理である。 However, the technique described in Patent Document 2 has a problem that no consideration is given to the case where data needs to be used between processors. In Patent Document 2, when an operation result performed by one of a plurality of VLIW processors is used in another operation by another processor, the operation result is not stored in the register of the other processor. I need it. In other words, the other processor needs to acquire the operation result from the processor having the operation result. This is processing that does not need to be considered in a processing method in which sequential processing is performed by one processor.

一方、特許文献３では、２つの実行ユニットの演算結果が共通のレジスタファイルに格納されるので、プロセッサ間のデータの交換は可能である。しかしながら、レジスタファイルが共通だとレジスタファイルへの書き込みや読み出しがボトルネックとなるおそれがあり、マルチプロセッサシステムでは各プロセッサが独立のレジスタファイルを有する方が処理効率を向上させやすいといえる。特許文献３では、レジスタファイルが独立の場合にプロセッサ間で演算結果をどのように共有するかについて考慮されていない。 On the other hand, in Patent Document 3, since the operation results of the two execution units are stored in a common register file, data exchange between processors is possible. However, if the register file is common, writing to and reading from the register file may become a bottleneck, and in a multiprocessor system, it can be said that it is easier to improve the processing efficiency if each processor has an independent register file. In Patent Document 3, it is not considered how to share an operation result between processors when the register file is independent.

本発明は、上記課題に鑑み、各プロセッサ間で効率的に演算結果を共有可能な情報処理装置を提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an information processing apparatus that can efficiently share an operation result between processors.

上記課題に鑑み、本発明は、複数のコアが並列に命令を実行する情報処理装置において、コア毎の演算結果を記憶する演算結果記憶手段と、プログラムを記憶するプログラム記憶手段と、コア数分の命令が含まれる命令セットを実行順に前記プログラム記憶手段から読み出し一次記憶部に記憶する命令読み出し手段、命令セットに含まれる命令を各コアの命令キューに記憶する命令配信手段と、前記一次記憶部に記憶された第一の命令セットに含まれる命令を第一のコアが演算した演算結果を、演算対象とする依存命令が、第一の命令セットよりも後に実行される第二の命令セットに含まれ、第一のコアと異なる第二のコアが実行するか否かを判定する命令依存関係判定手段と、前記命令依存関係判定手段が、第二のコアが第二の命令セットに含まれる依存命令を実行すると判定した場合、第一のコアの前記演算結果記憶手段の値を、前記第二のコアの前記演算結果記憶手段に複写する複写手段と、を有する。 In view of the above problems, the present invention provides an information processing apparatus in which a plurality of cores execute instructions in parallel, operation result storage means for storing operation results for each core, program storage means for storing programs, and the number of cores Instruction reading means for reading out the instruction set including the instructions from the program storage means in the execution order and storing it in the primary storage section, instruction distribution means for storing the instructions included in the instruction set in the instruction queue of each core, and the primary storage section The second core is a second instruction set that is executed after the first instruction set. An instruction dependency relationship determining means for determining whether or not a second core different from the first core is executed, and the instruction dependency relationship determining means is configured such that the second core has a second instruction set. When determining to execute the dependent instructions contained, it has a copy unit for the value of the computation result storage means of the first core, copying to the arithmetic result storage means of the second core, the.

各プロセッサ間で効率的に演算結果を共有可能な情報処理装置を提供することができる。 It is possible to provide an information processing apparatus that can efficiently share a calculation result between processors.

本実施形態の概略的な特徴を説明する図の一例である。It is an example of the figure explaining the schematic characteristic of this embodiment. ＥＣＵに搭載されるマルチコアマイコンのハードウェア構成図の一例である。It is an example of the hardware block diagram of the multi-core microcomputer mounted in ECU. ＣＰＵ１〜３による命令の実行を模式的に説明する図の一例である。It is an example of the figure which illustrates execution of the instruction by CPU1-3 typically. 故障が検出された場合のＣＰＵ１〜３による命令の実行を模式的に説明する図の一例である。It is an example of the figure which illustrates typically execution of the command by CPU1-3 when a failure is detected. 動作シーケンス制御回路の動作手順の一例を示す図である。It is a figure which shows an example of the operation | movement procedure of an operation sequence control circuit. 命令依存関係によるＣＰＵのＷａｉｔを説明する図の一例である。It is an example of a diagram for explaining the wait of the CPU due to instruction dependency. 命令依存関係が検出された場合の動作シーケンス制御回路の動作手順の一例を示す図である。It is a figure which shows an example of the operation | movement procedure of an operation | movement sequence control circuit when an instruction dependence relationship is detected. 処理レイテンシとＷａｉｔによる調整を説明する図の一例である。It is an example of the figure explaining adjustment by processing latency and Wait. 各ＣＰＵが処理レイテンシの異なる命令を実行した場合の動作シーケンス制御回路の動作手順の一例を示す。An example of an operation procedure of the operation sequence control circuit when each CPU executes an instruction having a different processing latency is shown. レジスタのリフレッシュを説明する図の一例である。It is an example of the figure explaining refresh of a register. 命令セット間の命令依存関係がある場合の、動作シーケンス制御回路の動作手順の一例を示す図である。It is a figure which shows an example of the operation | movement procedure of an operation | movement sequence control circuit when there exists an instruction dependence relationship between instruction sets.

以下、本発明を実施するための形態について図面を参照しながら実施例を挙げて説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.

図１は、本実施形態の概略的な特徴を説明する図の一例である。マルチコアマイコンは３つのＣＰＵ（ＣＰＵ１〜ＣＰＵ３）を有しているものとする。それぞれが１つのプログラムの連続した命令コード（以下、単に命令という）を分散して実行しており、図１では、以下のような命令を実行している。
ＣＰＵ１：Ａ＋Ｂ→Ｃ
ＣＰＵ２：Ｚ＋Ｄ→Ｅ
ＣＰＵ３：Ｆ＋Ｇ→Ｈ
矢印の先のアルファベットが演算結果であり、この演算結果は各ＣＰＵのレジスタに格納されている。 FIG. 1 is an example of a diagram illustrating schematic features of the present embodiment. The multi-core microcomputer has three CPUs (CPU1 to CPU3). Each of them executes a series of instruction codes (hereinafter simply referred to as instructions) of one program in a distributed manner. In FIG. 1, the following instructions are executed.
CPU1: A + B → C
CPU2: Z + D → E
CPU3: F + G → H
The alphabet at the end of the arrow is the calculation result, and this calculation result is stored in a register of each CPU.

ＣＰＵ１が実行するプログラムの次の命令が下記のような命令であったとする。
Ａ＋Ｅ→Ｉ
被演算子ＡはＣＰＵ１のレジスタ内に記憶されているが、被演算子ＥはＣＰＵ２のレジスタ内に記憶されている。このため、このままではＣＰＵ１は演算を実行できない。そこで、動作シーケンス制御回路がＣＰＵ１のレジスタに、ＣＰＵ２のレジスタに記憶されている演算結果Ｅをコピーする。以下では、この操作をレジスタのリフレッシュという場合がある。 Assume that the next instruction of the program executed by the CPU 1 is the following instruction.
A + E → I
The operand A is stored in the CPU 1 register, while the operand E is stored in the CPU 2 register. For this reason, the CPU 1 cannot perform the operation as it is. Therefore, the operation sequence control circuit copies the operation result E stored in the register of CPU2 to the register of CPU1. Hereinafter, this operation may be referred to as register refresh.

このように、命令単位で処理を分散することで、タスク単位で処理を分散するよりも処理の並列度を向上させることができる。また、処理の並列度を向上させたために、演算結果が各ＣＰＵに分散する状況が生じても、レジスタをリフレッシュすることで、命令の依存性を解消できる。 Thus, by distributing the processing in units of instructions, it is possible to improve the degree of parallelism of processing compared to distributing the processing in units of tasks. In addition, since the degree of parallelism of processing is improved, even if a situation occurs in which operation results are distributed to each CPU, the dependency of instructions can be eliminated by refreshing the registers.

まず、マルチコアマイコンは、一般に、複数のコアがあるにも拘わらず、１つのコアが故障した場合、バスなどを経由し故障コアの影響が正常コアに波及してしまうため、正常コアも含めて全てのコア（すなわちマイコン全体）を停止する必要があると言われている。エンジン制御などではマルチコアマイコンを停止してもユーザに与える影響は少ないが、ＥＰＳ（Electric Power Steering）やＥＣＢ（Electric Control Braking system）などのマルチコアマイコンが急に停止されると、運転者に大きな違和感を感じさせてしまう。このため、車両にはＥＰＳやＥＣＢが故障した時のバックアップ装置を用意しておくことが一般的になっており、機能を冗長して搭載することがコスト増をもたらしている。そこで、マルチコアマイコンは、フォールトトレラントな設計方針（故障しても必要な機能を提供することができる）により設計されることが望まれているが、そのためにはマルチコアマイコンの特性を利用することが有効だと考えられている。 First, multi-core microcomputers generally include the normal cores because if one core fails even though there are multiple cores, the influence of the failed core will propagate to the normal core via the bus etc. It is said that it is necessary to stop all cores (that is, the entire microcomputer). Stopping the multi-core microcomputer in engine control, etc. has little effect on the user, but if the multi-core microcomputer such as EPS (Electric Power Steering) or ECB (Electric Control Braking system) is suddenly stopped, the driver feels awkward Makes you feel. For this reason, it is common to prepare a backup device in the case of an EPS or ECB failure in the vehicle, and redundant installation of functions brings an increase in cost. Therefore, it is desired that multi-core microcomputers are designed based on a fault-tolerant design policy (which can provide necessary functions even if a failure occurs). To do so, it is necessary to use the characteristics of multi-core microcomputers. It is considered effective.

なお、従来から、複数コアで同じ処理を実行するロックステップ機能が搭載されたマルチコアマイコンが知られている。しかしロックステップ機能は信頼性や故障検出率の向上のための技術であり、複数コアで１つの演算結果しか得られず、マルチコアによる性能向上は期待できない。 Conventionally, a multi-core microcomputer equipped with a lock step function for executing the same processing with a plurality of cores is known. However, the lockstep function is a technique for improving the reliability and the failure detection rate, and only one calculation result can be obtained with a plurality of cores.

本実施例では、マルチコアマイコンの特性を利用して、故障時に故障したコアのみを停止させ、正常コアによる演算を継続することでロバスト性を向上させるマルチコアマイコンについて説明する。 In the present embodiment, a multi-core microcomputer that improves the robustness by stopping only the core that has failed at the time of failure and continuing the calculation by the normal core using the characteristics of the multi-core microcomputer will be described.

図２は、ＥＣＵに搭載されるマルチコアマイコン１００のハードウェア構成図の一例を示す。このマルチコアマイコン１００は、上述したエンジンＥＣＵ、ＥＰＳ‐ＥＣＵ、ＥＣＢ‐ＥＣＵだけでなく、ＨＶ（ハイブリッド）‐ＥＣＵ、ゲートウェイＥＣＵなど、種々のものに搭載されうる。また、複数のＥＣＵの機能が統合された統合ＥＣＵに搭載されてもよい。 FIG. 2 shows an example of a hardware configuration diagram of the multi-core microcomputer 100 mounted on the ECU. The multi-core microcomputer 100 can be mounted not only on the engine ECU, EPS-ECU and ECB-ECU described above but also on various things such as an HV (hybrid) -ECU and a gateway ECU. Moreover, you may mount in integrated ECU in which the function of several ECU was integrated.

マルチコアマイコン１００は、命令バス１２に接続されたフラッシュＲＯＭ１１及び動作シーケンス制御回路１３、データバス１５に接続された複数のＣＰＵ１４（区別する場合、ＣＰＵ１〜ｎという）、故障診断装置１６、Ｉ／Ｏ１７、ＲＡＭ１８、ＤＭＡＣ１９、及び、ＩＮＴＣ２０、を有する。 The multi-core microcomputer 100 includes a flash ROM 11 and an operation sequence control circuit 13 connected to the instruction bus 12, a plurality of CPUs 14 (referred to as CPU 1 to n when distinguished), a failure diagnosis device 16, and an I / O 17 connected to the data bus 15. , RAM 18, DMAC 19, and INTC 20.

マルチコアマイコン１００は、複数のＣＰＵ１〜ｎを有する。したがって、このマルチコアマイコン１００のプロセッサはマルチＣＰＵと呼ばれることがあるが、マイコン自体が１チップに実装されることも多く、マルチコアと区別する意義は小さい。コアが複数個、搭載されていれば、マルチＣＰＵ又はマルチコアのいずれと呼ばれているプロセッサでも、本実施例のマルチコアマイコン１００に適用することができる。また、マルチコアマイコン１００には、同じプロセッサコアを複数搭載するホモジニアスマルチコアと異なる種類のコアを複数搭載するヘテロジニアスマルチコアがあるが、本実施例では各コアが１つのプログラムを命令単位で分散して実行するという点でホモジニアスマルチコアである。しかし、一部のコアが異なる構成を有していてもよい。 The multi-core microcomputer 100 has a plurality of CPUs 1 to n. Therefore, the processor of the multi-core microcomputer 100 is sometimes called a multi-CPU, but the microcomputer itself is often mounted on a single chip, so that it is not meaningful to distinguish from the multi-core. As long as a plurality of cores are mounted, a processor called multi-CPU or multi-core can be applied to the multi-core microcomputer 100 of this embodiment. The multi-core microcomputer 100 includes a homogeneous multi-core having a plurality of the same processor cores and a heterogeneous multi-core having a plurality of different types of cores. In this embodiment, each core distributes one program in units of instructions. It is homogeneous multicore in that it performs. However, some cores may have different configurations.

ＣＰＵ１４は、データバス１５に接続されたレジスタセット４４、ＣＰＵレジスタ４５、レジスタセット４４及びＣＰＵレジスタ４５に接続された演算装置４６、プログラムカウンタ４１、命令キュー４２、並びに、命令デコーダ４３、を有する。レジスタセット４４には、ＲＡＭ１８に記憶されているデータやＩ／Ｏ１７に入力されたセンサの検出値等が、データバス１５を介して読み出される。ＣＰＵレジスタ４５には、演算装置４６の演算結果がライトバックされる。レジスタセット４４及びＣＰＵレジスタ４５は、複数のレジスタの集合である。 The CPU 14 includes a register set 44 connected to the data bus 15, a CPU register 45, a register set 44, an arithmetic device 46 connected to the CPU register 45, a program counter 41, an instruction queue 42, and an instruction decoder 43. Data stored in the RAM 18, sensor detection values input to the I / O 17, and the like are read out to the register set 44 via the data bus 15. The CPU register 45 is written back with the calculation result of the calculation device 46. The register set 44 and the CPU register 45 are a set of a plurality of registers.

プログラムカウンタ４１には、フラッシュＲＯＭ１１に記憶されたＣＰＵが実行する命令のアドレスが記憶される。本実施例では、動作シーケンス制御回路１３が命令を各ＣＰＵ１〜ｎに設定するため、プログラムカウンタ４１はなくてもよい。なお、ＣＰＵ１４は、動作シーケンス制御回路１３に関係なく単独でプログラムを実行することも可能である。この場合、プログラムカウンタ４１はアドレスを命令バス１２に出力して、そのアドレスの命令を命令キュー４２に読み出す。プログラムカウンタ４１は、１つの命令が読み出される毎に、記憶しているアドレスの値を大きくして次の命令の読み出しに備える。 The program counter 41 stores an address of an instruction executed by the CPU stored in the flash ROM 11. In this embodiment, since the operation sequence control circuit 13 sets instructions to each of the CPUs 1 to n, the program counter 41 is not necessary. The CPU 14 can also execute the program independently regardless of the operation sequence control circuit 13. In this case, the program counter 41 outputs the address to the instruction bus 12 and reads the instruction at that address to the instruction queue 42. Each time one instruction is read, the program counter 41 increases the stored address value to prepare for reading the next instruction.

命令キュー４２は、ＦＩＦＯ（First In，First Out）型の記憶手段で、複数の命令（例えば、５〜２０程度）を保持させておくことができる。命令キュー４２には実行順に命令が読み込まれる。命令デコーダ４３は、命令キュー４２の命令をデコードして、演算装置４６や周辺の回路等に、制御線を介して信号を出力する。デコード結果に応じて、例えば、演算装置４６に対し、加算、乗算、減算、シフト、除算等を指示する。また、デコードにより命令が演算対象とするオペランドが特定されるので、デコード結果に応じてレジスタセット４４及びＣＰＵレジスタ４５から演算装置４６にデータが入力される。 The instruction queue 42 is a FIFO (First In, First Out) type storage means and can hold a plurality of instructions (for example, about 5 to 20). Instructions are read into the instruction queue 42 in the order of execution. The instruction decoder 43 decodes the instruction in the instruction queue 42 and outputs a signal to the arithmetic unit 46 and peripheral circuits through a control line. Depending on the decoding result, for example, the arithmetic unit 46 is instructed to perform addition, multiplication, subtraction, shift, division, and the like. Further, since the operand to be operated by the instruction is specified by decoding, data is input from the register set 44 and the CPU register 45 to the arithmetic unit 46 according to the decoding result.

演算装置４６は、ＡＬＵ（Arithmetic and Logic Unit）、ＬＳＵ（Load Store Unit）、ＭＵＬ（乗算器）、及び、ＤＩＶ（除算器）を有する。演算装置４６は、レジスタセット４４，ＣＰＵレジスタ４５から入力されたデータに各種の演算を施す。演算結果は、ＣＰＵレジスタ４５にライトバックされ、再度、演算の対象となったり、ＲＡＭ１８に書き込まれる。 The arithmetic unit 46 includes an ALU (Arithmetic and Logic Unit), an LSU (Load Store Unit), a MUL (multiplier), and a DIV (divider). The arithmetic unit 46 performs various arithmetic operations on the data input from the register set 44 and the CPU register 45. The calculation result is written back to the CPU register 45 and again becomes a calculation target or written to the RAM 18.

演算装置４６は、１つの命令の実行手順をステージ毎に分け、各ステージを同時に実行するパイプライン制御を備えている。ステージの区分数は設計によって様々だが、ＩＦ（Instruction Fetch）、ＩＤ（Instruction Decode）、ＥＸ（Execute）、ＭＡ（Memory Access）、ＷＢ(Write Back)の各ステージを有する。パイプライン制御により、１クロックで１つの命令を実行することができる。 The arithmetic unit 46 includes pipeline control that divides the execution procedure of one instruction for each stage and executes each stage simultaneously. The number of stages varies depending on the design, but each stage includes IF (Instruction Fetch), ID (Instruction Decode), EX (Execute), MA (Memory Access), and WB (Write Back). By pipeline control, one instruction can be executed in one clock.

動作シーケンス制御回路１３は、共通命令キュー２１、ＣＰＵバス制御部２２、及び、命令スケジューラ２３を有する。動作シーケンス制御回路１３は、命令単位で各ＣＰＵを並列動作させる。また、レジスタのリフレッシュが必要か否かを判定し、必要ならばＣＰＵ１〜ｎの間でデータを交換する。 The operation sequence control circuit 13 includes a common instruction queue 21, a CPU bus control unit 22, and an instruction scheduler 23. The operation sequence control circuit 13 operates the CPUs in parallel in units of instructions. Further, it is determined whether or not the register needs to be refreshed. If necessary, data is exchanged between the CPUs 1 to n.

動作シーケンス制御回路１３とＣＰＵ１〜ｎは断接可能に接続されており、動作シーケンス制御回路１３は故障が検出されたＣＰＵ１４を物理的に切断する。この場合、正常なＣＰＵ１４のみでプログラムの実行を継続できる。故障部位を特定できない場合はマイコン全体を停止する。 The operation sequence control circuit 13 and the CPUs 1 to n are connected so as to be connectable and disconnectable, and the operation sequence control circuit 13 physically disconnects the CPU 14 in which a failure is detected. In this case, the execution of the program can be continued only by the normal CPU 14. If the failed part cannot be specified, the entire microcomputer is stopped.

共通命令キュー２１は、フラッシュＲＯＭ１１から、各ＣＰＵ１〜ｎが実行する命令（図では命令コード１〜ｎ）を読み出して記憶する。共通命令キュー２１は、各ＣＰＵのパイプラインステージを監視し、パイプラインステージが進行する毎に、ＣＰＵの数分の命令（ＣＰＵが３つの場合は３つの命令。以下、ＣＰＵの数の命令を「命令セット」という）をフラッシュＲＯＭ１１から読み出す。命令セットの各命令は、命令キュー制御部３２によりＣＰＵ１〜ＣＰＵｎの命令キュー４２にセットされる。 The common instruction queue 21 reads and stores instructions (instruction codes 1 to n in the figure) executed by the CPUs 1 to n from the flash ROM 11. The common instruction queue 21 monitors the pipeline stage of each CPU, and each time the pipeline stage progresses, instructions for the number of CPUs (three instructions when there are three CPUs; hereinafter, instructions for the number of CPUs). Read “instruction set”) from the flash ROM 11. Each instruction in the instruction set is set in the instruction queue 42 of CPU 1 to CPUn by the instruction queue control unit 32.

ＣＰＵバス制御部２２は、パイプライン同期部３１、命令キュー制御部３２、命令依存判定部３３、ＣＰＵレジスタ同期部３４、及び、外部アクセス制御部３５を有する。パイプライン同期部３１は、各ＣＰＵ１〜ｎのパイプラインステージを同期させる。この制御により、ＣＰＵ１がＩＦのステージを実行する場合は、他のＣＰＵ２〜ｎもＩＦを実行し、ＩＤのステージを実行する場合は、他のＣＰＵ２〜ｎもＩＤを実行する等、命令の実行タイミングや実行結果が得られるタイミングを同期させることができる。なお、ＥＸのステージは命令によって消費するクロック数（処理レイテンシ）が異なる。このため、実施例３で説明するように、パイプライン同期部３１は、各ＣＰＵの消費クロックが同一になるように、処理レイテンシの最も大きい命令を実行するＣＰＵが実行完了するまで、他のＣＰＵをＷａｉｔ（ストール）させる。これにより、各ＣＰＵは同じタイミングでＩＦからのステージを同期して実行することができる。 The CPU bus control unit 22 includes a pipeline synchronization unit 31, an instruction queue control unit 32, an instruction dependency determination unit 33, a CPU register synchronization unit 34, and an external access control unit 35. The pipeline synchronization unit 31 synchronizes the pipeline stages of the CPUs 1 to n. With this control, when the CPU 1 executes the IF stage, the other CPUs 2 to n also execute the IF, and when the ID stage is executed, the other CPUs 2 to n also execute the ID. It is possible to synchronize timing and timing at which execution results are obtained. The EX stage differs in the number of clocks (processing latency) consumed by instructions. For this reason, as will be described in the third embodiment, the pipeline synchronization unit 31 is configured so that the CPU executing the instruction with the largest processing latency completes the other CPUs so that the clocks consumed by the CPUs are the same. Wait. Thereby, each CPU can synchronously execute the stage from the IF at the same timing.

命令キュー制御部３２は、共通命令キュー２１から取得した命令セットを命令に分解して、各ＣＰＵ１〜ｎの命令キュー４２に設定する。ＣＰＵ１４の数が３つだとすると、ＣＰＵ１には命令コード１がセットされ、ＣＰＵ２には命令コード２がセットされ、ＣＰＵ３には命令コード３がセットされる。そして、次のサイクルでは、命令キュー制御部３２は、ＣＰＵ１に命令コード４をセットし、ＣＰＵ２に命令コード５をセットし、ＣＰＵ３に命令コード６をセットする。このように、命令の順番と、ＣＰＵへの割り当て順は固定であるが、必ずしも固定である必要はない。 The instruction queue control unit 32 decomposes the instruction set acquired from the common instruction queue 21 into instructions and sets the instruction sets in the instruction queues 42 of the CPUs 1 to n. If the number of CPUs 14 is three, instruction code 1 is set in CPU 1, instruction code 2 is set in CPU 2, and instruction code 3 is set in CPU 3. In the next cycle, the instruction queue control unit 32 sets the instruction code 4 in the CPU 1, sets the instruction code 5 in the CPU 2, and sets the instruction code 6 in the CPU 3. As described above, the order of instructions and the order of assignment to the CPU are fixed, but are not necessarily fixed.

命令依存判定部３３は、各命令がそれよりも前の命令の実行結果を必要とする命令依存関係にある命令か否かを判定する。命令依存判定部３３は、命令依存関係の命令と、該命令が実行結果を必要とする命令をパイプライン同期部３１に通知する。 The instruction dependency determination unit 33 determines whether each instruction is an instruction having an instruction dependency that requires an execution result of an instruction prior to the instruction. The instruction dependency determining unit 33 notifies the pipeline synchronization unit 31 of instructions having an instruction dependency relationship and instructions that require execution results.

ＣＰＵレジスタ同期部３４は、各ＣＰＵ１〜ｎのレジスタセット４４又はＣＰＵレジスタ４５を同じデータにリフレッシュする。すなわち、ＣＰＵ１のレジスタセット４４又はＣＰＵレジスタ４５のデータが、ＣＰＵ２の演算に必要な場合、ＣＰＵレジスタ同期部３４は、ＣＰＵ１のレジスタセット４４又はＣＰＵレジスタ４５のデータをＣＰＵ２のレジスタセット４４又はＣＰＵレジスタ４５にコピーする。 The CPU register synchronization unit 34 refreshes the register set 44 or the CPU register 45 of each CPU 1 to n to the same data. That is, when the data of the register set 44 or CPU register 45 of the CPU 1 is necessary for the calculation of the CPU 2, the CPU register synchronization unit 34 converts the data of the register set 44 or CPU register 45 of the CPU 1 into the register set 44 or CPU register of the CPU 2. Copy to 45.

外部アクセス制御部３５は、各ＣＰＵ１〜ｎからの外部アクセス（バスアクセス）を調停し、１つのＣＰＵにアクセス権を与える。調停方法は、優先度順、ラウンドロビンなど予め定められている。 The external access control unit 35 arbitrates external access (bus access) from each of the CPUs 1 to n and gives an access right to one CPU. Arbitration methods are determined in advance, such as priority order and round robin.

命令スケジューラ２３は、ＩＮＴＣ２０から外部割り込みの通知を取得した場合に、共通命令キュー２１を更新する。すなわち、外部割り込みが発生した場合、ＣＰＵ１４は割込み内容に応じたタスクを実行する必要があるので、該タスクのプログラムが記憶されたアドレスから命令セットを読み出し、共通命令キュー２１にセットする。 The instruction scheduler 23 updates the common instruction queue 21 when receiving a notification of an external interrupt from the INTC 20. That is, when an external interrupt occurs, the CPU 14 needs to execute a task according to the content of the interrupt, so the instruction set is read from the address where the program of the task is stored and set in the common instruction queue 21.

また、命令スケジューラ２３は、分岐命令などに対しても共通命令キュー２１を更新する。すなわち、命令キュー制御部３２は、ＣＰＵ１４が実行した命令が分岐命令である場合には分岐先のアドレスを、比較命令であった場合にはＣＰＵがステータスフラグを参照して決定したアドレスを、命令スケジューラ２３に通知する。命令スケジューラ２３は分岐先のアドレスから、命令セットを読み出し、共通命令キュー２１にセットする。 The instruction scheduler 23 also updates the common instruction queue 21 for branch instructions and the like. That is, the instruction queue control unit 32 determines the branch destination address when the instruction executed by the CPU 14 is a branch instruction, and the address determined by the CPU with reference to the status flag when the instruction is a comparison instruction. The scheduler 23 is notified. The instruction scheduler 23 reads the instruction set from the branch destination address and sets it in the common instruction queue 21.

故障診断装置１６は、例えばＷＤＴが例として挙げられる。ＷＤＴはＣＰＵ１〜ｎ毎にプログラムの実行状態を監視するタイマであり、予め決められた時間内にＣＰＵ１〜ｎがタイマをリセットしないことから、ＣＰＵ１〜ｎの異常を検出する。また、一時的に命令単位の分散実行を停止し、各ＣＰＵ１〜ｎが同じ命令を実行して結果が同じなるか否かに基づき、故障を検出してもよい。また、一時的に命令単位の分散実行を停止し、各ＣＰＵ１〜ｎがアプリケーションとは別に自己診断プログラムを実行し、その値が期待値と一致するか否かに基づき、ＣＰＵ１〜ｎの故障を検出してもよい。 An example of the failure diagnosis apparatus 16 is WDT. The WDT is a timer that monitors the execution state of the program for each of the CPUs 1 to n, and detects an abnormality of the CPUs 1 to n because the CPUs 1 to n do not reset the timer within a predetermined time. Alternatively, the distributed execution in units of instructions may be temporarily stopped, and a failure may be detected based on whether each CPU 1 to n executes the same instruction and the result is the same. Moreover, the distributed execution of the instruction unit is temporarily stopped, and each of the CPUs 1 to n executes a self-diagnosis program separately from the application, and based on whether or not the value matches the expected value, the failure of the CPUs 1 to n is determined. It may be detected.

Ｉ／Ｏ１７には各種のセンサ、ＡＤＣ、ＤＡＣ、ＣＡＮＣ（ＣＡＮ Controller）、アクチュエータやアクチュエータのドライバ回路等が接続されている。ＤＭＡＣ１９は、Ｉ／Ｏ１７から入力されたデータを、ＣＰＵ１〜ｎを介さずにＲＡＭ１８に記録し、ＣＰＵ１〜ｎからの要求を受け付けてＲＡＭ１８の指定アドレスからセンサやアクチュエータにデータを転送する。 Various sensors, ADC, DAC, CANC (CAN Controller), an actuator, a driver circuit of the actuator, and the like are connected to the I / O 17. The DMAC 19 records the data input from the I / O 17 in the RAM 18 without going through the CPUs 1 to n, receives the request from the CPUs 1 to n, and transfers the data from the designated address of the RAM 18 to the sensor or actuator.

ＩＮＴＣ２０は、故障診断装置１６、Ｉ／Ｏ１７などから割込みを受け付けて、動作シーケンス制御回路１３に割り込む。割込みにより、動作シーケンス制御回路１３は各ＣＰＵ１〜ｎが実行する処理を切り換えることができる。 The INTC 20 receives an interrupt from the failure diagnosis device 16, the I / O 17, etc., and interrupts the operation sequence control circuit 13. By the interruption, the operation sequence control circuit 13 can switch the processes executed by the CPUs 1 to n.

〔実行態様〕
図３は、ＣＰＵ１〜３による命令の実行を模式的に説明する図の一例である。縦方向の命令１〜１５は１つのプログラムの実行順である。また、横方向の１〜１０は時間の経過を意味し、例えばクロック数やサイクル数を単位とする。 [Execution mode]
FIG. 3 is an example of a diagram schematically illustrating execution of instructions by the CPUs 1 to 3. The commands 1 to 15 in the vertical direction are the execution order of one program. Moreover, 1 to 10 in the horizontal direction means the passage of time, for example, the number of clocks or the number of cycles.

時間１：ＣＰＵ１が命令キュー４２から命令１をフェッチし、ＣＰＵ２が命令キュー４２から命令２をフェッチし、ＣＰＵ３が命令キュー４２から命令３をフェッチしている。 Time 1: CPU 1 fetches instruction 1 from instruction queue 42, CPU 2 fetches instruction 2 from instruction queue 42, and CPU 3 fetches instruction 3 from instruction queue 42.

時間２：ＣＰＵ１が命令１をデコードし、ＣＰＵ２が命令２をデコードし、ＣＰＵ３が命令３をデコードしている。 Time 2: CPU 1 decodes instruction 1, CPU 2 decodes instruction 2, and CPU 3 decodes instruction 3.

また、パイプライン制御により、ＣＰＵ１が命令キュー４２から命令４をフェッチし、ＣＰＵ２が命令キュー４２から命令５をフェッチし、ＣＰＵ３が命令キュー４２から命令６をフェッチしている。 Also, by pipeline control, the CPU 1 fetches the instruction 4 from the instruction queue 42, the CPU 2 fetches the instruction 5 from the instruction queue 42, and the CPU 3 fetches the instruction 6 from the instruction queue 42.

時間３〜５では、ＣＰＵ１〜３が各ステージを進行させ、時間５ではＩＦ，ＩＤ，ＥＣ，ＭＡ，ＷＢの５つのステージが同時に実行されるようになる。したがって、時間５以降は、１クロック毎にＣＰＵ１〜３の３つの命令の実行結果が得られる。すなわち、時間５〜９の間に、合計１５個の実行結果が得られる。このように、ＣＰＵ１〜３が３命令ずつ並列に命令を実行することで、マルチコアの性能を最大限引き出すことができる。 At times 3-5, the CPUs 1-3 advance each stage, and at time 5, the five stages IF, ID, EC, MA, and WB are executed simultaneously. Therefore, after time 5, the execution results of the three instructions of the CPUs 1 to 3 are obtained every clock. That is, a total of 15 execution results are obtained during the time 5-9. As described above, the CPUs 1 to 3 execute instructions in parallel by three instructions, so that the multicore performance can be maximized.

図４は、故障が検出された場合のＣＰＵ１〜３による命令の実行を模式的に説明する図の一例である。命令１〜６については、図３と同様に正常に実行されている。故障診断装置１６は、故障したＣＰＵを動作シーケンス制御回路１３に通知する。これにより、動作シーケンス制御回路１３は故障が検出されたＣＰＵ１４を物理的に切断することで、命令の実行対象から除外する。 FIG. 4 is an example of a diagram schematically illustrating execution of instructions by the CPUs 1 to 3 when a failure is detected. The instructions 1 to 6 are normally executed as in FIG. The failure diagnosis device 16 notifies the operation sequence control circuit 13 of the failed CPU. As a result, the operation sequence control circuit 13 is physically disconnected from the CPU 14 in which the failure is detected, and is excluded from the instruction execution target.

ＣＰＵ１が故障した場合、動作シーケンス制御回路１３はＣＰＵ２，３のみで命令を分散して実行する。このため、ＣＰＵ１が故障した以降は、１クロック毎に２つの命令の実行結果が得られる。 When the CPU 1 fails, the operation sequence control circuit 13 distributes and executes instructions only with the CPUs 2 and 3. For this reason, after the CPU 1 has failed, execution results of two instructions are obtained every clock.

なお、ＣＰＵ１の故障が検出された時に、すでに命令キュー制御部３２がＣＰＵ１に割り当てた命令はどのＣＰＵも実行しないことになる。このため、命令キュー制御部３２は、ＣＰＵ１が故障するまでに、ＣＰＵ１〜３が実行した最後の命令の次の命令から、ＣＰＵ２，３に再度、割り当てる。例えば、ＣＰＵ１が故障するまでに、ＣＰＵ１〜３が命令６まで実行した場合、命令キュー制御部３２は命令７、８から、ＣＰＵ２，３に割り当てる。これにより、図示するように、命令７，９，１１，１３はＣＰＵ２により、命令８，１０，１２，１４はＣＰＵ３により、並列に実行される。 Note that when a failure of the CPU 1 is detected, any CPU that has already been assigned to the CPU 1 by the instruction queue control unit 32 is not executed. For this reason, the instruction queue control unit 32 assigns again to the CPUs 2 and 3 from the instruction next to the last instruction executed by the CPUs 1 to 3 until the CPU 1 fails. For example, when the CPUs 1 to 3 execute instructions 6 until the CPU 1 breaks down, the instruction queue control unit 32 assigns the instructions 7 and 8 to the CPUs 2 and 3. As a result, the instructions 7, 9, 11, and 13 are executed in parallel by the CPU 2, and the instructions 8, 10, 12, and 14 are executed in parallel by the CPU 3.

したがって、全てのＣＰＵ１４の故障が検出されない限り、処理速度は低下してもマルチコアマイコン１００は動作を継続することができる。 Therefore, as long as no failure of all the CPUs 14 is detected, the multi-core microcomputer 100 can continue the operation even if the processing speed decreases.

図５は、動作シーケンス制御回路１３の動作手順の一例を示す図である。動作シーケンス制御回路１３は例えば、クロック毎に、故障診断装置１６から故障が通知されたか否かを判定する（Ｓ１０）。 FIG. 5 is a diagram illustrating an example of an operation procedure of the operation sequence control circuit 13. For example, the operation sequence control circuit 13 determines whether or not a failure is notified from the failure diagnosis device 16 for each clock (S10).

故障が検出されない場合、命令スケジューラ２３が共通命令キュー２１にＣＰＵ１４の数分の命令セットを読み出す（Ｓ２０）。 If no failure is detected, the instruction scheduler 23 reads the instruction sets for the number of CPUs 14 into the common instruction queue 21 (S20).

一方、故障が検出された場合（Ｓ２０のＹｅｓ）、動作シーケンス制御回路１３は故障したＣＰＵ１４を切断する（Ｓ５０）。また、動作シーケンス制御回路１３は、命令スケジューラ２３及び共通命令キュー２１に、故障していないＣＰＵの数、又は、故障したＣＰＵの数を通知することで、命令セットとして一度に読み出す命令数を調整する。また、動作シーケンス制御回路１３は、命令キュー制御部３２に、切断されていないＣＰＵ１４を通知する。 On the other hand, when a failure is detected (Yes in S20), the operation sequence control circuit 13 disconnects the failed CPU 14 (S50). Further, the operation sequence control circuit 13 notifies the instruction scheduler 23 and the common instruction queue 21 of the number of CPUs that have not failed or the number of failed CPUs, thereby adjusting the number of instructions to be read at a time as an instruction set. To do. Further, the operation sequence control circuit 13 notifies the instruction queue control unit 32 of the CPU 14 that has not been disconnected.

この後の処理は、ＣＰＵ１４の故障の有無に関係なく共通である。すなわち、共通命令キュー２１は命令セットとしてＣＰＵ１４の数分の命令を読み出す（Ｓ２０）。 The subsequent processing is common regardless of whether or not the CPU 14 has failed. That is, the common instruction queue 21 reads instructions for the number of CPUs 14 as an instruction set (S20).

次に、命令キュー制御部３２は共通命令キュー２１から読み出した命令セットの各命令をＣＰＵ１〜ｎに割り当てる（Ｓ３０）。 Next, the instruction queue control unit 32 assigns each instruction of the instruction set read from the common instruction queue 21 to the CPUs 1 to n (S30).

パイプライン同期部３１は、各ＣＰＵ１〜ｎの実行ステージを同期させながら、１ステージ進行させる（Ｓ４０）。以降は、ステップＳ１０からの繰り返しとなる。 The pipeline synchronization unit 31 advances one stage while synchronizing the execution stages of the CPUs 1 to n (S40). Thereafter, the process is repeated from step S10.

本実施例のマルチコアマイコン１００によれば、故障したＣＰＵを物理的に切断可能であり、正常なＣＰＵのみで処理を継続できる。また、動作するＣＰＵ１４の数が変わっても、命令セットに含まれる命令の数と、命令を割り当て先のＣＰＵを調整すれば演算を継続できる。したがって、タスク単位で処理を分散する場合のように、タスクの入れ替え（コンテキストスイッチ）などの複雑な制御が不要である。 According to the multi-core microcomputer 100 of the present embodiment, a failed CPU can be physically disconnected, and processing can be continued only with a normal CPU. Even if the number of operating CPUs 14 changes, the calculation can be continued by adjusting the number of instructions included in the instruction set and the CPU to which the instructions are assigned. Therefore, complicated control such as task replacement (context switch) is not required as in the case where processing is distributed in units of tasks.

また、本実施例のマルチコアマイコン１００は、命令単位で処理を分散できるので、タスク単位の処理の分散では並列動作の実現が難しいのに対し、マルチコアの性能を最大限引き出すことができる。また、命令単位で分散できるので、すでに使用されているシングルコアのプログラムを流用しやすい。 In addition, since the multi-core microcomputer 100 according to the present embodiment can distribute the processing in units of instructions, it is difficult to realize parallel operation by distributing the processing in units of tasks, but can maximize the performance of the multi-core. In addition, since it can be distributed in units of instructions, it is easy to divert already used single-core programs.

命令単位で処理を分散した場合、命令の依存関係により、各ＣＰＵが並列に命令を実行できない場合がある。例えば、命令１〜３が次のような場合、命令２は命令１との間に命令依存関係を有しているため、並列に実行できない。
命令１：Ａ＋Ｂ→Ｃ
命令２：Ｃ＋Ｄ→Ｅ
命令３：Ｆ＋Ｇ→Ｈ
この命令１〜３をＣＰＵ１〜３がそれぞれ実行する場合、ＣＰＵ２が命令２を実行するには、ＣＰＵ１の命令１の実行結果が必要である。このため、ＣＰＵ１とＣＰＵ２は、並列に処理を実行できない。 When processing is distributed in units of instructions, each CPU may not be able to execute instructions in parallel due to instruction dependency. For example, when the instructions 1 to 3 are as follows, the instruction 2 has an instruction dependency with the instruction 1 and cannot be executed in parallel.
Command 1: A + B → C
Command 2: C + D → E
Command 3: F + G → H
When the CPUs 1 to 3 execute the instructions 1 to 3, respectively, the execution result of the instruction 1 of the CPU 1 is necessary for the CPU 2 to execute the instruction 2. For this reason, CPU1 and CPU2 cannot perform a process in parallel.

本実施例では、このような命令依存関係を命令依存判定部３３が検出し、パイプライン同期部３１が命令依存関係の命令を実行するＣＰＵ１４をWaitさせるマルチコアマイコン１００について説明する。 In the present embodiment, a description will be given of the multi-core microcomputer 100 in which the instruction dependency determination unit 33 detects such an instruction dependency and the pipeline synchronization unit 31 waits for the CPU 14 that executes the instruction dependency instruction.

命令依存判定部３３は、同時に実行される命令セットに含まれる命令１〜３に命令依存関係があるか否かを判定する。判定方法は、例えば以下のようになる。命令依存判定部３３は、１つの命令セットの中で、命令順に１つの命令に着目し演算結果が格納されるレジスタ名又は変数名（アドレス番地）を特定する。そして、このレジスタ名又は変数名が、後の命令の演算対象のオペランドに記述されているか否かを判定する。記述されていれば、後の命令には命令依存関係があることになる。上記の例では、命令１の演算結果が格納されるレジスタ名が“Ｃ”であり、“Ｃ”というレジスタ名が、後の命令２の演算対象のオペランド（Ｃ＋Ｄの“Ｃ”）に記述されている。よって、命令２は命令１に対し命令依存関係がある。以下、命令２を「命令依存関係がある命令」、命令１を「先行命令」と称す。 The instruction dependency determining unit 33 determines whether or not the instructions 1 to 3 included in the instruction set executed at the same time have an instruction dependency. The determination method is as follows, for example. The instruction dependency determining unit 33 focuses on one instruction in the order of instructions in one instruction set and specifies a register name or a variable name (address address) in which an operation result is stored. Then, it is determined whether or not this register name or variable name is described in the operand to be operated on in the subsequent instruction. If it is described, the subsequent instruction has an instruction dependency. In the above example, the register name that stores the operation result of the instruction 1 is “C”, and the register name “C” is described in the operand (C + D “C”) that is the operation object of the subsequent instruction 2. ing. Therefore, the instruction 2 has an instruction dependency relationship with the instruction 1. Hereinafter, the instruction 2 is referred to as an “instruction having an instruction dependency” and the instruction 1 is referred to as a “preceding instruction”.

図６は、命令依存関係によるＣＰＵ１４のＷａｉｔを説明する図の一例を、図７は命令依存関係が検出された場合の動作シーケンス制御回路１３の動作手順の一例をそれぞれ示す。ここでは、ステップＳ２５において、命令依存判定部３３が命令２は、命令１との間に命令依存関係があると判定したとする。
(S25‐1) 命令キュー制御部３２は、命令２が実行結果を必要とする先行命令の命令１をＣＰＵ１に割り当てる。こうすることで、ＣＰＵ１は命令２よりも先に命令１を実行完了できる。
(S25‐2) 命令キュー制御部３２は、ＣＰＵ１以外のＣＰＵ２，３をＷａｉｔさせる。
(S25‐3) パイプライン同期部３１は、各ＣＰＵ１〜３を1ステージ実行させる。ＣＰＵ２，３はＷａｉｔしているので何もしない。これにより、ＣＰＵ１だけ先行命令の命令１のステージを進めることができる。
(S25‐4) 命令キュー制御部３２は、命令２〜４を命令セットとして、その中の命令依存関係のある命令２を、先行命令を実行するＣＰＵであるＣＰＵ１に割り当てる。これにより同じＣＰＵ１が命令依存関係のある命令２と先行命令である命令１を実行できる。 FIG. 6 is an example of a diagram for explaining the wait of the CPU 14 by the instruction dependency relationship, and FIG. 7 shows an example of an operation procedure of the operation sequence control circuit 13 when the instruction dependency relationship is detected. Here, in step S25, it is assumed that the instruction dependency determining unit 33 determines that the instruction 2 has an instruction dependency relationship with the instruction 1.
(S25-1) The instruction queue control unit 32 assigns the instruction 1 of the preceding instruction that requires the execution result of the instruction 2 to the CPU 1. By doing so, the CPU 1 can complete the execution of the instruction 1 before the instruction 2.
(S25-2) The instruction queue control unit 32 causes the CPUs 2 and 3 other than the CPU 1 to wait.
(S25-3) The pipeline synchronization unit 31 causes the CPUs 1 to 3 to execute one stage. The CPUs 2 and 3 do not do anything because they are waiting. Thereby, only the CPU 1 can advance the stage of the instruction 1 of the preceding instruction.
(S25-4) The instruction queue control unit 32 uses the instructions 2 to 4 as an instruction set, and assigns the instruction 2 having the instruction dependency among them to the CPU 1 that is the CPU that executes the preceding instruction. As a result, the same CPU 1 can execute the instruction 2 having the instruction dependency and the instruction 1 which is the preceding instruction.

また、命令キュー制御部３２は、命令セットのうち残りの命令３，４をＣＰＵ２，３に割り当てる。すなわち、ＣＰＵ２，３は命令２，３ではなく、命令３，４を実行する。 The instruction queue control unit 32 assigns the remaining instructions 3 and 4 in the instruction set to the CPUs 2 and 3. That is, the CPUs 2 and 3 execute the instructions 3 and 4 instead of the instructions 2 and 3.

以降の処理は、実施例１と同様であり、パイプライン同期部３１は、各ＣＰＵ１〜３の実行ステージを一致させながら１ステージ実行させる（Ｓ４０）。以降は、ステップＳ１０からの繰り返しとなる。 The subsequent processing is the same as in the first embodiment, and the pipeline synchronization unit 31 executes one stage while matching the execution stages of the CPUs 1 to 3 (S40). Thereafter, the process is repeated from step S10.

本実施例のマルチコアマイコン１００は、ＣＰＵ１が先行命令を実行している間、他のＣＰＵ２，３をＷａｉｔさせ、命令依存関係の命令２を、先行命令である命令１を実行するＣＰＵと同じＣＰＵ１に割り当てる。これにより、命令セット内に命令依存関係が検出されても、命令単位で処理を分散させることができる。 The multi-core microcomputer 100 of this embodiment causes the other CPUs 2 and 3 to wait while the CPU 1 is executing the preceding instruction, and the instruction 2 having the instruction dependency relationship is the same CPU 1 as the CPU that executes the instruction 1 that is the preceding instruction. Assign to. As a result, even if an instruction dependency relationship is detected in the instruction set, processing can be distributed in units of instructions.

なお、本実施例は実施例１と共にマルチコアマイコン１００に適用することができる。 This embodiment can be applied to the multi-core microcomputer 100 together with the first embodiment.

本実施例では、ＣＰＵ毎の命令の処理レイテンシの違いを調整するマルチコアマイコン１００について説明する。すでに説明したように、ＡＬＵがシフト命令を実行するために必要なクロック数と、ＭＵＬが乗算命令を実行するために必要なクロック数、又は、ＤＩＶが除算命令を実行するために必要なクロック数は大きく異なる。このクロック数を処理レイテンシと呼び、１つの命令セットで処理レイテンシが異なれば、ＣＰＵ１〜ｎが命令を実行完了する時間が異なってしまう。 In this embodiment, a description will be given of the multi-core microcomputer 100 that adjusts the difference in instruction processing latency for each CPU. As already described, the number of clocks required for the ALU to execute the shift instruction and the number of clocks required for the MUL to execute the multiply instruction, or the number of clocks required for the DIV to execute the divide instruction. Are very different. This number of clocks is called processing latency, and if the processing latency is different for one instruction set, the time required for CPUs 1 to n to complete execution of instructions differs.

ＣＰＵ毎に命令の実行完了時間が異なる場合の不都合を説明する。例えば、ＣＰＵ１〜３が命令１〜３を実行した後、命令４〜６を実行する場合を考える。ＣＰＵ１が命令１を実行するために大きな遅延があり、ＣＰＵ１が実行する命令４が命令１と命令依存関係にある場合、ＣＰＵ１が命令４のＥＸステージを実行する際に、命令１の実行が完了していないおそれがある。 An inconvenience when the execution completion time of the instruction is different for each CPU will be described. For example, consider a case where the CPUs 1 to 3 execute the instructions 4 to 6 after executing the instructions 1 to 3. When CPU 1 has a large delay for executing instruction 1 and instruction 4 executed by CPU 1 has an instruction dependency relationship with instruction 1, execution of instruction 1 is completed when CPU 1 executes the EX stage of instruction 4 It may not have been done.

また、ＣＰＵ１〜ｎが実行する命令の実行完了時間が異なる場合、各ＣＰＵ１〜ｎが実行結果に応じて非同期にバスの獲得要求を出力する。例えば、処理レイテンシの違いにより、ＣＰＵ２がＣＰＵ１よりも先に命令２を実行完了させバスの獲得要求を出力したとする。しかし、ＣＰＵ１が命令１の実行のためにバスを使用することが必要な場合、ＣＰＵ１はさらに命令１の実行完了が遅れてしまう。 When execution completion times of instructions executed by the CPUs 1 to n are different, the CPUs 1 to n output a bus acquisition request asynchronously according to the execution result. For example, it is assumed that the CPU 2 completes the execution of the instruction 2 before the CPU 1 and outputs a bus acquisition request due to a difference in processing latency. However, when the CPU 1 needs to use the bus for executing the instruction 1, the CPU 1 further delays the completion of the execution of the instruction 1.

そこで、本実施例では、各ＣＰＵ１〜ｎが実行する命令の処理レイテンシに応じて、ステージにＷａｉｔを挿入する。すなわち、各ＣＰＵ１〜ｎが実行する命令の実行完了のタイミングがずれた場合、パイプライン同期部３１は、処理レイテンシが短いＣＰＵにＷａｉｔを挿入することで、各ＣＰＵ１〜ｎが同期して命令セットを実行完了するように調整する。 Therefore, in this embodiment, Wait is inserted into the stage according to the processing latency of the instructions executed by the CPUs 1 to n. That is, when the execution completion timings of the instructions executed by the CPUs 1 to n are shifted, the pipeline synchronization unit 31 inserts a wait in a CPU with a short processing latency so that the CPUs 1 to n synchronize with each other. Adjust to complete execution.

図８は、処理レイテンシとＷａｉｔによる調整を説明する図の一例である。命令１〜３は次の処理レイテンシであるとする。なお、各命令の処理レイテンシは、動作シーケンス制御回路１３にとって既知である。
命令1：処理レイテンシが２
命令２：処理レイテンシが３
命令３：処理レイテンシが１
パイプライン同期部３１は、命令１〜３で最も処理レイテンシが大きい命令２を実行したＣＰＵ２のＥＸステージ（時間３〜５）を特定する。時間６以降であれば、各ＣＰＵ１〜３は次の命令セットのＥＸステージを実行可能である。よって、パイプライン同期部３１は、時間６にＣＰＵ１〜３の命令４〜６のＥＸステージを揃える。 FIG. 8 is an example of a diagram illustrating adjustment by processing latency and wait. It is assumed that the instructions 1 to 3 are the next processing latency. Note that the processing latency of each instruction is known to the operation sequence control circuit 13.
Instruction 1: Processing latency is 2
Instruction 2: Processing latency is 3
Instruction 3: Processing latency is 1
The pipeline synchronization unit 31 specifies the EX stage (time 3 to 5) of the CPU 2 that has executed the instruction 2 having the largest processing latency among the instructions 1 to 3. After time 6, each of the CPUs 1 to 3 can execute the EX stage of the next instruction set. Therefore, the pipeline synchronization unit 31 aligns the EX stages of the instructions 4 to 6 of the CPUs 1 to 3 at time 6.

同様に、時間６に、ＣＰＵ１〜３の命令７〜９のＩＤステージを揃える。同様に、時間６に、ＣＰＵ１〜３の命令１０〜１２のＩＦステージを揃える。時間６に揃えるステージが、後続の命令セット毎に早まるのは、パイプライン制御によるものである。 Similarly, at time 6, the ID stages of the instructions 7 to 9 of the CPUs 1 to 3 are aligned. Similarly, at time 6, the IF stages of the instructions 10 to 12 of the CPUs 1 to 3 are aligned. It is because of pipeline control that the stage aligned at time 6 is advanced for each subsequent instruction set.

このように、処理レイテンシが大きい命令を実行したＣＰＵ２の命令の実行完了に合わせて、後続の命令セットのステージを遅らせることで、各ＣＰＵ１〜３のステージを同期させることができる。本実施例において、命令１〜３の命令セットを「検出命令セット」、命令４〜６の命令セットを「次命令セット」、命令７〜９の命令セットを「次〃命令セット」、命令１０〜１２の命令セットを「次〃〃命令セット」という。 In this way, the stages of the CPUs 1 to 3 can be synchronized by delaying the stage of the subsequent instruction set in accordance with the completion of the execution of the instruction of the CPU 2 that has executed the instruction having a large processing latency. In this embodiment, the instruction set of instructions 1 to 3 is “detection instruction set”, the instruction set of instructions 4 to 6 is “next instruction set”, the instruction set of instructions 7 to 9 is “next instruction set”, the instruction 10 The instruction set of ˜12 is called “next instruction set”.

図９は、各ＣＰＵ１〜３が処理レイテンシの異なる命令を実行した場合の動作シーケンス制御回路の動作手順の一例を示す。図９の手順は、例えば図５，７のステップＳ３０に続いて実行される。 FIG. 9 shows an example of an operation procedure of the operation sequence control circuit when the CPUs 1 to 3 execute instructions having different processing latencies. The procedure of FIG. 9 is executed following, for example, step S30 of FIGS.

パイプライン同期部３１は、各ＣＰＵ１〜３のＥＸステージの実行結果を監視し、処理レイテンシが異なるか否かを判定する（Ｓ１１０）。処理レイテンシは完全に一致している必要はなく、ある程度のずれは許容することができる。 The pipeline synchronization unit 31 monitors the execution results of the EX stages of the CPUs 1 to 3 and determines whether the processing latencies are different (S110). The processing latencies do not need to be completely matched, and a certain amount of deviation can be allowed.

処理レイテンシが異ならない場合、パイプライン同期部３１はＷａｉｔを挿入することなく、図９の処理は終了され、各ＣＰＵ１〜３のステージを進行させる（Ｓ４０）。 If the processing latencies are not different, the pipeline synchronization unit 31 terminates the process of FIG. 9 without inserting a wait, and advances the stages of the CPUs 1 to 3 (S40).

処理レイテンシが異なる命令がある場合（Ｓ１１０のＹｅｓ）、パイプライン同期部３１は、次命令セットの、ＥＸステージの前にＷａｉｔを挿入する（Ｓ１２０）。図８では、時間４の命令４〜６のＷａｉｔが対応する。 If there are instructions with different processing latencies (Yes in S110), the pipeline synchronization unit 31 inserts Wait before the EX stage of the next instruction set (S120). In FIG. 8, waits for instructions 4 to 6 at time 4 correspond.

同様に、パイプライン同期部３１は、次〃命令セットの、ＩＤステージの前にＷａｉｔを挿入する（Ｓ１３０）。図８では、時間４の命令７〜９のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the ID stage of the next instruction set (S130). In FIG. 8, Wait of instructions 7 to 9 at time 4 corresponds.

同様に、パイプライン同期部３１は、次〃〃命令セットの、ＩＦステージの前にＷａｉｔを挿入する（Ｓ１４０）。図８では、時間４の命令１０〜１２のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the IF stage of the next instruction set (S140). In FIG. 8, Waits corresponding to instructions 10 to 12 at time 4 correspond.

そして、パイプライン同期部３１は各ＣＰＵ１〜３のステージを１つ進める（Ｓ１５０）。命令４〜１２はＷａｉｔなのでステージは進まないが、命令１〜３は１ステージ又は1ステージ内の処理が進行する。 Then, the pipeline synchronization unit 31 advances one stage of each of the CPUs 1 to 3 (S150). Since the instructions 4 to 12 are Wait, the stage does not advance, but the instructions 1 to 3 are processed in one stage or in one stage.

次に、パイプライン同期部３１は、処理レイテンシが異なる命令が検出された検出命令セットのうち、最も処理レイテンシが大きい命令が実行完了したか否かを判定する（Ｓ１６０）。すなわち、図８ではＣＰＵ２が命令２を実行完了したか否かを判定する。ここでは、同じＣＰＵ内で演算結果を利用できればよいので、実行完了とはＥＸステージが終わることである。 Next, the pipeline synchronization unit 31 determines whether or not the instruction with the largest processing latency is completed from the detected instruction set in which the instructions with different processing latencies are detected (S160). That is, in FIG. 8, the CPU 2 determines whether or not the execution of the instruction 2 has been completed. Here, it is only necessary that the calculation result can be used in the same CPU, and thus the completion of execution means the end of the EX stage.

最も処理レイテンシが大きい命令が実行完了していない場合（Ｓ１６０のＮｏ）、パイプライン同期部３１は、次命令セットの、ＥＸステージの前にＷａｉｔを挿入する（Ｓ１２０）。図８では、時間５の命令４〜６のＷａｉｔが対応する。 If execution of the instruction with the largest processing latency has not been completed (No in S160), the pipeline synchronization unit 31 inserts Wait before the EX stage of the next instruction set (S120). In FIG. 8, waits for instructions 4 to 6 at time 5 correspond.

同様に、パイプライン同期部３１は、次〃命令セットの、ＩＤステージの前にＷａｉｔを挿入する（Ｓ１３０）。図８では、時間５の命令７〜９のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the ID stage of the next instruction set (S130). In FIG. 8, waits for instructions 7 to 9 at time 5 correspond.

同様に、パイプライン同期部３１は、次〃〃命令セットの、ＩＦステージの前にＷａｉｔを挿入する（Ｓ１４０）。図８では、時間５の命令１０〜１２のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the IF stage of the next instruction set (S140). In FIG. 8, waits for instructions 10 to 12 at time 5 correspond.

そして、パイプライン同期部３１は各ＣＰＵのステージを１つ進める（Ｓ１５０）。これにより、ＣＰＵ２が命令２を実行完了させることができる。 Then, the pipeline synchronization unit 31 advances the stage of each CPU by one (S150). Thereby, the CPU 2 can complete the execution of the instruction 2.

最も処理レイテンシが大きい命令が実行完了した場合（Ｓ１６０のＹｅｓ）、図9の処理は終了する。すなわち、パイプライン同期部３１は、Ｗａｉｔを挿入しないので、各ＣＰＵ１〜３の命令１〜１２のステージが１つ進行される（Ｓ４０）。 When the execution of the instruction with the largest processing latency is completed (Yes in S160), the processing in FIG. 9 ends. That is, since the pipeline synchronization unit 31 does not insert Wait, one stage of the instructions 1 to 12 of the CPUs 1 to 3 is advanced (S40).

本実施例のマルチコアマイコン１００は、各ＣＰＵ１〜ｎが命令セット内で処理する命令の処理レイテンシが異なっても、各ステージにＷａｉｔを挿入することで、各ＣＰＵの実行ステージを同期させることができる。したがって、命令セット内に処理レイテンシの異なる命令があってもＣＰＵ１〜３のステージを同期させ、命令依存関係の発生を抑制できる。 The multi-core microcomputer 100 according to the present embodiment can synchronize the execution stages of the CPUs by inserting waits in the respective stages even when the processing latencies of the instructions processed by the CPUs 1 to n in the instruction set are different. . Therefore, even if there are instructions with different processing latencies in the instruction set, the stages of the CPUs 1 to 3 can be synchronized to suppress the occurrence of instruction dependency.

なお、本実施例は実施例１，２と共にマルチコアマイコン１００に適用することができる。 This embodiment can be applied to the multi-core microcomputer 100 together with the first and second embodiments.

本実施例では異なるＣＰＵ１〜ｎが演算結果を同期するマルチコアマイコン１００について説明する。図１において説明したように、ＣＰＵ１が実行する命令の被演算子が、ＣＰＵ２など他のＣＰＵのＣＰＵレジスタ４５に記憶されている場合がある。このままではＣＰＵ１は演算を実行できないので、ＣＰＵレジスタ同期部３４はＣＰＵ１に、ＣＰＵ２のレジスタに記憶されている演算結果をコピーする。 In this embodiment, a multi-core microcomputer 100 in which different CPUs 1 to n synchronize operation results will be described. As described with reference to FIG. 1, the operand of the instruction executed by the CPU 1 may be stored in the CPU register 45 of another CPU such as the CPU 2. Since the CPU 1 cannot execute the operation as it is, the CPU register synchronization unit 34 copies the operation result stored in the register of the CPU 2 to the CPU 1.

図１０は、レジスタのリフレッシュを説明する図の一例である。説明のため、命令セットを順番に命令セットＡ〜Ｄという。 FIG. 10 is an example of a diagram illustrating register refresh. For the sake of explanation, the instruction sets are referred to as instruction sets A to D in order.

(i) 命令依存判定部３３は、命令セットＢの命令が、直前の命令セットＡの命令と命令依存関係にあるか否かを判定する。具体的には、以下の命令依存関係を判定する。
ＣＰＵ１が実行する命令４が、命令２，３と命令依存関係にあるか否か
ＣＰＵ２が実行する命令５が、命令１，３と命令依存関係にあるか否か
ＣＰＵ３が実行する命令６が、命令１，２と命令依存関係にあるか否か
同様の関係が、命令セットＣとＢ、命令セットＤとＣ、のように前後の命令セット毎に生じる。例えば、命令セットＣとＢでは、
ＣＰＵ１が実行する命令７が、命令５，６と命令依存関係にあるか否か
ＣＰＵ２が実行する命令８が、命令４，６と命令依存関係にあるか否か
ＣＰＵ３が実行する命令９が、命令４，５と命令依存関係にあるか否か
が判定される。 (i) The instruction dependency determining unit 33 determines whether an instruction in the instruction set B has an instruction dependency relationship with an instruction in the immediately preceding instruction set A. Specifically, the following instruction dependency is determined.
Whether the instruction 4 executed by the CPU 1 has an instruction dependency with the instructions 2 and 3, the instruction 5 executed by the CPU 2 has an instruction dependency with the instructions 1 and 3, and the instruction 6 executed by the CPU 3 has Whether or not there is an instruction dependency relationship with the instructions 1 and 2 is generated for each instruction set before and after the instruction sets C and B and instruction sets D and C. For example, in instruction sets C and B,
Whether the instruction 7 executed by the CPU 1 has an instruction dependency with the instructions 5 and 6, the instruction 8 executed by the CPU 2 has an instruction dependency with the instructions 4 and 6, and the instruction 9 executed by the CPU 3 has It is determined whether or not there is an instruction dependency with the instructions 4 and 5.

判定方法は、例えば以下のようになる。命令４の演算対象のオペランドに、命令２，３の演算結果が格納されるレジスタ名又は変数名（アドレス番地）が記述されているか否かを判定する。記述されていれば、命令４には命令２又は命令３と命令依存関係があることになる。また、命令２又は３は先行命令となる。ここでは命令４が命令依存関係の命令であり、命令２が先行命令であるとする。 The determination method is as follows, for example. It is determined whether or not the register name or variable name (address address) in which the operation results of the instructions 2 and 3 are stored is described in the operand to be operated by the instruction 4. If it is described, the instruction 4 has an instruction dependency relationship with the instruction 2 or the instruction 3. Further, the instruction 2 or 3 is a preceding instruction. Here, it is assumed that instruction 4 is an instruction dependency instruction and instruction 2 is a preceding instruction.

(ii) パイプライン同期部３１は、先行命令の命令２の実行が完了したタイミングで、ＣＰＵ２のＣＰＵレジスタ４５のデータをＣＰＵ１のレジスタにコピーする。コピーするまではＣＰＵ１は命令４を実行できないので、パイプライン同期部３１は実施例３と同様に各ステージにＷａｉｔを挿入する。Ｗａｉｔの間に、ＣＰＵ２のＣＰＵレジスタ４５のデータをＣＰＵ１のレジスタにコピーすればよい。 (ii) The pipeline synchronization unit 31 copies the data of the CPU register 45 of the CPU 2 to the register of the CPU 1 at the timing when the execution of the instruction 2 of the preceding instruction is completed. Since the CPU 1 cannot execute the instruction 4 until copying, the pipeline synchronization unit 31 inserts Wait in each stage as in the third embodiment. The data of the CPU register 45 of the CPU 2 may be copied to the register of the CPU 1 during the wait.

具体的には、例えば、ＣＰＵ２のＣＰＵレジスタ４５とＣＰＵ１のレジスタセット４４を直接、接続しておく。ＣＰＵレジスタ同期部３４は、ＣＰＵ２が命令２を実行完了したタイミングで、ＣＰＵ２のＣＰＵレジスタ４５のデータをＣＰＵ１にコピーする。 Specifically, for example, the CPU register 45 of the CPU 2 and the register set 44 of the CPU 1 are directly connected. The CPU register synchronization unit 34 copies the data in the CPU register 45 of the CPU 2 to the CPU 1 at the timing when the CPU 2 completes the execution of the instruction 2.

図１１は、命令セット間の命令依存関係がある場合の、動作シーケンス制御回路１３の動作手順の一例を示す。なお、図１１の手順は、例えば図５，７のステップＳ３０に続いて実行される。 FIG. 11 shows an example of an operation procedure of the operation sequence control circuit 13 when there is an instruction dependency between instruction sets. Note that the procedure of FIG. 11 is executed following, for example, step S30 of FIGS.

なお、命令依存判定部３３は、予め、命令セット間に命令依存関係があるか否かを判定しておく。例えば、２つ以上の命令セットＡ，Ｂを命令キュー４２に読み出し、命令セットＢと直前の命令セットＡとの間の命令依存関係を判定しておく。命令依存関係がある場合、命令依存関係がある命令を実行するＣＰＵ、及び、先行命令を実行するＣＰＵを記録しておく。 The instruction dependency determining unit 33 determines in advance whether there is an instruction dependency between instruction sets. For example, two or more instruction sets A and B are read to the instruction queue 42, and the instruction dependency between the instruction set B and the immediately preceding instruction set A is determined. If there is an instruction dependency, the CPU that executes the instruction having the instruction dependency and the CPU that executes the preceding instruction are recorded.

本実施例において、命令依存関係がある命令４が含まれる命令４〜６の命令セットの次の命令セット（命令７〜９）を「次命令セット」、命令１０〜１２の命令セットを「次〃命令セット」という。 In this embodiment, the next instruction set (instructions 7 to 9) of the instruction set of instructions 4 to 6 including the instruction 4 having the instruction dependency relationship is “next instruction set”, and the instruction set of instructions 10 to 12 is “next”. It is called “Instruction Set”.

パイプライン同期部３１は、先行命令を実行しているか否かを判定する（Ｓ２１０）。この判定は、先行命令のＥＸステージよりも前で行えばよい。先行命令を実行していない場合（Ｓ２１０のＮｏ）、パイプライン同期部３１はＷａｉｔを挿入することなく、各ＣＰＵ１〜３のステージを進行させる（Ｓ４０）。 The pipeline synchronization unit 31 determines whether or not a preceding instruction is being executed (S210). This determination may be performed before the EX stage of the preceding instruction. When the preceding instruction is not executed (No in S210), the pipeline synchronization unit 31 advances the stages of the CPUs 1 to 3 without inserting Wait (S40).

先行命令を実行している場合（Ｓ２１０のＹｅｓ）、パイプライン同期部３１は、命令依存関係がある命令を含む命令セットのＥＸステージの前にＷａｉｔを挿入する（Ｓ２２０）。図１０では、時間４の命令４〜６のＷａｉｔが対応する。 When the preceding instruction is being executed (Yes in S210), the pipeline synchronization unit 31 inserts Wait before the EX stage of the instruction set including the instruction having the instruction dependency (S220). In FIG. 10, waits for instructions 4 to 6 at time 4 correspond.

同様に、パイプライン同期部３１は、命令依存関係がある命令を含む命令セットの次の命令セットである次命令セットの、ＩＤステージの前にＷａｉｔを挿入する（Ｓ２３０）。図１０では、時間４の命令７〜９のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the ID stage of the next instruction set that is the next instruction set including the instruction having the instruction dependency (S230). In FIG. 10, waits for instructions 7 to 9 at time 4 correspond.

同様に、パイプライン同期部３１は、次〃命令セットの、ＩＦステージの前にＷａｉｔを挿入する（Ｓ２４０）。図１０では、時間４の命令１０〜１２のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the IF stage of the next instruction set (S240). In FIG. 10, waits for instructions 10 to 12 at time 4 correspond.

そして、パイプライン同期部３１は各ＣＰＵ１〜３のステージを１つ進める（Ｓ２５０）。命令４〜１２はＷａｉｔなのでステージは進まないが、命令１〜３は１ステージ又は1ステージ内の処理が進行する。 Then, the pipeline synchronization unit 31 advances one stage of each of the CPUs 1 to 3 (S250). Since the instructions 4 to 12 are Wait, the stage does not advance, but the instructions 1 to 3 are processed in one stage or in one stage.

次に、パイプライン同期部３１は、先行命令の実行が完了したか否かを判定する（Ｓ２６０）。すなわち、図１０ではＣＰＵ２が命令２を実行完了したか否かを判定する。命令依存関係のある命令４を実行するＣＰＵ１と、先行命令の命令２を実行するＣＰＵ２とでＣＰＵが異なるため、実行完了したか否はライトバックされたか否かにより判定される。 Next, the pipeline synchronization unit 31 determines whether the execution of the preceding instruction is completed (S260). That is, in FIG. 10, the CPU 2 determines whether or not the execution of the instruction 2 has been completed. Since the CPU 1 is different between the CPU 1 that executes the instruction 4 having the instruction dependency and the CPU 2 that executes the instruction 2 of the preceding instruction, whether or not the execution is completed is determined by whether or not the write back is performed.

先行命令を実行している場合（Ｓ２１０のＹｅｓ）、パイプライン同期部３１は、命令依存関係がある命令を含む命令セットのＥＸステージの前にＷａｉｔを挿入する（Ｓ２２０）。図１０では、時間５の命令４〜６のＷａｉｔが対応する。 When the preceding instruction is being executed (Yes in S210), the pipeline synchronization unit 31 inserts Wait before the EX stage of the instruction set including the instruction having the instruction dependency (S220). In FIG. 10, waits for instructions 4 to 6 at time 5 correspond.

同様に、パイプライン同期部３１は、次命令セットの、ＩＤステージの前にＷａｉｔを挿入する（Ｓ２３０）。図１０では、時間５の命令７〜９のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the ID stage of the next instruction set (S230). In FIG. 10, Wait of instructions 7 to 9 at time 5 corresponds.

同様に、パイプライン同期部３１は、次〃命令セットの、ＩＦステージの前にＷａｉｔを挿入する（Ｓ２４０）。図１０では、時間５の命令１０〜１２のＷａｉｔが対応する。 Similarly, the pipeline synchronization unit 31 inserts Wait before the IF stage of the next instruction set (S240). In FIG. 10, Waits corresponding to instructions 10 to 12 at time 5 correspond.

そして、先行命令の実行が完了した場合（Ｓ２６０のＹｅｓ）、ＣＰＵレジスタ同期部３４はレジスタをリフレッシュする（Ｓ２７０）。すなわち、ＣＰＵレジスタ同期部３４は、ＣＰＵ２のＣＰＵレジスタ４５のデータをＣＰＵ１のレジスタセット４４にコピーする。これにより、ＣＰＵ１はＣＰＵ２の演算結果を利用して命令４を実行できる。以降は、図５，７のステップＳ４０に進み、Ｗａｉｔが挿入されることなく、各ＣＰＵ１〜３が命令４〜１２のステージを進行させる。 When the execution of the preceding instruction is completed (Yes in S260), the CPU register synchronization unit 34 refreshes the register (S270). That is, the CPU register synchronization unit 34 copies the data in the CPU register 45 of the CPU 2 to the register set 44 of the CPU 1. Thereby, the CPU 1 can execute the instruction 4 using the calculation result of the CPU 2. Thereafter, the process proceeds to step S40 in FIGS. 5 and 7, and the CPUs 1 to 3 advance the stages of the instructions 4 to 12 without inserting Wait.

本実施例のマルチコアマイコン１００は、命令セット間で異なるＣＰＵ間に命令依存関係があっても、レジスタをリフレッシュすることで、命令単位で命令を分散して実行することを可能にできる。 The multi-core microcomputer 100 of this embodiment can distribute and execute instructions in units of instructions by refreshing the registers even if there is an instruction dependency between different CPUs between instruction sets.

なお、本実施例は実施例１〜３と共にマルチコアマイコン１００に適用することができる。 This embodiment can be applied to the multi-core microcomputer 100 together with the first to third embodiments.

１３動作シーケンス制御回路
１４ＣＰＵ
１６故障診断装置
２１共通命令キュー
２２ＣＰＵバス制御部
２３命令スケジューラ
３１パイプライン同期部
３２命令キュー制御部
３３命令依存判定部
３４ＣＰＵレジスタ同期部
３５外部アクセス制御部
１００マルチコアマイコン 13 Operation Sequence Control Circuit 14 CPU
DESCRIPTION OF SYMBOLS 16 Failure diagnostic device 21 Common instruction queue 22 CPU bus control part 23 Instruction scheduler 31 Pipeline synchronization part 32 Instruction queue control part 33 Instruction dependence determination part 34 CPU register synchronization part 35 External access control part 100 Multi-core microcomputer

Claims

In an information processing apparatus in which multiple cores execute instructions in parallel,
Program storage means for storing a program;
Calculation result storage means for storing calculation results for each core;
Instruction reading means for reading an instruction set including instructions for the number of cores in the execution order from the program storage means, and storing the instruction set in a primary storage unit;
Instruction distribution means for storing instructions included in the instruction set in an instruction queue of each core;
A dependent instruction whose operation target is an operation result obtained by the first core calculating an instruction included in the first instruction set stored in the primary storage unit is executed after the first instruction set. Instruction dependency determination means for determining whether or not a second core different from the first core executes, included in the instruction set;
When the instruction dependency relationship determining unit determines that the second core executes the dependency instruction included in the second instruction set, the value of the operation result storage unit of the first core is set to the second core. Copying means for copying to the calculation result storage means of the core;
An information processing apparatus.

A stage in which a plurality of cores control the progress of the stage so that each instruction is executed while synchronizing each stage of instruction reading from the instruction queue, instruction decoding, instruction execution, and storage in the operation result storage means Having synchronization means;
The stage synchronization means is executed by the plurality of cores until the first core stores the calculation result to be calculated by the second core in the calculation result storage means of the first core. Stopping the stages of multiple cores so that the stage of each instruction in the instruction set after one instruction set does not reach the stage of instruction execution,
The information processing apparatus according to claim 1.

When the instruction dependency relationship determining unit determines that the second instruction included in the instruction set is the operation target of the operation result of the first instruction in the same instruction set,
The instruction distribution means stores the first instruction included in the instruction set in the instruction queue of any one of a plurality of cores,
The stage synchronization means stops a stage of a core other than the core in which the first instruction is stored in the instruction queue, and reads the instruction in the core in which the first instruction is stored in the instruction queue. And execute
The instruction distribution means stores the second instruction in the instruction queue of the core that executed the first instruction, and stores the remaining instructions of the instruction set starting with the second instruction to another core. Stored in the instruction queue of
The stage synchronization means causes each core to execute instructions subsequent to the second instruction while synchronizing the stages of a plurality of cores.
The information processing apparatus according to claim 2.

If the instruction set includes instructions with different processing latencies required for the instruction execution stage,
The stage synchronization means is configured such that each instruction stage of an instruction set following an instruction set including an instruction having a different processing latency becomes an instruction execution stage until a core executing an instruction having a different processing latency finishes the instruction execution stage. Stop multiple core stages to avoid reaching
The information processing apparatus according to claim 2.

It has a failure detection means for detecting a failure for each core,
When the failure detection unit detects a core failure, the failure detection unit includes a cutting unit that disconnects the core from which the failure has been detected and the program storage unit, the command reading unit, the command distribution unit, and the stage synchronization unit. ,
The instruction reading means reads an instruction set including instructions for the number of cores excluding the disconnected core from the program storage means and stores it in a primary storage unit,
The instruction distribution means stores instructions for the number of cores included in the instruction set, excluding the disconnected core, in the instruction queue of each core.
The information processing apparatus according to claim 2, wherein the information processing apparatus is an information processing apparatus.

Program storage means for storing a program;
Calculation result storage means for storing calculation results for each core,
In an instruction execution method of an information processing apparatus in which a plurality of cores execute instructions in parallel,
Instruction reading means reads an instruction set including instructions for the number of cores in the execution order from the program storage means and stores it in the primary storage unit;
An instruction distribution means for storing instructions included in the instruction set in an instruction queue of each core;
The instruction dependency relation determining means has a dependency instruction whose operation target is an operation result obtained by the first core calculating an instruction included in the first instruction set stored in the primary storage unit, as compared to the first instruction set. Determining whether a second core included in a second instruction set to be executed later and different from the first core executes;
When the instruction dependency determining means determines that the second core executes the dependent instruction included in the second instruction set, the copying means sets the value of the operation result storage means of the first core, Copying to the operation result storage means of the second core;
An information processing method comprising: