JP2011008617A

JP2011008617A - Multithread execution device, method of generating object program, and program

Info

Publication number: JP2011008617A
Application number: JP2009152749A
Authority: JP
Inventors: Masuzo Takemoto; 益三嵩本; Hiroyasu Saikai; 弘恭西海; Kazuya Okamoto; 和也岡本
Original assignee: Denso Corp; Toyota Motor Corp; Renesas Electronics Corp
Current assignee: Denso Corp; Toyota Motor Corp; Renesas Electronics Corp
Priority date: 2009-06-26
Filing date: 2009-06-26
Publication date: 2011-01-13

Abstract

PROBLEM TO BE SOLVED: To provide a multithread execution device, capable of adjusting the execution completion frequency of instructions per unit time by a CPU, a method for generating an object program, and a program.SOLUTION: The multithread execution device 100 includes a program storage means 26 which stores an object program generated from a source program; an instruction issuing means 11 which issues an instruction of the object program; and an instruction execution means 15 which executes the instruction. An adjustment code is inserted to the object program, the adjustment code adjusting so that execution completion speed information of the instruction is substantially matched to target execution speed information for each program read from a target execution rate information storage means 80.

Description

一以上のスレッドを時分割して実行可能なマルチスレッド実行装置及びマルチスレッド実行方法に関し、特に、スレッド毎に命令の発行比率を調整可能なマルチスレッド実行装置、オブジェクトプログラムの生成方法及びプログラムに関する。 More particularly, the present invention relates to a multithread execution device, an object program generation method, and a program that can adjust an instruction issue rate for each thread.

プログラムを効率よく実行するための種々の方法が考えられている。従来における実行効率向上のアプローチの多くは、ＣＰＵの演算回路がプログラムを実行していない時間を少なくしようとするものである。例えば、分岐やユーザの操作による処理内容の確定待ちを低減する技術（例えば、特許文献１、３参照。）、入出力割り込みによるソフト的なオーバーヘッドを低減する技術（例えば、特許文献２参照。）、コンテキストの切り替え等、ハード的なオーバーヘッドを低減する技術（例えば、特許文献４参照。）が提案されている。 Various methods for efficiently executing a program have been considered. Many of the conventional approaches for improving the execution efficiency try to reduce the time during which the arithmetic circuit of the CPU is not executing the program. For example, a technique for reducing waiting for confirmation of processing contents due to branching or user operation (see, for example, Patent Documents 1 and 3) and a technique for reducing software overhead due to input / output interrupts (for example, see Patent Document 2). A technique for reducing hardware overhead such as context switching has been proposed (see, for example, Patent Document 4).

特許文献１では、静的に処理順序を決定可能なプログラムの部分を、コンパイル時にＰＵ（Processing Unit）の特性に応じて配置しておくヘテロジニアスなマルチプロセッサが開示されている。また、特許文献２には、共有資源にアクセスするためのサブシステムコールをマスクすることで、ハイパバイザーによる介入を制御する入出力サブシステムコール命令制御方法が開示されている。特許文献２記載の入出力サブシステムコール命令制御方法は、ハイパバイザーによる介入を制御することで、割り込みによるオーバーヘッドを抑制することを図っている。また、特許文献３には、複数のプロセッサに投機的に命令を発行するマルチスレッド処理装置において、各プロセッサのスレッド実行開始、終了、実行状態の切り替え、及び、スレッド間のデータ転送を制御するスレッドマネージャを有するマルチスレッド処理装置が開示されている。特許文献３記載のマルチスレッド処理装置は、複数のプログラム経路を同時に実行することで、投機的実行の予測精度を向上させることを図っている。また、特許文献４には、複数のハードウェアスレッドを有するマルチスレッドコンピュータにおいて、スレッドにプログラムカウンタとレジスタセットを固定的に割り当て、待機状態のスレッドと実行状態のスレッドを切り替えるマルチスレッド実行方法が開示されている。特許文献４記載のマルチスレッド実行方法は、スレッドの準備に要する時間を削減できるので処理速度を向上することを図っている。 Patent Document 1 discloses a heterogeneous multiprocessor in which a portion of a program whose processing order can be determined statically is arranged according to the characteristics of a PU (Processing Unit) at the time of compilation. Patent Document 2 discloses an input / output subsystem call instruction control method for controlling intervention by a hypervisor by masking subsystem calls for accessing shared resources. The input / output subsystem call instruction control method described in Patent Document 2 aims to suppress overhead due to interrupts by controlling intervention by a hypervisor. Patent Document 3 discloses a thread that controls thread execution start and end, execution state switching, and data transfer between threads in a multithread processing apparatus that speculatively issues instructions to a plurality of processors. A multi-thread processing device having a manager is disclosed. The multi-thread processing device described in Patent Document 3 aims to improve the prediction accuracy of speculative execution by simultaneously executing a plurality of program paths. Patent Document 4 discloses a multi-thread execution method in which a program counter and a register set are fixedly assigned to a thread and a thread in a standby state and a thread in an execution state are switched in a multi-thread computer having a plurality of hardware threads. Has been. The multi-thread execution method described in Patent Document 4 aims to improve the processing speed because the time required for thread preparation can be reduced.

特開２００７−３２８４１６号公報JP 2007-328416 A 特開平６−３５７３１号公報JP-A-6-35731 特開２０００−４７８８７号公報JP 2000-47887 A 特開２００４−２３４１２３号公報JP 2004-234123 A

しかしながら、特許文献１〜４に代表される従来の技術のように実行効率を単に向上させるだけでは、２つのプログラム間に実行タイミングの依存性があるシステムに影響を及ぼすおそれがあるという問題がある。 However, there is a problem that simply improving the execution efficiency as in the conventional techniques represented by Patent Documents 1 to 4 may affect a system having an execution timing dependency between two programs. .

図１（ａ）は、プログラム＃０の実行タイミングとプログラム＃１の実行タイミングの関係を説明する図の一例である。プログラム＃０はＣＰＵ＿Ａ１により実行され、プログラム＃１はＣＰＵ＿Ｂ１により実行されている。ＣＰＵ＿Ａ１とＣＰＵ＿Ｂ１は、一般的には別のマイコンに実装されている。 FIG. 1A is an example for explaining the relationship between the execution timing of the program # 0 and the execution timing of the program # 1. Program # 0 is executed by CPU_A1, and program # 1 is executed by CPU_B1. CPU_A1 and CPU_B1 are generally mounted in different microcomputers.

プログラム＃０は処理ａを実行し、プログラム＃１は、処理ｂ１と処理ｂ２を実行する。処理ｂ２は、処理ａの処理結果を利用して実行されるものである。図１（ａ）ではプログラム＃０とプログラム＃１の実行タイミングが同期するよう設計されているので、ＣＰＵ＿Ｂ１は処理ａの処理結果を利用して処理ｂ２を実行することができる。 Program # 0 executes process a, and program # 1 executes process b1 and process b2. The process b2 is executed using the processing result of the process a. In FIG. 1A, since the execution timings of the program # 0 and the program # 1 are designed to be synchronized, the CPU_B1 can execute the process b2 using the process result of the process a.

図１（ｂ）は、ＣＰＵ＿Ｂ２がプログラム＃１を実行する際の、プログラム＃０の実行タイミングとプログラム＃１の実行タイミングの関係の一例を示す図である。ＣＰＵ＿Ｂ２はＣＰＵ＿Ｂ１よりＩＰＣ（Instructions Per Clock cycle）が大きいか又は単位処理の実行サイクル数が小さいＣＰＵ、いわゆる実行速度の速いＣＰＵである。ＣＰＵ＿Ｂ２がプログラム＃１を実行することで、処理ｂ１の実行タイミングが早まる。一方、処理ｂ１が完了した時にはＣＰＵ＿Ａ１は処理ａを完了してない。このため、ＣＰＵ＿Ｂ２が処理ｂ２を実行する際に、処理ａの処理結果を利用できず、例えば、ＣＰＵ＿Ｂ２は１つ前の処理ａの処理結果を利用して処理ｂ２を実行してしまうなどの不具合が生じる。 FIG. 1B is a diagram illustrating an example of the relationship between the execution timing of the program # 0 and the execution timing of the program # 1 when the CPU_B2 executes the program # 1. CPU_B2 is a CPU having an IPC (Instructions Per Clock cycle) larger than that of CPU_B1 or a smaller number of execution cycles of unit processing, that is, a CPU having a higher execution speed. The execution timing of the process b1 is advanced by the CPU_B2 executing the program # 1. On the other hand, when the process b1 is completed, the CPU_A1 has not completed the process a. For this reason, when the CPU_B2 executes the process b2, the process result of the process a cannot be used. For example, the CPU_B2 executes the process b2 using the process result of the previous process a. Occurs.

ところで、ハードウェア・マルチスレッド技術を搭載したＣＰＵや、１つのＣＰＵに複数のコアを搭載したマルチコアＣＰＵ、が利用できるようになってきた。そこで、それまでは複数のＣＰＵ＿Ａ１で実行していたプログラム＃０とＣＰＵ＿Ｂ１で実行していたプログラム＃１を、１つのＣＰＵに移植して実行することが考えられている。このような技術は、例えば、車載ＬＡＮに接続された複数のＥＣＵ（Electronic Control Unit）をより少ない数のＥＣＵに統合する際に利用することができる。 By the way, it has become possible to use a CPU equipped with hardware multithread technology and a multicore CPU in which a plurality of cores are mounted on one CPU. Therefore, it is considered that the program # 0 that has been executed by the plurality of CPU_A1 and the program # 1 that has been executed by the CPU_B1 are transplanted to one CPU and executed. Such a technique can be used, for example, when integrating a plurality of ECUs (Electronic Control Units) connected to the in-vehicle LAN into a smaller number of ECUs.

しかしながら、統合後のＥＣＵに搭載されたＣＰＵは、統合前のＥＣＵに搭載されていたＣＰＵとアーキテクチャが異なるため、実行タイミングに依存関係のあるプログラム＃０とプログラム＃１を単に統合後のＥＣＵに移植しただけでは、図１（ｂ）と同様の不都合が生じてしまう。 However, since the CPU mounted on the ECU after integration has a different architecture from the CPU mounted on the ECU before integration, the program # 0 and the program # 1 that depend on execution timing are simply used as the ECU after integration. Just transplanting causes the same inconvenience as in FIG.

このような不都合に対し、統合用ＣＰＵに異なるプログラム＃０，１を移植した場合、命令の発行比率を調整することが考えられる。
図２（ａ）は、統合前のＣＰＵ＿Ａ１とＣＰＵ＿Ｂ１、及び、統合用ＣＰＵの関係を説明する図の一例である。ＣＰＵ＿Ａ１の動作クロックは６０ＭＨｚ、ＣＰＵ＿Ｂ１の動作クロックは１８０ＭＨｚである。説明を簡単にするため両者のＩＰＣは同じ（ＩＰＣ＝１）としたが、両者のＩＰＣは異なっていてもよい。また、統合用ＣＰＵの動作クロックは１８０ＭＨｚである。したがって、ＣＰＵ＿Ｂ１の実行速度はＣＰＵ＿Ａ１の３倍である。なお、統合用ＣＰＵが有するｖＣＰＵ＿Ａ１、ｖＣＰＵ＿Ｂ１は仮想ＣＰＵを意味する。 For such inconvenience, when different programs # 0 and # 1 are ported to the integration CPU, it is conceivable to adjust the instruction issue ratio.
FIG. 2A is an example of a diagram illustrating the relationship between CPU_A1 and CPU_B1 before integration and the CPU for integration. The operation clock of CPU_A1 is 60 MHz, and the operation clock of CPU_B1 is 180 MHz. In order to simplify the explanation, both IPCs are the same (IPC = 1), but both IPCs may be different. The operation clock of the integration CPU is 180 MHz. Therefore, the execution speed of CPU_B1 is three times that of CPU_A1. Note that vCPU_A1 and vCPU_B1 included in the integration CPU mean virtual CPUs.

ＣＰＵ＿Ａ１が実行していたプログラム＃０とＣＰＵ＿Ｂ１が実行していたプログラム＃１の実行タイミングを合わせるためには、元の実行速度に応じて、統合用ＣＰＵにおける命令発行比率を変えればよい。
図２（ｂ）は命令の発行比率と実行完了数の関係の一例を示す図である。実行速度が３倍異なるのであれば、統合用ＣＰＵは、ｖＣＰＵ＿Ｂ１に３回、命令を発行する毎に、ｖＣＰＵ＿Ａ１に１回、命令を発行する。こうすることで、統合前にＣＰＵ＿Ａ１が実行していたプログラム＃０と、ＣＰＵ＿Ｂ１が実行していたるプログラム＃１の実行タイミングを、統合用ＣＰＵにおいても、ある程度、揃えられる可能性がある。 In order to match the execution timings of the program # 0 executed by the CPU_A1 and the program # 1 executed by the CPU_B1, the instruction issue ratio in the integration CPU may be changed according to the original execution speed.
FIG. 2B is a diagram illustrating an example of the relationship between the instruction issue ratio and the number of execution completions. If the execution speed is three times different, the integration CPU issues an instruction to vCPU_A1 once every time the instruction is issued to vCPU_B1 three times. By doing so, the execution timings of the program # 0 executed by the CPU_A1 before the integration and the program # 1 executed by the CPU_B1 may be adjusted to some extent even in the integration CPU.

しかしながら、統合用ＣＰＵとＣＰＵ＿Ａ１及びＣＰＵ＿Ｂ１は動作周波数だけでなく、命令セットやＩＰＣなど単位時間当たりの命令実行完了数が異なる。また、統合用ＣＰＵが、段数の多いパイプラインや、複数のパイプラインを備えている場合、ハザードやストールにより単位時間当たりの命令実行完了数が変わるため、ＩＰＣを考慮して命令の発行比率を決定しても、所望の命令実行比率は得られない。 However, the integration CPU and CPU_A1 and CPU_B1 differ not only in the operating frequency but also in the number of instruction execution completions per unit time such as an instruction set and IPC. If the integration CPU has a pipeline with a large number of stages or a plurality of pipelines, the number of instruction execution completions per unit time will change due to hazards or stalls. Even if it is determined, the desired instruction execution ratio cannot be obtained.

このため、命令の発行比率を制御するだけでは、ｖＣＰＵ＿Ａ１の単位時間当たりの命令実行完了数とＣＰＵ＿Ａ１の単位時間当たりの命令実行完了数、及び、ｖＣＰＵ＿Ｂ１の単位時間当たりの命令実行完了数とＣＰＵ＿Ｂ１の単位時間当たりの命令実行完了数、を一致させることができない。例えば、命令の発行比率を制御しても、単位時間におけるプログラム＃０と＃１の命令実行完了数はそれぞれ変動を繰り返し、最終的にはプログラム＃０とプログラム＃１実行タイミングが大きく異なることになってしまう。 Therefore, only by controlling the instruction issue rate, the number of instruction execution completions per unit time of vCPU_A1 and the number of instruction execution completions per unit time of CPU_A1, and the number of instruction execution completions per unit time of vCPU_B1 and CPU_B1 The number of instruction execution completions per unit time cannot be matched. For example, even if the instruction issue ratio is controlled, the number of instruction execution completions of programs # 0 and # 1 in a unit time repeatedly fluctuate, and the execution timings of program # 0 and program # 1 are greatly different in the end. turn into.

すなわち、従来のマルチスレッド技術又はマルチコアを利用して、複数のプログラム＃０，１を統合した場合、統合前の複数のプログラム＃０，１の実行タイミングを保証することができないという問題があった。 That is, when a plurality of programs # 0, # 1 are integrated using conventional multi-thread technology or multi-core, there is a problem that the execution timing of the plurality of programs # 0, # 1 before integration cannot be guaranteed. .

本発明は、上記課題に鑑み、ＣＰＵによる単位時間の命令の実行完了回数を調整できるマルチスレッド実行装置、オブジェクトプログラムの生成方法及びプログラムを提供することを目的とする。 In view of the above problems, an object of the present invention is to provide a multithread execution device, a method for generating an object program, and a program that can adjust the number of execution completions of instructions per unit time by a CPU.

ソースプログラムから生成されたオブジェクトプログラムを記憶するプログラム記憶手段と、ソースプログラムから生成されたオブジェクトプログラムの命令を発行する命令発行手段と、前記命令を実行する命令実行手段と、を有するマルチスレッド実行装置であって、前記オブジェクトプログラムには、目標実行速度情報記憶手段から読み出された前記プログラム毎の目標実行速度情報に、命令の実行完了速度情報が略一致するよう調整する、調整用コードが挿入されている、ことを特徴とする。 A multi-thread execution device comprising: a program storage unit that stores an object program generated from a source program; an instruction issue unit that issues an instruction of an object program generated from the source program; and an instruction execution unit that executes the instruction In the object program, an adjustment code for adjusting the instruction execution completion speed information so as to substantially match the target execution speed information for each program read from the target execution speed information storage means is inserted. It is characterized by being.

ＣＰＵによる単位時間の命令の実行完了回数を調整できるマルチスレッド実行装置、オブジェクトプログラムの生成方法及びプログラムを提供することができる。 It is possible to provide a multi-thread execution device, an object program generation method, and a program that can adjust the number of execution completions of instructions per unit time by a CPU.

プログラム＃０の実行タイミングとプログラム＃１の実行タイミングの関係を説明する図の一例である。FIG. 10 is an example of a diagram illustrating the relationship between the execution timing of program # 0 and the execution timing of program # 1. 統合前のＣＰＵ＿Ａ１とＣＰＵ＿Ｂ１、及び、統合用ＣＰＵの関係を説明する図の一例である。It is an example of the figure explaining the relationship between CPU_A1 before integration, CPU_B1, and CPU for integration. マルチスレッド実行装置１００の概略構成図の一例である。1 is an example of a schematic configuration diagram of a multi-thread execution device 100. FIG. スレッド＃０〜＃２とハードウェアリソースの関係の一例を模式的に示す図である。It is a figure which shows typically an example of the relationship between thread | sled # 0-# 2 and a hardware resource. 実行状況監視装置の概略構成図の一例である。It is an example of the schematic block diagram of an execution condition monitoring apparatus. コンパイラによる調整用コードの挿入を模式的に説明する図の一例である。It is an example of the figure which illustrates insertion of the code for adjustment by a compiler typically. アプリ実行プロファイルを模式的に示す図の一例である。It is an example of the figure which shows an application execution profile typically. 目標ＩＰＣの算出手順の一例を示す図である。It is a figure which shows an example of the calculation procedure of target IPC. 目標ＩＰＣ（統合前の実行タイミング）とアプリ実行プロファイルの関係を模式的に示す図の一例である。It is an example of the figure which shows typically the relationship between target IPC (execution timing before integration) and application execution profile. アプリ実行プログラム＃０〜２の実行比率を、少なくすべきか、多くすべきか、又は、現状維持すべきか、を登録したスケジューリング比率テーブルの一例を示す。An example of the scheduling ratio table which registered whether the execution ratio of application execution program # 0-2 should be decreased, should be increased, or should maintain the present condition is shown. 実行比率の調整を説明する図の一例である。It is an example of the figure explaining adjustment of an execution ratio. マルチスレッド実行装置の効果を説明する図の一例である。It is an example of the figure explaining the effect of a multithread execution apparatus. マルチスレッド実行装置がスレッドの実行比率を調整する手順を示すフローチャート図の一例である。It is an example of the flowchart figure which shows the procedure in which a multithread execution apparatus adjusts the execution ratio of a thread. マルチスレッド実行装置の概略構成図の一例である。It is an example of the schematic block diagram of a multithread execution apparatus. マルチスレッド実行装置の概略構成図の一例である。It is an example of the schematic block diagram of a multithread execution apparatus. マルチスレッド実行装置の概略構成図の一例である。It is an example of the schematic block diagram of a multithread execution apparatus.

以下、本発明を実施するための形態について図面を参照しながら説明する。
図３は、マルチスレッド実行装置１００の概略構成図の一例を示す。マルチスレッド実行装置１００は、例えば３つの統合前のＥＣＵ（Electronic Control Unit）を統合したマイコン２００を有する。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.
FIG. 3 shows an example of a schematic configuration diagram of the multithread execution apparatus 100. The multi-thread execution device 100 includes a microcomputer 200 in which, for example, three pre-integration ECUs (Electronic Control Units) are integrated.

始めにマルチスレッド実行装置１００が、単位時間における各プログラムの命令の実行回数を（以下、単に実行速度という場合がある）調整する手順の概略を説明する。各プログラムの命令の実行回数を調整することで、各プログラムの実行比率も調整される。
（１）メモリ２６に、アプリ実行プログラム＃０〜＃２を記憶しておく。アプリ実行プログラム＃０〜＃２は、例えば３つのプログラム（プログラム＃０、プログラム＃１、プログラム＃２）をコンパイルしたものである。コンパイル前のプログラム＃０〜＃２はアプリソース記憶部８０に記憶されている。
また、アプリソース記憶部８０には、統合前のＣＰＵにおける各プログラム＃０〜＃２の統合前実行速度情報（単位時間毎の各プログラムの命令実行完了数又は単位処理毎の実行サイクル数）が記憶されている。
（２）実行状況監視装置１４は、各アプリ実行プログラム＃０〜＃２（ハードウェア・マルチスレッド）の実行状況を監視する。具体的には、単位時間毎の各アプリ実行プログラム＃０〜＃２の命令実行完了数又は単位処理毎の実行サイクル数をカウントする。
（３）プロファイラ１２は、実行状況監視装置の監視した、単位時間毎の命令実行完了数又は単位処理毎の実行サイクル数から、アプリ実行プロファイルを生成する。アプリ実行プロファイルは、各アプリ実行プログラム＃０〜＃２の実際の実行状態を時間と共に記録したファイルとなる。
（４）コンパイラ２４は、アプリ実行プロファイルと統合前の統合前実行速度情報を比較して、アプリ実行プログラム＃０〜＃２のいずれか１以上に命令発行頻度を調整するための調整用コードを挿入する。
（５）以上の処理を何回か繰り返せば、アプリ実行プログラム＃０〜＃２の実際の実行速度は、統合前の実行速度と同程度となる。 First, an outline of a procedure in which the multi-thread execution device 100 adjusts the number of execution times of instructions of each program in a unit time (hereinafter sometimes simply referred to as an execution speed) will be described. By adjusting the number of executions of instructions of each program, the execution ratio of each program is also adjusted.
(1) Application execution programs # 0 to # 2 are stored in the memory 26. The application execution programs # 0 to # 2 are obtained by compiling, for example, three programs (program # 0, program # 1, and program # 2). The pre-compiled programs # 0 to # 2 are stored in the application source storage unit 80.
The application source storage unit 80 also stores pre-integration execution speed information (the number of instruction execution completions of each program per unit time or the number of execution cycles per unit process) of each program # 0 to # 2 in the CPU before integration. It is remembered.
(2) The execution status monitoring device 14 monitors the execution status of each application execution program # 0 to # 2 (hardware / multithread). Specifically, the number of instruction execution completions of each application execution program # 0 to # 2 per unit time or the number of execution cycles per unit process is counted.
(3) The profiler 12 generates an application execution profile from the number of instruction execution completions per unit time or the number of execution cycles per unit process monitored by the execution status monitoring device. The application execution profile is a file in which the actual execution state of each application execution program # 0 to # 2 is recorded with time.
(4) The compiler 24 compares the application execution profile with the pre-integration execution speed information before the integration, and provides an adjustment code for adjusting the instruction issue frequency to any one or more of the application execution programs # 0 to # 2. insert.
(5) If the above process is repeated several times, the actual execution speed of the application execution programs # 0 to # 2 will be approximately the same as the execution speed before integration.

したがって、マルチスレッド実行装置１００に複数のＥＣＵのプログラム＃０〜２を統合して搭載しても、アプリ実行プログラム＃０〜２の実行速度を統合前のＣＰＵで実行した場合とほぼ同じにすることができる。また、結果的に、アプリ実行プログラム＃０〜２の実行比率を、統合前のＣＰＵで実行した場合とほぼ同じにすることができる。 Therefore, even if a plurality of ECU programs # 0 to # 2 are integrated and installed in the multi-thread execution device 100, the execution speed of the application execution programs # 0 to # 2 is substantially the same as that executed by the CPU before the integration. be able to. As a result, the execution ratio of the application execution programs # 0 to # 2 can be made substantially the same as when executed by the CPU before integration.

また、統合の前後でプログラム＃０〜２を再コーディングする必要がないのでコスト低減が可能である。また、プログラム＃０〜２のタイミング依存検討が不要なので、ソフトウェアの流通性が飛躍的に高まる。また、単位処理の実行サイクル数をフィードバック制御に用いることで、異種マイコンへの統合が可能となる。 Further, since it is not necessary to recode the programs # 0 to 2 before and after the integration, the cost can be reduced. In addition, since it is not necessary to examine the timing dependence of the programs # 0 to # 2, software distribution is greatly improved. Further, by using the number of execution cycles of unit processing for feedback control, integration into different types of microcomputers becomes possible.

〔マルチスレッド実行装置１００のハードウェア構成〕
図３に示したように、マルチスレッド実行装置１００は、マイコン２００と開発環境３００とを有する。マイコン２００は、プロセッサ５０と、命令側メモリバス２２を介して接続されたメモリ２６、データ側メモリバス２３を介して接続されたデータ記憶部２７と、を有する。開発環境３００は、プロファイラ１２、アプリ実行プロファイル記憶部７０、アプリ実行プログラム記憶部６０、コンパイラ２４、及び、アプリソース記憶部８０を有する。当然ながら開発環境３００は、車載される段階では不要となる。アプリ実行プロファイル記憶部７０、アプリ実行プログラム記憶部６０、及び、アプリソース記憶部８０は、例えば、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）を実体とする。これら三者を別体に設ける必要はない。 [Hardware Configuration of Multi-Thread Execution Device 100]
As shown in FIG. 3, the multi-thread execution device 100 includes a microcomputer 200 and a development environment 300. The microcomputer 200 includes a processor 50, a memory 26 connected via the instruction-side memory bus 22, and a data storage unit 27 connected via the data-side memory bus 23. The development environment 300 includes a profiler 12, an application execution profile storage unit 70, an application execution program storage unit 60, a compiler 24, and an application source storage unit 80. Of course, the development environment 300 becomes unnecessary at the stage of being mounted on the vehicle. The application execution profile storage unit 70, the application execution program storage unit 60, and the application source storage unit 80 have, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive) as an entity. There is no need to provide these three members separately.

プロセッサ５０は、ハードウェア・マルチスレッド型の実行環境を有する。ハードウェア・マルチスレッドとは、命令バッファ１７、レジスタファイル１６（システムレジスタ等）をそれぞれ複数備え、これらを適宜切り替えて命令を実行する構造をいう。なお、プログラムやその一部の関数等を処理の実行単位としてスレッドと呼ぶことがあるが、本実施形態のスレッドとはハード的な命令の供給・実行手段をいう。図３では、３つの命令バッファ１７と３つのレジスタファイル１６を有するので、マルチスレッド実行装置１００は３つのスレッド＃０〜＃２を有する。 The processor 50 has a hardware multithread type execution environment. The hardware multithread means a structure in which a plurality of instruction buffers 17 and a register file 16 (system registers or the like) are provided, and instructions are executed by appropriately switching these. Note that a program, a part of the function, and the like may be referred to as a thread as a process execution unit, but the thread in this embodiment refers to a hardware instruction supply / execution means. In FIG. 3, since the three instruction buffers 17 and the three register files 16 are included, the multi-thread execution device 100 includes three threads # 0 to # 2.

図４はスレッド＃０〜＃２とハードウェアリソースの関係の一例を模式的に示す図である。マルチスレッド実行装置１００は、各スレッドを１クロックで切り替えて異なるプログラムを実行するので、物理的に存在する１つのプロセッサ５０により複数の仮想的なＣＰＵを有するように振る舞う。このため、命令バッファ＃０から演算回路１５までをスレッド＃０、命令バッファ＃１から演算回路１５までをスレッド＃１、命令バッファ＃２から演算回路１５までをスレッド＃２、という。スレッド＃０〜＃２とその他の周辺回路を併せてｖＣＰＵ（virtual ＣＰＵ）＃０〜＃２と称する。このように、仮想的なｖＣＰＵ＃０〜＃２を複数有することで、統合前のＥＣＵでそれぞれ独立に実行されていた異なるアプリ実行プログラム＃０〜＃２を独立に実行することができる。 FIG. 4 is a diagram schematically illustrating an example of the relationship between threads # 0 to # 2 and hardware resources. Since the multi-thread execution device 100 executes different programs by switching each thread in one clock, the multi-thread execution device 100 behaves as having a plurality of virtual CPUs by one processor 50 that physically exists. Therefore, the instruction buffer # 0 to the arithmetic circuit 15 are referred to as thread # 0, the instruction buffer # 1 to the arithmetic circuit 15 are referred to as thread # 1, and the instruction buffer # 2 to the arithmetic circuit 15 are referred to as thread # 2. The threads # 0 to # 2 and other peripheral circuits are collectively referred to as vCPUs (virtual CPU) # 0 to # 2. In this way, by having a plurality of virtual vCPUs # 0 to # 2, different application execution programs # 0 to # 2 that have been independently executed by the ECU before integration can be executed independently.

なお、本実施形態は、ハードウェア・マルチスレッドを対象に説明するが、マルチコアのマイコン２００に対しても、各プログラムの実行速度を調整することができる。この場合、マルチコアに共通のスレッドスケジュール装置を有すればよい。 Although the present embodiment will be described for hardware and multi-thread, the execution speed of each program can be adjusted even for the multi-core microcomputer 200. In this case, it is only necessary to have a thread schedule device common to the multi-cores.

命令バッファ＃０〜＃２は、それぞれ命令側メモリバス２２と接続されている。命令側メモリバス２２は、アプリ実行プログラム＃０〜＃２を記憶したＥＥＰＲＯＭ等のメモリ２６と接続されている。スレッドスケジュール装置１１は、命令バッファ＃０〜＃２毎に設けられた不図示のプログラムカウンタのアドレスを命令側メモリバス２２に出力して、命令バッファ＃０〜＃２にアプリ実行プログラム＃０〜＃２の命令を１つずつ読み出す。命令バッファ＃０〜＃２は、切り替えスイッチ１９に接続されている。切り替えスイッチ１９は、同時に１つの命令バッファ１７だけを命令デコーダ１８に接続するスイッチである。 The instruction buffers # 0 to # 2 are connected to the instruction-side memory bus 22, respectively. The instruction-side memory bus 22 is connected to a memory 26 such as an EEPROM that stores application execution programs # 0 to # 2. The thread schedule device 11 outputs the address of a program counter (not shown) provided for each of the instruction buffers # 0 to # 2 to the instruction-side memory bus 22, and stores the application execution programs # 0 to # 0 in the instruction buffers # 0 to # 2. Read instruction # 2 one by one. The instruction buffers # 0 to # 2 are connected to the changeover switch 19. The changeover switch 19 is a switch that connects only one instruction buffer 17 to the instruction decoder 18 at the same time.

スレッドスケジュール装置１１は、命令を発行する命令バッファ＃０〜＃２を選択する装置である。後述するように、スレッドスケジュール装置１１は、演算回路１５からの信号に応じて、命令を発行するスレッドを調整する。こうすることで、プログラム毎に独立にＩＰＣ（Instructions Per Clock cycle）を制御でき、命令発行比率を制御できる。 The thread schedule device 11 is a device that selects instruction buffers # 0 to # 2 that issue instructions. As will be described later, the thread schedule device 11 adjusts a thread that issues an instruction in accordance with a signal from the arithmetic circuit 15. By doing so, the IPC (Instructions Per Clock cycle) can be controlled independently for each program, and the instruction issue ratio can be controlled.

スレッドスケジュール装置１１は２つの切り替えスイッチ１９、２１とそれぞれ接続されており、切り替えスイッチ１９の接続先を命令バッファ＃０〜＃２のいずれかに切り替えることで、命令デコーダ１８に接続する命令バッファ＃０〜＃２を切り替える。スレッドスケジュール装置１１が選択したスレッド＃０〜＃２の命令が命令バッファ＃０〜＃２から読み出され、命令デコーダ１８に送出される。また、スレッドスケジュール装置１１は、命令バッファ＃０〜＃２の全てを命令デコーダ１８と切り離すこともできる。同様に、スレッドスケジュール装置１１は、切り替えスイッチ２１の接続先をレジスタファイル＃０〜＃２のいずれかに切り替えることで、命令デコーダ１８と演算回路１５が使用するレジスタファイル＃０〜＃２を切り替える。 The thread schedule device 11 is connected to two changeover switches 19 and 21, respectively. The instruction buffer # connected to the instruction decoder 18 is switched by switching the connection destination of the changeover switch 19 to one of the instruction buffers # 0 to # 2. Switch between 0 and # 2. Instructions of threads # 0 to # 2 selected by the thread schedule device 11 are read from the instruction buffers # 0 to # 2 and sent to the instruction decoder 18. The thread schedule device 11 can also disconnect all of the instruction buffers # 0 to # 2 from the instruction decoder 18. Similarly, the thread schedule device 11 switches the register file # 0 to # 2 used by the instruction decoder 18 and the arithmetic circuit 15 by switching the connection destination of the changeover switch 21 to one of the register files # 0 to # 2. .

命令デコーダ１８は命令を解読し、解読結果をそのスレッドのレジスタファイル＃０〜＃２に転送すると共に、パイプライン制御回路１３に送出する。レジスタファイル＃０〜＃２は、演算回路１５の演算結果、及び、データ側メモリバス２３に接続されたデータ記憶部２７から読み出されたデータを一時的に記憶する一群のレジスタである。命令デコーダ１８による解読結果は、例えば、演算の種別、１以上のソースオペランド、結果の格納場所、等である。ソースオペランドはレジスタファイル１６に供給されるので、これにより演算回路１５が演算に使用するレジスタが指定される。 The instruction decoder 18 decodes the instruction, transfers the decoded result to the register file # 0 to # 2 of the thread, and sends it to the pipeline control circuit 13. The register files # 0 to # 2 are a group of registers that temporarily store the calculation result of the calculation circuit 15 and the data read from the data storage unit 27 connected to the data-side memory bus 23. The result of decoding by the instruction decoder 18 is, for example, the type of operation, one or more source operands, the result storage location, and the like. Since the source operand is supplied to the register file 16, the register used by the arithmetic circuit 15 for the operation is specified.

命令デコーダ１８が演算の種類をパイプライン制御回路１３に送出すると、パイプライン制御回路１３は演算回路１５が実行する演算を指定する。演算回路１５は、演算の種類に応じて、レジスタファイル１６に記憶されたデータに演算を施す。演算の内容は、ストア、ロード、加算、乗算、除算、分岐等、演算回路１５に応じて種々のものが用意されている。そして、ストアやロード命令の場合、演算回路１５は演算したアドレスを指定してデータ側メモリバス２３からデータをフェッチする。そして、演算回路１５は、加算等の演算結果、又は、読み出したデータを、レジスタファイル１６の、結果の格納場所により指定されるレジスタにライトバックする。 When the instruction decoder 18 sends the type of operation to the pipeline control circuit 13, the pipeline control circuit 13 specifies the operation to be executed by the operation circuit 15. The arithmetic circuit 15 performs an operation on the data stored in the register file 16 according to the type of operation. Various types of calculation contents are prepared according to the calculation circuit 15 such as store, load, addition, multiplication, division, and branch. In the case of a store or load instruction, the arithmetic circuit 15 designates the calculated address and fetches data from the data-side memory bus 23. Then, the arithmetic circuit 15 writes back the operation result such as addition or the read data to the register designated by the result storage location in the register file 16.

パイプライン制御回路１３は、以上のパイプライン制御の各ステージ（命令フェッチ、命令デコード、命令実行、オペランドフェッチ、ライトバック等）を動作クロックに基づき制御する。また、パイプライン処理ではハザード（各ステージの処理を決まった時間内に終わらせることを阻害する要因）が不可避なので、パイプライン制御回路１３は、演算の種類やソースオペランド等を参照して、パイプラインにストールを発生させたり、ＮＯＰ命令を挿入したり、分岐により不要となった各ステージの内容をフラッシュするなどの処理を行う。例えば、あるスレッド（例えばスレッド＃０）にＩ／Ｏ待ちのようなハザードが生じた場合、パイプライン制御回路１３はスレッドスケジュール装置１１にそのスレッド＃０を停止させ、別のスレッド（例えばスレッド＃１）を実行するよう要求する。 The pipeline control circuit 13 controls each stage of the above pipeline control (instruction fetch, instruction decode, instruction execution, operand fetch, write back, etc.) based on the operation clock. In pipeline processing, hazards (factors that hinder the completion of processing in each stage within a fixed time) are unavoidable, so the pipeline control circuit 13 refers to the type of operation, source operand, etc. Processing such as causing a stall in the line, inserting a NOP instruction, or flushing the contents of each stage that has become unnecessary due to branching is performed. For example, when a hazard such as waiting for I / O occurs in a certain thread (for example, thread # 0), the pipeline control circuit 13 causes the thread scheduling device 11 to stop the thread # 0, and another thread (for example, thread # 0). Request to execute 1).

パイプライン制御回路１３は実行状況監視装置１４と接続されている。実行状況監視装置１４は、スレッド毎に、単位時間当たりの命令実行完了数又は単位処理の実行サイクル数をカウントする（以下、両者を区別しない場合、単に「カウント値」という。）。 The pipeline control circuit 13 is connected to the execution status monitoring device 14. The execution status monitoring device 14 counts the number of instruction execution completions per unit time or the number of execution cycles of unit processing for each thread (hereinafter simply referred to as “count value” when they are not distinguished from each other).

〔実行状況監視装置１４〕
図５は、実行状況監視装置１４の概略構成図の一例を示す。実行状況監視装置１４は、実行状況監視部１４１とカウンタ部１４２を有する。単位時間当たりの命令実行完了数又は単位処理の実行サイクル数は、いずれも広い意味で命令の実行速度を表すものである。単位時間当たりの命令実行完了数は、統合前のＣＰＵとｖＣＰＵ＃０〜＃２の命令セットアーキテクチャ（ＩＳＡ）が同一の場合に、コンパイラ２４がアプリ実行プログラム＃０〜＃２のいずれか１以上に調整用コードを挿入する際に用いられる。単位処理の実行サイクル数は、ＩＳＡが異なる場合に、コンパイラ２４がアプリ実行プログラム＃０〜＃２のいずれか１以上に調整用コードを挿入する際に用いられる。 [Execution status monitoring device 14]
FIG. 5 shows an example of a schematic configuration diagram of the execution status monitoring device 14. The execution status monitoring device 14 includes an execution status monitoring unit 141 and a counter unit 142. The number of instruction execution completions per unit time or the number of execution cycles of unit processing represents the instruction execution speed in a broad sense. The number of instruction execution completions per unit time is one or more of the application execution programs # 0 to # 2 when the compiler before the integration and the instruction set architecture (ISA) of the vCPUs # 0 to # 2 are the same. Used when inserting the adjustment cord into The number of execution cycles of unit processing is used when the compiler 24 inserts the adjustment code into any one or more of the application execution programs # 0 to # 2 when the ISA is different.

ＩＳＡは、ＣＰＵに実装された一式の命令形式（二進数のオペコード）を意味する。したがって、ＩＳＡが同一であればある命令を実行完了するまでの実行サイクル数は、統合前のＣＰＵとｖＣＰＵ＃０〜＃２で同じである。この場合、単位時間当たりの命令実行完了数の違いは、動作周波数にのみ起因すると考えられる。一方、ＩＳＡが異なる場合はある命令を実行完了するまでの実行サイクル数が統合前のＣＰＵとｖＣＰＵで異なるので、単位時間当たりの命令実行完了数がソースコード上の命令実行完了の進捗と一致しない。このため、統合前のＣＰＵとｖＣＰＵ＃０〜＃２のＩＳＡが異なる場合、単位処理の実行サイクル数を利用して、コンパイラ２４が、ソースコード上の実行の進捗が一致するように、調整用コードを挿入する。 ISA means a set of instruction formats (binary opcodes) implemented in the CPU. Therefore, if the ISA is the same, the number of execution cycles until execution of a certain instruction is completed is the same for the CPU before integration and the vCPUs # 0 to # 2. In this case, the difference in the number of instruction execution completions per unit time can be attributed only to the operating frequency. On the other hand, if the ISA is different, the number of execution cycles until execution of a certain instruction is completed differs between the CPU before integration and the vCPU, so the number of instruction execution completions per unit time does not match the progress of instruction execution completion on the source code. . For this reason, when the CPU before integration and the ISAs of the vCPUs # 0 to # 2 are different, the compiler 24 uses the number of execution cycles of unit processing so that the execution progress on the source code matches. Insert the code.

・単位時間毎の命令実行完了数
まず、単位時間毎の命令実行完了数のカウントについて説明する。パイプライン制御回路１３は、１つの命令をパイプライン処理の各ステージに遷移させながら、回路やレジスタにデータを送出する。したがって、１つの命令が実行完了したこと（例えば、レジスタファイル１６へのライトバック時）は、パイプライン制御回路１３にとって既知である。また、どのスレッド＃０〜＃２の命令を実行完了したかは、スレッドスケジュール装置１１や切り替えスイッチ１９の状態から明らかとなる。パイプライン制御回路１３は、１つの命令が実行完了すると命令実行完了情報を実行状況監視装置１４に出力する。具体的には、パイプライン制御回路１３と実行状況監視装置１４がスレッドの数だけ信号線で接続されており、パイプライン制御回路１３は実行完了したスレッドに対応した信号線にＨｉｇｈ信号を出力する。 -Number of instruction execution completions per unit time First, the count of instruction execution completions per unit time will be described. The pipeline control circuit 13 sends data to a circuit or a register while changing one instruction to each stage of pipeline processing. Therefore, it is known to the pipeline control circuit 13 that execution of one instruction has been completed (for example, when writing back to the register file 16). Further, it is clear from the state of the thread schedule device 11 and the changeover switch 19 which thread # 0 to # 2 has executed the instruction. The pipeline control circuit 13 outputs instruction execution completion information to the execution state monitoring device 14 when execution of one instruction is completed. Specifically, the pipeline control circuit 13 and the execution status monitoring device 14 are connected by signal lines as many as the number of threads, and the pipeline control circuit 13 outputs a High signal to the signal line corresponding to the thread that has completed execution. .

実行状況監視部１４１は命令実行完了情報を取得すると、カウンタ部１４２にカウントアップを要求する。具体的にはスレッド毎に設けられたカウント回路＃０〜＃２にＨｉｇｈ信号を出力する。これにより、カウンタ部１４２は、スレッド毎に実行完了した実行完了命令数をカウントできる。単位時間の命令実行完了数は、大きいほど実行速度が大きいことを意味する。 When acquiring the instruction execution completion information, the execution status monitoring unit 141 requests the counter unit 142 to count up. Specifically, a High signal is output to count circuits # 0 to # 2 provided for each thread. Accordingly, the counter unit 142 can count the number of execution completion instructions that have been executed for each thread. The larger the number of instruction execution completions per unit time, the higher the execution speed.

そして、カウンタ部１４２は、単位時間毎に、カウント値をプロファイラ１２に出力する。また、実行状況監視部１４１は、カウンタ部１４２がカウント値をプロファイラ１２に出力した直後に、カウンタ部１４２をリセット（ゼロに戻す）する。したがって、カウンタ部１４２は、単位時間毎の命令実行完了数をプロファイラ１２に出力することができる。なお、単位時間は、例えば、動作クロックの×１０〜１０００程度である。単位時間が余り短いとカウント値の変動が大きく、長いと実行時間比率を制御しにくいので、単位時間として極端に短くない程度の実行サイクル数が、実行状況監視装置１４に設定されている。 The counter unit 142 outputs a count value to the profiler 12 every unit time. The execution status monitoring unit 141 resets (returns to zero) the counter unit 142 immediately after the counter unit 142 outputs the count value to the profiler 12. Therefore, the counter unit 142 can output the instruction execution completion number per unit time to the profiler 12. The unit time is, for example, about x10 to 1000 of the operation clock. If the unit time is too short, the variation of the count value is large, and if the unit time is too long, it is difficult to control the execution time ratio. Therefore, the number of execution cycles that is not extremely short as the unit time is set in the execution status monitoring device 14.

・単位処理の実行サイクル数
単位処理の実行サイクル数のカウントについて説明する。単位処理とは、例えば一連の処理をまとめた関数である。例えばＣ言語では「関数名〜｛Return；｝」が１つの単位処理である。プロセッサ５０がコンパイルされたオブジェクトコードを実行する場合、コンパイルにより、オブジェクトコードには、関数の始まり・終わりにそれぞれスタックポインタの初期化・後処理する所定のコードが記述される。パイプライン制御回路１３は、命令デコーダ１８から、これら関数の始まりを示すコードと終わり示すコードを検出するとＨｉｇｈ信号を実行状況監視装置１４に出力する。 -Number of execution cycles of unit processing The counting of the number of execution cycles of unit processing will be described. The unit process is a function that summarizes a series of processes, for example. For example, in C language, “function name˜ {Return;}” is one unit process. When the processor 50 executes the compiled object code, a predetermined code for initialization and post-processing of the stack pointer is described in the object code by the compilation at the beginning and end of the function, respectively. When the pipeline control circuit 13 detects a code indicating the start and end of these functions from the instruction decoder 18, it outputs a High signal to the execution status monitoring device 14.

実行状況監視部１４１は、関数の始まりのＨｉｇｈ信号を取得するとカウンタ部１４２に動作クロックのカウント開始を要求し、次に終わりのＨｉｇｈ信号を取得するとカウンタ部１４２にカウント値をプロファイラ１２に出力させる。また、実行状況監視部１４１は、Ｈｉｇｈ信号を取得する度にカウンタ部１４２をリセット（ゼロに戻す）する。したがって、カウンタ部１４２は、関数の実行完了に必要な実行サイクル数（単位処理毎の実行サイクル数）をカウントして出力することができる。単位処理毎の実行サイクル数は、大きいほど実行速度が遅いことを意味する。 The execution status monitoring unit 141 requests the counter unit 142 to start counting the operation clock when the high signal at the start of the function is acquired, and causes the counter unit 142 to output the count value to the profiler 12 when the high signal at the end is acquired. . In addition, the execution status monitoring unit 141 resets (returns to zero) the counter unit 142 every time a High signal is acquired. Therefore, the counter unit 142 can count and output the number of execution cycles (the number of execution cycles for each unit process) necessary for completing the execution of the function. The larger the number of execution cycles per unit process, the slower the execution speed.

単位処理の実行サイクル数をフィードバック制御に用いることで、ＩＳＡが異なる例えば異種マイコンへ複数のプログラム＃０〜＃２を移植することが可能となる。 By using the number of execution cycles of unit processing for feedback control, for example, a plurality of programs # 0 to # 2 can be ported to different types of microcomputers having different ISAs.

後述するように、命令実行完了数をカウントした単位時間毎や単位処理の実行サイクル数毎に調整用コードが挿入されることで、実行速度の調整が容易になる。このため、単位処理の実行サイクル数毎に実行速度を調整したのでは、実行速度の調整が遅れるようにあれば、基本ブロック単位の実行サイクル数毎に、調整用コードを挿入する。基本ブロックは、割り込みなどの特殊な状況を除いて、連続して実行されるひとかたまりの命令列をいう。例えば、分岐命令による分岐先から次に分岐されるまでの一連の命令である。基本ブロックはタスクよりも小さい単位なので、適切なタイミングで実行速度を調整することが可能になる。基本ブロックの区切りは、パイプライン制御装置１３が命令から検出できる。 As will be described later, the adjustment of the execution speed is facilitated by inserting the adjustment code every unit time in which the instruction execution completion count is counted or every execution cycle number of the unit processing. For this reason, if the execution speed is adjusted for each number of execution cycles of unit processing, an adjustment code is inserted for each execution cycle number of basic block units if the adjustment of the execution speed is delayed. A basic block is a group of instructions that are executed in succession except for special situations such as interrupts. For example, it is a series of instructions from the branch destination by the branch instruction to the next branch. Since the basic block is a unit smaller than the task, the execution speed can be adjusted at an appropriate timing. The break of the basic block can be detected from the instruction by the pipeline control device 13.

〔コンパイラ２４による調整用コードの挿入〕
図６は、コンパイラ２４による調整用コードの挿入を模式的に説明する図の一例である。まず、プロファイラ１２によるアプリ実行プロファイルの生成について説明する。
図７（ａ）は、アプリ実行プロファイルを模式的に示す図の一例である。横軸は時間、縦軸はＩＰＣであるが、単位処理毎の実行サイクル数としてもよい。実行状況監視装置１４が、アプリ実行プログラム＃０〜＃２毎に、単位時間毎の命令実行完了数をプロファイラ１２に出力すると、プロファイラ１２は図７（ａ）のようなアプリ実行プロファイルを生成し、アプリ実行プロファイル記憶部７０に記憶する。 [Inserting adjustment code by compiler 24]
FIG. 6 is an example of a diagram for schematically explaining the insertion of the adjustment code by the compiler 24. First, generation of an application execution profile by the profiler 12 will be described.
FIG. 7A is an example of a diagram schematically illustrating an application execution profile. Although the horizontal axis represents time and the vertical axis represents IPC, the number of execution cycles per unit process may be used. When the execution status monitoring device 14 outputs the number of instruction execution completions per unit time to the profiler 12 for each of the application execution programs # 0 to # 2, the profiler 12 generates an application execution profile as shown in FIG. And stored in the application execution profile storage unit 70.

図７（ｂ）は、各アプリ実行プログラム＃０〜＃２のアプリ実行プロファイルを時間に対し重畳したアプリ実行プロファイルの一例を示す。図７（ｂ）は、ある時刻における各アプリ実行プログラム＃０〜＃２の実行比率を示すもので、面積比はアプリ実行プログラム＃０〜＃２の実行時間比率に等しい。本実施形態のマルチスレッド実行装置１００は、このようなアプリ実行プログラム＃０〜＃２の実行比率を、統合前と同程度にすることが特徴である。 FIG. 7B shows an example of an application execution profile in which the application execution profiles of the application execution programs # 0 to # 2 are superimposed on time. FIG. 7B shows the execution ratio of each of the application execution programs # 0 to # 2 at a certain time, and the area ratio is equal to the execution time ratio of the application execution programs # 0 to # 2. The multi-thread execution device 100 of the present embodiment is characterized in that the execution ratio of such application execution programs # 0 to # 2 is set to the same level as before integration.

＜統合前実行速度情報＞
図７（ｃ）は統合前のＣＰＵ＃０〜＃２によるアプリ実行プログラム＃０〜＃２のＩＰＣの一例を示す。図７（ｃ）の３つのアプリ実行プログラム＃０〜＃２のこのような実行速度を、マルチスレッド実行装置１００にて維持できれば、アプリ実行プログラム＃０〜＃２の命令発行比率が適切であることになる。 <Execution speed information before integration>
FIG. 7C shows an example of the IPC of the application execution programs # 0 to # 2 by the CPUs # 0 to # 2 before integration. If such an execution speed of the three application execution programs # 0 to # 2 in FIG. 7C can be maintained by the multithread execution device 100, the instruction issue ratio of the application execution programs # 0 to # 2 is appropriate. It will be.

しかしながら、プログラム＃０〜２をマイコン２００に移植する際は、統合前のＣＰＵ＃０〜＃２が実行していたアプリ実行プログラム＃０〜＃２を、処理能力の高いｖＣＰＵ＃０〜＃２に実行させることが多い。このため、ｖＣＰＵ＃０〜＃２の処理能力を最大に活用してアプリ実行プログラム＃０〜＃２を実行すると、単位時間の演算量が多くなってしまう。すなわち、ｖＣＰＵ＃０〜＃２によるアプリ実行プログラム＃０〜＃２の実行速度を保証できない。 However, when porting the programs # 0 to # 2 to the microcomputer 200, the application execution programs # 0 to # 2 executed by the CPUs # 0 to # 2 before the integration are transferred to the vCPUs # 0 to # 2 having a high processing capability. Often run. For this reason, if the application execution programs # 0 to # 2 are executed by maximizing the processing capability of the vCPUs # 0 to # 2, the amount of calculation per unit time increases. That is, the execution speed of the application execution programs # 0 to # 2 by the vCPUs # 0 to # 2 cannot be guaranteed.

そこで、統合前のＣＰＵ＃０〜＃２によるアプリ実行プログラム＃０〜＃２のＩＰＣから、ｖＣＰＵ＃０〜＃２における目標ＩＰＣを決定する方法について説明する。 Therefore, a method for determining the target IPC in the vCPUs # 0 to # 2 from the IPCs of the application execution programs # 0 to # 2 by the CPUs # 0 to # 2 before integration will be described.

図８（ａ）は目標ＩＰＣの算出手順の一例を示す図である。図示するように、目標ＩＰＣは、次式により算出される。
目標ＩＰＣ＝統合前のＣＰＵのＩＰＣ ÷ 統合前のＣＰＵの動作周波数とｖＣＰＵの動作周波数の比 … （１）
統合前のＣＰＵとｖＣＰＵ＃０〜＃２のＩＳＡが同一の場合、実行速度は動作周波数により定まるとしてよいので、統合前のＣＰＵの動作周波数とｖＣＰＵの動作周波数の比から、目標ＩＰＣを求めればよい。統合前のＣＰＵの動作周波数とｖＣＰＵの動作周波数は、例えばコンパイラ２４にとって既知である。 FIG. 8A is a diagram illustrating an example of a target IPC calculation procedure. As shown in the figure, the target IPC is calculated by the following equation.
Target IPC = IPC of CPU before integration / ratio of CPU operating frequency before integration and vCPU operating frequency (1)
If the CPU before integration and the ISAs of vCPUs # 0 to # 2 are the same, the execution speed may be determined by the operating frequency. Therefore, if the target IPC is obtained from the ratio of the operating frequency of the CPU before integration and the operating frequency of the vCPU. Good. The operating frequency of the CPU before the integration and the operating frequency of the vCPU are known to the compiler 24, for example.

一般には統合前のＣＰＵよりもｖＣＰＵの方が動作周波数が大きくなるため、統合前のＣＰＵの動作周波数とｖＣＰＵ＃０〜＃２の動作周波数の比は１以上となることが多い。例えば、統合前のＣＰＵのＩＰＣが例えば「０．８」であり、統合前のＣＰＵの動作周波数とｖＣＰＵ＃０〜＃２の動作周波数の比が「２」の場合、目標ＩＰＣは「０．４」となる。図７（ｃ）のような、統合前のＣＰＵ＃０〜＃２によるアプリ実行プログラム＃０〜＃２のＩＰＣを、動作周波数の比「２」で割れば、目標ＩＰＣが得られる。 In general, since the operating frequency of the vCPU is higher than that of the CPU before integration, the ratio between the operating frequency of the CPU before integration and the operating frequency of the vCPUs # 0 to # 2 is often 1 or more. For example, when the IPC of the CPU before integration is “0.8”, for example, and the ratio between the operating frequency of the CPU before integration and the operating frequency of the vCPUs # 0 to # 2 is “2”, the target IPC is “0. 4 ". As shown in FIG. 7C, the target IPC is obtained by dividing the IPC of the application execution programs # 0 to # 2 by the CPUs # 0 to # 2 before integration by the operating frequency ratio “2”.

また、ＩＰＣでなく、単位処理の実行サイクル数から、統合前実行速度情報を算出してもよい。図８（ｂ）は統合前実行速度情報としての目標サイクル数の算出手順の一例を示す図である。図示するように、目標実行サイクル数は、次式により算出される。
目標実行サイクル数＝統合前のＣＰＵにおける単位処理の実行サイクル数 × 統合前のＣＰＵの動作周波数とｖＣＰＵの動作周波数の比 … （２）
統合前のＣＰＵとｖＣＰＵ＃０〜＃２のＩＳＡが同一でない場合、両者で同じ単位処理の実行サイクル数が異なるので、単位処理の実行サイクル数を、動作周波数の比で調整すればよい。 Further, the pre-integration execution speed information may be calculated from the number of execution cycles of unit processing instead of the IPC. FIG. 8B is a diagram showing an example of a procedure for calculating the target number of cycles as pre-integration execution speed information. As shown in the figure, the target number of execution cycles is calculated by the following equation.
Target execution cycle number = number of execution cycles of unit processing in CPU before integration × ratio of CPU operating frequency before integration and vCPU operating frequency (2)
When the CPUs before integration and the ISAs of the vCPUs # 0 to # 2 are not the same, the number of execution cycles of the same unit process is different between them, and therefore the number of execution cycles of the unit process may be adjusted by the ratio of the operating frequencies.

例えば、統合前のＣＰＵの単位処理の実行サイクル数が「１００」であり、統合前のＣＰＵの動作周波数とｖＣＰＵ＃０〜＃２の動作周波数の比が「２」の場合、目標実行サイクル数は「２００」となる。すなわち、ｖＣＰＵ＃０〜＃２が統合前のＣＰＵよりも早いので、単位処理の実行サイクル数を多くする必要がある。 For example, when the number of execution cycles of the unit processing of the CPU before integration is “100” and the ratio between the operating frequency of the CPU before integration and the operating frequency of the vCPUs # 0 to # 2 is “2”, the target number of execution cycles Becomes “200”. That is, since the vCPUs # 0 to # 2 are faster than the CPUs before integration, it is necessary to increase the number of execution cycles of unit processing.

図７（ｃ）と同様に、統合前のＣＰＵ＃０〜＃２によるアプリ実行プログラム＃０〜＃２の、単位処理の実行サイクル数に、動作周波数の比「２」を乗じれば、目標実行サイクル数が得られる。 Similarly to FIG. 7C, the target number of executions of the application execution programs # 0 to # 2 by the CPUs # 0 to # 2 before integration is multiplied by the operation frequency ratio “2”. The number of execution cycles is obtained.

＜調整用コード＞
図６に戻り、調整用コードについて説明する。アプリソース記憶部８０に各プログラムの統合前実行速度情報が得られ、アプリ実行プロファイル記憶部７０にアプリ実行プロファイルが記憶されている。この状態で、コンパイラ２４は両者を単位時間毎又は単位処理毎に比較しながら、アプリ実行プログラム＃０〜＃２に命令発行比率を調整するための調整用コードを挿入する。 <Adjustment code>
Returning to FIG. 6, the adjustment code will be described. The pre-integration execution speed information of each program is obtained in the application source storage unit 80, and the application execution profile is stored in the application execution profile storage unit 70. In this state, the compiler 24 inserts an adjustment code for adjusting the instruction issue ratio into the application execution programs # 0 to # 2 while comparing the two for each unit time or for each unit process.

図９は、目標ＩＰＣとアプリ実行プロファイルの関係を模式的に示す図の一例である。図９では、統合前のＣＰＵの動作周波数とｖＣＰＵの動作周波数の比を「２」にした。このため、統合前のＣＰＵのＩＰＣが「０．８」なら、目標ＩＰＣは「０．４」に、統合前のＣＰＵのＩＰＣが「０．７５」なら、目標ＩＰＣは「０．３５」になっている。 FIG. 9 is an example of a diagram schematically illustrating the relationship between the target IPC and the application execution profile. In FIG. 9, the ratio between the operating frequency of the CPU before integration and the operating frequency of the vCPU is “2”. Therefore, if the IPC of the CPU before integration is “0.8”, the target IPC is “0.4”, and if the IPC of the CPU before integration is “0.75”, the target IPC is “0.35”. It has become.

例えば、タスク毎にＩＰＣを比較する場合、アプリ実行プロファイルにおける、タスク１（時刻Ｔ０〜Ｔ９９）の平均のＩＰＣが「０．２８」になっている。目標ＩＰＣが「０．４」なので、コンパイラ２４は、このプログラム（＃０〜＃２のいずれか）の実行速度が小さいので、実行速度を増大させる必要があると判定する。 For example, when comparing IPC for each task, the average IPC of task 1 (time T0 to T99) in the application execution profile is “0.28”. Since the target IPC is “0.4”, the compiler 24 determines that the execution speed needs to be increased because the execution speed of this program (any one of # 0 to # 2) is low.

また、例えば、アプリ実行プロファイルにおける、タスク２（時刻Ｔ１００〜Ｔ１９９）の平均のＩＰＣが「０．４５」になっている。目標ＩＰＣが「０．４」なので、コンパイラ２４は、このプログラム（＃０〜＃２のいずれか）の単位時間の実行速度が大きいので、実行速度を減少させる必要があると判定する。 Also, for example, the average IPC of task 2 (time T100 to T199) in the application execution profile is “0.45”. Since the target IPC is “0.4”, the compiler 24 determines that the execution speed needs to be reduced because the execution speed of the program (any one of # 0 to # 2) is high.

コンパイラ２４は、単位処理毎（タスク毎）や基本ブロック毎に、アプリ実行プロファイルのＩＰＣと目標ＩＰＣを比較する。これを各アプリ実行プログラム＃０〜＃２について行う。こうすることで、タスク毎や基本ブロック毎に、アプリ実行プログラム＃０〜２の実行速度を、大きくすべきか、小さくすべきか、又は、現状維持すべきか、を決定できる。 The compiler 24 compares the IPC of the application execution profile with the target IPC for each unit process (for each task) or for each basic block. This is performed for each of the application execution programs # 0 to # 2. By doing so, it is possible to determine whether the execution speed of the application execution programs # 0 to # 2 should be increased, decreased, or maintained for each task and each basic block.

図１０は、アプリ実行プログラム＃０〜２の実行速度を、小さくすべきか、大きくすべきか、又は、現状維持すべきか、を登録したスケジューリング比率テーブルの一例を示す。なお、時間帯は、例えば１つのタスクや基本ブロックを表す。したがって、アプリ実行プログラム＃０〜２に挿入される調整用コードが、アプリ実行プログラム＃０〜＃２の実行上において同じタイミングになるとは限らない。 FIG. 10 shows an example of a scheduling ratio table that registers whether the execution speed of the application execution programs # 0 to # 2 should be reduced, increased, or maintained. The time zone represents, for example, one task or basic block. Therefore, the adjustment code inserted into the application execution programs # 0 to # 2 does not always have the same timing when executing the application execution programs # 0 to # 2.

コンパイラ２４が、アプリ実行プログラム＃０〜＃２に実行速度を調整するための調整用コードを挿入する態様には、以下の２つの態様がある。
（Ａ）タスクのコードの先頭に、スケジューリング比率テーブルに応じた調整用コードを追加する態様
（Ｂ）スケジューリング調整用のタスク（このタスク全体が調整用コードとなる）と、スケジューリング比率テーブルの全体をアプリソース記憶部８０に記憶して、アプリ実行プログラム記憶部６０にオブジェクトとして追加する態様
いずれの態様でも、調整用コードがコンパイラ２４によりコンパイルされるので、アプリ実行プログラム＃０〜２と共にプロセッサ５０により実行される。なお、コンパイラ２４は、コンパイルと共に調整用コードを挿入してもよいし、コンパイル後に調整用コードをオブジェクトコードに挿入してもよい。 There are the following two modes in which the compiler 24 inserts the adjustment code for adjusting the execution speed into the application execution programs # 0 to # 2.
(A) A mode in which an adjustment code corresponding to the scheduling ratio table is added to the head of the task code. (B) A task for scheduling adjustment (the entire task becomes the adjustment code) and the entire scheduling ratio table. Aspect of storing in the application source storage unit 80 and adding as an object to the application execution program storage unit 60 In any of the aspects, the adjustment code is compiled by the compiler 24, so the processor 50 together with the application execution program # 0-2 Executed. The compiler 24 may insert the adjustment code together with the compilation, or may insert the adjustment code into the object code after the compilation.

（Ａ）の態様では、調整用コードが演算回路１５により実行されると、プロセッサ５０が、例えばＯＳのスケジューラを呼び出して、アプリ実行プログラム＃０〜２の実行速度を「大」「小」「現」のうちどのよう調整するかの速度調整情報をスレッドスケジュール装置１１に設定する。実際には速度調整情報はフラグ等により実現される。例えば、アプリ実行プログラム＃０の速度調整情報が「大」の場合、スレッドスケジュール装置１１は、アプリ実行プログラム＃０の命令の発行比率をそれまでよりも多くする。また、例えば、プログラム＃１の速度調整情報が、「小」の場合、スレッドスケジュール装置１１は、プログラム＃１の命令の発行比率をそれまでよりも少なくする。なお、複数のアプリ実行プログラム＃０〜２の２以上から「大」の速度調整情報が設定された場合、スレッドスケジュール装置１１はアプリ実行プログラム＃０〜２の優先順位の高いものから命令の発行比率を調整する。このような状況が生じても実行速度を調整できるように、プロセッサ５０の処理能力には予め余裕を持たせてあるものとする。すなわち、統合前のＣＰＵをそれぞれ合計した以上の処理能力を、プロセッサ５０が有している。 In the aspect of (A), when the adjustment code is executed by the arithmetic circuit 15, the processor 50 calls the scheduler of the OS, for example, and sets the execution speed of the application execution programs # 0 to # 2 to “large”, “small”, “ The speed adjustment information on how to adjust the “current” is set in the thread schedule device 11. Actually, the speed adjustment information is realized by a flag or the like. For example, when the speed adjustment information of the application execution program # 0 is “large”, the thread schedule device 11 increases the issuance ratio of the instructions of the application execution program # 0 than before. Further, for example, when the speed adjustment information of the program # 1 is “small”, the thread schedule device 11 decreases the issuance ratio of instructions of the program # 1 than before. When “large” speed adjustment information is set from two or more of the plurality of application execution programs # 0 to # 2, the thread schedule device 11 issues instructions from the application execution programs # 0 to # 2 having the highest priority. Adjust the ratio. It is assumed that the processing capacity of the processor 50 has a margin in advance so that the execution speed can be adjusted even in such a situation. That is, the processor 50 has more processing capacity than the total of the CPUs before integration.

また、調整用コードとして「多」又は「小」の２値でなく、「多」について「＋５」〜「＋１」の多段の数値を、「小」について「−５」〜「−１」の多段の数値を、用いてもよい。このような数値は、コンパイラ２４が、目標ＩＰＣとアプリ実行プロファイルのＩＰＣの乖離に応じて決定する。そして、ＯＳが多値の速度調整情報をスレッドスケジュール装置１１に設定する。この場合、スレッドスケジュール装置１１は、多値の値に応じて、プログラム＃０〜２の各命令の発行頻度を調整する。 Also, the adjustment code is not a binary value of “many” or “small”, but a multi-stage numerical value of “+5” to “+1” for “many”, and “−5” to “−1” for “small”. Multi-stage numerical values may be used. Such a numerical value is determined by the compiler 24 according to the difference between the target IPC and the IPC of the application execution profile. Then, the OS sets multi-value speed adjustment information in the thread schedule device 11. In this case, the thread schedule device 11 adjusts the issue frequency of each instruction of the programs # 0 to # 2 according to the multivalued value.

なお、（Ａ）の態様において、スレッドスケジュール装置１１に速度調整情報を設定するのは、ＯＳである必要はなく、アプリ実行プログラム＃０〜＃２が直接、設定してもよいし、ドライバプログラムを呼び出して設定してもよい。 In the aspect of (A), it is not necessary for the OS to set the speed adjustment information in the thread schedule device 11, and the application execution programs # 0 to # 2 may directly set the driver adjustment program. May be set by calling.

（Ｂ）の態様では、スケジューリング調整用のタスクが、各アプリ実行プログラム＃０〜＃２の実行開始時に、スケジューリング比率テーブルを読み出し、アプリ実行プログラム＃０〜２の進捗に応じて、速度調整情報をスレッドスケジュール装置１１に設定する。スレッドスケジュール装置１１による、スケジューリング比率の調整は態様（Ａ）と同じである。 In the aspect of (B), the task for scheduling adjustment reads the scheduling ratio table at the start of execution of each of the application execution programs # 0 to # 2, and the speed adjustment information according to the progress of the application execution programs # 0 to # 2. Is set in the thread schedule device 11. The adjustment of the scheduling ratio by the thread schedule device 11 is the same as in the aspect (A).

以上のようにして、アプリ実行プログラム＃０〜＃２のいずれか１以上に実行速度を調整するための調整用コードが挿入された。 As described above, the adjustment code for adjusting the execution speed is inserted into one or more of the application execution programs # 0 to # 2.

〔実行速度の調整〕
図１１は、実行速度の調整を説明する図の一例である。演算回路１５が調整用コードを実行すると、演算回路１５がスレッドスケジュール装置１１に各プログラムの速度調整情報を設定する。 [Adjustment of execution speed]
FIG. 11 is an example of a diagram illustrating adjustment of execution speed. When the arithmetic circuit 15 executes the adjustment code, the arithmetic circuit 15 sets the speed adjustment information of each program in the thread schedule device 11.

スレッドスケジュール装置１１は、切り替えスイッチ１９を制御して、命令バッファ＃０〜＃２と命令デコーダ１８の接続を切り替える。こうすることで、実行速度を大きくしたいアプリ実行プログラム＃０〜＃２を実行するスレッド＃０〜＃２は、命令デコーダ１８と高い頻度で接続され、実行速度を小さくしたいアプリ実行プログラム＃０〜＃２を実行するスレッド＃０〜＃２は、命令デコーダ１８と接続される頻度が減るようになる。したがって、プロセッサ５０は、アプリ実行プロファイルを統合前実行速度情報に近づけることができる。 The thread schedule device 11 controls the changeover switch 19 to switch the connection between the instruction buffers # 0 to # 2 and the instruction decoder 18. In this way, the threads # 0 to # 2 that execute the application execution programs # 0 to # 2 whose execution speed is to be increased are connected with the instruction decoder 18 at a high frequency, and the application execution programs # 0 to # 0 whose execution speed is to be decreased are set. The threads # 0 to # 2 executing # 2 are less frequently connected to the instruction decoder 18. Therefore, the processor 50 can bring the application execution profile closer to the pre-integration execution speed information.

図１２は、マルチスレッド実行装置１００の効果を説明する図の一例である。図１２（ａ）の実測ＩＰＣは、調整用コードが挿入されていない状態のアプリ実行プログラム＃０〜＃２が、マイコン２００で実行された際のＩＰＣである。図示するように、実測ＩＰＣが目標ＩＰＣよりも大きくなっている。 FIG. 12 is an example of a diagram for explaining the effect of the multi-thread execution device 100. The actually measured IPC in FIG. 12A is an IPC when the application execution programs # 0 to # 2 in a state where no adjustment code is inserted are executed by the microcomputer 200. As shown in the figure, the measured IPC is larger than the target IPC.

図１２（ｂ）は、調整用コードが挿入された状態でマイコン２００で実行された際の実測ＩＰＣと目標ＩＰＣの関係の一例を示す。アプリ実行プログラム＃０〜＃２の実行速度を調整することで、図１２（ｂ）に示すように、目標ＩＰＣとマイコン２００で実行された際のＩＰＣの差を小さくすることができる。 FIG. 12B shows an example of the relationship between the measured IPC and the target IPC when executed by the microcomputer 200 with the adjustment code inserted. By adjusting the execution speed of the application execution programs # 0 to # 2, the difference between the target IPC and the IPC when executed by the microcomputer 200 can be reduced as shown in FIG.

また、図１２（ｂ）に示すように、実測されたＩＰＣが目標ＩＰＣよりも大きければ、調整用コードを挿入することで実行速度をさらに調整でき、実測されたＩＰＣが目標ＩＰＣよりも小さければ、調整用コードを挿入することで実行速度をさらに調整できる。 Also, as shown in FIG. 12B, if the measured IPC is larger than the target IPC, the execution speed can be further adjusted by inserting an adjustment code, and if the measured IPC is smaller than the target IPC. The execution speed can be further adjusted by inserting an adjustment code.

〔動作手順〕
図１３は、マルチスレッド実行装置１００がスレッドの実行速度を調整する手順を示すフローチャート図の一例を示す。図１３の手順の前に、アプリ実行プログラム＃０〜＃２には、調整用コードが挿入されている。図１３のフローチャート図は、マルチスレッド実行装置１００が起動するとスタートする。 [Operation procedure]
FIG. 13 shows an example of a flowchart showing a procedure for adjusting the thread execution speed by the multi-thread execution device 100. Before the procedure of FIG. 13, the adjustment code is inserted into the application execution programs # 0 to # 2. The flowchart of FIG. 13 starts when the multithread execution apparatus 100 is activated.

起動直後、スレッドスケジュール装置１１は、初期値のスレッドの命令の発行比率により（例えば、同じ発行比率により）、命令バッファ＃０〜＃２と命令デコーダ１８を接続する。 Immediately after the activation, the thread schedule device 11 connects the instruction buffers # 0 to # 2 and the instruction decoder 18 according to the instruction issue ratio of the initial thread (for example, with the same issue ratio).

アプリ実行プログラム＃０〜＃２に挿入された調整用コードが演算回路１５により実行されると、例えばＯＳのスケジューラがスレッドスケジュール装置１１に、速度調整情報を設定する（Ｓ１０）。 When the adjustment code inserted into the application execution programs # 0 to # 2 is executed by the arithmetic circuit 15, for example, the scheduler of the OS sets the speed adjustment information in the thread schedule device 11 (S10).

スレッドスケジュール装置１１には、他のアプリ実行プログラムからも速度調整情報が設定されるので、スレッドスケジュール装置１１は、３つのアプリ実行プログラム＃０〜＃２の優先順位を参照する（Ｓ２０）。 Since the speed adjustment information is also set in the thread schedule device 11 from other application execution programs, the thread schedule device 11 refers to the priorities of the three application execution programs # 0 to # 2 (S20).

スレッドスケジュール装置１１は、優先順位の高いアプリ実行プログラム＃０〜＃２から順番に、実行速度を調整するので、優先順位に基づき、アプリ実行プログラムの実行速度を調整する順番を決定する（Ｓ３０）。 Since the thread schedule device 11 adjusts the execution speed in order from the application execution programs # 0 to # 2 having the highest priority, the thread scheduling apparatus 11 determines the order in which the execution speed of the application execution program is adjusted based on the priority (S30). .

また、優先順位が同じアプリ実行プログラム＃０〜＃２がある場合があるので、スレッドスケジュール装置１１は、同一の優先順位のアプリ実行プログラム＃０〜＃２があるか否かを判定する（Ｓ４０）。同一の優先順位のアプリ実行プログラム＃０〜＃２がない場合（Ｓ４０のＮｏ）、優先順位が最も高いアプリ実行プログラム＃０〜＃２の実行速度を調整する（Ｓ５０）。同一の優先順位のアプリ実行プログラム＃０〜＃２がある場合（Ｓ４０のＹｅｓ）、スレッドスケジュール装置１１は、例えば、命令を実行する毎に、実行速度を調整するアプリ実行プログラムを切り替える（Ｓ６０）。先に、速度調整情報が設定されたアプリ実行プログラム＃０〜＃２を優先してもよい。 Further, since there may be application execution programs # 0 to # 2 having the same priority, the thread schedule device 11 determines whether there are application execution programs # 0 to # 2 having the same priority (S40). ). When there is no application execution program # 0 to # 2 having the same priority (No in S40), the execution speed of the application execution programs # 0 to # 2 having the highest priority is adjusted (S50). When there are application execution programs # 0 to # 2 having the same priority (Yes in S40), for example, the thread schedule device 11 switches an application execution program for adjusting the execution speed each time an instruction is executed (S60). . First, application execution programs # 0 to # 2 in which speed adjustment information is set may be prioritized.

そして、スレッドスケジュール装置１１は、優先順位の高いアプリ実行プログラム＃０〜＃２の調整が済むと、次に優先順位の低いアプリ実行プログラム＃０〜＃２が速度調整情報を設定しているか否かを判定する（Ｓ７０）。次に優先順位の低いアプリ実行プログラム＃０〜＃２が速度調整情報を設定している場合（Ｓ７０のＹｅｓ）、プロセッサ５０はステップＳ４０からの処理を繰り返す。以上のように、優先順位の高いアプリ実行プログラムから優先的に、実行速度が調整されていく。 Then, the thread schedule device 11 determines whether or not the application execution programs # 0 to # 2 with the next lowest priority set the speed adjustment information after the adjustment of the application execution programs # 0 to # 2 with the higher priority is completed. Is determined (S70). Next, when the application execution programs # 0 to # 2 with the lowest priority set speed adjustment information (Yes in S70), the processor 50 repeats the processing from step S40. As described above, the execution speed is adjusted with priority from the application execution program having a higher priority.

以上説明したように、本実施形態のマルチスレッド実行装置１００は、コンパイラ２４が命令の実行速度を調整するための調整用コードをアプリ実行プログラムに挿入することで、各スレッド＃０〜＃２の実行速度を調整することができる。したがって、複数のＥＣＵを１つに統合した場合に、統合前の各ＥＣＵの実行速度を再コーディングなしに維持することができる。また、プログラム＃０〜＃２を別のＣＰＵに移植しても、移植先のＣＰＵが移植前と時間的に同じ振る舞いをするので、プログラム＃０〜＃２の再設計が不要になり、プログラム＃０〜＃２の流通性が向上する。特に、単位処理の実行サイクル数により調整用コードを生成すれば、異種マイコンへの統合が可能となる。 As described above, the multi-thread execution device 100 according to the present embodiment inserts the adjustment code for adjusting the instruction execution speed into the application execution program by the compiler 24, so that each of the threads # 0 to # 2 The execution speed can be adjusted. Therefore, when a plurality of ECUs are integrated into one, the execution speed of each ECU before integration can be maintained without recoding. In addition, even if the programs # 0 to # 2 are ported to another CPU, the CPU at the porting destination behaves in the same manner as before the porting, so that it is not necessary to redesign the programs # 0 to # 2. Distribution of # 0 to # 2 is improved. In particular, if an adjustment code is generated according to the number of execution cycles of unit processing, integration into different types of microcomputers becomes possible.

〔好適な変形例〕
・スレッドが１つのマルチスレッド実行装置１００
マルチスレッド実行装置１００が複数のスレッドを有することを前提に説明したが、マルチスレッド実行装置１００が１つのみのスレッドを有していてもよい。
図１４は、マルチスレッド実行装置１００の概略構成図の一例を示す。図１４において図３と同一部には同一の符号を付しその説明は省略する。スレッドが１つの場合、命令バッファ１７及びレジスタファイル１６も１つになる。また、スレッドスケジュールが不要なので、スレッドスケジュール装置１１の代わりに、命令発行制御回路２８が命令バッファ１７に接続されている。命令発行制御回路２８は、演算回路１５が調整用コードを実行することで設定された速度調整情報に応じて、命令バッファ１７に命令をフェッチするタイミングを制御する。すなわち、速度調整情報が「小」の場合、命令発行制御回路２８は、例えば、命令バッファ１７と命令デコーダ１８の接続を２クロックに１回、断接したり、パイプラインにストールを発生させたり、ＮＯＰ命令を挿入する。こうすることで、単位時間毎の命令実行完了数を低下させ、又は、単位処理の実行サイクル数を増大させ、統合前のＣＰＵによるアプリ実行プログラムの実行速度を再現できる。また、速度調整情報が「大」の場合、命令発行制御回路２８は、例えば、命令バッファ１７と命令デコーダ１８を４クロックに３回接続するなど、接続する比率を多くする。こうすることで、単位時間毎の命令実行完了数を増大させ、又は、単位処理の実行サイクル数を低下させ、統合前のＣＰＵと同程度の実行速度を再現できる。 [Preferred modification]
Multi-thread execution device 100 with one thread
Although the description has been made on the assumption that the multi-thread execution device 100 has a plurality of threads, the multi-thread execution device 100 may have only one thread.
FIG. 14 shows an example of a schematic configuration diagram of the multi-thread execution device 100. 14, the same parts as those in FIG. 3 are denoted by the same reference numerals, and the description thereof is omitted. When there is one thread, the instruction buffer 17 and the register file 16 are also one. Since a thread schedule is unnecessary, an instruction issue control circuit 28 is connected to the instruction buffer 17 instead of the thread schedule device 11. The instruction issuance control circuit 28 controls the timing at which an instruction is fetched to the instruction buffer 17 according to the speed adjustment information set by the arithmetic circuit 15 executing the adjustment code. That is, when the speed adjustment information is “small”, the instruction issuance control circuit 28, for example, disconnects the connection between the instruction buffer 17 and the instruction decoder 18 once every two clocks, or causes a stall in the pipeline. Insert a NOP instruction. By doing so, the number of instruction execution completions per unit time can be reduced, or the number of execution cycles of unit processing can be increased, and the execution speed of the application execution program by the CPU before integration can be reproduced. When the speed adjustment information is “large”, the instruction issuance control circuit 28 increases the connection ratio, for example, by connecting the instruction buffer 17 and the instruction decoder 18 three times in four clocks. By doing this, the number of instruction execution completions per unit time can be increased, or the number of execution cycles of unit processing can be reduced, and an execution speed comparable to that of the CPU before integration can be reproduced.

・実行状況監視装置１４を有さない形態
図１５は、マルチスレッド実行装置１００の概略構成図の一例を示す。図１５において図３と同一部には同一の符号を付しその説明は省略する。図１５のマルチスレッド実行装置１００は、実行状況監視装置１４を有さず、アプリソース記憶部８０がコンパイラ２４と、コンパイラ２４がアプリ実行プログラム記憶部６０と、アプリ実行プログラム記憶部６０がシミュレータ２９と、シミュレータ２９が標準プロファイル記憶部７１と、標準プロファイル記憶部７１がコンパイラ２４と、それぞれ接続されている。 FIG. 15 illustrates an example of a schematic configuration diagram of the multi-thread execution device 100. FIG. In FIG. 15, the same parts as those in FIG. The multi-thread execution device 100 of FIG. 15 does not have the execution status monitoring device 14, the application source storage unit 80 is the compiler 24, the compiler 24 is the application execution program storage unit 60, and the application execution program storage unit 60 is the simulator 29. The simulator 29 is connected to the standard profile storage unit 71, and the standard profile storage unit 71 is connected to the compiler 24.

図１５のマルチスレッド実行装置１００は、実行状況監視装置１４ではなく、シミュレータ２９がアプリ実行プロファイルの代わりに標準プロファイルを生成する。アプリ実行プロファイルが、マイコン２００によるアプリ実行プログラム＃０〜＃２を実行した際の実行速度情報であるのに対し、標準プロファイルは人為的又は機械的に設定された実行速度情報である。標準プロファイルは、例えば設計者が、一定のＩＰＣ又は単位処理の実行サイクル数を、マルチスレッド実行装置１００のおよそのＩＰＣや単位処理の実行サイクル数として設定する。こうすることで、シミュレータ２９がアプリ実行プログラム毎に、単位時間毎のＩＰＣ又は単位処理の実行サイクル数を生成することができる。また、アプリ実行プログラムの演算の種類毎に各命令をクロック数で重み付けして、マルチスレッド実行装置１００のＣＰＵがプログラムを実行した際の各命令のＩＰＣや実行サイクル数を算出してもよい。このシミュレータ２９は、実在するマルチスレッド実行装置１００をシミュレートしたものなので、演算の種類毎に必要なクロック数は容易に特定でき、マルチスレッド実行装置１００による実行結果を正確にシミュレートできる。 In the multithread execution device 100 of FIG. 15, the simulator 29 generates a standard profile instead of the application execution profile instead of the execution status monitoring device 14. The application execution profile is execution speed information when the application execution programs # 0 to # 2 are executed by the microcomputer 200, whereas the standard profile is execution speed information set artificially or mechanically. In the standard profile, for example, the designer sets a certain number of execution cycles of IPC or unit processing as the approximate number of execution cycles of IPC or unit processing of the multithread execution apparatus 100. By doing so, the simulator 29 can generate an IPC per unit time or the number of execution cycles of unit processing for each application execution program. Alternatively, each instruction may be weighted by the number of clocks for each type of calculation of the application execution program, and the IPC and the number of execution cycles of each instruction when the CPU of the multithread execution apparatus 100 executes the program may be calculated. Since the simulator 29 simulates the existing multithread execution device 100, the number of clocks required for each type of operation can be easily specified, and the execution result by the multithread execution device 100 can be accurately simulated.

実行状況監視装置１４を有さない形態は、スレッドが１つのマルチスレッド実行装置１００に対しても適用できる。
図１６は、マルチスレッド実行装置１００の概略構成図の一例を示す。図１６において図１５と同一部には同一の符号を付しその説明は省略する。図１６のマルチスレッド実行装置１００は、スレッドスケジュール装置の代わりに命令発行制御回路２８を有し、また、プロファイラ１２でなくシミュレータ２９を有する。シミュレータ２９がアプリ実行プログラムの、単位時間毎のＩＰＣ又は単位処理の実行サイクル数を生成するので、図１５と同様に、統合前のＣＰＵの実行状況を再現できる。 The form without the execution status monitoring device 14 can also be applied to the multi-thread execution device 100 with one thread.
FIG. 16 shows an example of a schematic configuration diagram of the multi-thread execution device 100. In FIG. 16, the same parts as those in FIG. 15 are denoted by the same reference numerals, and the description thereof is omitted. The multi-thread execution device 100 of FIG. 16 has an instruction issue control circuit 28 instead of the thread schedule device, and has a simulator 29 instead of the profiler 12. Since the simulator 29 generates the IPC per unit time or the number of execution cycles of unit processing of the application execution program, the execution state of the CPU before integration can be reproduced as in FIG.

図１５や図１６のマルチスレッド実行装置１００のように、実行状況監視装置１４を有さない構成とすることで、シミュレータ２９が必要になるものの、コスト増を抑制しやすくなる。 The configuration without the execution status monitoring device 14 as in the multi-thread execution device 100 of FIGS. 15 and 16 makes it easy to suppress an increase in cost although the simulator 29 is required.

１１スレッドスケジュール装置
１２プロファイラ
１３パイプライン制御回路
１４実行状況監視装置
１５演算回路
１６レジスタファイル
１７命令バッファ
１８命令デコーダ
１９、２１切り替えスイッチ
５０プロセッサ
６０アプリ実行プロファイル記憶部
１００マルチスレッド実行装置 DESCRIPTION OF SYMBOLS 11 Thread schedule apparatus 12 Profiler 13 Pipeline control circuit 14 Execution condition monitoring apparatus 15 Arithmetic circuit 16 Register file 17 Instruction buffer 18 Instruction decoder 19, 21 Changeover switch 50 Processor 60 Application execution profile memory | storage part 100 Multithread execution apparatus

Claims

Program storage means for storing an object program generated from a source program;
An instruction issuing means for issuing an instruction of the object program;
An instruction execution means for executing the instruction, and a multi-thread execution device comprising:
The object program includes
An adjustment code is inserted for adjusting the execution completion speed information of the instruction to substantially match the target execution speed information of the program read from the target execution speed information storage means.
A multi-thread execution device.

The instruction execution means that has executed the adjustment code includes:
Requesting the instruction issuing means to adjust the frequency of issuing instructions of the object program in accordance with the adjustment code;
The multi-thread execution device according to claim 1.

The adjustment code is inserted by a compiler that compiles the source program.
The multi-thread execution device according to claim 1 or 2,

The execution speed monitoring means for monitoring the execution completion speed of the instruction;
The execution completion speed information is generated from an execution situation when the object program is actually executed, which is monitored by the execution speed monitoring means.
The multi-thread execution device according to claim 1, wherein the multi-thread execution device is provided.

The execution completion speed information is generated by a simulator that simulates the execution status of the object program.
The multi-thread execution device according to claim 1, wherein the multi-thread execution device is provided.

It has a multi-thread execution environment that executes multiple object programs in a time-sharing manner.
The command issuing means adjusts the command issue rate of the plurality of object programs by switching threads.
6. The multi-thread execution device according to claim 1, wherein

The instruction issuing means selects the object program for adjusting the issue frequency according to the priority order of a plurality of object programs.
The multi-thread execution device according to claim 1, wherein

The target execution completion number information is a target execution completion number per unit time of an instruction or a target execution cycle number per unit processing of a program.
8. The multi-thread execution device according to claim 1, wherein

An object program generation method executed by an information processing apparatus including: an instruction issue unit that issues an instruction of an object program generated from a source program; and an instruction execution unit that executes the instruction,
An execution speed monitoring unit or a simulator generating execution completion speed information for each source program;
The compiler reading the source program from the program storage means;
A step of the compiler reading target execution speed information for each source program from the target execution speed information storage means;
A step of reading execution completion speed information for each source program from the execution completion speed information storage means;
A compiler converts the source program into an object program, and adjusts the execution completion speed information so as to substantially match the target execution speed information, and inserts adjustment code into the object program;
A method for generating an object program, comprising:

On the computer,
Reading the source program from the program storage means;
Reading target execution speed information for each source program from the target execution speed information storage means;
Reading execution completion speed information for each source program from the execution completion speed information storage means;
Converting the source program into an object program and adjusting the execution completion speed information so as to substantially match the target execution speed information; and inserting adjustment code into the object program;
A program characterized by having executed.