JP2016143378A

JP2016143378A - Parallelization compilation method, parallelization compiler, and electronic device

Info

Publication number: JP2016143378A
Application number: JP2015021113A
Authority: JP
Inventors: 範幸鈴木; Noriyuki Suzuki
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2015-02-05
Filing date: 2015-02-05
Publication date: 2016-08-08
Anticipated expiration: 2035-02-05
Also published as: DE102016201612A1; JP6488739B2

Abstract

PROBLEM TO BE SOLVED: To facilitate quality maintenance and management of parallelization programs.SOLUTION: A sequential program, which is executed by a single processor system and contains descriptions to be selected or unselected as compiling targets in accordance with input information by means of conditional compiling, is divided into a plurality of macro tasks regardless of the input information. Parallelly executable macro tasks are then assigned to different processor units to produce a parallelization program to be executed by a multi-processor system (S110). Descriptions not selected as compiling targets because of prescribed input information are removed from the parallelization program (S130).SELECTED DRAWING: Figure 2

Description

本発明は、並列化コンパイル方法、並列化コンパイラ、及び、電子装置に関する。 The present invention relates to a parallel compilation method, a parallel compiler, and an electronic apparatus.

マルチコアプロセッサが搭載された車載装置において、各コアに機能を分散させることでスループットを向上させることが知られている（非特許文献１）。また、シングルコアプロセッサ用のプログラム（逐次プログラム）から、マルチコアプロセッサにより並列処理可能な並列化プログラムを生成する並列化コンパイラが知られている。 In an in-vehicle device in which a multi-core processor is mounted, it is known that throughput is improved by distributing functions to each core (Non-Patent Document 1). A parallel compiler that generates a parallel program that can be processed in parallel by a multi-core processor from a program (sequential program) for a single core processor is known.

K Seo，J Yoon，J Kim，T Chung，K Yi，N Chang、「Coordinated implementation and processing of a unified chassis control algorithm with multi-central processing unit」、JAUTO1346 IMechE、２００９年、Vol.224 Part DK Seo, J Yoon, J Kim, T Chung, K Yi, N Chang, “Coordinated implementation and processing of a unified chassis control algorithm with multi-central processing unit”, JAUTO1346 IMechE, 2009, Vol.224 Part D

ここで、逐次プログラムにおいて、条件コンパイルスイッチにより仕向地等に応じてコンパイル対象となる記述を選択する場合がある（条件付コンパイル）。並列化コンパイラを用いてこのような逐次プログラムから並列化プログラムを生成する場合、一般的に、まず、条件コンパイルスイッチに基づき逐次プログラムの中からコンパイルの対象となる部分が特定され、該部分から並列化プログラムが生成される。 Here, in a sequential program, there are cases where a description to be compiled is selected according to a destination by a conditional compilation switch (conditional compilation). When a parallelized program is generated from such a sequential program using a parallelizing compiler, generally, a part to be compiled is first identified from the sequential program based on a conditional compilation switch, and then the parallel program is determined from the part. A generalized program is generated.

しかしながら、コンパイルの対象となる部分が変更されると、逐次プログラムを構成するマクロタスク間のデータ依存性や制御依存性が変化する可能性がある。このため、同一の逐次プログラムから生成された仕向地等の異なる複数の並列化プログラムでは、同一の逐次プログラムから生成され、同等の機能を有するにも関わらず、各プロセッサコアに割り当てられる処理内容が大きく異なる可能性がある。その結果、品質の維持や並列化プログラムの管理が困難になる。 However, if the part to be compiled is changed, there is a possibility that the data dependency and control dependency between the macro tasks constituting the sequential program may change. For this reason, in a plurality of parallel programs with different destinations generated from the same sequential program, the processing contents assigned to each processor core are generated from the same sequential program and have equivalent functions. It can be very different. As a result, it becomes difficult to maintain quality and manage parallelized programs.

本発明は、並列化プログラムの品質の維持や管理を容易にすることを目的とする。 An object of the present invention is to facilitate the maintenance and management of the quality of a parallelized program.

本発明の一側面である並列化コンパイル方法は、シングルプロセッサシステムにより実行されるプログラムであって、条件付コンパイルにより入力情報に応じてコンパイルするか否かが選択される条件付記述が含まれている逐次プログラムに記述された処理を、入力情報に関わらず複数のマクロタスクに分割する分割手順（Ｓ２１０）と、マクロタスク間のデータ依存性に基づき、マルチプロセッサシステムを構成する複数のプロセッサユニットにより並列実行可能なマクロタスクを抽出する抽出手順（Ｓ２２０）と、それぞれのマクロタスクをいずれかのプロセッサユニットに割り当てる処理であって、並列実行可能なマクロタスクのうちの全部又は一部を、異なるプロセッサユニットに割り当て、マルチプロセッサシステムにより実行される並列化プログラムを生成するスタティックスケジューリングを行うスケジューリング手順（Ｓ１１０）と、を有する。 A parallelized compilation method according to one aspect of the present invention is a program executed by a single processor system, and includes a conditional description for selecting whether to compile according to input information by conditional compilation. The processing described in the sequential program is divided into a plurality of macro tasks regardless of input information (S210), and the plurality of processor units constituting the multiprocessor system are based on the data dependency between the macro tasks. An extraction procedure (S220) for extracting macrotasks that can be executed in parallel, and a process for assigning each macrotask to any one of the processor units, wherein all or some of the macrotasks that can be executed in parallel are different processors. Assigned to a unit and executed by a multiprocessor system A scheduling procedure for static scheduling (S110), the generating a parallelized program that.

このような構成によれば、条件付記述を全て含んだ状態で逐次プログラムから並列化プログラムが生成される。そして、入力情報を設定することで、該並列化プログラムにおける条件付記述の中からコンパイルの対象外となる部分を除去することが可能となる。 According to such a configuration, a parallelized program is generated from the sequential program in a state including all conditional descriptions. Then, by setting the input information, it becomes possible to remove a portion that is not subject to compilation from the conditional description in the parallelized program.

このため、どのような入力情報が設定されても、条件付記述に対応する処理以外の処理（入力情報の内容に関わらず常に実行される処理）は、常に同じプロセッサユニットに割り当てられることになる。つまり、入力情報を設定することで仕向地等の異なる複数の種別の並列化プログラムが生成される場合であっても、各種別の並列化プログラムの共通性をより一層高めることができ、その結果、並列化プログラムの品質の維持や管理が容易になる。 For this reason, no matter what input information is set, processes other than the process corresponding to the conditional description (a process that is always executed regardless of the contents of the input information) are always assigned to the same processor unit. . In other words, even when multiple types of parallelized programs with different destinations are generated by setting the input information, the commonality of each type of parallelized program can be further enhanced. This makes it easy to maintain and manage the quality of parallelized programs.

なお、この欄及び特許請求の範囲に記載した括弧内の符号は、１つの態様として後述する実施形態に記載の具体的手段との対応関係を示すものであって、本発明の技術的範囲を限定するものではない。 In addition, the code | symbol in the parenthesis described in this column and a claim shows the correspondence with the specific means as described in embodiment mentioned later as one aspect, Comprising: The technical scope of this invention is shown. It is not limited.

自動並列化コンパイラがインストールされたＰＣの構成を示すブロック図である。It is a block diagram which shows the structure of PC in which the automatic parallelizing compiler was installed. 自動並列化処理のフローチャートである。It is a flowchart of an automatic parallelization process. ソフト構造解析処理のフローチャートである。It is a flowchart of a soft structure analysis process. 車載装置の構成を示すブロック図である。It is a block diagram which shows the structure of a vehicle-mounted apparatus. 具体例１における逐次プログラムや割当情報や比較プログラムの説明図である。It is explanatory drawing of the sequential program in the specific example 1, allocation information, and a comparison program. 具体例１における逐次プログラムに記載された処理の処理時間等を示す表である。10 is a table showing processing times and the like of processing described in the sequential program in specific example 1; 具体例１で生成された比較プログラムの性能の検証結果を示す表である。10 is a table showing a verification result of performance of a comparison program generated in specific example 1; 具体例２における逐次プログラムやＭＴＧの説明図である。It is explanatory drawing of the sequential program and MTG in the specific example 2. FIG. 具体例２におけるＭＴＧや割当情報の説明図である。It is explanatory drawing of MTG and the allocation information in the specific example 2. 具体例２における逐次プログラムに記載された処理の処理時間等を示す表である。10 is a table showing processing times and the like of processing described in the sequential program in specific example 2. 具体例２で生成された比較プログラムの並列実行時間を示す表である。10 is a table showing parallel execution times of comparison programs generated in specific example 2; 具体例３で生成された比較プログラムの性能の検証結果を示す表である。10 is a table showing verification results of the performance of a comparison program generated in specific example 3. 具体例４における逐次プログラムの説明図である。It is explanatory drawing of the sequential program in the specific example 4. 具体例４における条件コンパイルスイッチの引数等を示す表である。10 is a table showing arguments of conditional compilation switches in specific example 4. 具体例４における各種別に対応する条件コンパイルスイッチの引数の値を示す表である。10 is a table showing argument compile switch values corresponding to various types in specific example 4. 具体例４で生成された比較プログラムの説明図である。It is explanatory drawing of the comparison program produced | generated by the specific example 4. 具体例４で生成された比較プログラムの説明図である。It is explanatory drawing of the comparison program produced | generated by the specific example 4.

以下、本発明の実施形態について図面を用いて説明する。なお、本発明の実施の形態は、下記の実施形態に何ら限定されることはなく、本発明の技術的範囲に属する限り種々の形態を採りうる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiment of the present invention is not limited to the following embodiment, and can take various forms as long as they belong to the technical scope of the present invention.

［本実施形態について］
１．自動並列化コンパイラについて
本実施形態の自動並列化コンパイラは、組込みシステム向けのシングルプロセッサシステム用のソースプログラム（逐次プログラム）から、組込みシステム向けのマルチプロセッサシステム用の並列化プログラムを生成する機能を有している。 [About this embodiment]
1. About the automatic parallelizing compiler The automatic parallelizing compiler of this embodiment has a function of generating a parallel processing program for a multiprocessor system for an embedded system from a source program (sequential program) for a single processor system for an embedded system. doing.

１−１．自動並列化コンパイラの設計概念
本実施形態の自動並列化コンパイラは、以下の機能を有している。
（１）マルチグレイン並列処理
（２）コンパイル時のスタティックスケジューリングコードの挿入
（３）実行時のダイナミックスケジューリングコードの生成
（４）階層型マクロデータフローの実現
（５）マクロタスクの分割／融合，Loop distribution／interchange等の並列性抽出
（６）データローカライズによるデータ転送効率の向上
（７）コンパイラによる電力削減
１−２．自動並列化コンパイラの内部処理
自動並列化コンパイラは、Front End（ＦＥ），Middle Path（ＭＰ），Back End（ＢＥ）の３つのステージを有している。各ステージは実行形態として独立しており、ＦＥ，ＭＰから生成される中間言語によりコード授受が行われる。 1-1. Design concept of automatic parallelizing compiler The automatic parallelizing compiler of this embodiment has the following functions.
(1) Multigrain parallel processing (2) Static scheduling code insertion at compile time (3) Dynamic scheduling code generation at runtime (4) Realization of hierarchical macro data flow (5) Split / fusion of macro tasks, Loop Extraction of parallelism such as distribution / interchange (6) Improvement of data transfer efficiency by data localization (7) Power reduction by compiler 1-2. Internal processing of automatic parallelizing compiler The automatic parallelizing compiler has three stages of Front End (FE), Middle Path (MP), and Back End (BE). Each stage is independent as an execution form, and code is exchanged by an intermediate language generated from FE and MP.

なお、ＦＥは、逐次プログラムのソースコードを字句解析・構文解析を行い、ＭＰにおいてparse可能な中間言語を生成する部位である。ＦＥの生成する中間言語は、基本的に４つのオペランドを持つ解析木（parse tree）で表現されており、全体として１つのブロックを形成していて構造化は行われていない。 Note that FE is a part that generates an intermediate language that can be parsed in MP by performing lexical analysis and syntax analysis on the source code of the sequential program. The intermediate language generated by the FE is basically expressed by a parse tree having four operands, forms one block as a whole, and is not structured.

また、ＭＰは、制御依存性解析・データ依存性解析・最適化等を行う部位であり、そのデータを用いて粗粒度・中粒度・近細粒度並列化のマルチグレイン並列処理を行う。
また、ＢＥは、ＭＰが生成した並列化中間言語を読み込んで実際のマシンコードを生成する部位である。当該部位は、ターゲットとなっているマルチコアアーキテクチャのアセンブラコードを生成するＢＥの他、OpenMP用の並列化FortranコードやＣコードを生成するＢＥを有している。さらには、当該部位は、後述する並列化ＡＰＩによりメモリ配置，データ転送を含めて並列化したコードを生成するＢＥ等、多様なアーキテクチャに対応したコードを出力するＢＥを有している。 MP is a part that performs control dependency analysis, data dependency analysis, optimization, and the like, and performs coarse grain, medium grain, and near fine grain parallel processing using the data.
The BE is a part that reads the parallelized intermediate language generated by the MP and generates an actual machine code. This part has BE that generates parallel Fortran code for OpenMP and C code in addition to BE that generates assembler code of the target multi-core architecture. Furthermore, the part has a BE that outputs codes corresponding to various architectures, such as a BE that generates a parallelized code including memory allocation and data transfer by a parallelized API described later.

１−３．自動並列化コンパイラの並列性解析
自動並列化コンパイラは、逐次プログラムを、基本ブロック（ＢＢ），繰り返しブロック（ＲＢ），サブルーチンブロック（ＳＢ）の３種類の粗粒度タスク（マクロタスク（ＭＴ））に分割するマクロデータフロー処理を行う。 1-3. Parallelism analysis of automatic parallelizing compiler The automatic parallelizing compiler converts sequential programs into three types of coarse-grained tasks (macrotasks (MT)): basic blocks (BB), repetitive blocks (RB), and subroutine blocks (SB). Performs macro data flow processing to be divided.

しかし、マクロデータフロー処理では、プログラムの形状によってはプロセッサの利用効率が上がらず、十分な粗粒度並列性が抽出できないという問題点がある。
そこで、自動並列化コンパイラでは、従来の単階層マクロデータフロー処理手法を拡張し、ＭＴ内部に対してマクロデータフロー処理を階層的に利用する階層型マクロデータフロー処理を採用している。階層的マクロデータフロー処理では、ＭＴの階層的な定義を行い、各階層のマクロタスクに対してマクロタスク間の並列性の解析を行う。 However, the macro data flow processing has a problem that the utilization efficiency of the processor does not increase depending on the shape of the program, and sufficient coarse grain parallelism cannot be extracted.
Therefore, the automatic parallelizing compiler expands the conventional single-layer macro data flow processing method and employs hierarchical macro data flow processing that hierarchically uses the macro data flow processing inside the MT. In hierarchical macro data flow processing, MT is defined hierarchically and parallelism between macro tasks is analyzed for macro tasks in each hierarchy.

＜マクロフローグラフ（ＭＦＧ）の生成＞
自動並列化コンパイラは、まず、生成された各階層のマクロタスクに対して、マクロタスク間の制御依存性とデータ依存性を解析する。この解析結果は、マクロフローグラフ（ＭＦＧ）として表される。 <Generation of macro flow graph (MFG)>
The automatic parallelizing compiler first analyzes the control dependency and the data dependency between macrotasks for the generated macrotasks of each layer. The analysis result is represented as a macro flow graph (MFG).

＜マクロタスクグラフ（ＭＴＧ）の生成＞
ＭＦＧは、マクロタスク間の制御依存性とデータ依存性を表すが、並列性は表していない。並列性を抽出するためには、各マクロタスクに対し、制御依存性とデータ依存性の両方を考慮した最早実行可能条件解析を行う必要がある。最早実行可能条件とは、そのＭＴが最も早い時点で実行可能になる条件であり、次のような実行条件から求められる。 <Generation of macro task graph (MTG)>
MFG expresses control dependency and data dependency between macro tasks, but does not express parallelism. In order to extract parallelism, it is necessary to perform the earliest feasible condition analysis considering both control dependency and data dependency for each macro task. The earliest executable condition is a condition in which the MT can be executed at the earliest time, and is obtained from the following execution condition.

（１）ＭＴｉがＭＴｊにデータ依存するならば、ＭＴｊの実行が終了するまでＭＴｉは実行できない。
（２）ＭＴｊの条件分岐先が確定すれば、ＭＴｊの実行が終了しなくても、ＭＴｊに制御依存するＭＴｉは実行できる。 (1) If MTi is data-dependent on MTj, MTi cannot be executed until MTj has been executed.
(2) If the conditional branch destination of MTj is determined, MTi that is dependent on MTj can be executed even if execution of MTj is not completed.

したがって、最早実行可能条件の一般形は次のようになる。
（ＭＴｉが制御依存するＭＴｊがＭＴｉに分岐する）
ＡＮＤ
（（ＭＴｉがデータ依存するＭＴｋ（０≦ｋ≦｜Ｎ｜））が終了）ＯＲ（ＭＴｋが実行されないことが決定する））
マクロタスクの最早実行可能条件は、マクロタスクグラフ（ＭＴＧ）で表される。 Therefore, the general form of the earliest feasible condition is as follows.
(MTj on which MTi depends on control branches to MTi)
AND
((MTk where MTi is data-dependent (0 ≦ k ≦ | N |) ends) OR (determines that MTk is not executed))
The earliest executable condition of the macro task is represented by a macro task graph (MTG).

１−４．マルチグレイン並列処理
自動並列化コンパイラでは、従来のループ並列化に加え、ループ間，サブルーチン間における粗粒度タスク間の並列性を利用する粗粒度タスク並列処理や、ステートメント間の並列性を利用する近細粒度並列処理を効果的に組み合わせたマルチグレイン並列処理（参考文献１（本多弘樹, 岩田雅彦, 笠原博徳、「Fortranプログラム粗粒度タスク間の並列性検出手法」、電子情報通信学会論文誌、１９９０年）参照）を実現している。 1-4. Multi-grain parallel processing In addition to the conventional loop parallelization, the automatic parallelizing compiler uses coarse-grained task parallelism that uses parallelism between coarse-grained tasks between loops and subroutines, and near-parallel processing that uses parallelism between statements. Multi-grain parallel processing that effectively combines fine-grain parallel processing (Reference 1 (Hiroki Honda, Masahiko Iwata, Hironori Kasahara, “Parallelity detection method between Fortran coarse-grained tasks”, IEICE Transactions, 1990))).

＜粗粒度タスク並列処理＞
自動並列化コンパイラは、ＢＢ，ＲＢ，ＳＢ等のＭＴ間の制御依存性とデータ依存性を表現したマクロフローグラフ（ＭＦＧ）を生成し、さらに、ＭＦＧから最早実行可能条件解析により引きだしたＭＴ間の並列性を、マクロタスクグラフ（ＭＴＧ）として表現する（参考文献１，参考文献２（笠原，合田，吉田，岡本，本多、「Fortranマクロデータフロー処理のマクロタスク生成手法」、信学論、１９９２年、Vol.J75-D-I、No.8、pp.511-525）参照）。 <Coarse grain task parallel processing>
The automatic parallelizing compiler generates a macro flow graph (MFG) that expresses the control dependency and data dependency between MTs such as BB, RB, SB, etc., and further, between MTs extracted from MFG by the earliest executable condition analysis Is expressed as a macrotask graph (MTG) (Reference 1, Reference 2 (Kasahara, Goda, Yoshida, Okamoto, Honda, “Macro Task Generation Method for Fortran Macro Data Flow Processing”), Science Theory 1992, Vol. J75-DI, No. 8, pp. 511-525)).

その後、自動並列化コンパイラは、ＭＴＧ上のＭＴを、１つ以上のプロセッサエレメント（ＰＥ）をグルーピングしたプロセッサグループ（ＰＧ）に割り当てる。
＜中粒度並列処理＞
ＰＧに割り当てられたＭＴが、DOALLループ、或いはイタレーションレベルで並列処理が可能なものであれば、そのＭＴには、プロセッサクラスタ内のプロセッサによって中粒度並列処理がなされる。この中粒度並列処理は、ＤＯループイタレーション間の並列性を利用する並列処理のことであり、マルチプロセッサにおける並列処理では最も一般的なものである。 Thereafter, the automatic parallelizing compiler assigns the MT on the MTG to a processor group (PG) in which one or more processor elements (PE) are grouped.
<Medium-grain parallel processing>
If the MT assigned to the PG can be processed in parallel at the DOALL loop or iteration level, the MT is subjected to medium-grain parallel processing by the processors in the processor cluster. This medium-grain parallel processing is parallel processing using parallelism between DO loop iterations, and is the most common parallel processing in a multiprocessor.

＜近細粒度並列処理＞
ステートメントレベルの近細粒度タスクに対する並列処理を、近細粒度並列処理という。これによって、依存の無いステートメントも並列実行が可能になり、実行時間が短縮される。 <Near Fine Grain Parallel Processing>
Parallel processing for statement-level near-fine-grain tasks is called near-fine-grain parallel processing. As a result, statements that do not depend on can be executed in parallel, and the execution time is shortened.

１−５．マクロタスクスケジューリング
粗粒度タスク並列処理では、各階層で生成されたマクロタスクは、ＰＧに割り当てられて実行される。どのＰＧにマクロタスクを割り当てるかを決定するスケジューリング手法として、下記のダイナミックスケジューリングとスタティックスケジューリングがあり、これらは、マクロタスクグラフの形状や実行時非決定性等を元に選択される。 1-5. Macrotask scheduling In coarse-grained task parallel processing, macrotasks generated at each level are assigned to PGs and executed. There are the following dynamic scheduling and static scheduling as scheduling methods for determining which PG a macro task is assigned to, and these are selected based on the shape of the macro task graph, non-determinism during execution, and the like.

＜ダイナミックスケジューリング＞
条件分岐等の実行時不確定性が存在する場合には、ダイナミックスケジューリングによって実行時にマクロタスクをＰＧに割り当てる。ダイナミックスケジューリングルーチンは、マクロタスクの終了や分岐方向の決定に応じてマクロタスク実行管理テーブルを操作し、各マクロタスクの最早実行可能条件を検査する。 <Dynamic scheduling>
When there is execution-time uncertainty such as conditional branching, a macrotask is assigned to a PG at the time of execution by dynamic scheduling. The dynamic scheduling routine operates the macro task execution management table according to the end of the macro task or the determination of the branch direction, and checks the earliest executable condition of each macro task.

マクロタスクが実行可能であれば、レディキューにマクロタスクが投入される。レディキュー内のマクロタスクは、その優先順位に従ってソートされ、レディキューの先頭のマクロタスクが、アイドル状態のプロセッサクラスタに割り当てられる。 If the macro task can be executed, the macro task is put into the ready queue. The macrotasks in the ready queue are sorted according to their priorities, and the first macrotask in the ready queue is assigned to an idle processor cluster.

また、ダイナミックスケジューリングコード生成時には、一つの専用のプロセッサがスケジューリングを行う集中スケジューリング方式と、スケジューリング機能を各プロセッサに分散した分散スケジューリング方式を、使用するプロセッサ台数，システムの同期オーバーヘッドに応じて使い分けることができる。 Also, when generating dynamic scheduling code, a centralized scheduling method in which one dedicated processor performs scheduling and a distributed scheduling method in which the scheduling function is distributed to each processor can be used according to the number of processors used and the synchronization overhead of the system. it can.

＜スタティックスケジューリング＞
一方、スタティックスケジューリングは、マクロタスクグラフがデータ依存エッジのみを持つ場合に使用され、自動並列化コンパイラが、コンパイル時にＰＧへのマクロタスクの割り当てを決める方式である。 <Static scheduling>
On the other hand, static scheduling is used when the macrotask graph has only data-dependent edges, and the automatic parallelizing compiler determines the assignment of macrotasks to PGs during compilation.

スタティックスケジューリングは、実行時スケジューリングオーバーへッドを無くし、データ転送と同期のオーバーへッドを最小化することが可能であるため、粒度の細かいタスクのスケジューリングに対しても効果的に利用できる。 Static scheduling can be effectively used for scheduling fine-grained tasks because it eliminates runtime scheduling overhead and minimizes data transmission and synchronization overhead.

また、スタティックスケジューリングの際、タスクのコストは自動並列化コンパイラでのタスクコスト推定値を適用するが、自動並列化コンパイラのプロファイル自動フィードバック機能を用いることで、実コストでタスクスケジューリングを行うことも可能である。 In the case of static scheduling, the task cost estimate value of the automatic parallelizing compiler is applied to the task cost. However, it is also possible to perform task scheduling at the actual cost by using the automatic profile parallel feedback function of the automatic parallelizing compiler. It is.

プロファイル自動フィードバック機能を用いる場合、第１フェーズとして、逐次プログラムをＭＴに分解し、ＭＴ毎にプロファイラ関数を挿入して逐次プログラムを生成する。このプロファイラ関数では、タスク実行コスト（clock cycle）とタスク実行回数を計測する。このプロファイラ関数が挿入された逐次プログラムを一度ターゲットとなるマシン上で実行することで、ターゲットとなるマシン上でのタスク実行コストとタスク実行回数の情報を持つファイルを出力する。 When the profile automatic feedback function is used, as a first phase, the sequential program is decomposed into MTs, and a profiler function is inserted for each MT to generate a sequential program. This profiler function measures the task execution cost (clock cycle) and the number of task executions. By executing the sequential program in which the profiler function is inserted once on the target machine, a file having information on the task execution cost and the task execution number on the target machine is output.

そして、第２フェーズにて、この出力ファイルと逐次プログラムを入力として、実コストに基づきスケジューリングした並列化プログラムが生成される。
１−６．データローカライゼーション
自動並列化コンパイラは、プログラム全域に渡るキャッシュ最適化を行うことが可能である。自動並列化コンパイラは、ループ間などの並列性を解析した後、ループ間にデータ依存があることが分かると、依存があるループ間でのキャッシュのグローバル最適化を試みる（参考文献３（特許第４１７７６８１号公報）参照）。 Then, in the second phase, the parallelized program scheduled based on the actual cost is generated using the output file and the sequential program as input.
1-6. Data localization An automatic parallelizing compiler can perform cache optimization over the entire program. After analyzing the parallelism between loops, etc., if the automatic parallelizing compiler finds that there is data dependence between the loops, it tries to optimize the cache globally between the loops with dependence (Reference 3 (Patent No. 3). No. 4,177,681)).

具体的には、各ループでアクセスされる配列を調査し、同一の分割ループは同一の配列部分にアクセスするように調整することにより、同一の分割ループを同一プロセッサに割り当てる。これにより、同一の分割ループでは、全ての配列データがキャッシュ上で再利用されるようになる。 Specifically, the array accessed in each loop is investigated, and the same divided loop is allocated to the same processor by adjusting so as to access the same array portion. As a result, in the same divided loop, all the array data is reused on the cache.

また、このローカライズ技術は、
（１）任意のサイズのローカルメモリ或いは分散共有メモリが与えられた時に、ＤＭＡ（ＤＴＵ）（参考文献４（特許第４４７６２６７号公報）参照）を用いアクセスされる前に、前記プロセッサに近接したローカル或いは分散共有メモリに事前ロードし、プログラム全域で再利用する。 In addition, this localization technology
(1) When a local memory or a distributed shared memory of an arbitrary size is provided, a local memory close to the processor is accessed before being accessed using DMA (DTU) (see Reference 4 (Japanese Patent No. 4476267)). Alternatively, it is preloaded into the distributed shared memory and reused throughout the program.

（２）送付先のメモリが一杯の場合には、送付先プロセッサのＤＴＵが、メモリからの掃き出し優先順位に従ってデータを共有メモリ等へ掃き出したことを同期フラグで知らされたら、自動的に空いたメモリにデータを転送する。 (2) When the destination memory is full, when the DTU of the destination processor is informed by the synchronization flag that the data has been flushed to the shared memory or the like according to the flushing priority from the memory, it is automatically freed Transfer data to memory.

（３）将来再利用されるデータであるが、暫くの間使用されず、メモリの領域を開ける必要がある場合には、ＣＰＵによるタスク実行の裏側でＤＴＵが当該データを集中共有メモリに待避し、使用時までに再ロードする。 (3) Data that will be reused in the future, but will not be used for a while, and if the memory area needs to be opened, the DTU saves the data to the centralized shared memory behind the task execution by the CPU. Reload by use.

といったローカルメモリ管理，データ転送技術へと進化している（参考文献５（英国特許第２４７８８７４号明細書）。
１−７．並列化プログラムの生成
自動並列化コンパイラにおける並列化プログラムの生成は、自動並列化ＡＰＩ（参考文献７（早稲田大学、「Optimally Scheduled Advanced Multiprocessor Application Program Interface」、２００８年）参照）を用い、並列化Ｃ或いは並列化Fortranのような、source-to-sourceで並列化を行うことが可能である。 It has evolved to local memory management and data transfer technology (Reference 5 (UK Patent No. 2478874)).
1-7. Generation of a parallelized program Parallelized programs are generated by an automatic parallelizing compiler using an automatic parallelizing API (see Reference 7 (Waseda University, "Optimally Scheduled Advanced Multiprocessor Application Program Interface", 2008)). Alternatively, parallelization can be done with source-to-source, such as parallelized Fortran.

この場合には、自動並列化コンパイラは、様々なプラットフォームにおいて並列化プログラムを実行可能とするため、後述する自動並列化ＡＰＩ標準解釈系を用いて、各プロセッサ用のＣ或いはFortranのディレクティブ部分をランタイムライブラリコールに変換する。その後、自動並列化コンパイラは、各プロセッサ用のコードを逐次コンパイラでコンパイルしてバイナリを生成し、このバイナリをリンクすると、対象となるマルチプロセッサ上で並列化プログラムを実行可能となる。 In this case, the automatic parallelizing compiler can execute the parallelized program on various platforms, so that the C or Fortran directive part for each processor is executed at runtime using the automatic parallelizing API standard interpretation system described later. Convert to library call. Thereafter, the automatic parallelizing compiler compiles the code for each processor with a sequential compiler to generate a binary, and when this binary is linked, the parallelized program can be executed on the target multiprocessor.

２．組み込みシステム用の逐次プログラムの並列化手順と手法
次に、組み込みシステム用の逐次プログラムの特徴について述べ、本実施形態の自動並列化コンパイラによる並列化手法について説明する。なお、組み込みシステムとは、例えば、車載装置であっても良いし、車載装置以外の電子装置であっても良い。また、逐次プログラムは、モデルベース設計により自動生成されたもの（一例として、MathWork社のMatlab（登録商標），Simulink（登録商標）にて自動生成されたもの）であっても良い。 2. Next, features of the sequential program for the embedded system will be described, and a parallelization method by the automatic parallelizing compiler of this embodiment will be described. Note that the embedded system may be, for example, an in-vehicle device or an electronic device other than the in-vehicle device. The sequential program may be automatically generated by model-based design (as an example, automatically generated by MatWork (registered trademark) or Simulink (registered trademark) of MathWork).

自動並列化コンパイラは、条件分岐と代入文により構成され、処理が細かい逐次プログラムに対して、インライン展開やリネーミングを行い、並列性を抽出する。また、リアルタイム性を順守するために条件分岐隠蔽のためのタスク融合を行い、オーバーヘッドが低くなるようにスタティックスケジューリングを行う。さらに、実コストでスタティックスケジューリングを行うために、プロファイル自動フィードバック機能を適用しても良い。 The automatic parallelizing compiler is composed of conditional branches and assignment statements, and performs inline expansion and renaming on sequential programs with fine processing to extract parallelism. In addition, task scheduling for conditional branch concealment is performed in order to comply with real-time characteristics, and static scheduling is performed so as to reduce overhead. Further, an automatic profile feedback function may be applied to perform static scheduling at an actual cost.

また、逐次プログラムにおいて、条件コンパイルスイッチ（プリプロセッサへの命令）により、仕向地や機能やハードウェアの構成等が異なる組み込みシステムの各種別に応じてコンパイルの対象となる記述を選択する条件付コンパイルが行われる場合がある。このような場合、逐次プログラムの各条件コンパイルスイッチの引数として、いずれかの種別に対応する情報（仕向地等を示す情報）を設定することで、逐次プログラムから、該種別に対応するバイナリコードが生成される。 In a sequential program, conditional compilation (selection to the preprocessor) performs conditional compilation to select a description to be compiled according to various types of embedded systems with different destinations, functions, hardware configurations, etc. May be. In such a case, by setting information corresponding to one of the types (information indicating the destination) as an argument of each conditional compilation switch of the sequential program, the binary code corresponding to the type can be obtained from the sequential program. Generated.

これに対し、本実施形態の自動並列化コンパイラは、条件付コンパイルによるコンパイル対象の選択を無視し、逐次プログラムの全ての部分を対象としてマクロタスクの分割や並列性の抽出やスタティックスケジューリング等を行い、並列化プログラムを生成する。その後、並列化プログラムから、条件付コンパイルによりコンパイルの対象外となる記述を特定し、該記述を除いた状態で、マルチコアプロセッサを動作させるためのバイナリデータを生成する。 On the other hand, the automatic parallelizing compiler of this embodiment ignores the selection of the compilation target by conditional compilation, performs macro task division, parallelism extraction, static scheduling, etc. for all parts of the sequential program. Generate a parallelized program. Thereafter, a description that is not subject to compilation is identified from the parallelized program by conditional compilation, and binary data for operating the multi-core processor is generated without the description.

２−１．自動並列化コンパイラの動作環境等について
自動並列化コンパイラ１は、例えば、ＤＶＤ，ＣＤ−ＲＯＭ，ＵＳＢメモリ，メモリカード（登録商標）等の光ディスク，磁気ディスク，半導体製メモリ等として構成された記憶媒体１８に記憶された状態で、ユーザに提供される（図１参照）。無論、ネットワークを経由してユーザに提供されても良い。 2-1. Operating Environment of Automatic Parallelizing Compiler The automatic parallelizing compiler 1 is a storage medium configured as, for example, an optical disk such as a DVD, CD-ROM, USB memory, or memory card (registered trademark), a magnetic disk, a semiconductor memory, or the like. 18 is provided to the user in the state stored in 18 (see FIG. 1). Of course, it may be provided to the user via the network.

そして、自動並列化コンパイラ１がインストールされたパーソナルコンピュータ（ＰＣ）１０は、自動並列化コンパイル装置として動作する。ＰＣ１０は、ディスプレイ１１，ＨＤＤ１２，ＣＰＵ１３，ＲＯＭ１４，ＲＡＭ１５，入力装置１６，読取部１７等を備える。 The personal computer (PC) 10 in which the automatic parallelizing compiler 1 is installed operates as an automatic parallelizing compiling device. The PC 10 includes a display 11, an HDD 12, a CPU 13, a ROM 14, a RAM 15, an input device 16, a reading unit 17, and the like.

ディスプレイ１１は、ＣＰＵ１３から受けた映像信号を、ユーザに対して映像として表示する。
また、入力装置１６は、キーボード、マウス等から構成され、ユーザが操作することにより、その操作に応じた信号をＣＰＵ１３に出力する。 The display 11 displays the video signal received from the CPU 13 as a video to the user.
The input device 16 includes a keyboard and a mouse, and outputs a signal corresponding to the operation to the CPU 13 when operated by the user.

また、読取部１７は、自動並列化コンパイラ１等が記憶された記憶媒体１８からデータを読み取る部位である。
また、ＲＡＭ１５は読み出し、書き込み可能な揮発性メモリであり、ＲＯＭ１４は読み出し専用の不揮発性メモリであり、ＨＤＤ１２は読み出し，書き込みが可能な不揮発性メモリである。ＲＯＭ１４，ＨＤＤ１２には、ＣＰＵ１３が読み出して実行するプログラム等が予め記憶されている。 The reading unit 17 is a part that reads data from the storage medium 18 in which the automatic parallelizing compiler 1 or the like is stored.
The RAM 15 is a readable / writable volatile memory, the ROM 14 is a read-only nonvolatile memory, and the HDD 12 is a readable / writable nonvolatile memory. In the ROM 14 and the HDD 12, programs that are read and executed by the CPU 13 are stored in advance.

また、ＲＡＭ１５は、ＣＰＵ１３がＲＯＭ１４，ＨＤＤ１２に記憶されたプログラムを実行する際に、そのプログラムを一時的に保存するための記憶領域や、作業用のデータを一時的に保存するための記憶領域として用いられる。 The RAM 15 is a storage area for temporarily storing the program when the CPU 13 executes the program stored in the ROM 14 or the HDD 12 or a storage area for temporarily storing work data. Used.

また、ＣＰＵ１３は、ＯＳをＨＤＤ１２から読み出して実行し、ＨＤＤ１２に記録されている各種プログラムをＯＳ上のプロセスとして実行する。また、ＣＰＵ１３は、このプロセスにおいて、必要に応じて入力装置１６から信号の入力を受け付け、ディスプレイ１１に映像信号を出力し、ＲＡＭ１５，ＨＤＤ１２に対してデータの読み出し／書き込みの制御を行う。 Further, the CPU 13 reads the OS from the HDD 12 and executes it, and executes various programs recorded on the HDD 12 as processes on the OS. In this process, the CPU 13 receives an input of a signal from the input device 16 as necessary, outputs a video signal to the display 11, and controls data read / write to the RAM 15 and the HDD 12.

また、ＰＣ１０には、読取部１７を介して記憶媒体１８から読み取られた自動並列化コンパイラ１がインストールされており、自動並列化コンパイラ１は、ＨＤＤ１２に保存され、ＯＳ上のプロセスとして実行されるアプリケーションの１つとなっている。 In addition, the automatic parallelizing compiler 1 read from the storage medium 18 via the reading unit 17 is installed in the PC 10, and the automatic parallelizing compiler 1 is stored in the HDD 12 and executed as a process on the OS. One of the applications.

なお、この自動並列化コンパイル装置は、車載装置等といった組み込みシステム向けの並列化プログラムの開発に用いられる。しかしながら、これに限定されることは無く、例えば情報家電等といった様々な用途の組込みシステム向けの並列化プログラムの開発や、組込みシステム以外の他の用途の並列化プログラムの開発に用いることができる。 This automatic parallel compilation device is used for developing a parallel program for an embedded system such as an in-vehicle device. However, the present invention is not limited to this, and can be used, for example, for developing parallel programs for embedded systems for various uses such as information appliances, and for developing parallel programs for other uses other than embedded systems.

２−２．自動並列化処理について
次に、条件付コンパイルがなされる逐次プログラムに基づき並列化プログラムを生成する自動並列化処理について、図２のフローチャートを用いて説明する。なお、条件付コンパイルにより、該逐次プログラムからは、条件コンパイルスイッチにて参照される引数の値に対応する種別の組み込みシステムに搭載されるバイナリデータが生成される。また、本処理は、ＰＣ１０にて動作している自動並列化コンパイラ１がユーザからの指示に応じて開始する。 2-2. Automatic Parallelization Processing Next, automatic parallelization processing for generating a parallelized program based on a sequential program that is subjected to conditional compilation will be described with reference to the flowchart of FIG. By the conditional compilation, binary data mounted on the embedded system of the type corresponding to the argument value referenced by the conditional compilation switch is generated from the sequential program. Further, this process is started in response to an instruction from the user by the automatic parallelizing compiler 1 operating on the PC 10.

Ｓ１００では、自動並列化コンパイラ１は、逐次プログラムに記述されている条件コンパイルスイッチに基づき、逐次プログラムのうち、条件付コンパイルによりコンパイルするか否かの選択の対象となる条件付記述を特定し、条件付記述を、１又は複数の条件付ブロックに分ける。なお、各条件付ブロックは、１のマクロタスク（条件付マクロタスク）に分割される。 In S100, the automatic parallelizing compiler 1 specifies a conditional description that is a target of selection as to whether or not to compile by conditional compilation among sequential programs based on the conditional compilation switch described in the sequential program, The conditional description is divided into one or more conditional blocks. Each conditional block is divided into one macro task (conditional macro task).

ここで、１セットの条件コンパイルスイッチにより、引数の値に応じてコンパイル対象とするか否かが一括して選択される１つの一連の記述を、条件付ブロックとしても良いし、該選択の対象となる複数の一連の記述の各々を、条件付ブロックとしても良い。また、該一連の記述を組み合わせたものを、条件付ブロックとしても良い。なお、条件付ブロックの設定方法の具体例については、後述する具体例４を参照されたい。 Here, a set of conditional compilation switches that collectively select whether or not to compile according to the value of an argument may be a conditional block, or the selection target Each of a plurality of series of descriptions described above may be a conditional block. A combination of the series of descriptions may be a conditional block. For a specific example of the conditional block setting method, refer to specific example 4 described later.

Ｓ１０５では、自動並列化コンパイラ１は、ソフト構造解析処理（図３参照）を実行し、ＭＴＧを生成する。そして、逐次プログラムにより動作する組み込みシステムの種別の１つ（換言すれば、いずれかの種別に対応する条件コンパイルスイッチの引数の値）を選択し、Ｓ１１０に処理を移行する。 In S105, the automatic parallelizing compiler 1 executes a software structure analysis process (see FIG. 3) to generate an MTG. Then, one of the types of the embedded system that operates by the sequential program (in other words, the value of the argument of the conditional compilation switch corresponding to one of the types) is selected, and the process proceeds to S110.

Ｓ１１０では、自動並列化コンパイラ１は、ソフト構造解析処理で生成されたＭＴＧに基づき、現在選択中の種別に対応するスタティックスケジューリングを行う。これにより、並列実行可能なマクロタスクが異なるＰＧに割り当てられ（該割り当ての結果を示す情報を割当情報とする）、並列化プログラム（以後、比較プログラムと記載）が生成される。この時、比較プログラムは、各種別に対応して生成される。つまり、１の種別に対応するスタティックスケジューリングで、各種別に対応して同じ内容の比較プログラムが複数生成される。以後、比較プログラムが対応する種別を、最終種別とする。 In S110, the automatic parallelizing compiler 1 performs static scheduling corresponding to the currently selected type based on the MTG generated by the software structure analysis process. Thereby, macro tasks that can be executed in parallel are assigned to different PGs (information indicating the result of the assignment is assigned information), and a parallelized program (hereinafter referred to as a comparison program) is generated. At this time, the comparison program is generated corresponding to each type. That is, a plurality of comparison programs having the same contents are generated corresponding to each type by static scheduling corresponding to one type. Hereinafter, the type corresponding to the comparison program is set as the final type.

なお、比較プログラムでは、各条件付マクロタスクに対応する記述に対し、逐次プログラムにおける該記述に相当する条件付記述に設定されていた条件コンパイルスイッチと同等の内容の条件コンパイルスイッチが設定されている。このため、比較プログラムの条件コンパイルスイッチの引数をいずれかの種別に対応する値に設定すると、該種別に対応する並列化プログラムとなる。 In the comparison program, a conditional compilation switch having the same content as the conditional compilation switch set in the conditional description corresponding to the description in the sequential program is set for the description corresponding to each conditional macrotask. . Therefore, if the argument of the conditional compilation switch of the comparison program is set to a value corresponding to any type, a parallelized program corresponding to the type is obtained.

また、自動並列化コンパイラ１は、スタティックスケジューリングにおいて、プロファイル自動フィードバック機能により各マクロタスクを実行する際の実コスト（一例として処理時間）を特定し、各ＰＧの処理負荷が同程度となるように、並列実行可能なマクロタスクを割り当てても良い。さらに、自動並列化コンパイラ１は、条件コンパイルスイッチに基づき、選択中の種別ではコンパイル対象とならないマクロタスクを特定し、該マクロタスクの処理負荷が無いものとみなしてスタティックスケジューリングを行っても良い。 Further, the automatic parallelizing compiler 1 specifies the actual cost (processing time as an example) when executing each macrotask by the profile automatic feedback function in the static scheduling so that the processing load of each PG becomes approximately the same. A macro task that can be executed in parallel may be assigned. Further, the automatic parallelizing compiler 1 may identify a macrotask that is not a compilation target in the selected type based on the conditional compilation switch, and may perform static scheduling on the assumption that there is no processing load on the macrotask.

また、この時、自動並列化コンパイラ１は、並列化プログラムを様々なプラットフォームで動作させるため、自動並列化ＡＰＩ標準解釈系を用いて、自動並列化ＡＰＩが加えられた並列化プログラムをランタイムライブラリが実装された並列化プログラムに変換しても良い。 At this time, since the automatic parallelizing compiler 1 operates the parallelized program on various platforms, the runtime library uses the automatic parallelizing API standard interpretation system to execute the parallelized program to which the automatic parallelizing API is added. It may be converted into an implemented parallelized program.

Ｓ１１５では、自動並列化コンパイラ１は、逐次プログラムにより動作する組み込みシステムの全種別に対応するスタティックスケジューリングが終了したか否かを判定する。そして、肯定判定が得られた場合には（Ｓ１１５：Ｙｅｓ）、Ｓ１２０に処理を移行し、否定判定が得られた場合には（Ｓ１１５：Ｎｏ）、スタティックスケジューリングが行われていない他の種別を選択し、Ｓ１１０に移行する。 In S115, the automatic parallelizing compiler 1 determines whether or not the static scheduling corresponding to all types of embedded systems operating by sequential programs has been completed. If an affirmative determination is obtained (S115: Yes), the process proceeds to S120. If a negative determination is obtained (S115: No), another type for which static scheduling is not performed is selected. Select and move to S110.

Ｓ１２０では、自動並列化コンパイラ１は、Ｓ１１０〜Ｓ１１５にて生成された全比較プログラムの性能を検証すると共に、検証結果に基づき適切なスタティックスケジューリングを選択し、Ｓ１２５に処理を移行する。この時、相対的に最も性能の良い比較プログラムを生成可能なスタティックスケジューリングを選択しても良いし、所定の性能を有する比較プログラムを生成可能なスタティックスケジューリングを選択しても良い（詳細は後述する）。 In S120, the automatic parallelizing compiler 1 verifies the performance of all the comparison programs generated in S110 to S115, selects appropriate static scheduling based on the verification result, and shifts the processing to S125. At this time, static scheduling capable of generating a comparative program having the best performance may be selected, or static scheduling capable of generating a comparative program having a predetermined performance may be selected (details will be described later). ).

Ｓ１２５では、自動並列化コンパイラ１は、スタティックスケジューリングの選択に成功したか否かを判定する。具体的には、例えば、最も性能の良い比較プログラムを生成可能なスタティックスケジューリングが１つだけ存在する場合には、選択に成功したものとし、そうでない場合には、選択に失敗したものとしても良い。また、一定の性能を有するスタティックスケジューリングが存在する場合には、選択に成功したものとし、そうでない場合には、選択に失敗したものとしても良い。そして、肯定判定が得られた場合には（Ｓ１２５：Ｙｅｓ）、逐次プログラムにより動作する組み込みシステムの種別の１つを選択してＳ１３０に処理を移行し、否定判定が得られた場合には（Ｓ１２５：Ｎｏ）、Ｓ１４５に移行する。 In S125, the automatic parallelizing compiler 1 determines whether or not the static scheduling has been successfully selected. Specifically, for example, if there is only one static scheduling capable of generating the best performance comparison program, the selection may be successful, and if not, the selection may be failed. . If static scheduling having a certain performance exists, the selection may be successful, and if not, the selection may be failed. If an affirmative determination is obtained (S125: Yes), one of the embedded system types operating by the sequential program is selected and the process proceeds to S130. If a negative determination is obtained ( S125: No), the process proceeds to S145.

Ｓ１３０では、自動並列化コンパイラ１は、選択されたスタティックスケジューリングにより生成され、選択中の種別に対応する比較プログラム（選択プログラム）に記述されているコンパイラスイッチを解析し、解析結果に応じた処理を行うプリプロ処理を実行する。この時、条件コンパイルスイッチに基づき、選択プログラムの中から選択中の種別ではコンパイルの対象外となる記述を特定し、Ｓ１３５に移行する。 In S130, the automatic parallelizing compiler 1 analyzes the compiler switch generated by the selected static scheduling and described in the comparison program (selected program) corresponding to the selected type, and performs processing according to the analysis result. Execute pre-pro processing to be performed. At this time, based on the conditional compilation switch, a description that is not subject to compilation is specified for the type selected from the selected programs, and the process proceeds to S135.

Ｓ１３５では、自動並列化コンパイラ１は、選択プログラムにおけるコンパイルの対象となる記述から、マルチコアプロセッサを動作させるためのバイナリデータを生成し、Ｓ１４０に移行する。 In S135, the automatic parallelizing compiler 1 generates binary data for operating the multi-core processor from the description to be compiled in the selected program, and proceeds to S140.

Ｓ１４０では、自動並列化コンパイラ１は、全種別に対応するバイナリデータが生成されたか否かを判定する。そして、肯定判定が得られた場合には（Ｓ１４０：Ｙｅｓ）、本処理を終了し、否定判定が得られた場合には（Ｓ１４０：Ｎｏ）、バイナリデータが生成されていない他の種別を選択し、Ｓ１３０に移行する。 In S140, the automatic parallelizing compiler 1 determines whether binary data corresponding to all types has been generated. If an affirmative determination is obtained (S140: Yes), the present process is terminated. If a negative determination is obtained (S140: No), another type for which binary data has not been generated is selected. Then, the process proceeds to S130.

一方、Ｓ１４５では、自動並列化コンパイラ１は、ディスプレイ１１等を介して、スタティックスケジューリングの選択に失敗した旨を報知する。この時、比較プログラムの性能の検証結果として、例えば、各比較プログラムの並列実行時間，低減率や、各スタティックスケジューリングにより生成された比較プログラムにおける最大並列実行時間，最小低減率（詳細は後述する）等を表示しても良い。その後、自動並列化コンパイラ１は、本処理を終了する。 On the other hand, in S145, the automatic parallelizing compiler 1 notifies that the selection of the static scheduling has failed via the display 11 or the like. At this time, as the verification result of the performance of the comparison program, for example, the parallel execution time and reduction rate of each comparison program, and the maximum parallel execution time and minimum reduction rate in the comparison program generated by each static scheduling (details will be described later) Etc. may be displayed. Thereafter, the automatic parallelizing compiler 1 ends this processing.

２−３．ソフト構造解析処理について
次に、逐次プログラムからＭＴＧを生成するソフト構造解析処理について、図３のフローチャートを用いて説明する。本処理は、自動並列化処理にて実行される。 2-3. Software Structure Analysis Processing Next, software structure analysis processing for generating MTG from a sequential program will be described using the flowchart of FIG. This processing is executed by automatic parallelization processing.

Ｓ２００では、自動並列化コンパイラ１は、逐次プログラムに対し、インライン展開（サブルーチンをコールする記述を、該サブルーチンにて定義されている処理の記述に置き換える）を行い、Ｓ２０５に移行する。組み込みシステム用の逐次プログラムは、一般的に処理が細かく、粗い粒度での並列化が困難であるが、インライン展開を行うことで、サブルーチン内の並列性をも有効活用できるようになる。 In S200, the automatic parallelizing compiler 1 performs inline expansion (replaces the description for calling the subroutine with the description of the process defined in the subroutine) for the sequential program, and proceeds to S205. A sequential program for an embedded system is generally finely processed and difficult to parallelize with a coarse granularity. However, by performing inline expansion, the parallelism within the subroutine can be effectively utilized.

Ｓ２０５では、自動並列化コンパイラ１は、ローカル変数のリネームを行う。例えばSimulinkモデルから自動生成された逐次プログラム等では、ＲＯＭ使用量削減のため、多くの箇所で同じローカル変数が繰り返し使用される場合があり、これにより、並列性解析の際にデータ依存があると特定され、並列性が十分引き出せなくなってしまう。そこで、使い回しされているローカル変数のリネームが行われる。 In S205, the automatic parallelizing compiler 1 renames the local variable. For example, in a sequential program automatically generated from a Simulink model, the same local variable may be used repeatedly in many places to reduce ROM usage. As a result, the parallelism cannot be extracted sufficiently. Therefore, the local variables being reused are renamed.

具体的には、自動並列化コンパイラ１は、逐次プログラムの各関数内において、同一名称のローカル変数が用いられている複数の処理ブロックを特定すると共に、特定した各処理ブロックにおいて独自の名称のローカル変数が用いられるよう、逐次プログラムを改変する。 Specifically, the automatic parallelizing compiler 1 specifies a plurality of processing blocks in which local variables with the same name are used in each function of the sequential program, and each local block with a unique name in each specified processing block. Modify the sequential program so that variables are used.

なお、処理ブロックとは、例えば、ループ処理や、if文やswitch-case文等の分岐処理のステートメントと、これに付随する代入文等から構成される記述の集合体であっても良い。また、このほかにも、例えば、逐次プログラムを生成したSimulinkモデルにおける各ブロックに対応する記述の集合体を、処理ブロックとしても良い。 The processing block may be, for example, a collection of descriptions including a loop processing, a branch processing statement such as an if statement or a switch-case statement, and an assignment statement accompanying the statement. In addition, for example, a collection of descriptions corresponding to each block in the Simulink model that generated the sequential program may be used as a processing block.

Ｓ２１０では、自動並列化コンパイラ１は、これらの処理がなされた逐次プログラムをマクロタスクに分割する。なお、上述したように、条件付ブロックは、１のマクロタスク（条件付マクロタスク）に分割される。そして、各マクロタスク間のデータ依存性と制御依存性を解析してＭＦＧを生成し、Ｓ２１５に移行する。 In S210, the automatic parallelizing compiler 1 divides the sequential program that has been subjected to these processes into macro tasks. As described above, the conditional block is divided into one macro task (conditional macro task). Then, an MFG is generated by analyzing data dependency and control dependency between the macro tasks, and the process proceeds to S215.

Ｓ２１５では、自動並列化コンパイラ１は、ＭＦＧが示す制御依存性に基づき、異なるマクロタスクに分岐するマクロタスクを始端タスクとして特定する。また、自動並列化コンパイラ１は、該始端タスクを始点として順次実行される複数の一連の処理の全てにおいて共通して実行されるマクロタスクのうち、最初に実行されるものを終端タスクとして特定する。 In S215, the automatic parallelizing compiler 1 identifies a macro task that branches to a different macro task as a start task based on the control dependency indicated by the MFG. Further, the automatic parallelizing compiler 1 identifies the macro task that is executed first among the macro tasks that are executed in common among all of the series of processes that are sequentially executed starting from the start task as the end task. .

そして、自動並列化コンパイラ１は、特定した始端タスクと、該始端タスクを始点とする処理における終端タスクと、該始端タスクの実行後であって、該終端タスクの実行前に実行される全てのマクロタスクとを、１つのマクロタスクに融合させ（タスク融合）、Ｓ２２０に移行する。 Then, the automatic parallelizing compiler 1 executes the specified start task, the end task in the process starting from the start task, and all the processes executed after the start task and before the end task is executed. The macro task is merged into one macro task (task fusion), and the process proceeds to S220.

なお、マクロタスクの粒度を細かくするためには、Ｓ２１５での処理のように、始端タスクを始点として順次実行される複数の一連の処理の全てにおいて共通して実行されるマクロタスクのうち、最初に実行されるマクロタスクを終端タスクとして特定するのが好適である。しかし、これに限らず、これらのマクロタスクのうち、２番目以降に実行されるいずれか一つのマクロタスクを終端タスクとして特定しても良い。 In order to make the granularity of the macro task fine, among the macro tasks that are commonly executed in all of a plurality of series of processes that are sequentially executed starting from the start task as in the process in S215, It is preferable to specify a macro task to be executed as a termination task. However, the present invention is not limited to this, and any one of these macrotasks executed after the second may be specified as a termination task.

組み込みシステム（特に車載装置）向けの逐次プログラムは、ループ構造が少ないため、近細粒度並列化か粗粒度タスク並列化を適用することが考えられるが、実行オーバーヘッドを小さく抑えるため、自動並列化コンパイラ１は、粗粒度タスク並列化を適用する。 Sequential programs for embedded systems (especially in-vehicle devices) have few loop structures, so it may be possible to apply near fine-grain parallelism or coarse-grained task parallelism. 1 applies coarse-grained task parallelization.

また、逐次プログラムでは、各マクロタスクのコストは数１０クロック程度であるが、自動並列化コンパイラ１によりダイナミックスケジューリングを行った場合には、通常数１０から数１００クロックのオーバーヘッドが生じる。このため、ダイナミックスケジューリングは、逐次プログラムには不向きである。 In the sequential program, the cost of each macrotask is about several tens of clocks. However, when dynamic scheduling is performed by the automatic parallelizing compiler 1, an overhead of several tens to several hundreds of clocks is usually generated. For this reason, dynamic scheduling is not suitable for sequential programs.

しかしながら、条件分岐を持つマクロタスクは、その実行時に動的に分岐先が決定されるため、そのままでは、コンパイル時にプロセッサコアを割り当てるスタティックスケジューリングが適用できないという問題がある。 However, since a macrotask having a conditional branch is dynamically determined at the time of execution of the macrotask, static scheduling that assigns a processor core at the time of compilation cannot be applied as it is.

そこで、自動並列化コンパイラ１は、タスク融合アルゴリズムにより、条件分岐をもつマクロタスクと、その分岐先のマクロタスクまでを１つの粗粒度タスク（Blockタスク）に融合するタスク融合を行う。タスク融合を行うことで、ＭＦＧは制御依存性が無い状態となり、スタティックスケジューリングが可能となる。 Therefore, the automatic parallelizing compiler 1 performs task fusion by fusing a macro task having a conditional branch and a macro task at the branch destination into one coarse-grained task (Block task) by a task fusion algorithm. By performing task fusion, the MFG has no control dependency, and static scheduling is possible.

Ｓ２２０では、自動並列化コンパイラ１は、タスク融合がなされたＭＦＧに基づき、各マクロタスクの最早実行可能条件を解析し、ＭＴＧを生成する。そして、本処理を終了する。 In S220, the automatic parallelizing compiler 1 analyzes the earliest executable condition of each macro task based on the MFG subjected to task fusion, and generates an MTG. Then, this process ends.

３．車載装置の構成について
次に、本実施形態の自動並列化コンパイラ１により生成された並列化プログラムにより動作する車載装置２０の構成について説明する（図４参照）。無論、自動並列化コンパイラ１は、車載装置２０に限らず、同様の構成を有する様々な電子装置を動作させる並列化プログラムを生成可能である。 3. Next, the configuration of the in-vehicle device 20 that operates according to the parallelized program generated by the automatic parallelizing compiler 1 of the present embodiment will be described (see FIG. 4). Of course, the automatic parallelizing compiler 1 is capable of generating parallelized programs that operate not only the in-vehicle device 20 but also various electronic devices having the same configuration.

車載装置２０は、マルチコアプロセッサ２１，通信部２２，センサ部２３，入出力ポート２４等を備える。
マルチコアプロセッサ２１は、ＲＯＭ２１ａと、ＲＡＭ２１ｂと、複数のＰＥ２１ｃ，２１ｄ…等を有している。 The in-vehicle device 20 includes a multi-core processor 21, a communication unit 22, a sensor unit 23, an input / output port 24, and the like.
The multi-core processor 21 includes a ROM 21a, a RAM 21b, and a plurality of PEs 21c, 21d, etc.

また、ＲＯＭ２１ａは、自動並列化コンパイラ１により生成された並列化プログラム２１ａ−１（バイナリデータ）が保存されている。マルチコアプロセッサ２１は、並列化プログラム２１ａ−１に従い動作し、車載装置２０を統括制御する。 The ROM 21a stores a parallelized program 21a-1 (binary data) generated by the automatic parallelizing compiler 1. The multi-core processor 21 operates according to the parallelized program 21a-1 and performs overall control of the in-vehicle device 20.

また、ＲＡＭ２１ｂは、ＰＥ２１ｃ，２１ｄ…等によりアクセスされる部位である。
また、通信部２２は、車内ＬＡＮ等を介して接続された他のＥＣＵと通信を行う部位である。 The RAM 21b is a part accessed by the PEs 21c, 21d, etc.
The communication unit 22 is a part that communicates with another ECU connected via an in-vehicle LAN or the like.

また、センサ部２３は、制御対象等の状態を検出するための各種センサから構成される部位である。
また、入出力ポート２４は、制御対象を制御するための各種信号の送受信を行う部位である。 Moreover, the sensor part 23 is a site | part comprised from the various sensors for detecting states, such as a control object.
The input / output port 24 is a part that transmits and receives various signals for controlling the control target.

［具体例について］
次に、本実施形態の自動並列化コンパイラ１により並列化プログラムを生成する処理の具体例について説明する。以下の説明において、処理Ａ等といった記載がなされるが、これは、各種演算や代入や分岐処理や関数コール等からなる一連の処理の記述を意味する。 [Specific examples]
Next, a specific example of processing for generating a parallelized program by the automatic parallelizing compiler 1 of the present embodiment will be described. In the following description, description such as process A is made, which means a description of a series of processes including various operations, assignments, branch processes, function calls, and the like.

１．具体例１について
具体例１では、２つの種別（Ａ仕様とＢ仕様）の組み込みシステムに対応する条件付コンパイルが行われる逐次プログラム３００に基づき比較プログラムが生成され、各比較プログラムの性能が検証される（図５参照）。 1. About Specific Example 1 In Specific Example 1, a comparison program is generated based on a sequential program 300 that is subjected to conditional compilation corresponding to two types of embedded systems (A specification and B specification), and the performance of each comparison program is verified. (See FIG. 5).

なお、具体例１では、２つのＰＥ（第１，第２コア）を有するマルチコアプロセッサにて実行される並列化プログラムが生成され、各マクロタスクは、これらのコアに割り当てられる。 In Specific Example 1, a parallelized program that is executed by a multi-core processor having two PEs (first and second cores) is generated, and each macrotask is assigned to these cores.

また、逐次プログラム３００では、“処理Ｃ１及びＣ２”が条件付記述となっており、条件コンパイルスイッチ３０１により、“処理Ｃ１”と“処理Ｃ２”のうちの一方がコンパイルの対象として選択される。具体的には、Ａ仕様（“ＣＳＷ＝Ａ”）の場合には処理Ｃ１が、Ｂ仕様（“ＣＳＷ≠Ａ”）の場合には処理Ｃ２がコンパイルの対象として選択される（図６参照）。 In the sequential program 300, “processing C1 and C2” is a conditional description, and the conditional compilation switch 301 selects one of “processing C1” and “processing C2” as a compilation target. Specifically, in the case of the A specification (“CSW = A”), the processing C1 is selected, and in the case of the B specification (“CSW ≠ A”), the processing C2 is selected as the target of compilation (see FIG. 6). .

自動並列化コンパイラ１は、自動並列化処理のＳ１００にて、逐次プログラム３００の“処理Ｃ１，Ｃ２”の各々を条件付ブロックとして特定する。無論、“処理Ｃ１及びＣ２”を、１つの条件付ブロックとして特定しても良い。 The automatic parallelizing compiler 1 identifies each of “processing C1 and C2” of the sequential program 300 as a conditional block in S100 of the automatic parallelizing processing. Of course, “processing C1 and C2” may be specified as one conditional block.

また、Ｓ１０５（ソフト構造解析処理）では、自動並列化コンパイラ１は、“処理Ａ〜Ｃ２”の各々をマクロタスクとし、ＭＴＧを生成する。
また、Ｓ１１０〜Ｓ１１５において、自動並列化コンパイラ１は、逐次プログラム３００に基づき、Ａ仕様に対応するスタティックスケジューリングＡと、Ｂ仕様に対応するスタティックスケジューリングＢを行う。 In S105 (software structure analysis process), the automatic parallelizing compiler 1 generates an MTG using each of “Processes A to C2” as a macro task.
In S110 to S115, the automatic parallelizing compiler 1 performs static scheduling A corresponding to the A specification and static scheduling B corresponding to the B specification based on the sequential program 300.

図５の３１０はスタティックスケジューリングＡにより生成された割当情報を、３１１はスタティックスケジューリングＢにより生成された割当情報を示している。
割当情報３１０，３１１における“処理Ａ”，“処理Ｂ”，“処理Ｃ１”，“処理Ｃ２”は、逐次プログラムにおける各処理に対応するマクロタスクを示している。各スタティックスケジューリングでは、一例として各処理の処理時間が処理負荷とみなされる。また、スタティックスケジューリングＡでは、“処理Ｃ２”の処理負荷が、スタティックスケジューリングＢでは、“処理Ｃ１”の処理負荷が無いものとみなされる。 In FIG. 5, 310 indicates allocation information generated by static scheduling A, and 311 indicates allocation information generated by static scheduling B.
“Processing A”, “Processing B”, “Processing C1”, and “Processing C2” in the allocation information 310 and 311 indicate macrotasks corresponding to the processes in the sequential program. In each static scheduling, for example, the processing time of each process is regarded as a processing load. In static scheduling A, the processing load of “processing C2” is regarded as having no processing load of “processing C1” in static scheduling B.

上記スタティックスケジューリングにより、４つの比較プログラムＡＡ〜ＢＢが生成される。比較プログラムＡＡ，ＡＢは、スタティックスケジューリングＡにより生成されたものである。比較プログラムＡＡの最終種別はＡ仕様であり、比較プログラムＡＢの最終種別はＢ仕様である。また、比較プログラムＢＡ，ＢＢは、スタティックスケジューリングＢにより生成されたものである。比較プログラムＢＡの最終種別はＡ仕様であり、比較プログラムＢＢの最終種別はＢ仕様である。 Four comparison programs AA to BB are generated by the static scheduling. The comparison programs AA and AB are generated by static scheduling A. The final type of the comparison program AA is A specification, and the final type of the comparison program AB is B specification. The comparison programs BA and BB are generated by static scheduling B. The final type of the comparison program BA is A specification, and the final type of the comparison program BB is B specification.

図５の３２０は比較プログラムＡＡ及びＡＢを、３２２は比較プログラムＢＡ及びＢＢを示している。比較プログラム３２０，３２２における条件コンパイルスイッチ３２１，３２３の引数である“ＣＳＷ”の値を設定すると、比較プログラムＡＡ〜ＢＢのうちのいずれかが生成される。 In FIG. 5, 320 indicates comparison programs AA and AB, and 322 indicates comparison programs BA and BB. When the value of “CSW” that is an argument of the conditional compilation switches 321 and 323 in the comparison programs 320 and 322 is set, one of the comparison programs AA to BB is generated.

また、Ｓ１２０において、自動並列化コンパイラ１は、Ｓ１１０〜Ｓ１１５にて生成された比較プログラムＡＡ〜ＢＢの性能を検証し、検証結果に基づき、最適なスタティックスケジューリングを選択する。この時、比較プログラムＡＡ〜ＢＢの各々の並列実行時間と低減率とが算出され、これらに基づき性能の検証が行われる。 In S120, the automatic parallelizing compiler 1 verifies the performance of the comparison programs AA to BB generated in S110 to S115, and selects the optimal static scheduling based on the verification result. At this time, the parallel execution time and the reduction rate of each of the comparison programs AA to BB are calculated, and the performance is verified based on them.

並列実行時間とは、比較プログラムにおいて各コア（ＰＧ）に割り当てられたマクロタスクの処理時間の総和のうちの最大値を意味する。なお、該総和を算出する際、比較プログラムに係る最終種別ではコンパイルの対象外となるマクロタスクの処理時間は０となるのは、言うまでも無い。 The parallel execution time means the maximum value of the total processing time of the macrotasks assigned to each core (PG) in the comparison program. Needless to say, when calculating the sum, the processing time of a macrotask that is not subject to compilation is 0 in the final type of the comparison program.

また、低減率とは、比較プログラムにより動作するマルチコアプロセッサと、該比較プログラムと同一の種別の逐次プログラムにより動作するシングルコアプロセッサの性能の差の推定結果である。低減率は、比較プログラムの並列実行時間と、該比較プログラムに係る最終種別に対応する条件付コンパイルがなされた逐次プログラムにより動作するシングルコア用プロセッサが実行する各処理の処理時間の総和（総処理時間）との比率（％）として算出される。低減率は、（１−比較プログラムの並列実行時間／総処理時間）×１００により算出される。 The reduction rate is an estimation result of a difference in performance between a multi-core processor that is operated by a comparison program and a single-core processor that is operated by a sequential program of the same type as the comparison program. The reduction rate is the sum of the parallel execution time of the comparison program and the processing time of each process executed by the single-core processor operated by the sequential program that has been subjected to conditional compilation corresponding to the final type of the comparison program (total processing) Time) and the ratio (%). The reduction rate is calculated by (1-parallel program parallel execution time / total processing time) × 100.

図６の表は、“処理Ａ〜Ｃ２”の処理時間を示しており、図７の表は、比較プログラムＡＡ〜ＢＢについて算出された並列実行時間と低減率とを示している。なお、Ａ仕様の逐次プログラムの総処理時間は２４０μｓに、Ｂ仕様の逐次プログラムの総処理時間は１９０μｓとなる。 The table of FIG. 6 shows the processing times of “Processes A to C2”, and the table of FIG. 7 shows the parallel execution times and reduction rates calculated for the comparison programs AA to BB. The total processing time for the A-specific sequential program is 240 μs, and the total processing time for the B-specific sequential program is 190 μs.

スタティックスケジューリングＡにより生成された比較プログラムＡＡ，ＡＢでは、並列実行時間の最大値（最大並列実行時間）は１２０μｓ、低減率の最小値（最小低減率）は３７％となる。一方、スタティックスケジューリングＢにより生成された比較プログラムＢＡ，ＢＢでは、最大並列実行時間は１４０μｓ、最小低減率は４２％となる。 In the comparison programs AA and AB generated by the static scheduling A, the maximum parallel execution time (maximum parallel execution time) is 120 μs, and the minimum reduction rate (minimum reduction rate) is 37%. On the other hand, in the comparison programs BA and BB generated by the static scheduling B, the maximum parallel execution time is 140 μs, and the minimum reduction rate is 42%.

このため、並列実行時間に基づき最適なスタティックスケジューリングを選択するとした場合、スタティックスケジューリングＡの最大並列実行時間が相対的に小さい。このため、スタティックスケジューリングＡにより生成された比較プログラムが、相対的に性能が良いと判断され、スタティックスケジューリングＡが選択される。 For this reason, when the optimum static scheduling is selected based on the parallel execution time, the maximum parallel execution time of the static scheduling A is relatively small. For this reason, it is determined that the comparison program generated by the static scheduling A has relatively good performance, and the static scheduling A is selected.

一方、低減率に基づき最適なスタティックスケジューリングを選択するとした場合、スタティックスケジューリングＢの最小低減率が相対的に大きい。このため、スタティックスケジューリングＢにより生成された比較プログラムが、相対的に性能が良いと判断され、スタティックスケジューリングＢが選択される。 On the other hand, when the optimal static scheduling is selected based on the reduction rate, the minimum reduction rate of static scheduling B is relatively large. For this reason, it is determined that the comparison program generated by the static scheduling B has relatively good performance, and the static scheduling B is selected.

なお、原則として最大並列実行時間に基づきスタティックスケジューリングを選択し、各スタティックスケジューリングの最大並列実行時間が同じ（又は同程度）である場合には、最小低減率に基づきスタティックスケジューリングを選択しても良い。反対に、原則として最小低減率に基づきスタティックスケジューリングを選択し、各スタティックスケジューリングの最小低減率が同じ（又は同程度）である場合には、最大並列実行時間に基づきスタティックスケジューリングを選択しても良い。 In principle, static scheduling may be selected based on the maximum parallel execution time, and if the maximum parallel execution time of each static scheduling is the same (or similar), static scheduling may be selected based on the minimum reduction rate. . Conversely, in principle, static scheduling may be selected based on the minimum reduction rate, and if the minimum reduction rate of each static scheduling is the same (or similar), static scheduling may be selected based on the maximum parallel execution time. .

また、種別が３以上存在する場合も、同様にして、最大並列実行時間や最小低減率に基づきスタティックスケジューリングを選択しても良い。
２．具体例２について
具体例２では、２つの種別（Ａ仕様とＢ仕様）の組み込みシステムに対応する条件付コンパイルが行われる逐次プログラム４００に基づき比較プログラムが生成され、各比較プログラムの性能が検証される（図８参照）。 Similarly, when there are three or more types, static scheduling may be selected based on the maximum parallel execution time and the minimum reduction rate.
2. About Specific Example 2 In specific example 2, a comparison program is generated based on sequential program 400 that is subjected to conditional compilation corresponding to two types of embedded systems (A specification and B specification), and the performance of each comparison program is verified. (See FIG. 8).

なお、具体例２では、２つのＰＥ（第１，第２コア）を有するマルチコアプロセッサにて実行される並列化プログラムが生成され、各マクロタスクは、これらのコアに割り当てられる。 In the second specific example, a parallelized program to be executed by a multicore processor having two PEs (first and second cores) is generated, and each macrotask is assigned to these cores.

また、逐次プログラム４００では、“処理Ｄ１，Ｄ２，Ｆ，Ｇ”が条件付記述となっており、条件コンパイルスイッチ４０１〜４０３により、これらの条件付記述のうちの一部がコンパイルの対象として選択される。具体的には、Ａ仕様（“ＣＳＷ＝Ａ”）の場合には“処理Ｄ１，Ｆ”が、Ｂ仕様（“ＣＳＷ≠Ａ”）の場合には“処理Ｄ２，Ｇ”が選択される（図８，１０参照）。 Further, in the sequential program 400, “processes D1, D2, F, and G” are conditional descriptions, and some of these conditional descriptions are selected as compilation targets by the conditional compilation switches 401 to 403. Is done. Specifically, in the case of the A specification (“CSW = A”), “processing D1, F” is selected, and in the case of the B specification (“CSW ≠ A”), “processing D2, G” is selected ( (See FIGS. 8 and 10).

自動並列化コンパイラ１は、自動並列化処理のＳ１００にて、逐次プログラム４００の“処理Ｄ１，Ｄ２，Ｆ，Ｇ”の各々を条件付ブロックとして特定する。
無論、１セットの条件コンパイルスイッチ４０１により引数の値に応じてコンパイル対象とするか否かが一括して選択される２つの処理である“処理Ｄ１，Ｄ２”を、１の条件付ブロックとして特定しても良い。また、２セットの条件コンパイルスイッチ４０２，４０３により引数の値に応じてコンパイル対象とするか否かが一括して選択される２つの処理である“処理Ｆ，Ｇ”を、１の条件付ブロックとして特定しても良い。 The automatic parallelizing compiler 1 specifies each of the “processing D1, D2, F, G” of the sequential program 400 as a conditional block in S100 of the automatic parallelization processing.
Of course, “Processing D1 and D2”, which are two processes in which whether or not to compile according to the value of the argument by a set of conditional compilation switch 401, are specified as one conditional block. You may do it. In addition, two processes “process F, G”, which are collectively selected by the two sets of conditional compilation switches 402, 403 as to whether or not to compile according to the value of the argument, are set as one conditional block. You may specify as.

また、Ｓ１０５（ソフト構造解析処理）では、自動並列化コンパイラ１は、“処理Ａ〜Ｉ”の各々をマクロタスクとし、ＭＴＧを生成する。図８の４１０は、逐次プログラム４００に基づき生成されたＭＴＧである。なお、自動並列化処理では、条件コンパイルスイッチに関わらず、逐次プログラム４００に記述された全ての処理を対象に並列性が抽出されるため、ＭＴＧ４１０では、各種別でコンパイルの対象となる全マクロタスクのデータ依存が表現される。 In S105 (software structure analysis processing), the automatic parallelizing compiler 1 generates an MTG using each of “Processes A to I” as a macro task. Reference numeral 410 in FIG. 8 denotes an MTG generated based on the sequential program 400. In the automatic parallelization process, parallelism is extracted for all processes described in the sequential program 400 regardless of the conditional compilation switch. Therefore, in the MTG 410, all macro tasks to be compiled according to various types are extracted. Data dependence is expressed.

また、Ｓ１１０〜Ｓ１１５において、自動並列化コンパイラ１は、逐次プログラム４００に基づき、Ａ仕様に対応するスタティックスケジューリングＡと、Ｂ仕様に対応するスタティックスケジューリングＢを行う。これらのスタティックスケジューリングは、各マクロタスクの処理負荷（一例として処理時間）に基づき行われる（図１０参照）。 In S110 to S115, the automatic parallelizing compiler 1 performs static scheduling A corresponding to the A specification and static scheduling B corresponding to the B specification based on the sequential program 400. These static schedulings are performed based on the processing load (processing time as an example) of each macro task (see FIG. 10).

図９の４１１は、逐次プログラム４００に基づくＭＴＧであって、Ａ仕様でコンパイルの対象となるマクロタスクを実線で、コンパイルの対象外となるマクロタスクを点線で示したものである。一方、４１２は、逐次プログラム４００に基づくＭＴＧであって、Ｂ仕様でコンパイルの対象となるマクロタスクを実線で、コンパイルの対象外となるマクロタスクを点線で示したものである。 Reference numeral 411 in FIG. 9 is an MTG based on the sequential program 400, and indicates a macrotask to be compiled in the A specification by a solid line and a macrotask that is not to be compiled by a dotted line. On the other hand, 412 is an MTG based on the sequential program 400, and indicates a macrotask to be compiled in the B specification with a solid line and a macrotask that is not to be compiled with a dotted line.

スタティックスケジューリングＡでは、ＭＴＧ４１１にて点線で示されたマクロタスクの処理負荷は無いものとみなされ、スタティックスケジューリングＢでは、ＭＴＧ４１２にて点線で示されたマクロタスクの処理負荷は無いものとみなされる。 In static scheduling A, it is considered that there is no macrotask processing load indicated by a dotted line in MTG 411, and in static scheduling B, it is assumed that there is no macrotask processing load indicated by a dotted line in MTG 412.

また、これらのスタティックスケジューリングでは、ＭＴＧから、データ依存により連なっている複数のマクロタスクのグループが特定される。具体的には、“処理Ａ，Ｂ，Ｄ２，Ｉ”と、“処理Ａ，Ｂ，Ｄ１，Ｉ”と、“処理Ａ，Ｃ，Ｉ”と、“処理Ａ，Ｃ，Ｆ，Ｉ”と、“処理Ａ，Ｇ，Ｈ，Ｉ”と、“処理Ａ，Ｈ，Ｉ”が特定される。 Further, in these static scheduling, a group of a plurality of macrotasks connected by data dependence is specified from the MTG. Specifically, “Processing A, B, D2, I”, “Processing A, B, D1, I”, “Processing A, C, I”, “Processing A, C, F, I” , “Processing A, G, H, I” and “Processing A, H, I” are specified.

そして、各グループに属するマクロタスクの処理時間の総和が算出され、該総和の最も大きいグループを構成するマクロタスクが、第１コアに割り当てられる。また、他のマクロタスクは、最大処理時間（各コアに割り当てられたマクロタスクの処理時間の総和のうちの最大値）が最小となるように、第１又は第２コアに割り当てられる。 Then, the total processing time of the macrotasks belonging to each group is calculated, and the macrotasks constituting the group having the largest total sum are allocated to the first core. Other macrotasks are assigned to the first or second core so that the maximum processing time (the maximum value of the total processing time of the macrotasks assigned to each core) is minimized.

なお、原則として、データ依存を有している複数のマクロタスクは同一のコアに割り当てられるが、このようなマクロタスクが異なるコアに割り当てられる場合もある。このような場合には、一方のコアでの処理が完了するまで、他方のコアの処理が待たされることになる。各コアに割り当てられたマクロタスクの処理時間の総和とは、このような待ち時間も含む時間となる。 In principle, a plurality of macro tasks having data dependence are assigned to the same core, but such macro tasks may be assigned to different cores. In such a case, the processing of the other core is waited until the processing of one core is completed. The total processing time of macrotasks assigned to each core is a time including such a waiting time.

４２０，４２１は、スタティックスケジューリングＡにより生成された割当情報を、４２２，４２３は、スタティックスケジューリングＢにより生成された割当情報を示している。割当情報４２０，４２２は、最終種別をＡ仕様とする場合に対応しており、割当情報４２１，４２３は、最終種別をＢ仕様とする場合に対応している。これらの割当情報においては、コンパイルの対象となるマクロタスクは実線で、対象外となるマクロタスクは点線で示されている。 420 and 421 indicate allocation information generated by static scheduling A, and 422 and 423 indicate allocation information generated by static scheduling B. The allocation information 420 and 422 corresponds to the case where the final type is the A specification, and the allocation information 421 and 423 corresponds to the case where the final type is the B specification. In these allocation information, the macrotasks to be compiled are indicated by solid lines, and the macrotasks that are not targeted are indicated by dotted lines.

また、割当情報４２０〜４２３における“待ち”という記載は、第１コアにより“処理Ａ”や“処理Ｉ”が実行されている間、第２コアは、当該処理が終了するのを待つことを示している。 In addition, the description “wait” in the allocation information 420 to 423 indicates that the second core waits for the end of the process while “Process A” and “Process I” are executed by the first core. Show.

また、Ｓ１２０において、自動並列化コンパイラ１は、Ｓ１１０〜Ｓ１１５にて生成された比較プログラムＡＡ〜ＢＢの性能を検証し、検証結果に基づき、最適なスタティックスケジューリングを選択する。この時、比較プログラムＡＡ〜ＢＢの各々の並列実行時間が算出され、これらに基づき性能の検証が行われる。 In S120, the automatic parallelizing compiler 1 verifies the performance of the comparison programs AA to BB generated in S110 to S115, and selects the optimal static scheduling based on the verification result. At this time, the parallel execution time of each of the comparison programs AA to BB is calculated, and the performance is verified based on them.

図１０の表は、処理Ａ〜Ｉの処理時間を示しており、図１１の表は、比較プログラムＡＡ〜ＢＢについて算出された並列実行時間を示している。
そして、スタティックスケジューリングＡの最大並列実行時間は２００μｓとなり、スタティックスケジューリングＢの最大並列実行時間は１９０μｓとなる。このため、並列実行時間に基づき最適なスタティックスケジューリングを選択するとした場合、スタティックスケジューリングＢが選択される。 The table of FIG. 10 shows the processing times of the processes A to I, and the table of FIG. 11 shows the parallel execution times calculated for the comparison programs AA to BB.
The maximum parallel execution time of static scheduling A is 200 μs, and the maximum parallel execution time of static scheduling B is 190 μs. For this reason, when the optimum static scheduling is selected based on the parallel execution time, the static scheduling B is selected.

無論、具体例１と同様、低減率、又は、並列実行時間及び低減率に基づき、最適なスタティックスケジューリングを選択しても良い。
３．具体例３について
具体例３では、３つの種別（Ａ〜Ｃ仕様）の組み込みシステムに対応する条件付コンパイルが行われる逐次プログラムに基づき比較プログラムが生成され、各比較プログラムの性能が検証される。以下では、性能の検証方法について詳しく説明する。 Of course, as in the first specific example, the optimal static scheduling may be selected based on the reduction rate, or the parallel execution time and the reduction rate.
3. About Specific Example 3 In specific example 3, a comparison program is generated based on a sequential program that is subjected to conditional compilation corresponding to three types (A to C specifications) of embedded systems, and the performance of each comparison program is verified. Hereinafter, the performance verification method will be described in detail.

自動並列化処理のＳ１００〜Ｓ１１５により、９つの比較プログラムＡＡ〜ＡＣ，ＢＡ〜ＢＣ，ＣＡ〜ＣＣが生成される。
比較プログラムＡＡ〜ＡＣは、Ａ仕様に対応するスタティックスケジューリングＡにより生成されたものである。比較プログラムＡＡの最終種別はＡ仕様、比較プログラムＡＢの最終種別はＢ仕様、比較プログラムＡＣの最終種別はＣ仕様となっている。 Nine comparison programs AA to AC, BA to BC, and CA to CC are generated by S100 to S115 of the automatic parallel processing.
The comparison programs AA to AC are generated by static scheduling A corresponding to the A specification. The final type of the comparison program AA is A specification, the final type of the comparison program AB is B specification, and the final type of the comparison program AC is C specification.

比較プログラムＢＡ〜ＢＣは、Ｂ仕様に対応するスタティックスケジューリングＢにより生成されたものである。比較プログラムＢＡの最終種別はＡ仕様、比較プログラムＢＢの最終種別はＢ仕様、比較プログラムＢＣの最終種別はＣ仕様となっている。 The comparison programs BA to BC are generated by static scheduling B corresponding to the B specification. The final type of the comparison program BA is A specification, the final type of the comparison program BB is B specification, and the final type of the comparison program BC is C specification.

比較プログラムＣＡ〜ＣＣは、Ｃ仕様に対応するスタティックスケジューリングＣにより生成されたものである。比較プログラムＣＡの最終種別はＡ仕様、比較プログラムＣＢの最終種別はＢ仕様、比較プログラムＣＣの最終種別はＣ仕様となっている。 The comparison programs CA to CC are generated by static scheduling C corresponding to the C specification. The final type of the comparison program CA is A specification, the final type of the comparison program CB is B specification, and the final type of the comparison program CC is C specification.

Ｓ１２０において、自動並列化コンパイラ１は、これらの比較プログラムの性能を検証し、検証結果に基づき、最適なスタティックスケジューリングを選択する。この時、各比較プログラムの並列実行時間と低減率とが算出される（図１２参照）。そして、各スタティックスケジューリングが、並列実行時間と低減率に基づき設定された選択条件を満たすか否かが判定される。なお、図１２の例では、種別がＡ仕様の逐次プログラムと、種別がＢ仕様の逐次プログラムと、種別がＣ仕様の逐次プログラムの総処理時間は、それぞれ、１００μｓ，５０μｓ，５０μｓとなっている。 In S120, the automatic parallelizing compiler 1 verifies the performance of these comparison programs, and selects the optimum static scheduling based on the verification result. At this time, the parallel execution time and the reduction rate of each comparison program are calculated (see FIG. 12). Then, it is determined whether each static scheduling satisfies the selection condition set based on the parallel execution time and the reduction rate. In the example of FIG. 12, the total processing times of the sequential program of type A specification, the sequential program of type B specification, and the sequential program of type C specification are 100 μs, 50 μs, and 50 μs, respectively. .

具体的には、例えば、最終種別がＡ仕様である比較プログラムの並列実行時間＜６５μｓ、且つ、最終種別がＢ仕様である比較プログラムの並列実行時間＜３５μｓ、且つ、最終種別がＣ仕様である比較プログラムの並列実行時間＜３５μｓ、という選択条件１が設定されていたとする。 Specifically, for example, the parallel execution time of the comparison program whose final type is the A specification <65 μs, the parallel execution time of the comparison program whose final type is the B specification <35 μs, and the final type is the C specification. It is assumed that the selection condition 1 is set such that the parallel execution time of the comparison program <35 μs.

図１２に記載された例では、スタティックスケジューリングＢにより生成された比較プログラムＢＡ〜ＢＣは選択条件１を満たす。このため、スタティックスケジューリングＢが選択される。 In the example described in FIG. 12, the comparison programs BA to BC generated by the static scheduling B satisfy the selection condition 1. For this reason, static scheduling B is selected.

なお、選択条件１は、各種別の組み込みシステムに搭載されたマルチコアプロセッサの性能が異なっている場合に用いることが考えられる。
また、例えば、最終種別がＡ仕様である比較プログラムの並列実行時間＜６５μｓ、且つ、最終種別がＢ仕様である比較プログラムとＣ仕様である比較プログラムの低減率の最小値が相対的に大きい、という選択条件２が設定されていたとする。 Note that the selection condition 1 can be used when the performance of multi-core processors installed in various embedded systems is different.
Further, for example, the parallel execution time of the comparison program whose final type is A specification <65 μs, and the minimum value of the reduction rate of the comparison program whose final type is B specification and the comparison program whose C type is C specification is relatively large. Assume that selection condition 2 is set.

図１２に記載された例では、スタティックスケジューリングＢにより生成された比較プログラムは選択条件２を満たす。このため、スタティックスケジューリングＢが選択される。 In the example described in FIG. 12, the comparison program generated by the static scheduling B satisfies the selection condition 2. For this reason, static scheduling B is selected.

なお、選択条件２は、各種別の組み込みシステムに搭載されたマルチコアプロセッサの性能が同程度であり、Ａ仕様の逐次プログラムの処理負荷が他の種別の逐次プログラムよりも大きい場合に用いることが考えられる。 Selection condition 2 is considered to be used when the performance of multi-core processors installed in various types of embedded systems is about the same, and the processing load of sequential programs of the A specification is larger than that of other types of sequential programs. It is done.

４．具体例４について
具体例４では、異なる複数の引数が用いられる条件コンパイルスイッチ５０１〜５０３を用いて３つの種別（Ａ〜Ｃ仕様）の組み込みシステムに対応する条件付コンパイルが行われる逐次プログラムに基づき、比較プログラムが生成される。以下では、このような逐次プログラムの条件付記述５００をマクロタスクに分割する方法について詳しく説明する（図１３参照）。 4). About Specific Example 4 In Specific Example 4, based on a sequential program in which conditional compilation corresponding to three types (A to C specifications) of embedded systems is performed using conditional compilation switches 501 to 503 in which a plurality of different arguments are used. A comparison program is generated. Hereinafter, a method of dividing the conditional description 500 of the sequential program into macro tasks will be described in detail (see FIG. 13).

条件コンパイルスイッチ５０１〜５０３では、それぞれ、“ＮＡ”，“ＪＣ”，“ＴＭ”という引数が用いられる。図１４は、各引数に設定可能な値を示している。また、図１５は、各種別を選択する際に各引数に設定すべき値を示している。 The conditional compilation switches 501 to 503 use the arguments “NA”, “JC”, and “TM”, respectively. FIG. 14 shows values that can be set for each argument. FIG. 15 shows values to be set for each argument when selecting various types.

自動並列化コンパイラ１は、自動並列化処理のＳ１００にて、逐次プログラムの条件付記述５００に記載された“処理Ａ〜Ｅ”の各々を、条件付ブロックとして特定する。
また、Ｓ１０５（ソフト構造解析処理）では、自動並列化コンパイラ１は、条件付ブロックとして特定された“処理Ａ〜Ｅ”の各々をマクロタスクとし、ＭＴＧを生成する。“処理Ａ〜Ｅ”の各々に対応するマクロタスクを、それぞれ、“マクロタスクＡ〜Ｅ”とする。 In S100 of the automatic parallelization process, the automatic parallelizing compiler 1 specifies each of “processing A to E” described in the conditional description 500 of the sequential program as a conditional block.
In S105 (software structure analysis processing), the automatic parallelizing compiler 1 generates an MTG using each of the “processing A to E” specified as the conditional block as a macro task. The macro tasks corresponding to each of the “processes A to E” are referred to as “macro tasks A to E”, respectively.

そして、自動並列化コンパイラ１は、Ｓ１１０〜Ｓ１１５にて、各種別に対応するスタティックスケジューリングを行い、“マクロタスクＡ〜Ｅ”をいずれかのＰＧに割り当てると共に、Ａ〜Ｃ仕様を最終種別とした比較プログラムを生成する。図１６の表は、比較プログラムにおける“マクロタスクＡ〜Ｅ”に対応する記述を示している。 Then, in S110 to S115, the automatic parallelizing compiler 1 performs static scheduling corresponding to each type, assigns “macrotasks A to E” to any PG, and compares the A to C specifications as the final type. Generate a program. The table of FIG. 16 shows descriptions corresponding to “macrotasks A to E” in the comparison program.

比較プログラムでは、“マクロタスクＡ〜Ｅ”に対応して“処理Ａ〜Ｅ”が記述されるが、これらの処理には、それぞれ、条件コンパイルスイッチが設定されている。“処理Ａ〜Ｅ”の各々に設定された条件コンパイルスイッチは、逐次プログラムにおける条件付記述５００において、該処理に設定された条件コンパイルスイッチと同等の内容になっている。 In the comparison program, “Processes A to E” are described corresponding to “Macro Tasks A to E”, and a conditional compilation switch is set for each of these processes. The conditional compilation switch set for each of the “processes A to E” has the same content as the conditional compilation switch set for the process in the conditional description 500 in the sequential program.

このため、比較プログラムにおいて、条件コンパイルスイッチの引数を種別に対応する値に設定すると、種別に対応する処理がコンパイルの対象として選択される。
なお、Ｓ１００にて、逐次プログラムの条件付記述５００に記載された“処理Ａ〜Ｅ”のうち、“処理Ａ〜Ｃ”と“処理Ｄ，Ｅ”を別々の条件付ブロックとしても良い。 For this reason, in the comparison program, when the argument of the conditional compilation switch is set to a value corresponding to the type, the process corresponding to the type is selected as the target of compilation.
In S100, among “Processes A to E” described in the conditional description 500 of the sequential program, “Processes A to C” and “Processes D and E” may be separate conditional blocks.

このような場合、“処理Ａ〜Ｃ”，“処理Ｄ，Ｅ”の各々に対応して、“マクロタスクＸ，Ｙ”が生成される。そして、比較プログラムでは、“マクロタスクＸ”に対応して“処理Ａ〜Ｃ”が、“マクロタスクＹ”に対応して“処理Ｄ，Ｅ”が記述され、これらの処理には、それぞれ、条件コンパイルスイッチが設定されている。“処理Ａ〜Ｃ”，“処理Ｄ，Ｅ”の各々に設定された条件コンパイルスイッチもまた、逐次プログラムにおける条件付記述５００において、該処理に設定された条件コンパイルスイッチと同等の内容になっている（図１７参照）。 In such a case, “macrotasks X and Y” are generated corresponding to each of “processes A to C” and “processes D and E”. In the comparison program, “Processes A to C” are described corresponding to “Macro Task X”, and “Processes D and E” are described corresponding to “Macro Task Y”. Conditional compilation switch is set. The conditional compile switch set for each of “Processes A to C” and “Processes D and E” is also equivalent to the conditional compile switch set for the process in the conditional description 500 in the sequential program. (See FIG. 17).

［効果］
本実施形態の自動並列化コンパイラ１は、条件付記述を全て含んだ状態で逐次プログラムから並列化プログラムを生成する。そして、並列化プログラムから、該並列化プログラムに対応する種別ではコンパイルの対象外となる部分を除去し、該種別の組み込みシステム（マルチコアプロセッサを搭載したシステム）を動作させるバイナリコードを生成する。 [effect]
The automatic parallelizing compiler 1 of the present embodiment generates a parallelized program from a sequential program in a state including all conditional descriptions. Then, from the parallelized program, a part that is not subject to compilation in the type corresponding to the parallelized program is removed, and binary code for operating the embedded system of the type (system equipped with a multi-core processor) is generated.

このため、各種別の並列化プログラムにおいて、条件付記述に対応する処理以外の処理（全種別の並列化プログラムで共通して実行される処理）を、常に同じＰＧに割り当てることができる。したがって、各種別の並列化プログラムの共通性をより一層高めることができ、並列化プログラムの品質の維持や管理が容易になる。 For this reason, in various types of parallelized programs, processes other than the processes corresponding to conditional descriptions (processes executed in common in all types of parallelized programs) can always be assigned to the same PG. Therefore, the commonality of each type of parallelized program can be further increased, and the maintenance and management of the quality of the parallelized program is facilitated.

また、自動並列化コンパイラ１は、マクロタスクの実コスト（処理負荷）に基づきスタティックスケジューリングを行う。具体的には、各ＰＧの処理負荷が同程度となるように、並列実行可能なマクロタスクが異なるＰＧに割り当てられる。これにより、並列化プログラムに従い動作するマルチコアプロセッサのパフォーマンスを向上させることができる。 The automatic parallelizing compiler 1 performs static scheduling based on the actual cost (processing load) of the macrotask. Specifically, macrotasks that can be executed in parallel are assigned to different PGs so that the processing loads of the PGs are approximately the same. Thereby, the performance of the multi-core processor that operates according to the parallelized program can be improved.

また、自動並列化コンパイラ１は、いずれかの逐次プログラムの種別に対応してスタティックスケジューリングを行うが、この時、対応する種別でコンパイルの対象外となるマクロタスクについては、処理負荷が無いものとみなされる。これにより、スタティックスケジューリングにより生成された比較プログラムから、該スタティックスケジューリングに対応する種別ではコンパイルの対象外となる記述を除去した場合には、最良のパフォーマンスが得られるようになる。したがって、自動並列化コンパイラ１により生成される並列化プログラムの性能を向上させることができる。 Further, the automatic parallelizing compiler 1 performs static scheduling corresponding to one of the types of sequential programs. At this time, it is assumed that there is no processing load for a macro task that is not subject to compilation in the corresponding type. It is regarded. As a result, the best performance can be obtained when a description that is not subject to compilation is removed from the comparison program generated by static scheduling. Therefore, the performance of the parallelized program generated by the automatic parallelizing compiler 1 can be improved.

また、自動並列化コンパイラ１は、各種別に対応するスタティックスケジューリングを行うと共に、各スタティックスケジューリングでは、各種別に対応する複数の比較プログラムを生成する。そして、各比較プログラムの性能を検証し、検証結果に基づき、最も性能の良い比較プログラムを生成可能なスタティックスケジューリングが選択される。 The automatic parallelizing compiler 1 performs static scheduling corresponding to each type and generates a plurality of comparison programs corresponding to each type in each static scheduling. Then, the performance of each comparison program is verified, and based on the verification result, a static scheduling that can generate a comparison program with the best performance is selected.

さらに、選択されたスケジューリングにより生成された比較プログラムからコンパイルの対象外となる部分を除去し、各種別に対応する並列化プログラムを生成する。これにより、並列化プログラムにおける各種別で共通する処理は、常に同じＰＧに割り当てられることになり、各種別の並列化プログラムの共通性をより一層高めることができる。 Further, a part not to be compiled is removed from the comparison program generated by the selected scheduling, and a parallelized program corresponding to each type is generated. As a result, processes common to various types in the parallelized program are always assigned to the same PG, and the commonality of the various types of parallelized programs can be further enhanced.

また、各比較プログラムの性能を検証結果に基づきスタティックスケジューリングが選択されるため、性能の良い並列化プログラムを生成可能となる。したがって、並列化プログラムの性能の低下を抑えつつ、並列化プログラムの品質の維持や管理を容易にすることができる。 In addition, since static scheduling is selected based on the results of verifying the performance of each comparison program, a parallel program with good performance can be generated. Therefore, it is possible to easily maintain and manage the quality of the parallelized program while suppressing a decrease in the performance of the parallelized program.

また、自動並列化コンパイラ１は、各比較プログラムの並列実行時間と低減率とを算出し、これらのうちの双方又は一方に基づき、各比較プログラムの性能を検証する。このため、各比較プログラムの性能を精度良く把握することができ、より適切にスタティックスケジューリングを選択することができる。 Moreover, the automatic parallelizing compiler 1 calculates the parallel execution time and the reduction rate of each comparison program, and verifies the performance of each comparison program based on both or one of them. For this reason, the performance of each comparison program can be grasped with high accuracy, and static scheduling can be selected more appropriately.

また、自動並列化コンパイラ１は、スタティックスケジューリングの選択ができなかった場合には、その旨をユーザに報知する。これにより、自動並列化コンパイラ１の使い勝手を向上させることができる。 If the automatic parallelizing compiler 1 cannot select the static scheduling, the automatic parallelizing compiler 1 notifies the user to that effect. Thereby, the usability of the automatic parallelizing compiler 1 can be improved.

［他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されることなく、種々の形態を採り得る。 [Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention can take a various form, without being limited to the said embodiment.

（１）本実施形態の自動並列化処理では、逐次コンパイラの全種別に対応するスタティックスケジューリングが行われる。しかしながら、これに限らず、いずれか１つの種別に対応するスタティックスケジューリングを行い、該スタティックスケジューリングにより生成された比較プログラムを各種別に対応する並列化プログラムとしても良い。なお、該種別は、例えば、ユーザにより指定されたものであっても良い。 (1) In the automatic parallel processing of this embodiment, static scheduling corresponding to all types of sequential compilers is performed. However, the present invention is not limited to this, and static scheduling corresponding to any one type may be performed, and the comparison program generated by the static scheduling may be a parallelized program corresponding to each type. The type may be specified by the user, for example.

また、逐次コンパイラの全種別のうちの一部に相当する複数の種別に対応するスタティックスケジューリングを行い、本実施形態と同様にして各比較プログラムの性能を検証していずれかのスタティックスケジューリングを選択しても良い。 In addition, static scheduling corresponding to a plurality of types corresponding to a part of all types of the sequential compiler is performed, and the performance of each comparison program is verified in the same manner as in this embodiment, and any one of the static scheduling is selected. May be.

このような場合であっても、各種別の並列化プログラムの共通性をより一層高めることができ、並列化プログラムの品質の維持や管理が容易になる。
（２）本実施形態の自動並列化コンパイラ１は、ソフト構造解析処理のＳ２００にて逐次プログラムのインライン展開を行うと共に、Ｓ２０５にてローカル変数のリネームを行うが、これらの処理の双方または一方を行わない構成としても良い。このような場合であっても、逐次プログラムの構造によっては、同様の効果が得られる。 Even in such a case, the commonality of various parallel programs can be further increased, and the quality and maintenance of the parallel programs can be facilitated.
(2) The automatic parallelizing compiler 1 of the present embodiment performs inline expansion of the sequential program in S200 of the software structure analysis process and renames local variables in S205. Both or one of these processes is performed. It is good also as composition which does not perform. Even in such a case, the same effect can be obtained depending on the structure of the sequential program.

（３）上記実施形態における１つの構成要素が有する機能を複数の構成要素として分散させたり、複数の構成要素が有する機能を１つの構成要素に統合させたりしても良い。また、上記実施形態の構成の少なくとも一部を、同様の機能を有する公知の構成に置き換えてもよい。また、上記実施形態の構成の一部を省略してもよい。また、上記実施形態の構成の少なくとも一部を、他の上記実施形態の構成に対して付加又は置換してもよい。なお、特許請求の範囲に記載した文言のみによって特定される技術思想に含まれるあらゆる態様が本発明の実施形態である。 (3) The functions of one constituent element in the above embodiment may be distributed as a plurality of constituent elements, or the functions of a plurality of constituent elements may be integrated into one constituent element. Further, at least a part of the configuration of the above embodiment may be replaced with a known configuration having the same function. Moreover, you may abbreviate | omit a part of structure of the said embodiment. In addition, at least a part of the configuration of the above embodiment may be added to or replaced with the configuration of the other embodiment. In addition, all the aspects included in the technical idea specified only by the wording described in the claim are embodiment of this invention.

（４）上述した自動並列化コンパイラや自動並列化コンパイル装置の他、自動並列化コンパイラを記憶した媒体や、自動並列化コンパイラにより行われる処理に相当する方法等、種々の形態で本発明を実現することもできる。 (4) In addition to the above-described automatic parallelizing compiler and automatic parallelizing compiling device, the present invention is realized in various forms such as a medium storing the automatic parallelizing compiler and a method corresponding to processing performed by the automatic parallelizing compiler. You can also

［特許請求の範囲との対応］
上記実施形態の説明で用いた用語と、特許請求の範囲の記載に用いた用語との対応を示す。 [Correspondence with Claims]
The correspondence between the terms used in the description of the above embodiment and the terms used in the description of the claims is shown.

自動並列化コンパイラ１が並列化コンパイラの一例に、車載装置２０が電子装置の一例に相当する。
また、ＰＧがプロセッサユニットの一例に相当する。 The automatic parallelizing compiler 1 corresponds to an example of a parallelizing compiler, and the in-vehicle device 20 corresponds to an example of an electronic device.
PG corresponds to an example of a processor unit.

また、自動並列化処理のＳ１１０が、スケジューリング手順，スケジューリング手段の一例に、Ｓ１２０が検証手順，検証手段，選択手順，選択手段の一例に、Ｓ１４５が報知手順，報知手段の一例に相当する。 Further, S110 of the automatic parallel processing corresponds to an example of a scheduling procedure and scheduling means, S120 corresponds to an example of a verification procedure, verification means, selection procedure, and selection means, and S145 corresponds to an example of a notification procedure and notification means.

また、ソフト構造解析処理のＳ２１０が、分割手順，分割手段の一例に、Ｓ２２０が、抽出手順，抽出手段の一例に相当する。
また、条件コンパイルスイッチの引数の値が、入力情報の一例に相当する。 Further, S210 of the software structure analysis process corresponds to an example of a dividing procedure and dividing means, and S220 corresponds to an example of an extracting procedure and extracting means.
The value of the argument of the conditional compilation switch corresponds to an example of input information.

また、並列実行時間が、比較プログラムの最大処理負荷の一例に、低減率が、比較プログラムに従い動作するマルチプロセッサシステムと、逐次プログラムに従い動作するシングルプロセッサシステムとの性能の差の一例に相当する。 The parallel execution time corresponds to an example of the maximum processing load of the comparison program, and the reduction rate corresponds to an example of a difference in performance between the multiprocessor system operating according to the comparison program and the single processor system operating according to the sequential program.

１…自動並列化コンパイラ、１０…ＰＣ、１１…ディスプレイ、１２…ＨＤＤ、１３…ＣＰＵ、１４…ＲＯＭ、１５…ＲＡＭ、１６…入力装置、１７…読取部、１８…記憶媒体、２０…車載装置、２１…マルチコアプロセッサ、２１ａ…ＲＯＭ、２１ｂ…ＲＡＭ、２１ｃ，２１ｄ…プロセッサエレメント（ＰＥ）、２２…通信部、２３…センサ部、２４…入出力ポート。 DESCRIPTION OF SYMBOLS 1 ... Automatic parallelizing compiler, 10 ... PC, 11 ... Display, 12 ... HDD, 13 ... CPU, 14 ... ROM, 15 ... RAM, 16 ... Input device, 17 ... Reading part, 18 ... Storage medium, 20 ... In-vehicle device 21 ... multi-core processor, 21a ... ROM, 21b ... RAM, 21c, 21d ... processor element (PE), 22 ... communication unit, 23 ... sensor unit, 24 ... input / output port.

Claims

A process that is executed by a single processor system and that is described in a sequential program that includes a conditional description for selecting whether or not to compile according to input information by conditional compilation. Regardless of the division procedure (S210) to divide into a plurality of macro tasks regardless of
An extraction procedure (S220) for extracting the macrotasks that can be executed in parallel by a plurality of processor units constituting a multiprocessor system based on the data dependency between the macrotasks;
A process of assigning each of the macrotasks to any one of the processor units, wherein all or part of the macrotasks that can be executed in parallel are assigned to different processor units and executed by the multiprocessor system. A scheduling procedure (S110) for performing static scheduling to generate a parallelized program;
A parallelized compiling method characterized by comprising:

In the parallel compilation method according to claim 1,
Performing the static scheduling based on the processing load of the macrotask in the scheduling procedure;
Parallelizing compilation method characterized by

In the parallel compilation method according to claim 2,
In the scheduling procedure, the static scheduling is performed in response to the predetermined input information, and the macro task processing corresponding to the conditional description that is not subject to compilation in the corresponding input information in the static scheduling. Considering no load,
Parallelizing compilation method characterized by

In the parallel compilation method according to any one of claims 1 to 3,
In the scheduling procedure, the static scheduling is performed corresponding to each of two or more types of input information determined in advance, and the parallelized program corresponding to each type of the input information is generated,
Corresponding to the predetermined input information, the conditional program selected not to be compiled by the input information from each of the parallelized programs generated in the static scheduling, As a comparison program,
The parallelized compilation method is:
For each of the static scheduling, a verification procedure (S120) for verifying the performance of the comparison program associated with each type of the input information generated from the parallelized program generated by the static scheduling;
A selection procedure (S120) for selecting any of the static scheduling based on the verification result of the verification procedure;
A parallel compilation method characterized by further comprising:

In the parallel compilation method according to claim 4,
In the verification procedure, for each of the comparison programs, for each of the processor units, the sum of the processing loads of the assigned macrotasks is calculated, and the maximum value of these sums is calculated as the comparison program. And the maximum processing load of
Selecting the static scheduling based on the maximum processing load in the selection procedure;
Parallelizing compilation method characterized by

In the parallel compilation method according to claim 4,
In the verification procedure, for each of the comparison programs, the multiprocessor system operating according to the comparison program, and the sequential description excluding the conditional description selected not to be compiled by the input information corresponding to the comparison program Estimating the performance difference from the single processor system operating according to the program;
Selecting the static scheduling based on the estimation result of the difference in performance in the selection procedure;
Parallelizing compilation method characterized by

In the parallel compilation method according to claim 4,
In the verification procedure, for each of the comparison programs, the sum of the processing loads of the assigned macrotasks is calculated for each of the processor units, and the maximum value of the sum is calculated as the maximum processing of the comparison program. The multiprocessor system that operates according to the comparison program and the single processor that operates according to the sequential program excluding the conditional description selected not to be compiled by the input information corresponding to the comparison program Estimate the performance difference with the system,
Selecting the static scheduling based on the maximum processing load and the estimation result of the performance difference in the selection procedure;
Parallelizing compilation method characterized by

In the parallel compilation method according to any one of claims 4 to 7,
In the selection procedure, when the static scheduling cannot be selected, it further includes a notification procedure (S145) for reporting the fact.
Parallelizing compilation method characterized by

A process that is executed by a single processor system and that is described in a sequential program that includes a conditional description for selecting whether or not to compile according to input information by conditional compilation. Regardless of the division means (S210) for dividing into a plurality of macro tasks regardless of
Extraction means (S220) for extracting the macrotask that can be executed in parallel by a plurality of processor units constituting a multiprocessor system based on the data dependency between the macrotasks;
A process of assigning each of the macrotasks to any one of the processor units, wherein all or part of the macrotasks that can be executed in parallel are assigned to different processor units and executed by the multiprocessor system. As a scheduling means (S110) for performing static scheduling for generating a parallelized program,
A parallelizing compiler (1) characterized by operating a computer.

A process that is executed by a single processor system and that is described in a sequential program that includes a conditional description for selecting whether or not to compile according to input information by conditional compilation. Regardless of the division procedure (S210) to divide into a plurality of macro tasks regardless of
An extraction procedure (S220) for extracting the macrotasks that can be executed in parallel by a plurality of processor units constituting a multiprocessor system based on the data dependency between the macrotasks;
A process of assigning each of the macrotasks to any one of the processor units, wherein all or part of the macrotasks that can be executed in parallel are assigned to different processor units and executed by the multiprocessor system. A scheduling procedure (S110) for performing static scheduling to generate a parallelized program;
An electronic device (20) comprising the multiprocessor system that operates according to the parallelized program generated by the parallelized compiling method.