JP5245727B2

JP5245727B2 - Design support program, design support apparatus, and design support method

Info

Publication number: JP5245727B2
Application number: JP2008282770A
Authority: JP
Inventors: 宏真山内; 真紀子伊藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-11-04
Filing date: 2008-11-04
Publication date: 2013-07-24
Anticipated expiration: 2028-11-04
Also published as: JP2010113384A

Description

この発明は、組み込みシステムの最適化を支援する設計支援プログラム、設計支援装置、および設計支援方法に関する。 The present invention relates to a design support program, a design support apparatus, and a design support method that support optimization of an embedded system.

従来から、組み込みシステムには、多機能・高性能に対して強い要求があるが、動作周波数を向上させるという手法では、消費電力が大きくなりすぎるため、複数のプロセッシングエレメント（ＰＥ）からなるシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）であるマルチコアが用いられるようになった。それらの機能は、独立に動くことができるものもあれば、依存して動作するものもある。 Conventionally, embedded systems have strong demands for multi-functionality and high performance, but the method of improving the operating frequency consumes too much power, so a system LSI consisting of multiple processing elements (PE) Multi-core which is (Large Scale Integration) has come to be used. Some of these functions can work independently, while others work in dependence.

またプログラムの構造や要求される性能も異なる。これらに対応するために、プロセッサの他に、メディア演算を得意とするＤＳＰ（ＤｉｇｉｔａｌＳｉｎｇａｌＰｒｏｃｅｓｓｏｒ）や、特定の命令に対して構成を変更することで高速に処理することを可能とするＲｅｃｏｎｆ（ＲｅｃｏｎｆｉｇｕｒａｂｌｅＰｒｏｃｅｓｓｏｒ）などのアクセラレータ・専用ハードウェアなどの異なる種類のコアを複数搭載したヘテロジニアスマルチコア（ＡｓｙｍｍｅｔｒｉｃＭｕｌｔｉ−ＣｏｒｅＰｒｏｃｅｓｓｏｒ。以下、「ＡＭＰ」）が多く使われている。 The program structure and required performance are also different. In order to cope with these, in addition to the processor, a DSP (Digital Single Processor) that excels in media operations, and Reconf (Reconfigurable) that enables high-speed processing by changing the configuration for a specific instruction. Heterogeneous multi-core processors (hereinafter referred to as “AMPs”) in which a plurality of different types of cores such as accelerators and dedicated hardware such as processors are mounted are often used.

そのようなＡＭＰの性能を引き出すためには、ＡＭＰに搭載しているＰＥの特性を理解し、プログラム中から機能を抽出しその処理に最適なＰＥをマッピングすることおよび並列処理を行うなどのソフトウェアの最適化が必要となる。 In order to bring out the performance of such AMP, software such as understanding the characteristics of the PE installed in AMP, extracting the function from the program, mapping the optimum PE for the processing, and performing parallel processing, etc. Optimization is required.

ハイパフォーマンスコンピューティング（以下、「ＨＰＣ」）の分野において、同じ種類の複数のプロセッサを搭載したシステムとして、ホモジニアスマルチコア（ＳｙｍｍｅｔｒｉｃＭｕｌｔｉ−ＣｏｒｅＰｒｏｃｅｓｓｏｒ。以下、「ＳＭＰ」）における並列化の研究が行われてきた。ＳＭＰにおける並列化では、各プロセッサにおける処理負荷の均一性が性能向上の鍵を握っており、処理負荷の均一性を抽出しやすいループ並列性についての多くの研究が行われてきた。 In the field of high performance computing (hereinafter “HPC”), parallel multi-core (SMP) has been studied as a system equipped with a plurality of processors of the same type. It was. In the parallelization in SMP, the uniformity of the processing load in each processor is the key to improving the performance, and many studies have been conducted on the loop parallelism that can easily extract the uniformity of the processing load.

ＳＭＰで行われてきたループ並列性を基にした並列処理では、ＰＥによって性能の異なるＡＭＰの性能を引き出すことは難しい。ＡＭＰではプログラムを解析し、その中から処理を抽出しそれに最適なＰＥをマッピングすることおよび並列処理を行うなどのソフトウェアの最適化が必要となる。シングルプロセッサ＋ハードウェアアクセラレータという構成においても、重たい処理をハードウェアアクセラレータに割り当てるということは行われてきた。 In parallel processing based on loop parallelism performed by SMP, it is difficult to extract the performance of AMP having different performance depending on PE. In AMP, it is necessary to optimize software such as analyzing a program, extracting a process from the program, mapping an optimum PE to the program, and performing parallel processing. Even in the configuration of a single processor + hardware accelerator, heavy processing has been assigned to the hardware accelerator.

また既存資産を継承するために、要求された機能を分析しソフトウェア・ハードウェアに切り分け、それらの機能を持つＩＰ（ＩｎｔｅｌｌｅｃｔｕａｌＰｒｏｐｅｒｔｙ）を集積し開発を行うという手法も行われてきた。しかしＡＭＰのようにソフトウェア処理をするＰＥが複数異なる種類搭載される場合には、解析の複雑さは急激に増大する。なおかつ、プログラムの規模が数百Ｋｓｔｅｐ〜１０００Ｋｓｔｅｐとも言われ今後拡大の一途を辿っていく上に、その規模で複雑な構造となっている組み込みソフトに対しては人手による解析はほぼ不可能である。このため、自動解析により最適化する方法が開示されている。 Further, in order to inherit existing assets, a method of analyzing required functions, separating them into software and hardware, and integrating and developing IP (Intellectual Property) having those functions has been performed. However, when multiple types of PEs that perform software processing are installed, such as AMP, the complexity of analysis increases rapidly. In addition, the scale of the program is said to be several hundred Kstep-1000 Kstep and will continue to expand in the future, and manual analysis is almost impossible for embedded software having a complicated structure at that scale. . For this reason, a method of optimization by automatic analysis is disclosed.

特開２００７−３２８４１５号公報JP 2007-328415 A 中田育男著「コンパイラの構成と最適化」朝倉書店出版１９９９年９月ｐ２７０〜ｐ２８８Ikuo Nakata “Compiler construction and optimization” Asakura Shoten Publishing September 1999 p270-p288

上述した従来技術では、プログラムを定義されているタスク分割ルールに基づいて、複数のタスクに分割し、対象のＡＭＰにおいてスケジューリングによって最適なコアを割り当てる形式を取っている。そのため、ＡＭＰの構成に変更があってもタスク分割結果は変更されない。 In the conventional technology described above, a program is divided into a plurality of tasks based on a defined task division rule, and an optimum core is assigned by scheduling in the target AMP. Therefore, even if the AMP configuration is changed, the task division result is not changed.

それに対して、この発明ではコアの特性に沿ったプログラムパターンをプログラム中から抽出して割り当てる形式を取っている。コア毎にパターンが異なり、新規にコアを追加する場合には、パターンも新たに追加される。この手法によって対象のＡＭＰのコアの特性を最大限に発揮することが出来る。また専用ハードウェアで処理すべき箇所の抽出も可能であり、ソフトウェアでの処理とハードウェアでの処理を自動的に分割することも可能である。 On the other hand, in the present invention, a program pattern according to the characteristics of the core is extracted from the program and assigned. The pattern is different for each core, and when a new core is added, the pattern is also newly added. By this method, the characteristics of the target AMP core can be maximized. Further, it is possible to extract a portion to be processed by dedicated hardware, and it is also possible to automatically divide the processing by software and the processing by hardware.

このようにプログラムに応じた組み込みシステムの最適化と設計者の設計負担の軽減および設計期間の短縮化とを図ることができる設計支援プログラム、設計支援装置、および設計支援方法を提供することを目的とする。 An object of the present invention is to provide a design support program, a design support apparatus, and a design support method capable of optimizing an embedded system according to a program, reducing a design burden on a designer, and shortening a design period. And

上述した課題を解決し、目的を達成するため、この設計支援プログラム、設計支援装置、および設計支援方法は、対象プログラムコードを複数のタスクに分割し、各タスクを実行する演算要素の種別に基づいて、前記各タスクの実行時間に応じた実行コストを算出し、前記タスクの実行順序をあらわす組み合わせパターンを生成し、実行コストに基づいて、各組み合わせパターンの実行時間に応じた総実行コストを算出し、総実行コストに基づいて、前記組み合わせパターン群の中から特定の組み合わせパターンを決定し、決定結果を出力することを要件とする。 In order to solve the above-described problems and achieve the object, the design support program, the design support apparatus, and the design support method divide the target program code into a plurality of tasks and based on the types of arithmetic elements that execute the tasks. Calculating the execution cost according to the execution time of each task, generating a combination pattern representing the execution order of the tasks, and calculating the total execution cost according to the execution time of each combination pattern based on the execution cost Then, based on the total execution cost, a specific combination pattern is determined from the combination pattern group, and the determination result is output.

この設計支援プログラム、設計支援装置、および設計支援方法によれば、プログラムに応じた組み込みシステムの最適化を図りつつ、設計者の設計負担の軽減および設計期間の短縮化を図ることができるという効果を奏する。 According to the design support program, the design support apparatus, and the design support method, it is possible to reduce the design burden on the designer and shorten the design period while optimizing the embedded system according to the program. Play.

以下に添付図面を参照して、この設計支援プログラム、設計支援装置、および設計支援方法の好適な実施の形態を詳細に説明する。この設計支援プログラム、設計支援装置、および設計支援方法は、組み込みシステムで実行されるプログラムをタスク分割してコストを算出し、タスクの実行順序をスケジューリングしてコストが最小となるパターンを得ることで、組み込みシステムを最適化し、性能の向上を図る技術である。 Exemplary embodiments of a design support program, a design support apparatus, and a design support method will be described below in detail with reference to the accompanying drawings. This design support program, design support device, and design support method calculate the cost by dividing the program executed in the embedded system into tasks, and schedule the task execution order to obtain a pattern that minimizes the cost. This is a technology that optimizes embedded systems and improves performance.

（設計支援装置のハードウェア構成）
図１は、本実施の形態にかかる設計支援装置のハードウェア構成を示すブロック図である。図１において、設計支援装置は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１と、ＲＯＭ（Ｒｅａｄ‐ＯｎｌｙＭｅｍｏｒｙ）１０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３と、磁気ディスクドライブ１０４と、磁気ディスク１０５と、光ディスクドライブ１０６と、光ディスク１０７と、ディスプレイ１０８と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）１０９と、キーボード１１０と、マウス１１１と、スキャナ１１２と、プリンタ１１３と、を備えている。また、各構成部はバス１００によってそれぞれ接続されている。 (Hardware configuration of design support device)
FIG. 1 is a block diagram showing a hardware configuration of the design support apparatus according to the present embodiment. In FIG. 1, a design support apparatus includes a CPU (Central Processing Unit) 101, a ROM (Read-Only Memory) 102, a RAM (Random Access Memory) 103, a magnetic disk drive 104, a magnetic disk 105, and an optical disk drive. 106, an optical disk 107, a display 108, an I / F (Interface) 109, a keyboard 110, a mouse 111, a scanner 112, and a printer 113. Each component is connected by a bus 100.

ここで、ＣＰＵ１０１は、設計支援装置の全体の制御を司る中央処理装置である。ＲＯＭ１０２は、ブートプログラムなどのプログラムを記憶している。ＲＡＭ１０３は、ＣＰＵ１０１のワークエリアとして使用される。磁気ディスクドライブ１０４は、ＣＰＵ１０１の制御にしたがって磁気ディスク１０５に対するデータのリード／ライトを制御する。磁気ディスク１０５は、磁気ディスクドライブ１０４の制御で書き込まれたデータを記憶する。 Here, the CPU 101 is a central processing unit that controls the entire design support apparatus. The ROM 102 stores a program such as a boot program. The RAM 103 is used as a work area for the CPU 101. The magnetic disk drive 104 controls reading / writing of data with respect to the magnetic disk 105 according to the control of the CPU 101. The magnetic disk 105 stores data written under the control of the magnetic disk drive 104.

光ディスクドライブ１０６は、ＣＰＵ１０１の制御にしたがって光ディスク１０７に対するデータのリード／ライトを制御する。光ディスク１０７は、光ディスクドライブ１０６の制御で書き込まれたデータを記憶したり、光ディスク１０７に記憶されたデータをコンピュータに読み取らせたりする。 The optical disk drive 106 controls reading / writing of data with respect to the optical disk 107 according to the control of the CPU 101. The optical disc 107 stores data written under the control of the optical disc drive 106, and causes the computer to read data stored on the optical disc 107.

ディスプレイ１０８は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。このディスプレイ１０８は、たとえば、ＣＲＴ、ＴＦＴ液晶ディスプレイ、プラズマディスプレイなどを採用することができる。 The display 108 displays data such as a document, an image, and function information as well as a cursor, an icon, or a tool box. As this display 108, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like can be adopted.

インターフェース（以下、「Ｉ／Ｆ」と略する。）１０９は、通信回線を通じてＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどのネットワーク１１４に接続され、このネットワーク１１４を介して他の装置に接続される。そして、Ｉ／Ｆ１０９は、ネットワーク１１４と内部のインターフェースを司り、外部装置からのデータの入出力を制御する。Ｉ／Ｆ１０９には、たとえばモデムやＬＡＮアダプタなどを採用することができる。 An interface (hereinafter abbreviated as “I / F”) 109 is connected to a network 114 such as a LAN (Local Area Network), a WAN (Wide Area Network), and the Internet through a communication line. Connected to other devices. The I / F 109 controls an internal interface with the network 114 and controls data input / output from an external device. For example, a modem or a LAN adapter may be employed as the I / F 109.

キーボード１１０は、文字、数字、各種指示などの入力のためのキーを備え、データの入力をおこなう。また、タッチパネル式の入力パッドやテンキーなどであってもよい。マウス１１１は、カーソルの移動や範囲選択、あるいはウィンドウの移動やサイズの変更などをおこなう。ポインティングデバイスとして同様に機能を備えるものであれば、トラックボールやジョイスティックなどであってもよい。 The keyboard 110 includes keys for inputting characters, numbers, various instructions, and the like, and inputs data. Moreover, a touch panel type input pad or a numeric keypad may be used. The mouse 111 performs cursor movement, range selection, window movement, size change, and the like. A trackball or a joystick may be used as long as they have the same function as a pointing device.

スキャナ１１２は、画像を光学的に読み取り、設計支援装置内に画像データを取り込む。なお、スキャナ１１２は、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）機能を持たせてもよい。また、プリンタ１１３は、画像データや文書データを印刷する。プリンタ１１３には、たとえば、レーザプリンタやインクジェットプリンタを採用することができる。 The scanner 112 optically reads an image and takes in the image data into the design support apparatus. The scanner 112 may have an OCR (Optical Character Reader) function. The printer 113 prints image data and document data. For example, a laser printer or an ink jet printer can be employed as the printer 113.

（設計支援装置の機能的構成）
図２は、本実施の形態にかかる設計支援装置の機能的構成を示すブロック図である。設計支援装置２００は、分割部２０１と、取得部２０２と、抽出部２０３と、タスク実行コスト算出部２０４と、生成部２０５と、総実行コスト算出部２０６と、決定部２０７と、出力部２０８と、を含む構成である。この制御部となる機能（分割部２０１〜出力部２０８）は、具体的には、たとえば、図１に示したＲＯＭ１０２、ＲＡＭ１０３、磁気ディスク１０５、光ディスク１０７などの記憶装置に記憶されたプログラムをＣＰＵ１０１に実行させることにより、または、Ｉ／Ｆ１０９により、その機能を実現する。 (Functional configuration of design support device)
FIG. 2 is a block diagram showing a functional configuration of the design support apparatus according to the present embodiment. The design support apparatus 200 includes a division unit 201, an acquisition unit 202, an extraction unit 203, a task execution cost calculation unit 204, a generation unit 205, a total execution cost calculation unit 206, a determination unit 207, and an output unit 208. It is the structure containing these. Specifically, the functions (the dividing unit 201 to the output unit 208) serving as the control unit are, for example, a program stored in a storage device such as the ROM 102, the RAM 103, the magnetic disk 105, and the optical disk 107 illustrated in FIG. The function is realized by executing the function or by the I / F 109.

分割部２０１は、組み込みシステムで実行させたい対象プログラムコードを複数のタスクに分割する機能を有する。組み込みシステムとは、複数種のコア（演算要素）を備えており、たとえば、プロセッサ、リコンフィグ回路、ＤＳＰなどの演算要素を有する。分割部２０１では、対象プログラムコードを、制御文単位のブロックに機能的に分割する。分割された個々のブロックをタスクと呼ぶ。この分割部２０１による分割は、周知の手法により実現する。 The dividing unit 201 has a function of dividing the target program code to be executed by the embedded system into a plurality of tasks. The embedded system includes a plurality of types of cores (arithmetic elements), and includes arithmetic elements such as a processor, a reconfiguration circuit, and a DSP. The dividing unit 201 functionally divides the target program code into blocks in control statement units. Each divided block is called a task. The division by the dividing unit 201 is realized by a known method.

図３は、対象プログラムコードの一記述例を示す説明図である。ここでは、３つのｆｏｒブロックからなる対象プログラムコード３００を例に挙げている。図４は、図３に示した対象プログラムコード３００の分割例を示す説明図である。図４では、タスクＴ１〜Ｔ３に分割されることとなる。 FIG. 3 is an explanatory diagram showing a description example of the target program code. Here, the target program code 300 including three for blocks is taken as an example. FIG. 4 is an explanatory diagram showing an example of division of the target program code 300 shown in FIG. In FIG. 4, the task is divided into tasks T1 to T3.

取得部２０２は、組み込みシステムに組み込まれている演算要素の種別および当該演算要素に実行させる最適化処理に関する設定情報を取得する機能を有する。ユーザが、所定の入力画面に設定情報をマウス１１１やキーボード１１０を操作して入力することで、設定情報を記憶装置に保存する。 The acquisition unit 202 has a function of acquiring setting information related to the types of calculation elements incorporated in the embedded system and optimization processing to be executed by the calculation elements. When the user inputs setting information to a predetermined input screen by operating the mouse 111 or the keyboard 110, the setting information is stored in the storage device.

図５〜図７は、入力画面の一例を示す説明図である。この入力画面５００は、演算要素ごとにタブ５０１〜５０３が表示され、タブを指定することで、指定されたタブにより特定される演算要素の設定情報（最適化手法、実行制約、パラメータ）の入力領域５０４〜５０６が表示される。そして、ＯＫボタン５０７をクリックすることで、設定情報が記憶装置に記憶されることとなる。 5-7 is explanatory drawing which shows an example of an input screen. In this input screen 500, tabs 501 to 503 are displayed for each computation element, and by specifying a tab, setting information (optimization technique, execution constraint, parameter) of computation elements specified by the designated tab is input. Regions 504 to 506 are displayed. By clicking the OK button 507, the setting information is stored in the storage device.

図５は、プロセッサのタブ５０１が指定されたときの入力画面５００である。最適化手法には、「ループ並列処理」や「ＶＬＩＷ命令」などのチェックボックスが表示されている。そして、マウス１１１やキーボード１１０を操作してチェックを入れることで、所望の最適化手法を設定することができる。また、「ＶＬＩＷ命令」については、並列数も必要になるため、マウス１１１やキーボード１１０を操作して数値を入力することとなる。実行制約は、最適化手法を指定することで自動でチェックが入力される。パラメータについては、マウス１１１やキーボード１１０を操作して数値を入力することとなる。 FIG. 5 shows an input screen 500 when the processor tab 501 is designated. In the optimization method, check boxes such as “loop parallel processing” and “VLIW instruction” are displayed. A desired optimization method can be set by operating the mouse 111 and the keyboard 110 and checking them. For the “VLIW instruction”, since a parallel number is also required, a numerical value is input by operating the mouse 111 or the keyboard 110. Execution constraints are automatically input by specifying an optimization method. Regarding parameters, numerical values are input by operating the mouse 111 and the keyboard 110.

図６は、ＤＳＰのタブ５０２が指定されたときの入力画面５００である。最適化手法には、「ＳＩＭＤ化」などのチェックボックスが表示されている。そして、マウス１１１やキーボード１１０を操作してチェックを入れることで、所望の最適化手法を設定することができる。また、「ＳＩＭＤ化」については、型ごとに並列数も必要になるため、マウス１１１やキーボード１１０を操作して数値を入力することとなる。実行制約は、最適化手法を指定することで自動でチェックが入力される。パラメータについては、マウス１１１やキーボード１１０を操作して数値を入力することとなる。 FIG. 6 shows an input screen 500 when the DSP tab 502 is designated. In the optimization method, check boxes such as “SIMD” are displayed. A desired optimization method can be set by operating the mouse 111 and the keyboard 110 and checking them. For “SIMD”, since a parallel number is required for each type, a numerical value is input by operating the mouse 111 or the keyboard 110. Execution constraints are automatically input by specifying an optimization method. Regarding parameters, numerical values are input by operating the mouse 111 and the keyboard 110.

図７は、リコンフィグ回路のタブ５０３が指定されたときの入力画面５００である。最適化手法には、「ｆｏｒブロックのパイプライン処理」や「ｉｆブロックの同時処理」、「ｓｗｉｔｃｈブロックの同時処理」などのチェックボックスが表示されている。そして、マウス１１１やキーボード１１０を操作してチェックを入れることで、所望の最適化手法を設定することができる。実行制約は、最適化手法を指定することで自動でチェックが入力される。パラメータについては、マウス１１１やキーボード１１０を操作して数値を入力することとなる。このようにして入力された情報が設定情報として記憶装置に記憶されることとなる。 FIG. 7 shows an input screen 500 when the reconfiguration circuit tab 503 is designated. In the optimization method, check boxes such as “for block block pipeline processing”, “if block simultaneous processing”, and “switch block simultaneous processing” are displayed. A desired optimization method can be set by operating the mouse 111 and the keyboard 110 and checking them. Execution constraints are automatically input by specifying an optimization method. Regarding parameters, numerical values are input by operating the mouse 111 and the keyboard 110. The information input in this way is stored in the storage device as setting information.

図８は、設定情報のデータ構造を示す説明図である。設定情報は、演算要素（ＰＥ）種、パラメータ、対象ブロック、最適化手法、実行制約を有する。ＰＥ種には、タブにより指定されたＰＥ種（プロセッサ、ＤＳＰ、リコンフィグ回路）が記憶される。パラメータには、その入力領域に入力された数値（ＰＥの使用個数、動作周波数など）が記憶される。 FIG. 8 is an explanatory diagram showing the data structure of the setting information. The setting information includes a computation element (PE) type, a parameter, a target block, an optimization method, and an execution constraint. The PE type (processor, DSP, reconfiguration circuit) designated by the tab is stored in the PE type. The parameter stores a numerical value (number of PEs used, operating frequency, etc.) input in the input area.

対象ブロックには、最適化手法が適用されるブロック（タスク）の種別（ｆｏｒ，ｉｆ−ｅｌｓｅ，ｓｗｉｔｃｈ−ｃａｓｅ）が記憶される。対象ブロックの種別は、最適化手法により一意に決定される。たとえば、最適化手法が「ループ並列性」である場合には、対象ブロックは「ｆｏｒ」となる。 The target block stores a type (for, if-else, switch-case) of a block (task) to which the optimization technique is applied. The type of the target block is uniquely determined by the optimization method. For example, when the optimization method is “loop parallelism”, the target block is “for”.

最適化手法には、チェックボックスにより指定された最適化手法が記憶される。また、並列数の数値が入力されている場合にはその数値も記憶する。実行制約には、チェックボックスにより指定された最適化手法の実行制約が記憶される。実行制約は、最適化手法により一意に決定される。たとえば、ＰＥ種が「リコンフィグ回路」で最適化手法が「パイプライン」である場合には、「（ループヘッダの）正規化」、「完全ネスト」、「ループ繰越依存」といった実行制約となる。 In the optimization method, the optimization method designated by the check box is stored. In addition, when a numerical value of the parallel number is input, the numerical value is also stored. The execution constraint stores the execution constraint of the optimization method specified by the check box. Execution constraints are uniquely determined by an optimization method. For example, when the PE type is “reconfigurable circuit” and the optimization method is “pipeline”, execution constraints such as “normalization of (loop header)”, “complete nesting”, and “loop carry dependency” are set. .

抽出部２０３は、取得部２０２によって取得された設定情報に含まれる演算要素の種別により最適化処理が実行可能なタスクを複数のタスクの中から抽出する機能を有する。具体的には、たとえば、あらかじめ用意されたプログラムパターンを抽出するための解析ルーチンの中から設定情報に合致するプログラムパターンを選択する。そして、選択されたプログラムパターンに該当するタスクを複数のタスクの中から抽出する。抽出されたタスクは、記憶装置に記憶される。 The extraction unit 203 has a function of extracting, from a plurality of tasks, a task that can be optimized by the type of calculation element included in the setting information acquired by the acquisition unit 202. Specifically, for example, a program pattern that matches the setting information is selected from an analysis routine for extracting a program pattern prepared in advance. Then, a task corresponding to the selected program pattern is extracted from a plurality of tasks. The extracted task is stored in the storage device.

たとえば、ＰＥ種が「リコンフィグ回路」で最適化手法が「パイプライン」である場合には、パイプライン処理に関する解析ルーチンが選択され、対象プログラムコード３００を、分割されたタスクごとに解析する。この解析ルーチンは、パイプライン処理の実行制約である「（ループヘッダの）正規化」、「完全ネスト」、「ループ繰越依存」の解析を含む。この解析ルーチンにより、分割された複数のタスクの中から、リコンフィグ回路に割り当てられるタスクを抽出する。このように、タスクが抽出された場合、そのタスクは最適化手法を実行する演算要素により実行されることとなる。 For example, when the PE type is “reconfigurable circuit” and the optimization method is “pipeline”, an analysis routine related to pipeline processing is selected, and the target program code 300 is analyzed for each divided task. This analysis routine includes analysis of “normalization of (loop header)”, “complete nesting”, and “loop carry dependency” which are execution constraints of pipeline processing. By this analysis routine, a task assigned to the reconfiguration circuit is extracted from a plurality of divided tasks. As described above, when a task is extracted, the task is executed by an arithmetic element that executes the optimization method.

タスク実行コスト算出部２０４は、分割部２０１によって得られた各タスクを実行する演算要素の種別に基づいて、各タスクの実行時間に応じた実行コストを算出する機能を有する。具体的には、設定情報に基づいて、抽出部２０３によって抽出されたタスクの実行時間に応じた実行コストを算出する。ここで、実行コストとは、タスクの実行時間に比例する値であり、実行コストが高いほどそのタスクの実行時間は長くなり、実行コストが低いほどそのタスクの実行時間は短くなる。プロセッサの実行コストは下記式（１）により算出される。 The task execution cost calculation unit 204 has a function of calculating an execution cost corresponding to the execution time of each task based on the type of arithmetic element that executes each task obtained by the dividing unit 201. Specifically, the execution cost corresponding to the execution time of the task extracted by the extraction unit 203 is calculated based on the setting information. Here, the execution cost is a value proportional to the execution time of the task. The higher the execution cost, the longer the execution time of the task, and the lower the execution cost, the shorter the execution time of the task. The execution cost of the processor is calculated by the following equation (1).

Ｃ＝ｓ×ｃ×Ｌ／ｆ・・・（１）
ここで、Ｃは実行コストであり、ｓは対象となるタスクのステートメント数であり、ｃは演算数であり、Ｌは対象となるタスクのループ回転数であり、ｆは対象となる演算要素のクロック周波数（設定情報の動作周波数）である。ｃは定数であり、以下、ｃ＝５とする。 C = s × c × L / f (1)
Here, C is the execution cost, s is the number of statements of the target task, c is the number of operations, L is the number of loop rotations of the target task, and f is the target operation element. Clock frequency (setting information operating frequency). c is a constant, and hereinafter, c = 5.

たとえば、図４に示したタスクＴ１をプロセッサが実行する場合の実行コストＣ（Ｔ１）は、ｆ＝５００［ＭＨｚ］とすると、
ｓ＝１
ｃ＝５
Ｌ＝２０×１０より、
Ｃ（Ｔ１）＝２［μｓ］となる。同様に、タスクＴ２，Ｔ３についても、
Ｃ（Ｔ２）＝０．５［μｓ］、Ｃｐ（Ｔ３）＝０．５［μｓ］となる。 For example, when the processor executes the task T1 shown in FIG. 4, the execution cost C (T1) is f = 500 [MHz].
s = 1
c = 5
From L = 20 × 10,
C (T1) = 2 [μs]. Similarly, for tasks T2 and T3,
C (T2) = 0.5 [μs] and Cp (T3) = 0.5 [μs].

また、ＤＳＰ／プロセッサの実行コストは、下記式（２）により算出される。 The execution cost of the DSP / processor is calculated by the following equation (2).

Ｃ＝ｓ×ｃ×Ｌ／（ｆ×ｐ）・・・（２）
ここで、ＣはＤＳＰ／プロセッサのＳＩＭＤ／ＶＬＩＷによる並列処理による実行コストであり、ｓは対象となるタスクのステートメント数であり、ｐは、ＳＩＭＤ／ＶＬＩＷによる並列処理の並列数である。 C = s × c × L / (f × p) (2)
Here, C is the execution cost of the DSP / processor parallel processing by SIMD / VLIW, s is the number of statements of the target task, and p is the parallel number of parallel processing by SIMD / VLIW.

たとえば、図４に示したタスクＴ１をＤＳＰ／プロセッサがＳＩＭＤ演算処理／ＶＬＩＷ処理を実行する場合の実行コストＣ（Ｔ１）は、ｆ＝５００［ＭＨｚ］とすると、
ｓ＝１
ｃ＝５
Ｌ＝２０×１０
ｐ＝８より、
Ｃ（Ｔ１）＝０．２５となる。同様に、タスクＴ２，Ｔ３についても、
Ｃ（Ｔ２）＝０．０６２５［μｓ］、Ｃ（Ｔ３）＝０．０６２５［μｓ］となる。 For example, when the DSP / processor executes the SIMD arithmetic processing / VLIW processing for the task T1 shown in FIG. 4, the execution cost C (T1) is f = 500 [MHz].
s = 1
c = 5
L = 20 × 10
From p = 8
C (T1) = 0.25. Similarly, for tasks T2 and T3,
C (T2) = 0.0625 [μs] and C (T3) = 0.0625 [μs].

また、対象となるタスクがＳＩＭＤ／ＶＬＩＷの両並列処理を実行する場合、双方の並列数で除算する。 Further, when the target task executes both SIMD / VLIW parallel processing, the task is divided by the parallel number of both.

たとえば、図４に示したタスクＴ１をＤＳＰがＳＩＭＤ演算処理を実行し、プロセッサがＶＬＩＷ処理を実行する場合の実行コストＣは、ＳＩＭＤの並列数ｐ＝８、ＶＬＩＷの並列数ｐ＝４とすると、Ｃ（Ｔ１）＝０．０６２５［μｓ］となる。 For example, if the DSP executes SIMD arithmetic processing for the task T1 shown in FIG. 4 and the processor executes VLIW processing, the execution cost C is assumed to be SIMD parallel number p = 8 and VLIW parallel number p = 4. , C (T1) = 0.0625 [μs].

なお、ＶＬＩＷ処理では命令レベルでの並列処理を行うため、詳細な見積もりを行うにはｎａｔｉｖｅｃｏｍｐｉｌｅｒを用いての見積もりが必要となる。しかし、大まかな精度で十分な場合には、この見積もり手法を適用する。 Since VLIW processing performs parallel processing at the instruction level, it is necessary to use a native compiler for detailed estimation. However, if rough accuracy is sufficient, this estimation method is applied.

また、リコンフィグ回路の実行コストは、下記式（３）により算出される。 Further, the execution cost of the reconfiguration circuit is calculated by the following equation (3).

Ｃ＝Ｌ／ｆ＋ｒ・・・（３）
ここで、Ｃはリコンフィグ回路の実行コストであり、ｒは、動的再構成コストである。なお、ｒが１〜数クロックである場合には、Ｌ／ｆに対して無視できる。以下、ｒ＝０とする。 C = L / f + r (3)
Here, C is the execution cost of the reconfiguration circuit, and r is the dynamic reconfiguration cost. When r is 1 to several clocks, it can be ignored for L / f. Hereinafter, r = 0.

たとえば、図４に示したタスクＴ１をリコンフィグ回路が実行する場合の実行コストＣ（Ｔ１）は、ｆ＝５００［ＭＨｚ］とすると、
Ｌ＝２０×１０より、
Ｃ（Ｔ１）＝０．４［μｓ］となる。同様に、タスクＴ２，Ｔ３についても、
Ｃ（Ｔ２）＝０．１［μｓ］、Ｃ（Ｔ３）＝０．１［μｓ］となる。 For example, when the reconfiguration circuit executes the task T1 shown in FIG. 4, the execution cost C (T1) is f = 500 [MHz].
From L = 20 × 10,
C (T1) = 0.4 [μs]. Similarly, for tasks T2 and T3,
C (T2) = 0.1 [μs] and C (T3) = 0.1 [μs].

また、タスク実行コスト算出部２０４は、演算要素間のデータ転送時間に応じた転送コストも算出する。ＤＳＰやリコンフィグ回路で実行する場合、または、複数のプロセッサにより並列実行する場合、データ転送がおこなわれるため、転送コストが必要となる。転送コストは、下記式（４）により算出される。 The task execution cost calculation unit 204 also calculates a transfer cost corresponding to the data transfer time between the calculation elements. When executed by a DSP or a reconfigurable circuit, or when executed in parallel by a plurality of processors, data transfer is performed, so transfer cost is required. The transfer cost is calculated by the following equation (4).

Ｃｔ＝ｄ／ｂｔ・・・（４）
ここで、ｄはデータ転送量であり、ｂｔはバススループットである。データ転送量ｄはそのタスクのループ回転数Ｌ×並列数ｐである。ＳＩＭＤ／ＶＬＩＷでない場合は、ｐ＝１とする。バススループットｂｔは、たとえば、６４ｂｉｔ、１６６ＭＨｚの場合は、６４×１６６＝１０６２４となる。 Ct = d / bt (4)
Here, d is the data transfer amount, and bt is the bus throughput. The data transfer amount d is the number of loop rotations L × the number of parallels p of the task. If not SIMD / VLIW, p = 1. The bus throughput bt is, for example, 64 × 166 = 10624 in the case of 64 bits and 166 MHz.

このように、プログラムパターンに該当するタスクとして抽出されたタスクは、その実行主体である演算要素およびその実行コスト（転送コストを用いる場合は転送コストも含む）とともに、割当情報として、記憶装置に記憶される。図９は、割当情報のデータ構造を示す説明図である。 As described above, the task extracted as the task corresponding to the program pattern is stored in the storage device as allocation information together with the calculation element that is the execution subject and the execution cost (including the transfer cost when the transfer cost is used). Is done. FIG. 9 is an explanatory diagram showing a data structure of allocation information.

図２において、生成部２０５は、タスクの実行順序をあらわす組み合わせパターンを生成する機能を有する。具体的には、たとえば、割当情報を参照することで、演算要素とその演算要素に割り当てられたタスクが特定されるため、これらのタスクが該当する演算要素により時系列（直列）または並列に実行される順序となる組み合わせパターンを構築する。組み合わせパターンは、可能な限りすべてのパターンを網羅するように生成する。組み合わせパターンの生成例については後述する。 In FIG. 2, the generation unit 205 has a function of generating a combination pattern representing the task execution order. Specifically, for example, by referring to the allocation information, the calculation element and the task assigned to the calculation element are specified, so these tasks are executed in time series (serial) or in parallel by the corresponding calculation element. Construct a combination pattern in the order in which it will be performed. The combination pattern is generated so as to cover all patterns as much as possible. An example of generating a combination pattern will be described later.

総実行コスト算出部２０６は、タスク実行コスト算出部２０４によって算出された実行コストに基づいて、生成部２０５によって生成された各組み合わせパターンの実行時間に応じた総実行コストを算出する機能を有する。具体的には、たとえば、組み合わせパターンを構成するタスクの実行コストを集計する。より具体的には、直列に実行されるタスクはその実行コストを集計する。並列に分岐している場合は、分岐先の実行コストの合計が最大となる方を採用して、総実行コストを集計する。また、データ転送が発生している箇所については転送コストも加算する。総実行コストの具体的な算出例については後述する。 The total execution cost calculation unit 206 has a function of calculating the total execution cost according to the execution time of each combination pattern generated by the generation unit 205 based on the execution cost calculated by the task execution cost calculation unit 204. Specifically, for example, the execution costs of the tasks constituting the combination pattern are totaled. More specifically, the tasks executed in series add up the execution costs. In the case of branching in parallel, the total execution cost is totaled by adopting the one with the largest total execution cost at the branch destination. In addition, the transfer cost is added to the portion where the data transfer occurs. A specific calculation example of the total execution cost will be described later.

決定部２０７は、総実行コスト算出部２０６によって算出された総実行コストに基づいて、組み合わせパターン群の中から特定の組み合わせパターンを決定する機能を有する。具体的には、たとえば、総実行コストが最小となる組み合わせパターンを最適な組み合わせパターンに決定する。これにより、組み込みシステムの実行時間が最短となるため、組み込みシステムの最適化を図ることができる。 The determination unit 207 has a function of determining a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation unit 206. Specifically, for example, the combination pattern that minimizes the total execution cost is determined as the optimal combination pattern. As a result, the execution time of the embedded system is minimized, so that the embedded system can be optimized.

また、総実行コストが所定のしきい値以下となる組み合わせパターンを最適な組み合わせパターンに決定することとしてもよい。これにより、ユーザの許容限度の組み合わせパターンが得られるため、組み合わせパターンの選択枝が増えることとなる。また、いずれの場合でも、総実行コストが同一の値である場合、演算要素の使用個数が少ない方を最適な組み合わせパターンとしてもよい。これにより、消費電力の低減化を図ることができる。なお、最適な組み合わせパターンは、記憶装置に記憶される。 In addition, a combination pattern whose total execution cost is equal to or less than a predetermined threshold value may be determined as an optimal combination pattern. As a result, a combination pattern with an allowable limit of the user is obtained, and the number of combinations of combination patterns increases. Also, in any case, when the total execution cost is the same value, the one with the smaller number of operation elements used may be set as the optimum combination pattern. Thereby, reduction of power consumption can be achieved. The optimal combination pattern is stored in the storage device.

出力部２０８は、決定部２０７によって決定された決定結果を出力する機能を有する。具体的には、たとえば、最適な組み合わせパターンをディスプレイ１０８に渡すことにより、ディスプレイ１０８の表示画面に最適な組み合わせパターンを表示させる。また、Ｉ／Ｆ１０９に渡すことにより、他のコンピュータに送信する。 The output unit 208 has a function of outputting the determination result determined by the determination unit 207. Specifically, for example, the optimal combination pattern is displayed on the display screen of the display 108 by passing the optimal combination pattern to the display 108. In addition, it is transmitted to another computer by passing it to the I / F 109.

（設計支援処理手順）
図１０は、本実施の形態にかかる設計支援処理手順を示すフローチャートである。まず、取得部２０２により設定情報を取得して記憶装置に記憶し（ステップＳ１００１）、対象プログラムコード３００を読み込んで記憶装置に記憶する（ステップＳ１００２）。対象プログラムコード３００の読み込みは、設定情報の取得（ステップＳ１００１）に先立って実行してもよい。 (Design support processing procedure)
FIG. 10 is a flowchart showing a design support processing procedure according to the present embodiment. First, setting information is acquired by the acquisition unit 202 and stored in the storage device (step S1001), and the target program code 300 is read and stored in the storage device (step S1002). The reading of the target program code 300 may be executed prior to the acquisition of setting information (step S1001).

そして、分割部２０１により対象プログラムコード３００を解析してタスク分割して、分割されたタスクを記憶装置に記憶する（ステップＳ１００３）。このあと、プログラムパターン割当処理（ステップＳ１００４）およびスケジューリング処理（ステップＳ１００５）を実行し、最後に、出力部２０８により、最適な組み合わせパターンを出力する（ステップＳ１００６）。 Then, the dividing unit 201 analyzes the target program code 300 to divide the task, and stores the divided task in the storage device (step S1003). Thereafter, program pattern allocation processing (step S1004) and scheduling processing (step S1005) are executed, and finally, the output unit 208 outputs an optimum combination pattern (step S1006).

図１１は、プログラムパターン割当処理（ステップＳ１００４）の詳細な処理手順を示すフローチャートである。まず、未処理のＰＥ種があるか否かを判断し（ステップＳ１１０１）、未処理のＰＥ種がある場合（ステップＳ１１０１：Ｙｅｓ）、未処理のＰＥ種を１つ選択する（ステップＳ１１０２）。つぎに、選択ＰＥ種に応じた設定情報を読み込み（ステップＳ１１０３）、解析ルーチンを呼び出す（ステップＳ１１０４）。 FIG. 11 is a flowchart showing a detailed processing procedure of the program pattern assignment processing (step S1004). First, it is determined whether or not there is an unprocessed PE type (step S1101). If there is an unprocessed PE type (step S1101: Yes), one unprocessed PE type is selected (step S1102). Next, setting information corresponding to the selected PE type is read (step S1103), and an analysis routine is called (step S1104).

解析ルーチンは、該当する最適化手法により呼び出されるプログラムパターンの抽出アルゴリズムである。解析ルーチンは複数呼び出される場合もある。最後に、解析実行処理（ステップＳ１１０５）をおこなう。解析実行処理では、呼び出された解析ルーチンを実行する。解析ルーチンが複数呼び出された場合は、順次実行することとなる。そして、ステップＳ１１０１に戻る。ステップＳ１１０１において、未処理のＰＥ種がない場合（ステップＳ１１０１：Ｎｏ）、ステップＳ１００５に移行する。 The analysis routine is a program pattern extraction algorithm called by a corresponding optimization method. Multiple analysis routines may be called. Finally, analysis execution processing (step S1105) is performed. In the analysis execution process, the called analysis routine is executed. If multiple analysis routines are called, they will be executed sequentially. Then, the process returns to step S1101. In step S1101, when there is no unprocessed PE type (step S1101: No), the process proceeds to step S1005.

図１２は、解析実行処理（ステップＳ１１０５）の詳細な処理手順を示すフローチャートである。まず、未処理の解析ルーチンがあるか否かを判断し（ステップＳ１２０１）、未処理の解析ルーチンがある場合（ステップＳ１２０１：Ｙｅｓ）、未処理の解析ルーチンを１つ選択し（ステップＳ１２０２）、選択解析ルーチンを実行する（ステップＳ１２０３）。そして、ステップＳ１２０１に戻る。ステップＳ１２０１において、未処理の解析ルーチンがない場合（ステップＳ１２０１：Ｎｏ）、ステップＳ１１０１に戻る。 FIG. 12 is a flowchart showing a detailed processing procedure of the analysis execution process (step S1105). First, it is determined whether or not there is an unprocessed analysis routine (step S1201). If there is an unprocessed analysis routine (step S1201: Yes), one unprocessed analysis routine is selected (step S1202). A selection analysis routine is executed (step S1203). Then, the process returns to step S1201. If there is no unprocessed analysis routine in step S1201 (step S1201: No), the process returns to step S1101.

つぎに、解析ルーチンの具体例について説明する。図１３〜図２０は、各種解析ルーチンの実行処理手順を示すフローチャートである。以下、個別に説明する。 Next, a specific example of the analysis routine will be described. 13 to 20 are flowcharts showing execution processing procedures of various analysis routines. Hereinafter, it demonstrates individually.

図１３は、ｆｏｒブロックのループ並列処理に関するプログラムパターン抽出処理手順を示すフローチャートである。この解析ルーチンは、ｆｏｒブロックのタスクをプロセッサに割り当てるための解析ルーチンである。 FIG. 13 is a flowchart showing a program pattern extraction processing procedure regarding loop parallel processing of the for block. This analysis routine is an analysis routine for assigning a for block task to a processor.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ１３０１）。探索が終了していない場合（ステップＳ１３０１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ１３０２）、ｆｏｒブロックであるか否かを判断する（ステップＳ１３０３）。ｆｏｒブロックでない場合（ステップＳ１３０３：Ｎｏ）、ステップＳ１３０１に戻る。一方、ｆｏｒブロックである場合（ステップＳ１３０３：Ｙｅｓ）、正規化可能であるか否かを判断する（ステップＳ１３０４）。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S1301). If the search has not ended (step S1301: No), the searched target block is selected (step S1302), and it is determined whether it is a for block (step S1303). If it is not a for block (step S1303: No), the process returns to step S1301. On the other hand, if it is a for block (step S1303: Yes), it is determined whether or not normalization is possible (step S1304).

ここで、正規化とは、ループヘッダを“ｆｏｒ（ｉ＝０；ｉｏｐ（比較演算）定数；ｉ＋＋）”の形に変換する処理である。ステップＳ１３０４では、ループの実行前にその形に変換可能か否かを判断する。上記の形に変換できないと実行時にループの回転数が分からないためにＮＧとする。 Here, normalization is a process of converting the loop header into a form of “for (i = 0; i op (comparison operation) constant; i ++)”. In step S1304, it is determined whether or not conversion to the shape is possible before execution of the loop. If it cannot be converted into the above form, it will be determined as NG because the rotational speed of the loop is not known at the time of execution.

正規化可能でない場合（ステップＳ１３０４：Ｎｏ）、ステップＳ１３０１に戻る。一方、正規化可能である場合（ステップＳ１３０４：Ｙｅｓ）、ループ繰越依存があるか否かを判断する（ステップＳ１３０５）。ここで、あるイタレーション（ｉ＝ｉ０）での結果を他のイタレーション（ｉ＝ｉ１）で利用する場合など、異なるイタレーション間で同じデータの定義・参照を行う場合、ループ繰越依存があるという。 If normalization is not possible (step S1304: NO), the process returns to step S1301. On the other hand, if normalization is possible (step S1304: Yes), it is determined whether there is a loop carry-over dependency (step S1305). Here, when the same data is defined / referenced between different iterations, such as when the result of one iteration (i = i0) is used in another iteration (i = i1), there is a loop carry dependency. That's it.

イタレーションごとに異なるコア（演算要素）に処理を割り当てた場合、実行順序が保証されないため、逐次実行した場合と異なる結果になる可能性があるからである。したがって、ループ繰越依存がない場合（ステップＳ１３０５：Ｎｏ）、ステップＳ１３０１に戻る。一方、ループ繰越依存がある場合（ステップＳ１３０５：Ｙｅｓ）、タスク実行コスト算出部２０４により、コスト見積もりをおこなう（ステップＳ１３０６）。すなわち、対象ブロックの実行コストや転送コストを算出する。 This is because if the processing is assigned to different cores (calculation elements) for each iteration, the execution order is not guaranteed, so that there is a possibility that the result is different from the case of sequential execution. Therefore, when there is no loop carry-over dependency (step S1305: No), the process returns to step S1301. On the other hand, when there is a loop carry-over dependency (step S1305: Yes), the task execution cost calculation unit 204 performs cost estimation (step S1306). That is, the execution cost and transfer cost of the target block are calculated.

そして、割り当てに見合うコストか否かを判断する（ステップＳ１３０７）。具体的には、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」が「逐次実行した場合のコスト」よりも大きくなってしまった場合、処理を割り当てることはかえって性能劣化の原因となる。このため、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」＜「逐次実行した場合のコスト」となる場合、割り当てに見合うコストとする。たとえば、対象ブロックのループ回転数が非常に小さいと実行コストも小さくなるため、割り当てに見合わなくなる。 Then, it is determined whether the cost is commensurate with the allocation (step S1307). More specifically, if “loop execution cost + transfer cost + call overhead” becomes larger than “sequential execution cost”, assigning a process may cause performance degradation. Therefore, if “loop execution cost + transfer cost + call overhead” <“cost when sequentially executed”, the cost is commensurate with the allocation. For example, if the loop speed of the target block is very small, the execution cost will be small, and it will not be appropriate for allocation.

そして、割り当てに見合わない場合（ステップＳ１３０７：Ｎｏ）、ステップＳ１３０１に戻る。一方、割り当てに見合う場合（ステップＳ１３０７：Ｙｅｓ）、対象ブロックをプロセッサに割当可能なタスクに決定する（ステップＳ１３０８）。たとえば、対象ブロックに割当フラグを設定する。そして、ステップＳ１３０１に戻る。ステップＳ１３０１において、探索が終了した場合（ステップＳ１３０１：Ｙｅｓ）、この解析ルーチンを終了する。 If the allocation is not met (step S1307: NO), the process returns to step S1301. On the other hand, if it matches the allocation (step S1307: Yes), the target block is determined as a task that can be allocated to the processor (step S1308). For example, an allocation flag is set for the target block. Then, the process returns to step S1301. In step S1301, when the search is finished (step S1301: Yes), this analysis routine is finished.

図１４は、ｆｏｒブロックのパイプライン処理に関するプログラムパターン抽出処理手順を示すフローチャートである。この解析ルーチンは、ｆｏｒブロックのタスクをプロセッサ／リコンフィグ回路（図１４中、「ｒｅｃｏｎｆ」と表記する。）に割り当てるための解析ルーチンである。 FIG. 14 is a flowchart showing a program pattern extraction processing procedure regarding pipeline processing of the for block. This analysis routine is an analysis routine for assigning the task of the for block to the processor / reconfigurable circuit (indicated as “reconf” in FIG. 14).

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ１４０１）。探索が終了していない場合（ステップＳ１４０１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ１４０２）、ｆｏｒブロックであるか否かを判断する（ステップＳ１４０３）。ｆｏｒブロックでない場合（ステップＳ１４０３：Ｎｏ）、ステップＳ１４０１に戻る。一方、ｆｏｒブロックである場合（ステップＳ１４０３：Ｙｅｓ）、正規化可能であるか否かを判断する（ステップＳ１４０４）。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S1401). If the search has not ended (step S1401: No), the searched target block is selected (step S1402), and it is determined whether it is a for block (step S1403). If it is not a for block (step S1403: No), the process returns to step S1401. On the other hand, if it is a for block (step S1403: Yes), it is determined whether or not normalization is possible (step S1404).

ここで、正規化とは、ループヘッダを“ｆｏｒ（ｉ＝０；ｉｏｐ（比較演算）定数；ｉ＋＋）”の形に変換する処理である。ステップＳ１４０４では、ループの実行前にその形に変換可能か否かを判断する。上記の形に変換できないと実行時にループの回転数が分からないためにＮＧとする。 Here, normalization is a process of converting the loop header into a form of “for (i = 0; i op (comparison operation) constant; i ++)”. In step S1404, it is determined whether or not conversion to the shape is possible before execution of the loop. If it cannot be converted into the above form, it will be determined as NG because the rotational speed of the loop is not known at the time of execution.

正規化可能でない場合（ステップＳ１４０４：Ｎｏ）、ステップＳ１４０１に戻る。一方、正規化可能である場合（ステップＳ１４０４：Ｙｅｓ）、入れ子構造であるか否かを判断する（ステップＳ１４０５）。この判断は、完全ネスト解析である。入れ子構造でない場合（ステップＳ１４０５：Ｎｏ）、ステップＳ１４０１に戻る。一方、入れ子構造である場合（ステップＳ１４０５：Ｙｅｓ）、ループ繰越依存があるか否かを判断する（ステップＳ１４０６）。ここで、あるイタレーション（ｉ＝ｉ０）での結果を他のイタレーション（ｉ＝ｉ１）で利用する場合など、異なるイタレーション間で同じデータの定義・参照を行う場合、ループ繰越依存があるという。 When normalization is not possible (step S1404: No), the process returns to step S1401. On the other hand, if normalization is possible (step S1404: Yes), it is determined whether the structure is a nested structure (step S1405). This determination is a complete nesting analysis. When it is not a nested structure (step S1405: No), it returns to step S1401. On the other hand, if it is a nested structure (step S1405: Yes), it is determined whether there is a loop carry-over dependency (step S1406). Here, when the same data is defined / referenced between different iterations, such as when the result of one iteration (i = i0) is used in another iteration (i = i1), there is a loop carry dependency. That's it.

イタレーションごとに異なるコア（演算要素）に処理を割り当てた場合、実行順序が保証されないため、逐次実行した場合と異なる結果になる可能性があるからである。したがって、ループ繰越依存がない場合（ステップＳ１４０６：Ｎｏ）、ステップＳ１４０１に戻る。一方、ループ繰越依存がある場合（ステップＳ１４０６：Ｙｅｓ）、タスク実行コスト算出部２０４により、コスト見積もりをおこなう（ステップＳ１４０７）。すなわち、対象ブロックの実行コストや転送コストを算出する。 This is because if the processing is assigned to different cores (calculation elements) for each iteration, the execution order is not guaranteed, so that there is a possibility that the result is different from the case of sequential execution. Therefore, when there is no loop carry-over dependency (step S1406: No), the process returns to step S1401. On the other hand, when there is a loop carry-over dependency (step S1406: Yes), the task execution cost calculation unit 204 performs cost estimation (step S1407). That is, the execution cost and transfer cost of the target block are calculated.

そして、割り当てに見合うコストか否かを判断する（ステップＳ１４０８）。具体的には、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」が「逐次実行した場合のコスト」よりも大きくなってしまった場合、処理を割り当てることはかえって性能劣化の原因となる。このため、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」＜「逐次実行した場合のコスト」となる場合、割り当てに見合うコストとする。たとえば、対象ブロックのループ回転数が非常に小さいと実行コストも小さくなるため、割り当てに見合わなくなる。 Then, it is determined whether the cost is commensurate with the allocation (step S1408). More specifically, if “loop execution cost + transfer cost + call overhead” becomes larger than “sequential execution cost”, assigning a process may cause performance degradation. Therefore, if “loop execution cost + transfer cost + call overhead” <“cost when sequentially executed”, the cost is commensurate with the allocation. For example, if the loop speed of the target block is very small, the execution cost will be small, and it will not be appropriate for allocation.

そして、割り当てに見合わない場合（ステップＳ１４０８：Ｎｏ）、ステップＳ１４０１に戻る。一方、割り当てに見合う場合（ステップＳ１４０８：Ｙｅｓ）、対象ブロックをプロセッサに割当可能なタスクに決定する（ステップＳ１４０９）。たとえば、対象ブロックに割当フラグを設定する。そして、ステップＳ１４０１に戻る。ステップＳ１４０１において、探索が終了した場合（ステップＳ１４０１：Ｙｅｓ）、この解析ルーチンを終了する。 If the allocation is not met (step S1408: NO), the process returns to step S1401. On the other hand, if it matches the allocation (step S1408: Yes), the target block is determined as a task that can be allocated to the processor (step S1409). For example, an allocation flag is set for the target block. Then, the process returns to step S1401. In step S1401, when the search is finished (step S1401: Yes), this analysis routine is finished.

図１５は、ｓｗｉｔｃｈ−ｃａｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。この解析ルーチンは、ｓｗｉｔｃｈ−ｃａｓｅブロックのタスクをプロセッサ／ＤＳＰに割り当てるための解析ルーチンである。 FIG. 15 is a flowchart showing a program pattern extraction processing procedure related to simultaneous processing of the switch-case block. This analysis routine is an analysis routine for assigning the task of the switch-case block to the processor / DSP.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ１５０１）。探索が終了していない場合（ステップＳ１５０１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ１５０２）、ｓｗｉｔｃｈ−ｃａｓｅブロックであるか否かを判断する（ステップＳ１５０３）。ｓｗｉｔｃｈ−ｃａｓｅブロックでない場合（ステップＳ１５０３：Ｎｏ）、ステップＳ１５０１に戻る。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S1501). If the search has not ended (step S1501: No), the searched target block is selected (step S1502), and it is determined whether the block is a switch-case block (step S1503). If it is not a switch-case block (step S1503: No), the process returns to step S1501.

一方、ｓｗｉｔｃｈ−ｃａｓｅブロックである場合（ステップＳ１５０３：Ｙｅｓ）、そのｓｗｉｔｃｈ−ｃａｓｅブロック内において、未選択のｃａｓｅブロックがあるか否かを判断する（ステップＳ１５０４）。未選択のｃａｓｅブロックがない場合（ステップＳ１５０４：Ｎｏ）、ステップＳ１５０１に戻る。一方、未選択のｃａｓｅブロックがある場合（ステップＳ１５０４：Ｙｅｓ）、未選択のｃａｓｅブロックを選択し（ステップＳ１５０５）、ｂｒｅａｋ文があるか否かを判断する（ステップＳ１５０６）。 On the other hand, if it is a switch-case block (step S1503: Yes), it is determined whether or not there is an unselected case block in the switch-case block (step S1504). When there is no unselected case block (step S1504: No), the process returns to step S1501. On the other hand, when there is an unselected case block (step S1504: Yes), an unselected case block is selected (step S1505), and it is determined whether there is a break statement (step S1506).

ｂｒｅａｋ文がない場合（ステップＳ１５０６：Ｎｏ）、ステップＳ１５０１に戻る。一方、ｂｒｅａｋ文がある場合（ステップＳ１５０６：Ｙｅｓ）、ステップＳ１５０４に戻る。ステップＳ１５０１において、探索が終了した場合（ステップＳ１５０１：Ｙｅｓ）、すべてのｃａｓｅブロックにおいてｂｒｅａｋ文があるか否かを判断する（ステップＳ１５０７）。ある場合、各ｃａｓｅブロックを、コア（対象演算要素。ここでは、プロセッサまたはＤＳＰ）に割り当て可能なタスクに決定する（ステップＳ１５０８）。たとえば、各ｃａｓｅブロックに割当フラグを設定する。そして、この解析ルーチンを終了する。一方、ステップＳ１５０７において、ない場合（ステップＳ１５０７：Ｙｅｓ）、この解析ルーチンを終了する。 When there is no break sentence (step S1506: No), the process returns to step S1501. On the other hand, when there is a break sentence (step S1506: Yes), the process returns to step S1504. If the search is completed in step S1501 (step S1501: Yes), it is determined whether or not there is a break statement in all the case blocks (step S1507). In some cases, each case block is determined as a task that can be assigned to a core (target operation element, here, a processor or a DSP) (step S1508). For example, an allocation flag is set for each case block. Then, this analysis routine ends. On the other hand, if it is not found in step S1507 (step S1507: Yes), this analysis routine is terminated.

図１６は、ｓｗｉｔｃｈ−ｃａｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。この解析ルーチンは、ｓｗｉｔｃｈ−ｃａｓｅブロックのタスクをリコンフィグ回路に割り当てるための解析ルーチンである。 FIG. 16 is a flowchart showing a program pattern extraction processing procedure related to simultaneous processing of the switch-case block. This analysis routine is an analysis routine for assigning the task of the switch-case block to the reconfiguration circuit.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ１６０１）。探索が終了していない場合（ステップＳ１６０１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ１６０２）、ｓｗｉｔｃｈ−ｃａｓｅブロックであるか否かを判断する（ステップＳ１６０３）。ｓｗｉｔｃｈ−ｃａｓｅブロックでない場合（ステップＳ１６０３：Ｎｏ）、ステップＳ１６０１に戻る。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S1601). If the search has not ended (step S1601: No), the searched target block is selected (step S1602), and it is determined whether the block is a switch-case block (step S1603). If it is not a switch-case block (step S1603: No), the process returns to step S1601.

一方、ｓｗｉｔｃｈ−ｃａｓｅブロックである場合（ステップＳ１６０３：Ｙｅｓ）、そのｓｗｉｔｃｈ−ｃａｓｅブロック内において、未選択のｃａｓｅブロックがあるか否かを判断する（ステップＳ１６０４）。未選択のｃａｓｅブロックがない場合（ステップＳ１６０４：Ｎｏ）、ステップＳ１６０１に戻る。一方、未選択のｃａｓｅブロックがある場合（ステップＳ１６０４：Ｙｅｓ）、 On the other hand, if it is a switch-case block (step S1603: Yes), it is determined whether or not there is an unselected case block in the switch-case block (step S1604). If there is no unselected case block (step S1604: No), the process returns to step S1601. On the other hand, when there is an unselected case block (step S1604: Yes),

未選択のｃａｓｅブロックを選択し（ステップＳ１６０５）、回路構成情報が生成できるかを判断する（ステップＳ１６０６）。生成できない場合（ステップＳ１６０６：Ｎｏ）、ステップＳ１６０４に戻る。一方、生成できる場合には（ステップＳ１６０６：Ｙｅｓ）、選択ｃａｓｅブロックの回路構成情報を生成して（ステップＳ１６０７）、ステップＳ１６０４に戻る。一方、ステップＳ１６０４において、未選択のｃａｓｅブロックがない場合（ステップＳ１６０４：Ｎｏ）、ステップＳ１６０１に戻る。 An unselected case block is selected (step S1605), and it is determined whether circuit configuration information can be generated (step S1606). If it cannot be generated (step S1606: NO), the process returns to step S1604. On the other hand, if it can be generated (step S1606: Yes), circuit configuration information of the selected case block is generated (step S1607), and the process returns to step S1604. On the other hand, if there is no unselected case block in step S1604 (step S1604: No), the process returns to step S1601.

ここで、回路構成情報が生成できるかの判断においては、解析される項目として例を挙げると、ｃａｓｅブロックにｂｒｅａｋ文が含まれているか、浮動小数点を使用しているか否か、ポインタ変数・構造体を使用しているか否か、除算・剰余における除数が定数であるか否かなどがある。少なくともいずれか一つが合致していれば、回路構成情報を生成可能としてもよく、すべてが合致している場合に、回路構成情報を生成可能としてもよい。 Here, in determining whether circuit configuration information can be generated, as an example of an item to be analyzed, whether a break statement is included in the case block, whether a floating point is used, whether a pointer variable / structure Whether the field is used, whether the divisor in the division / remainder is a constant, etc. If at least one of them matches, the circuit configuration information may be generated, and if all match, the circuit configuration information may be generated.

回路構成情報は、典型的な演算器ベースの粗粒度構成要素は４ビット〜３２ビットの算術演算、論理演算を行うＡＬＵ（ＡｒｉｔｈｍｅｔｉｃａｎｄＬｏｇｉｃＵｎｉｔ）、ビットシフトとマスク用回路およびレジスタファイルから構成され、これらの間のデータの流れを切り替えるデータセレクタによって接続することで、生成される。回路構成情報の生成自体は周知（参考資料："リコンフィギャラブルシステム"末吉敏則・天野英晴［編著］,ｐ１４４５．２．１粗粒度構成要素の利用）である。 The circuit configuration information consists of typical arithmetic unit-based coarse-grained components consisting of 4-bit to 32-bit arithmetic operations, ALU (Arithmetic and Logic Unit) that performs logical operations, bit shift and mask circuits, and register files. Are generated by connecting by a data selector that switches the data flow between them. The generation of the circuit configuration information itself is well known (reference material: “Reconfigurable System” Toshinori Sueyoshi, Hideharu Amano [edited], p144 5.2.1 Use of coarse-grained components).

図１７は、対象ブロックとなったｓｗｉｔｃｈ−ｃａｓｅブロックとその回路構成情報を示す説明図である。このｓｗｉｔｃｈ−ｃａｓｅブロック１７００のｂｒｅａｋ文があるｃａｓｅブロックごとに部分的な回路構成情報１７１１〜１７１３が生成され、最終的に統合されて、回路構成情報１７１０を得る。リコンフィグ回路には、この回路構成情報１７１０であらわされる回路がマッピングされることとなる。 FIG. 17 is an explanatory diagram showing a switch-case block that is a target block and its circuit configuration information. Partial circuit configuration information 1711 to 1713 is generated for each case block having a break statement of the switch-case block 1700 and finally integrated to obtain circuit configuration information 1710. The circuit represented by this circuit configuration information 1710 is mapped to the reconfiguration circuit.

図１６において、ステップ１６０１で探索が終了した場合（ステップＳ１６０１：Ｙｅｓ）、すべてのｃａｓｅブロックにおいて回路構成情報が生成できたか否かを判断する（ステップＳ１６０８）。できた場合（ステップＳ１６０８：Ｙｅｓ）、各ｃａｓｅブロックを割り当て可能なタスクに決定する（ステップＳ１６０９）。たとえば、すべてのｃａｓｅブロックに割当フラグを設定する。そして、この解析ルーチンを終了する。一方、できなかった場合（ステップＳ１６０８：Ｎｏ）、この解析ルーチンを終了する。 In FIG. 16, when the search is completed in step 1601 (step S1601: Yes), it is determined whether or not circuit configuration information has been generated in all the case blocks (step S1608). If it has been completed (step S1608: Yes), each case block is determined as an assignable task (step S1609). For example, an allocation flag is set for all case blocks. Then, this analysis routine ends. On the other hand, if it could not be done (step S1608: No), this analysis routine is terminated.

図１８は、ｉｆブロック／ｅｌｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。この解析ルーチンは、ｉｆブロック／ｅｌｓｅブロックのタスクをプロセッサ／ＤＳＰに割り当てるための解析ルーチンである。 FIG. 18 is a flowchart showing a program pattern extraction processing procedure regarding the simultaneous processing of the if block / else block. This analysis routine is an analysis routine for assigning the task of the if block / else block to the processor / DSP.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ１８０１）。探索が終了していない場合（ステップＳ１８０１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ１８０２）、ｉｆブロック／ｅｌｓｅブロックであるか否かを判断する（ステップＳ１８０３）。ｉｆブロック／ｅｌｓｅブロックでない場合（ステップＳ１８０３：Ｎｏ）、ステップＳ１８０１に戻る。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S1801). If the search is not completed (step S1801: No), the searched target block is selected (step S1802), and it is determined whether the block is an if block / else block (step S1803). If it is not an if block / else block (step S1803: NO), the process returns to step S1801.

一方、ｉｆブロック／ｅｌｓｅブロックである場合（ステップＳ１８０３：Ｙｅｓ）、そのｉｆブロック／ｅｌｓｅブロック内において、未選択のｉｆブロック／ｅｌｓｅブロックがあるか否かを判断する（ステップＳ１８０４）。未選択のｉｆブロック／ｅｌｓｅブロックがない場合（ステップＳ１８０４：Ｎｏ）、ステップＳ１８０１に戻る。 On the other hand, if it is an if block / else block (step S1803: Yes), it is determined whether there is an unselected if block / else block in the if block / else block (step S1804). If there is no unselected if block / else block (step S1804: No), the process returns to step S1801.

一方、未選択のｉｆブロック／ｅｌｓｅブロックがある場合（ステップＳ１８０４：Ｙｅｓ）、未選択のｉｆブロック／ｅｌｓｅブロックを選択し（ステップＳ１８０５）、対象ブロックを、対象演算要素（ここでは、プロセッサまたはＤＳＰ）に割り当て可能なタスクに決定する（ステップＳ１８０６）。たとえば、対象ブロックに割当フラグを設定する。そして、ステップＳ１８０４に戻る。また、ステップＳ１８０１において、探索が終了した場合（ステップＳ１８０１：Ｙｅｓ）、この解析ルーチンを終了する。 On the other hand, if there is an unselected if block / else block (step S1804: Yes), an unselected if block / else block is selected (step S1805), and the target block is selected as a target calculation element (here, a processor or DSP). ) Is determined as a task that can be assigned (step S1806). For example, an allocation flag is set for the target block. Then, the process returns to step S1804. In step S1801, when the search is completed (step S1801: Yes), the analysis routine is terminated.

図１９は、ｉｆブロック／ｅｌｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。この解析ルーチンは、ｉｆブロック／ｅｌｓｅブロックのタスクをリコンフィグ回路に割り当てるための解析ルーチンである。 FIG. 19 is a flowchart showing a program pattern extraction processing procedure regarding the simultaneous processing of the if block / else block. This analysis routine is an analysis routine for assigning the task of the if block / else block to the reconfiguration circuit.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ１９０１）。探索が終了していない場合（ステップＳ１９０１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ１９０２）、ｉｆブロック／ｅｌｓｅブロックであるか否かを判断する（ステップＳ１９０３）。ｉｆブロック／ｅｌｓｅブロックでない場合（ステップＳ１９０３：Ｎｏ）、ステップＳ１９０１に戻る。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S1901). If the search has not ended (step S1901: No), the searched target block is selected (step S1902), and it is determined whether the block is an if block / else block (step S1903). If it is not an if block / else block (step S1903: NO), the process returns to step S1901.

一方、ｉｆブロック／ｅｌｓｅブロックである場合（ステップＳ１９０３：Ｙｅｓ）、そのｉｆブロック／ｅｌｓｅブロック内において、未選択のｉｆブロック／ｅｌｓｅブロックがあるか否かを判断する（ステップＳ１９０４）。未選択のｉｆブロック／ｅｌｓｅブロックがない場合（ステップＳ１９０４：Ｎｏ）、ステップＳ１９０１に戻る。一方、未選択のｉｆブロック／ｅｌｓｅブロックがある場合（ステップＳ１９０４：Ｙｅｓ）、未選択のｉｆブロック／ｅｌｓｅブロックを選択する（ステップＳ１９０５）。 On the other hand, if it is an if block / else block (step S1903: Yes), it is determined whether there is an unselected if block / else block in the if block / else block (step S1904). If there is no unselected if block / else block (step S1904: No), the process returns to step S1901. On the other hand, if there is an unselected if block / else block (step S1904: Yes), an unselected if block / else block is selected (step S1905).

そして、回路構成情報が生成可能かを判断する（ステップＳ１９０６）。生成できない場合（ステップＳ１９０６：Ｎｏ）、ステップＳ１９０４に戻る。一方、生成できる場合には（ステップＳ１９０６：Ｙｅｓ）、選択ｉｆブロック／ｅｌｓｅブロックの回路構成情報を生成して（ステップＳ１９０７）、ステップＳ１９０４に戻る。一方、ステップＳ１９０４において、未選択のｉｆブロック／ｅｌｓｅブロックがない場合（ステップＳ１９０４：Ｎｏ）、ステップＳ１９０１に戻る。 Then, it is determined whether circuit configuration information can be generated (step S1906). If it cannot be generated (step S1906: NO), the process returns to step S1904. On the other hand, if it can be generated (step S1906: Yes), circuit configuration information of the selected if block / else block is generated (step S1907), and the process returns to step S1904. On the other hand, if there is no unselected if block / else block in step S1904 (step S1904: No), the process returns to step S1901.

ここで、回路構成情報が生成できるかの判断においては、解析される項目として例を挙げると、浮動小数点を使用しているか否か、ポインタ変数・構造体を使用しているか否か、除算・剰余における除数が定数であるか否かなどがある。少なくともいずれか一つが合致していれば、回路構成情報を生成可能としてもよく、すべてが合致している場合に、回路構成情報を生成可能としてもよい。 Here, in determining whether circuit configuration information can be generated, as an example of an item to be analyzed, whether a floating point is used, whether a pointer variable / structure is used, division / Whether the divisor in the remainder is a constant or not. If at least one of them matches, the circuit configuration information may be generated, and if all match, the circuit configuration information may be generated.

また、ステップＳ１９０１において、探索が終了した場合（ステップＳ１９０１：Ｙｅｓ）、すべてのｉｆブロック／ｅｌｓｅブロックにおいて回路構成情報が生成できたか否かを判断する（ステップＳ１９０８）。できた場合（ステップＳ１９０８：Ｙｅｓ）、各ｉｆブロック／ｅｌｓｅブロックを割り当て可能なタスクに決定する（ステップＳ１９０９）。たとえば、すべてのｃａｓｅブロックに割当フラグを設定する。そして、この解析ルーチンを終了する。一方、できなかった場合（ステップＳ１９０８：Ｎｏ）、この解析ルーチンを終了する。 If the search is completed in step S1901 (step S1901: Yes), it is determined whether circuit configuration information has been generated in all if blocks / else blocks (step S1908). If completed (step S1908: Yes), each if block / else block is determined as an assignable task (step S1909). For example, an allocation flag is set for all case blocks. Then, this analysis routine ends. On the other hand, if it could not be done (step S1908: No), this analysis routine is terminated.

図２０は、ＳＩＭＤ演算処理に関するプログラムパターン抽出処理手順を示すフローチャートである。この解析ルーチンは、ｆｏｒブロックのタスクをＤＳＰに割り当てるための解析ルーチンである。 FIG. 20 is a flowchart showing a program pattern extraction processing procedure related to SIMD arithmetic processing. This analysis routine is an analysis routine for assigning the task of the for block to the DSP.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ２００１）。探索が終了していない場合（ステップＳ２００１：Ｎｏ）、探索された対象ブロックを選択し（ステップＳ２００２）、ｆｏｒブロックであるか否かを判断する（ステップＳ２００３）。ｆｏｒブロックでない場合（ステップＳ２００３：Ｎｏ）、ステップＳ２００１に戻る。一方、ｆｏｒブロックである場合（ステップＳ２００３：Ｙｅｓ）、ＳＩＭＤ演算適用箇所があるか否かを判断する（ステップＳ２００４）。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S2001). If the search is not completed (step S2001: No), the searched target block is selected (step S2002), and it is determined whether it is a for block (step S2003). If it is not a for block (step S2003: No), the process returns to step S2001. On the other hand, if it is a for block (step S2003: Yes), it is determined whether there is a SIMD calculation application location (step S2004).

具体的には、配列要素のような同じデータ型にアクセスを繰り返し行うループ処理は、同じ型のデータ演算を一度に実行するものであるため、ＳＩＭＤ演算適用箇所となる。一方、データ依存がある場合は、そのようにできないため、対象箇所をｎａｔｉｖｅｃｏｍｐｉｌｅｒが解析し適用箇所となるか否かを判断することとなる。 Specifically, a loop process that repeatedly accesses the same data type such as an array element executes the same type of data operation at a time, and is therefore a SIMD operation application location. On the other hand, when there is data dependence, it is not possible to do so, so the native location is analyzed by the native compiler and it is determined whether or not it becomes an applicable location.

ＳＩＭＤ演算適用箇所がない場合（ステップＳ２００４：Ｎｏ）、ステップＳ２００１に戻る。一方、ＳＩＭＤ演算適用箇所がある場合（ステップＳ２００４：Ｙｅｓ）、ｎａｔｉｖｅｃｏｍｐｉｌｅｒによる最適化を実行する（ステップＳ２００５）。 When there is no SIMD calculation application part (step S2004: No), the process returns to step S2001. On the other hand, if there is a SIMD calculation application location (step S2004: Yes), optimization by the native compiler is executed (step S2005).

各コアには、ｎａｔｉｖｅｃｏｍｐｉｌｅｒと呼ばれる実行バイナリを作るためのコンパイラが用意されており、ｎａｔｉｖｅｃｏｍｐｉｌｅｒにおいて様々な最適化を実行することができる。たとえば、
ｆｏｒ（ｉ＝０；ｉ＜４；ｉ＋＋）
ａ［ｉ］＝ｂ［ｉ］＋ｃ［ｊ］；
を、
ａ［ｉ］＝ＳＩＭＤ＿ａｄｄ（ｂ［ｉ］，ｃ［ｉ］）；
のように、複数のデータ演算を一度に実行できるように変換する。 Each core is provided with a compiler for creating an execution binary called a native compiler, and various optimizations can be executed in the native compiler. For example,
for (i = 0; i <4; i ++)
a [i] = b [i] + c [j];
The
a [i] = SIMD_add (b [i], c [i]);
In this way, conversion is performed so that a plurality of data operations can be executed at once.

そして、最適化後、タスク実行コスト算出部２０４により、コスト見積もりをおこなう（ステップＳ２００６）。すなわち、最適化後の対象ブロックの実行コストや転送コストを算出する。 Then, after optimization, the task execution cost calculation unit 204 performs cost estimation (step S2006). That is, the execution cost and transfer cost of the target block after optimization are calculated.

そして、割り当てに見合うコストか否かを判断する（ステップＳ２００７）。具体的には、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」が「逐次実行した場合のコスト」よりも大きくなってしまった場合、処理を割り当てることはかえって性能劣化の原因となる。このため、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」＜「逐次実行した場合のコスト」となる場合、割り当てに見合うコストとする。たとえば、対象ブロックのループ回転数が非常に小さいと実行コストも小さくなるため、割り当てに見合わなくなる。 Then, it is determined whether or not the cost is commensurate with the allocation (step S2007). More specifically, if “loop execution cost + transfer cost + call overhead” becomes larger than “sequential execution cost”, assigning a process may cause performance degradation. Therefore, if “loop execution cost + transfer cost + call overhead” <“cost when sequentially executed”, the cost is commensurate with the allocation. For example, if the loop speed of the target block is very small, the execution cost will be small, and it will not be appropriate for allocation.

そして、割り当てに見合わない場合（ステップＳ２００７：Ｎｏ）、ステップＳ２００１に戻る。一方、割り当てに見合う場合（ステップＳ２００７：Ｙｅｓ）、最適化後の対象ブロックをＤＳＰに割当可能でかつＳＩＭＤ演算可能なタスクに決定する（ステップＳ２００８）。たとえば、対象ブロックに割当フラグを設定する。そして、ステップＳ２００１に戻る。ステップＳ２００１において、探索が終了した場合（ステップＳ２００１：Ｙｅｓ）、この解析ルーチンを終了する。 And when it does not match with allocation (step S2007: No), it returns to step S2001. On the other hand, if it is suitable for allocation (step S2007: Yes), the target block after optimization is determined as a task that can be allocated to the DSP and can be subjected to SIMD calculation (step S2008). For example, an allocation flag is set for the target block. Then, the process returns to step S2001. In step S2001, when the search ends (step S2001: Yes), the analysis routine ends.

図２１は、ＶＬＩＷ命令実行処理に関するプログラムパターン抽出処理手順を示すフローチャートである。この解析ルーチンは、ｆｏｒブロックのタスクをプロセッサに割り当てるための解析ルーチンである。 FIG. 21 is a flowchart showing a program pattern extraction processing procedure regarding the VLIW instruction execution processing. This analysis routine is an analysis routine for assigning a for block task to a processor.

まず、分割された全ブロック（タスク）の探索が終了したか否かを判断する（ステップＳ２１０１）。探索が終了していない場合（ステップＳ２１０１：Ｎｏ）、探索された未選択の対象ブロックを選択し（ステップＳ２１０２）、ｆｏｒブロックであるか否かを判断する（ステップＳ２１０３）。ｆｏｒブロックでない場合（ステップＳ２１０３：Ｎｏ）、ステップＳ２１０１に戻る。一方、ｆｏｒブロックである場合（ステップＳ２１０３：Ｙｅｓ）、ｎａｔｉｖｅｃｏｍｐｉｌｅｒによる最適化を実行する（ステップＳ２１０４）。この最適化によりｆｏｒブロックが、複数の演算を一度に実行できるブロックに変換される。 First, it is determined whether or not the search for all divided blocks (tasks) has been completed (step S2101). If the search is not completed (step S2101: No), the searched unselected target block is selected (step S2102), and it is determined whether the block is a for block (step S2103). If it is not a for block (step S2103: No), the process returns to step S2101. On the other hand, when it is a for block (step S2103: Yes), optimization by native compiler is performed (step S2104). This optimization converts the for block into a block that can execute a plurality of operations at once.

そして、最適化後、タスク実行コスト算出部２０４により、コスト見積もりをおこなう（ステップＳ２１０５）。すなわち、最適化後の対象ブロックの実行コストや転送コストを算出する。 Then, after optimization, the task execution cost calculation unit 204 performs cost estimation (step S2105). That is, the execution cost and transfer cost of the target block after optimization are calculated.

そして、割り当てに見合うコストか否かを判断する（ステップＳ２１０６）。具体的には、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」が「逐次実行した場合のコスト」よりも大きくなってしまった場合、処理を割り当てることはかえって性能劣化の原因となる。このため、「ループの実行コスト＋転送コスト＋呼び出しオーバーヘッド」＜「逐次実行した場合のコスト」となる場合、割り当てに見合うコストとする。たとえば、対象ブロックのループ回転数が非常に小さいと実行コストも小さくなるため、割り当てに見合わなくなる。 Then, it is determined whether the cost is commensurate with the allocation (step S2106). More specifically, if “loop execution cost + transfer cost + call overhead” becomes larger than “sequential execution cost”, assigning a process may cause performance degradation. Therefore, if “loop execution cost + transfer cost + call overhead” <“cost when sequentially executed”, the cost is commensurate with the allocation. For example, if the loop speed of the target block is very small, the execution cost will be small, and it will not be appropriate for allocation.

そして、割り当てに見合わない場合（ステップＳ２１０６：Ｎｏ）、ステップＳ２１０１に戻る。一方、割り当てに見合う場合（ステップＳ２１０６：Ｙｅｓ）、最適化後の対象ブロックをプロセッサに割当可能でかつＶＬＩＷ命令実行可能なタスクに決定する（ステップＳ２１０７）。たとえば、対象ブロックに割当フラグを設定する。そして、ステップＳ２１０１に戻る。ステップＳ２１０１において、探索が終了した場合（ステップＳ２１０１：Ｙｅｓ）、この解析ルーチンを終了する。このように、様々な解析ルーチンを用意しておくことで、どのような設定情報が取得されても、最適化をおこなうことができる。 If the allocation is not met (step S2106: NO), the process returns to step S2101. On the other hand, if it matches the assignment (step S2106: Yes), the optimized target block is determined as a task that can be assigned to the processor and can execute the VLIW instruction (step S2107). For example, an allocation flag is set for the target block. Then, the process returns to step S2101. In step S2101, when the search is finished (step S2101: Yes), this analysis routine is finished. In this way, by preparing various analysis routines, optimization can be performed no matter what setting information is acquired.

図２２は、図１０に示したスケジューリング処理の詳細な処理手順を示すフローチャートである。まず、生成部２０５により、組み合わせパターンを生成する（ステップＳ２２０１）。つぎに、未選択の組み合わせパターンがあるか否かを判断する（ステップＳ２２０２）。 FIG. 22 is a flowchart showing a detailed processing procedure of the scheduling process shown in FIG. First, a combination pattern is generated by the generation unit 205 (step S2201). Next, it is determined whether or not there is an unselected combination pattern (step S2202).

未選択の組み合わせパターンがある場合（ステップＳ２２０２：Ｙｅｓ）、未選択の組み合わせパターンを１つ選択し（ステップＳ２２０３）、総実行コスト算出部２０６により、選択組み合わせパターンの総実行コストを算出する（ステップＳ２２０４）。そして、ステップＳ２２０２に戻る。一方、未選択の組み合わせパターンがない場合（ステップＳ２２０２：Ｎｏ）、決定部２０７により、最適な組み合わせパターンを決定する（ステップＳ２２０５）。そして、ステップＳ１００６に移行する。 If there is an unselected combination pattern (step S2202: Yes), one unselected combination pattern is selected (step S2203), and the total execution cost calculation unit 206 calculates the total execution cost of the selected combination pattern (step S2202). S2204). Then, the process returns to step S2202. On the other hand, when there is no unselected combination pattern (step S2202: No), the determination unit 207 determines an optimum combination pattern (step S2205). Then, control goes to a step S1006.

つぎに、実施例１について説明する。実施例１では、図３に示した対象プログラムコード３００を組み込みシステムにマッピングする。実施例１は、プロセッサを２個、リコンフィグ回路を１個使用し、プロセッサはｆｏｒブロックのループ並列性を最適化手法とし、リコンフィグ回路は、パイプライン処理を最適化手法として選択した場合の適用例である。 Next, Example 1 will be described. In the first embodiment, the target program code 300 shown in FIG. 3 is mapped to the embedded system. In the first embodiment, two processors and one reconfiguration circuit are used. The processor uses the loop parallelism of the for block as the optimization method, and the reconfiguration circuit selects the pipeline processing as the optimization method. This is an application example.

図２３は、実施例１における設定情報を示す説明図である。設定情報はこのようにテーブル化されて記憶装置に記憶される。図２４は、実施例１におけるプログラムパターン抽出結果を示す割当テーブルを示す説明図である。割当テーブルは、図９に示した割当情報の一部を構成する。この割当テーブルでは、タスクを実行可能な演算要素について割当フラグが立てられる。図２４の場合は、タスクＴ１〜Ｔ３がいずれもプロセッサおよびリコンフィグ回路で実行されることを示している。すなわち、タスクＴ１〜Ｔ３は、図１３に示した解析ルーチンによりプロセッサに割当可能なタスクに決定されたブロックである。同様に、タスクＴ１〜Ｔ３は、図１４に示した解析ルーチンによりプロセッサに割当可能なタスクに決定されたブロックでもある。 FIG. 23 is an explanatory diagram of setting information in the first embodiment. The setting information is tabulated in this way and stored in the storage device. FIG. 24 is an explanatory diagram of an allocation table showing the program pattern extraction result in the first embodiment. The allocation table constitutes a part of the allocation information shown in FIG. In this allocation table, an allocation flag is set for a computation element that can execute a task. In the case of FIG. 24, all of the tasks T1 to T3 are executed by the processor and the reconfiguration circuit. That is, the tasks T1 to T3 are blocks determined as tasks that can be assigned to the processor by the analysis routine shown in FIG. Similarly, the tasks T1 to T3 are blocks determined as tasks that can be assigned to the processor by the analysis routine shown in FIG.

図２５は、実施例１における最適化するための組み合わせパターンを示す説明図である。図２５中、「Ｔａｓｋ♯」とは、タスクＴ♯を示す。実施例１では、（１）〜（１２）の１２個の組み合わせパターンとして生成部２０５により生成される。また、図２５中、黒抜きのタスクはプロセッサに割り当てられたタスクを示しており、網掛けのタスクは、リコンフィグ回路に割り当てられたタスクを示している。 FIG. 25 is an explanatory diagram illustrating combination patterns for optimization in the first embodiment. In FIG. 25, “Task #” indicates a task T #. In the first embodiment, the generation unit 205 generates 12 combination patterns (1) to (12). In FIG. 25, the black tasks indicate tasks assigned to the processor, and the shaded tasks indicate tasks assigned to the reconfiguration circuit.

また、各タスクＴ１〜Ｔ３は纏めて実行することができる。それらのタスクをタスクＴ４〜Ｔ７とする。タスクＴ４は、タスクＴ２とタスクＴ３を１つのプロセッサで実行するタスクである。タスクＴ５は、タスクＴ１とタスクＴ３を１つのプロセッサで実行するタスクである。タスクＴ６は、タスクＴ１とタスクＴ２を１つのプロセッサで実行するタスクである。タスクＴ７は、タスクＴ１〜Ｔ３を１つのプロセッサで実行するタスクである。図２６〜図２９は、タスクを纏めた例を示す説明図である。 Moreover, each task T1-T3 can be performed collectively. These tasks are referred to as tasks T4 to T7. The task T4 is a task that executes the task T2 and the task T3 with one processor. The task T5 is a task that executes the task T1 and the task T3 with one processor. The task T6 is a task for executing the task T1 and the task T2 by one processor. The task T7 is a task for executing the tasks T1 to T3 by one processor. 26 to 29 are explanatory diagrams illustrating examples in which tasks are collected.

図３０は、実施例１における実行コストおよび転送コストの算出結果テーブルを示す説明図である。算出結果テーブルは、図２４に示した割当テーブルとともに、図９に示した割当情報の一部を構成する。タスクＴ１〜Ｔ３の実行コストは、上述したように、図１３〜図２１の解析ルーチンで算出される。なお、タスクＴ４〜Ｔ７は、あらたに纏められたタスクであるが、これらの実行コストはタスクＴ１〜Ｔ３の実行コストの組み合わせとなる。 FIG. 30 is an explanatory diagram of a calculation result table for execution costs and transfer costs in the first embodiment. The calculation result table constitutes a part of the allocation information shown in FIG. 9 together with the allocation table shown in FIG. As described above, the execution costs of the tasks T1 to T3 are calculated by the analysis routines of FIGS. Note that the tasks T4 to T7 are newly collected tasks, but these execution costs are combinations of the execution costs of the tasks T1 to T3.

図３１は、図２５に示した組み合わせパターン（１）のスケジューリング結果を示す説明図である。図３１では、実行主体がリコンフィグ回路のみであるため、リコンフィグ回路がタスクＴ１〜Ｔ３の順に実行する。この場合、リコンフィグ回路へのマッピングが生じるため、タスクごとに転送コストが加算される。転送コストは、送信および受信があるため２倍となる。したがって、タスクＴ１の総実行コストは、７０（＝４０＋１５＋１５）となる。同様に、タスクＴ２の総実行コストは、２０（＝１０＋５＋５）、タスクＴ３の総実行コストは、２０（＝１０＋５＋５）となる。よって、組み合わせパターン（１）の総実行コストは、１１０（＝７０＋２０＋２０）となる。 FIG. 31 is an explanatory diagram showing a scheduling result of the combination pattern (1) shown in FIG. In FIG. 31, since the execution subject is only the reconfiguration circuit, the reconfiguration circuit executes tasks T1 to T3 in this order. In this case, since the mapping to the reconfiguration circuit occurs, the transfer cost is added for each task. The transfer cost is doubled due to transmission and reception. Therefore, the total execution cost of the task T1 is 70 (= 40 + 15 + 15). Similarly, the total execution cost of task T2 is 20 (= 10 + 5 + 5), and the total execution cost of task T3 is 20 (= 10 + 5 + 5). Therefore, the total execution cost of the combination pattern (1) is 110 (= 70 + 20 + 20).

図３２は、図２５に示した組み合わせパターン（２）のスケジューリング結果を示す説明図である。図３２では、タスクＴ１，Ｔ２をリコンフィグ回路が実行し、タスクＴ３をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０とタスクＴ２の総実行コスト：２０との和：９０である。一方、タスクＴ３は、２つのプロセッサのうちメインのプロセッサに実行させればよいため、転送コストが発生しない。したがって、タスクＴ３の総実行コストは５０である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（２）の総実行コストは、９０（＝７０＋２０）となる。 FIG. 32 is an explanatory diagram showing a scheduling result of the combination pattern (2) shown in FIG. In FIG. 32, the reconfiguration circuit executes tasks T1 and T2, and the processor executes task T3. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfigurable circuit is 90: the sum of the total execution cost of the task T1: 70 and the total execution cost of the task T2: 20. On the other hand, since the task T3 may be executed by the main processor of the two processors, there is no transfer cost. Therefore, the total execution cost of task T3 is 50. Since the reconfiguration circuit and the processor are executed in parallel, select the one with the highest total execution cost. Therefore, the total execution cost of the combination pattern (2) is 90 (= 70 + 20).

図３３は、図２５に示した組み合わせパターン（３）のスケジューリング結果を示す説明図である。図３３では、タスクＴ１，Ｔ３をリコンフィグ回路が実行し、タスクＴ２をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０とタスクＴ３の総実行コスト：２０との和：９０である。一方、タスクＴ２は、２つのプロセッサのうちメインのプロセッサに実行させればよいため、転送コストが発生しない。したがって、タスクＴ２の総実行コストは５０である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（３）の総実行コストは、９０（＝７０＋２０）となる。 FIG. 33 is an explanatory diagram showing a scheduling result of the combination pattern (3) shown in FIG. In FIG. 33, the reconfiguration circuit executes tasks T1 and T3, and the processor executes task T2. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is 90: the sum of the total execution cost of task T1: 70 and the total execution cost of task T3: 20. On the other hand, since the task T2 may be executed by the main processor of the two processors, there is no transfer cost. Therefore, the total execution cost of task T2 is 50. Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (3) is 90 (= 70 + 20).

図３４は、図２５に示した組み合わせパターン（４）のスケジューリング結果を示す説明図である。図３４では、タスクＴ１をリコンフィグ回路が実行し、タスクＴ２，Ｔ３をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０である。一方、タスクＴ２，Ｔ３は、２つのプロセッサがそれぞれ実行することとなるが、メインのプロセッサに対しては、データ転送が発生しない。したがって、タスクＴ３を実行するプロセッサをメインとすると、タスクＴ２の総実行コストは、６０（＝５０＋５＋５）、タスクＴ３の総実行コストは５０である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（４）の総実行コストは、７０（＝４０＋１５＋１５）となる。 FIG. 34 is an explanatory diagram showing a scheduling result of the combination pattern (4) shown in FIG. In FIG. 34, the reconfiguration circuit executes task T1, and the processor executes tasks T2 and T3. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T1: 70. On the other hand, the tasks T2 and T3 are executed by two processors, respectively, but no data transfer occurs to the main processor. Therefore, if the processor that executes task T3 is the main, the total execution cost of task T2 is 60 (= 50 + 5 + 5), and the total execution cost of task T3 is 50. Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (4) is 70 (= 40 + 15 + 15).

図３５は、図２５に示した組み合わせパターン（５）のスケジューリング結果を示す説明図である。図３５では、タスクＴ１をプロセッサが実行し、タスクＴ２，Ｔ３をリコンフィグ回路が実行する。このように、異なる演算要素を用いる場合は、並列実行できる。プロセッサでは、タスクＴ１の総実行コストは、２００である。一方、リコンフィグ回路では、タスクＴ２の総実行コストが２０（＝１０＋５＋５）で、タスクＴ３の総実行コストは２０（＝１０＋５＋５）である。リコンフィグ回路の総実行コストは、タスクＴ２の総実行コスト：２０とタスクＴ３の総実行コスト：２０との和：４０である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（５）の総実行コストは、２００となる。 FIG. 35 is an explanatory diagram showing a scheduling result of the combination pattern (5) shown in FIG. In FIG. 35, the processor executes task T1, and the reconfiguration circuit executes tasks T2 and T3. In this way, when different calculation elements are used, they can be executed in parallel. In the processor, the total execution cost of the task T1 is 200. On the other hand, in the reconfiguration circuit, the total execution cost of task T2 is 20 (= 10 + 5 + 5), and the total execution cost of task T3 is 20 (= 10 + 5 + 5). The total execution cost of the reconfiguration circuit is 40: the sum of the total execution cost of task T2: 20 and the total execution cost of task T3: 20. Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (5) is 200.

図３６は、図２５に示した組み合わせパターン（６）のスケジューリング結果を示す説明図である。図３６では、タスクＴ２をリコンフィグ回路が実行し、タスクＴ１，Ｔ３をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ２の総実行コスト：２０である。一方、タスクＴ１，Ｔ３は、２つのプロセッサがそれぞれ実行することとなるが、メインのプロセッサに対しては、データ転送が発生しない。したがって、タスクＴ１を実行するプロセッサをメインとすると、タスクＴ１の総実行コストは２００、タスクＴ３の総実行コストは６０（＝５０＋５＋５）である。転送コストが高い方のプロセッサをメインとすることにより、総実行コストを抑制することができる。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（６）の総実行コストは、２００となる。 FIG. 36 is an explanatory diagram showing a scheduling result of the combination pattern (6) shown in FIG. In FIG. 36, the reconfiguration circuit executes task T2, and the processor executes tasks T1 and T3. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T2: 20. On the other hand, the tasks T1 and T3 are executed by two processors, respectively, but no data transfer occurs to the main processor. Therefore, assuming that the processor that executes the task T1 is the main, the total execution cost of the task T1 is 200, and the total execution cost of the task T3 is 60 (= 50 + 5 + 5). By using the processor with the higher transfer cost as the main, the total execution cost can be suppressed. Since the reconfiguration circuit and the processor are executed in parallel, select the one with the highest total execution cost. Therefore, the total execution cost of the combination pattern (6) is 200.

図３７は、図２５に示した組み合わせパターン（７）のスケジューリング結果を示す説明図である。図３７では、タスクＴ３をリコンフィグ回路が実行し、タスクＴ１，Ｔ２をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ３の総実行コスト：２０である。一方、タスクＴ１，Ｔ２は、２つのプロセッサがそれぞれ実行することとなるが、メインのプロセッサに対しては、データ転送が発生しない。したがって、タスクＴ１を実行するプロセッサをメインとすると、タスクＴ１の総実行コストは２００、タスクＴ２の総実行コストは６０（＝５０＋５＋５）である。転送コストが高い方のプロセッサをメインとすることにより、総実行コストを抑制することができる。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（７）の総実行コストは、２００となる。 FIG. 37 is an explanatory diagram showing a scheduling result of the combination pattern (7) shown in FIG. In FIG. 37, the reconfiguration circuit executes task T3, and the processor executes tasks T1 and T2. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T3: 20. On the other hand, tasks T1 and T2 are executed by two processors, respectively, but no data transfer occurs to the main processor. Therefore, assuming that the processor that executes the task T1 is the main, the total execution cost of the task T1 is 200, and the total execution cost of the task T2 is 60 (= 50 + 5 + 5). By using the processor with the higher transfer cost as the main, the total execution cost can be suppressed. Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (7) is 200.

図３８は、図２５に示した組み合わせパターン（８）のスケジューリング結果を示す説明図である。図３８では、いずれのタスクＴ１〜Ｔ３もプロセッサが実行する。ここでは、プロセッサの個数は２個であるため、タスクＴ１をメインとなる一方のプロセッサ、タスクＴ２，Ｔ３を他方のプロセッサが実行する。転送コストが高い方のプロセッサをメインとすることにより、総実行コストを抑制することができる。タスクＴ２，Ｔ３を実行するプロセッサはメインのプロセッサではないため、それぞれ転送コストが発生する。したがって、タスクＴ１の総実行コストは２００、タスクＴ２の総実行コストは６０、タスクＴ３の総実行コストは６０となる。タスクＴ１を実行するプロセッサとタスクＴ２，Ｔ３を実行するプロセッサとの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（８）の総実行コストは、２００となる。 FIG. 38 is an explanatory diagram showing a scheduling result of the combination pattern (8) shown in FIG. In FIG. 38, the processor executes any of the tasks T1 to T3. Here, since the number of processors is two, the task T1 is executed by one main processor, and the tasks T2 and T3 are executed by the other processor. By using the processor with the higher transfer cost as the main, the total execution cost can be suppressed. Since the processor that executes tasks T2 and T3 is not the main processor, transfer costs are incurred. Therefore, the total execution cost of task T1 is 200, the total execution cost of task T2 is 60, and the total execution cost of task T3 is 60. Since the processor executing the task T1 and the processor executing the tasks T2 and T3 are executed in parallel, the one having the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (8) is 200.

図３９は、図２５に示した組み合わせパターン（９）のスケジューリング結果を示す説明図である。図３９では、タスクＴ１をリコンフィグ回路が実行し、タスクＴ４（Ｔ２＋Ｔ３）をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０である。一方、タスクＴ４は、メインのプロセッサのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ４の総実行コストは１００（＝５０＋５０）である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（９）の総実行コストは、１００となる。 FIG. 39 is an explanatory diagram showing a scheduling result of the combination pattern (9) shown in FIG. In FIG. 39, the reconfiguration circuit executes task T1, and the processor executes task T4 (T2 + T3). In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T1: 70. On the other hand, since the task T4 is executed only by the main processor, there is no transfer cost. Therefore, the total execution cost of the task T4 is 100 (= 50 + 50). Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (9) is 100.

図４０は、図２５に示した組み合わせパターン（１０）のスケジューリング結果を示す説明図である。図４０では、タスクＴ２をリコンフィグ回路が実行し、タスクＴ５（Ｔ１＋Ｔ３）をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ２の総実行コスト：２０である。一方、タスクＴ５は、メインのプロセッサのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ５の総実行コストは２５０（２００＋５０）である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０）の総実行コストは、２５０となる。 FIG. 40 is an explanatory diagram showing a scheduling result of the combination pattern (10) shown in FIG. In FIG. 40, the reconfiguration circuit executes task T2, and the processor executes task T5 (T1 + T3). In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T2: 20. On the other hand, since the task T5 is executed only by the main processor, there is no transfer cost. Therefore, the total execution cost of task T5 is 250 (200 + 50). Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (10) is 250.

図４１は、図２５に示した組み合わせパターン（１１）のスケジューリング結果を示す説明図である。図４１では、タスクＴ３をリコンフィグ回路が実行し、タスクＴ６（Ｔ１＋Ｔ２）をプロセッサが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ３の総実行コスト：２０である。一方、タスクＴ６は、メインのプロセッサのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ６の総実行コストは２５０（２００＋５０）である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１１）の総実行コストは、２５０となる。 FIG. 41 is an explanatory diagram showing a scheduling result of the combination pattern (11) shown in FIG. In FIG. 41, the reconfiguration circuit executes task T3, and the processor executes task T6 (T1 + T2). In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T3: 20. On the other hand, since the task T6 is executed only by the main processor, there is no transfer cost. Therefore, the total execution cost of task T6 is 250 (200 + 50). Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (11) is 250.

図４２は、図２５に示した組み合わせパターン（１２）のスケジューリング結果を示す説明図である。図４２では、タスクＴ７（Ｔ１＋Ｔ２＋Ｔ３）をプロセッサが実行する。タスクＴ７は、メインのプロセッサのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ７の総実行コストは３００（２００＋５０＋５０）である。よって、組み合わせパターン（１２）の総実行コストは、３００となる。 FIG. 42 is an explanatory diagram showing a scheduling result of the combination pattern (12) shown in FIG. In FIG. 42, the processor executes task T7 (T1 + T2 + T3). Since task T7 is executed only by the main processor, there is no transfer cost. Therefore, the total execution cost of task T7 is 300 (200 + 50 + 50). Therefore, the total execution cost of the combination pattern (12) is 300.

決定部２０７では、これらの組み合わせパターン（１）〜（１２）の総実行コストの中からその総実行コストが最小となる組み合わせパターンを最適な組み合わせパターンに決定する。したがって、図３４に示した組み合わせパターン（４）の総実行コストが最小であるため、組み合わせパターン（４）が最適な組み合わせパターンに決定される。すなわち、タスクＴ１をリコンフィグ回路に割り当てて、タスクＴ２，Ｔ３を２個のプロセッサにそれぞれ割り当てるのが、最適な組み合わせとなる。 The determination unit 207 determines a combination pattern having the minimum total execution cost as an optimal combination pattern from the total execution costs of the combination patterns (1) to (12). Therefore, since the total execution cost of the combination pattern (4) shown in FIG. 34 is the minimum, the combination pattern (4) is determined as the optimal combination pattern. That is, the optimal combination is that the task T1 is assigned to the reconfiguration circuit and the tasks T2 and T3 are assigned to the two processors, respectively.

つぎに、実施例２について説明する。実施例２では、図３に示した対象プログラムコード３００を組み込みシステムにマッピングする。実施例２は、全体を制御するプロセッサを１個、ＤＳＰを２個、リコンフィグ回路を１個使用し、ＤＳＰはｆｏｒブロックのＳＩＭＤ演算処理（４並列）を最適化手法とし、リコンフィグ回路は、パイプライン処理を最適化手法として選択した場合の適用例である。 Next, Example 2 will be described. In the second embodiment, the target program code 300 shown in FIG. 3 is mapped to the embedded system. The second embodiment uses one processor for controlling the whole, two DSPs, and one reconfigurable circuit. The DSP adopts a for block SIMD arithmetic processing (4 parallel) as an optimization method. This is an application example when pipeline processing is selected as an optimization method.

図４３は、実施例２における設定情報を示す説明図である。設定情報はこのようにテーブル化されて記憶装置に記憶される。図４４は、実施例２におけるプログラムパターン抽出結果を示す割当テーブルである。この割当テーブルは、図９に示した割当情報の一部を構成する。この割当テーブルでは、タスクを実行可能な演算要素について割当フラグが立てられる。図４４の場合は、タスクＴ１〜Ｔ３がいずれもＤＳＰおよびリコンフィグ回路で実行されることを示している。すなわち、タスクＴ１〜Ｔ３は、図２０に示した解析ルーチンによりＤＳＰに割当可能なタスクに決定されたブロックである。同様に、タスクＴ１〜Ｔ３は、図１４に示した解析ルーチンによりプロセッサに割当可能なタスクに決定されたブロックでもある。 FIG. 43 is an explanatory diagram of setting information in the second embodiment. The setting information is tabulated in this way and stored in the storage device. FIG. 44 is an assignment table showing program pattern extraction results in the second embodiment. This allocation table constitutes a part of the allocation information shown in FIG. In this allocation table, an allocation flag is set for a computation element that can execute a task. In the case of FIG. 44, the tasks T1 to T3 are all executed by the DSP and the reconfiguration circuit. That is, the tasks T1 to T3 are blocks determined as tasks that can be assigned to the DSP by the analysis routine shown in FIG. Similarly, the tasks T1 to T3 are blocks determined as tasks that can be assigned to the processor by the analysis routine shown in FIG.

図４５は、実施例２における最適化するための組み合わせパターンを示す説明図である。図４５中、「Ｔａｓｋ♯」とは、タスクＴ♯を示す。実施例２では、（１０１）〜（１１２）の１２個の組み合わせパターンとして生成部２０５により生成される。また、図４５中、ハッチングが施されているタスクはＤＳＰに割り当てられたタスクを示しており、網掛けのタスクは、リコンフィグ回路に割り当てられたタスクを示している。 FIG. 45 is an explanatory diagram illustrating combination patterns for optimization in the second embodiment. In FIG. 45, “Task #” indicates a task T #. In the second embodiment, the generation unit 205 generates 12 combination patterns (101) to (112). In FIG. 45, hatched tasks indicate tasks assigned to the DSP, and shaded tasks indicate tasks assigned to the reconfiguration circuit.

また、各タスクＴ１〜Ｔ３は纏めて実行することができる。それらのタスクをタスクＴ４〜Ｔ７とする。タスクＴ４は、タスクＴ２とタスクＴ３を１つのＤＳＰで実行するタスクである。タスクＴ５は、タスクＴ１とタスクＴ３を１つのＤＳＰで実行するタスクである。タスクＴ６は、タスクＴ１とタスクＴ２を１つのＤＳＰで実行するタスクである。タスクＴ７は、タスクＴ１〜Ｔ３を１つのＤＳＰで実行するタスクである。タスクの纏め方については、図２６〜図２９に示したとおりである。 Moreover, each task T1-T3 can be performed collectively. These tasks are referred to as tasks T4 to T7. The task T4 is a task for executing the task T2 and the task T3 with one DSP. The task T5 is a task that executes the task T1 and the task T3 with one DSP. The task T6 is a task for executing the task T1 and the task T2 by one DSP. The task T7 is a task for executing the tasks T1 to T3 by one DSP. The method of grouping tasks is as shown in FIGS.

図４６は、実施例２における実行コストおよび転送コストの算出結果テーブルを示す説明図である。算出結果テーブルは、図４４に示した割当テーブルとともに、図９に示した割当情報の一部を構成する。タスクＴ１〜Ｔ３の実行コストは、上述したように、図１３〜図２１の解析ルーチンで算出される。なお、タスクＴ４〜Ｔ７は、あらたに纏められたタスクであるが、これらの実行コストはタスクＴ１〜Ｔ３の実行コストの組み合わせとなる。 FIG. 46 is an explanatory diagram of an execution cost and transfer cost calculation result table in the second embodiment. The calculation result table constitutes a part of the allocation information shown in FIG. 9 together with the allocation table shown in FIG. As described above, the execution costs of the tasks T1 to T3 are calculated by the analysis routines of FIGS. Note that the tasks T4 to T7 are newly collected tasks, but these execution costs are combinations of the execution costs of the tasks T1 to T3.

図４７は、図４５に示した組み合わせパターン（１０１）のスケジューリング結果を示す説明図である。図４７では、実行主体がリコンフィグ回路のみであるため、リコンフィグ回路がタスクＴ１〜Ｔ３の順に実行する。この場合、リコンフィグ回路へのマッピングが生じるため、タスクごとに転送コストが加算される。転送コストは、送信および受信があるため２倍となる。したがって、タスクＴ１の総実行コストは、７０（＝４０＋１５＋１５）となる。同様に、タスクＴ２の総実行コストは、２０（＝１０＋５＋５）、タスクＴ３の総実行コストは、２０（＝１０＋５＋５）となる。よって、組み合わせパターン（１０１）の総実行コストは、１１０（＝７０＋２０＋２０）となる。 FIG. 47 is an explanatory diagram showing a scheduling result of the combination pattern (101) shown in FIG. In FIG. 47, since the execution subject is only the reconfiguration circuit, the reconfiguration circuit executes tasks T1 to T3 in this order. In this case, since the mapping to the reconfiguration circuit occurs, the transfer cost is added for each task. The transfer cost is doubled due to transmission and reception. Therefore, the total execution cost of the task T1 is 70 (= 40 + 15 + 15). Similarly, the total execution cost of task T2 is 20 (= 10 + 5 + 5), and the total execution cost of task T3 is 20 (= 10 + 5 + 5). Therefore, the total execution cost of the combination pattern (101) is 110 (= 70 + 20 + 20).

図４８は、図４５に示した組み合わせパターン（１０２）のスケジューリング結果を示す説明図である。図４８では、タスクＴ１，Ｔ２をリコンフィグ回路が実行し、タスクＴ３をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０とタスクＴ２の総実行コスト：２０との和：９０である。一方、タスクＴ３は、２つのＤＳＰのうちメインのＤＳＰに実行させればよいため、転送コストが発生しない。したがって、タスクＴ３の総実行コストは１３である。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０２）の総実行コストは、９０（＝７０＋２０）となる。 FIG. 48 is an explanatory diagram showing a scheduling result of the combination pattern (102) shown in FIG. In FIG. 48, the reconfiguration circuit executes tasks T1 and T2, and the DSP executes task T3. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfigurable circuit is 90: the sum of the total execution cost of the task T1: 70 and the total execution cost of the task T2: 20. On the other hand, since the task T3 may be executed by the main DSP of the two DSPs, no transfer cost is generated. Therefore, the total execution cost of task T3 is 13. Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (102) is 90 (= 70 + 20).

図４９は、図４５に示した組み合わせパターン（１０３）のスケジューリング結果を示す説明図である。図４９では、タスクＴ１，Ｔ３をリコンフィグ回路が実行し、タスクＴ２をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０とタスクＴ３の総実行コスト：２０との和：９０である。一方、タスクＴ２は、２つのＤＳＰのうちメインのＤＳＰに実行させればよいため、転送コストが発生しない。したがって、タスクＴ２の総実行コストは１３である。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０３）の総実行コストは、９０（＝７０＋２０）となる。 FIG. 49 is an explanatory diagram showing a scheduling result of the combination pattern (103) shown in FIG. In FIG. 49, tasks T1 and T3 are executed by the reconfiguration circuit, and task T2 is executed by the DSP. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is 90: the sum of the total execution cost of task T1: 70 and the total execution cost of task T3: 20. On the other hand, since the task T2 may be executed by the main DSP of the two DSPs, no transfer cost is generated. Therefore, the total execution cost of task T2 is 13. Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (103) is 90 (= 70 + 20).

図５０は、図４５に示した組み合わせパターン（１０４）のスケジューリング結果を示す説明図である。図５０では、タスクＴ１をリコンフィグ回路が実行し、タスクＴ２，Ｔ３をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０である。一方、タスクＴ２，Ｔ３は、２つのＤＳＰがそれぞれ実行することとなるが、メインのＤＳＰに対しては、データ転送が発生しない。したがって、タスクＴ３を実行するＤＳＰをメインとすると、タスクＴ２の総実行コストは、２３（＝１３＋５＋５）、タスクＴ３の総実行コストは１３である。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０４）の総実行コストは、７０（＝４０＋１５＋１５）となる。 FIG. 50 is an explanatory diagram showing a scheduling result of the combination pattern (104) shown in FIG. In FIG. 50, the reconfiguration circuit executes task T1, and the DSP executes tasks T2 and T3. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T1: 70. On the other hand, the tasks T2 and T3 are executed by two DSPs, respectively, but no data transfer occurs to the main DSP. Therefore, if the DSP that executes task T3 is the main, the total execution cost of task T2 is 23 (= 13 + 5 + 5), and the total execution cost of task T3 is 13. Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (104) is 70 (= 40 + 15 + 15).

図５１は、図４５に示した組み合わせパターン（１０５）のスケジューリング結果を示す説明図である。図５１では、タスクＴ１をＤＳＰが実行し、タスクＴ２，Ｔ３をリコンフィグ回路が実行する。このように、異なる演算要素を用いる場合は、並列実行できる。ＤＳＰでは、タスクＴ１の総実行コストは、５０である。一方、リコンフィグ回路では、タスクＴ２の総実行コストが２０（＝１０＋５＋５）で、タスクＴ３の総実行コストは２０（＝１０＋５＋５）である。リコンフィグ回路の総実行コストは、タスクＴ２の総実行コスト：２０とタスクＴ３の総実行コスト：２０との和：４０である。リコンフィグ回路とプロセッサの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０５）の総実行コストは、５０となる。 FIG. 51 is an explanatory diagram showing a scheduling result of the combination pattern (105) shown in FIG. In FIG. 51, the task T1 is executed by the DSP, and the tasks T2 and T3 are executed by the reconfiguration circuit. In this way, when different calculation elements are used, they can be executed in parallel. In the DSP, the total execution cost of the task T1 is 50. On the other hand, in the reconfiguration circuit, the total execution cost of task T2 is 20 (= 10 + 5 + 5), and the total execution cost of task T3 is 20 (= 10 + 5 + 5). The total execution cost of the reconfiguration circuit is 40: the sum of the total execution cost of task T2: 20 and the total execution cost of task T3: 20. Since the reconfiguration circuit and the processor are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (105) is 50.

図５２は、図４５に示した組み合わせパターン（１０６）のスケジューリング結果を示す説明図である。図５２では、タスクＴ２をリコンフィグ回路が実行し、タスクＴ１，Ｔ３をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ２の総実行コスト：２０である。一方、タスクＴ１，Ｔ３は、２つのＤＳＰがそれぞれ実行することとなるが、メインのＤＳＰに対しては、データ転送が発生しない。したがって、タスクＴ１を実行するＤＳＰをメインとすると、タスクＴ１の総実行コストは５０、タスクＴ３の総実行コストは２３（＝１３＋５＋５）である。転送コストが高い方のＤＳＰをメインとすることにより、総実行コストを抑制することができる。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０６）の総実行コストは、５０となる。 FIG. 52 is an explanatory diagram showing a scheduling result of the combination pattern (106) shown in FIG. In FIG. 52, the reconfiguration circuit executes task T2, and the DSP executes tasks T1 and T3. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T2: 20. On the other hand, the tasks T1 and T3 are executed by the two DSPs, respectively, but no data transfer occurs to the main DSP. Therefore, if the DSP that executes the task T1 is the main, the total execution cost of the task T1 is 50, and the total execution cost of the task T3 is 23 (= 13 + 5 + 5). By using the DSP having the higher transfer cost as the main, the total execution cost can be suppressed. Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (106) is 50.

図５３は、図４５に示した組み合わせパターン（１０７）のスケジューリング結果を示す説明図である。図５３では、タスクＴ３をリコンフィグ回路が実行し、タスクＴ１，Ｔ２をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ３の総実行コスト：２０である。一方、タスクＴ１，Ｔ２は、２つのＤＳＰがそれぞれ実行することとなるが、メインのＤＳＰに対しては、データ転送が発生しない。したがって、タスクＴ１を実行するＤＳＰをメインとすると、タスクＴ１の総実行コストは５０、タスクＴ２の総実行コストは２３（＝１３＋５＋５）である。転送コストが高い方のＤＳＰをメインとすることにより、総実行コストを抑制することができる。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０７）の総実行コストは、５０となる。 FIG. 53 is an explanatory diagram showing a scheduling result of the combination pattern (107) shown in FIG. In FIG. 53, the reconfiguration circuit executes task T3, and the DSP executes tasks T1 and T2. In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T3: 20. On the other hand, tasks T1 and T2 are executed by two DSPs, respectively, but no data transfer occurs to the main DSP. Therefore, if the DSP that executes task T1 is the main, the total execution cost of task T1 is 50, and the total execution cost of task T2 is 23 (= 13 + 5 + 5). By using the DSP having the higher transfer cost as the main, the total execution cost can be suppressed. Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (107) is 50.

図５４は、図４５に示した組み合わせパターン（１０８）のスケジューリング結果を示す説明図である。図５４では、いずれのタスクＴ１〜Ｔ３もＤＳＰが実行する。ここでは、ＤＳＰの個数は２個であるため、タスクＴ１をメインとなる一方のＤＳＰ、タスクＴ２，Ｔ３を他方のＤＳＰが実行する。転送コストが高い方のＤＳＰをメインとすることにより、総実行コストを抑制することができる。タスクＴ２，Ｔ３を実行するＤＳＰはメインのＤＳＰではないため、それぞれ転送コストが発生する。したがって、タスクＴ１の総実行コストは５０、タスクＴ２の総実行コストは２３、タスクＴ３の総実行コストは２３となる。タスクＴ１を実行するＤＳＰとタスクＴ２，Ｔ３を実行するＤＳＰとの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０８）の総実行コストは、５０となる。 FIG. 54 is an explanatory diagram showing a scheduling result of the combination pattern (108) shown in FIG. In FIG. 54, the DSP executes any of the tasks T1 to T3. Here, since the number of DSPs is two, the task DSP is executed by one main DSP and the tasks T2 and T3 are executed by the other DSP. By using the DSP having the higher transfer cost as the main, the total execution cost can be suppressed. Since the DSP that executes the tasks T2 and T3 is not the main DSP, there is a transfer cost. Therefore, the total execution cost of task T1 is 50, the total execution cost of task T2 is 23, and the total execution cost of task T3 is 23. Since the DSP that executes the task T1 and the DSP that executes the tasks T2 and T3 are executed in parallel, the one having the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (108) is 50.

図５５は、図４５に示した組み合わせパターン（１０９）のスケジューリング結果を示す説明図である。図５５では、タスクＴ１をリコンフィグ回路が実行し、タスクＴ４（Ｔ２＋Ｔ３）をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ１の総実行コスト：７０である。一方、タスクＴ４は、メインのＤＳＰのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ４の総実行コストは２６（＝１３＋１３）である。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１０９）の総実行コストは、７０となる。 FIG. 55 is an explanatory diagram showing a scheduling result of the combination pattern (109) shown in FIG. In FIG. 55, the reconfiguration circuit executes task T1, and the DSP executes task T4 (T2 + T3). In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T1: 70. On the other hand, since the task T4 is executed only by the main DSP, there is no transfer cost. Therefore, the total execution cost of task T4 is 26 (= 13 + 13). Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (109) is 70.

図５６は、図４５に示した組み合わせパターン（１１０）のスケジューリング結果を示す説明図である。図５６では、タスクＴ２をリコンフィグ回路が実行し、タスクＴ５（Ｔ１＋Ｔ３）をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ２の総実行コスト：２０である。一方、タスクＴ５は、メインのＤＳＰのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ５の総実行コストは６３（＝５０＋１３）である。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１１０）の総実行コストは、６３となる。 FIG. 56 is an explanatory diagram showing a scheduling result of the combination pattern (110) shown in FIG. In FIG. 56, the reconfiguration circuit executes task T2, and the DSP executes task T5 (T1 + T3). In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T2: 20. On the other hand, since the task T5 is executed only by the main DSP, there is no transfer cost. Therefore, the total execution cost of the task T5 is 63 (= 50 + 13). Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (110) is 63.

図５７は、図４５に示した組み合わせパターン（１１１）のスケジューリング結果を示す説明図である。図５７では、タスクＴ３をリコンフィグ回路が実行し、タスクＴ６（Ｔ１＋Ｔ２）をＤＳＰが実行する。このように、異なる演算要素を用いる場合は、並列実行できる。リコンフィグ回路の総実行コストは、タスクＴ３の総実行コスト：２０である。一方、タスクＴ６は、メインのＤＳＰのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ６の総実行コストは６３（＝５０＋１３）である。リコンフィグ回路とＤＳＰの並列実行であるため、総実行コストの多い方を選択する。よって、組み合わせパターン（１１１）の総実行コストは、６３となる。 FIG. 57 is an explanatory diagram showing a scheduling result of the combination pattern (111) shown in FIG. In FIG. 57, the reconfiguration circuit executes task T3, and the DSP executes task T6 (T1 + T2). In this way, when different calculation elements are used, they can be executed in parallel. The total execution cost of the reconfiguration circuit is the total execution cost of task T3: 20. On the other hand, since the task T6 is executed only by the main DSP, there is no transfer cost. Therefore, the total execution cost of the task T6 is 63 (= 50 + 13). Since the reconfiguration circuit and the DSP are executed in parallel, the one with the higher total execution cost is selected. Therefore, the total execution cost of the combination pattern (111) is 63.

図５８は、図４５に示した組み合わせパターン（１１２）のスケジューリング結果を示す説明図である。図５８では、タスクＴ７（Ｔ１＋Ｔ２＋Ｔ３）をＤＳＰが実行する。タスクＴ７は、メインのＤＳＰのみでの実行となるため、転送コストは発生しない。したがって、タスクＴ７の総実行コストは７６（＝５０＋１３＋１３）である。よって、組み合わせパターン（１１２）の総実行コストは、７６となる。 FIG. 58 is an explanatory diagram showing a scheduling result of the combination pattern (112) shown in FIG. In FIG. 58, the DSP executes task T7 (T1 + T2 + T3). Since task T7 is executed only by the main DSP, there is no transfer cost. Therefore, the total execution cost of task T7 is 76 (= 50 + 13 + 13). Therefore, the total execution cost of the combination pattern (112) is 76.

決定部２０７では、これらの組み合わせパターン（１０１）〜（１１２）の総実行コストの中からその総実行コストが最小となる組み合わせパターンを最適な組み合わせパターンに決定する。したがって、図５１〜図５４に示した組み合わせパターン（１０５）〜（１０８）の総実行コストが最小であるため、組み合わせパターン（１０５）〜（１０８）が最適な組み合わせパターンに決定される。ここでは、複数の組み合わせパターンが決定されたため、その中でも、演算要素の使用個数が少ない図５１または図５４の組み合わせパターン（１０５），（１０８）が最適な組み合わせパターンに決定される。 The determining unit 207 determines a combination pattern that minimizes the total execution cost from among the total execution costs of the combination patterns (101) to (112) as an optimal combination pattern. Therefore, since the total execution cost of the combination patterns (105) to (108) shown in FIGS. 51 to 54 is the minimum, the combination patterns (105) to (108) are determined as the optimum combination patterns. Here, since a plurality of combination patterns are determined, among them, the combination patterns (105) and (108) of FIG. 51 or FIG.

すなわち、組み合わせパターン（１０５）のように、タスクＴ１をＤＳＰに割り当てて、タスクＴ２，Ｔ３をリコンフィグ回路に割り当てると、最適な組み合わせとなる。同様に、組み合わせパターン（１０８）のように、タスクＴ１をメインである一方のＤＳＰに割り当てて、タスクＴ２，Ｔ３を他方のＤＳＰに割り当てると、最適な組み合わせとなる。 That is, as in the combination pattern (105), when the task T1 is assigned to the DSP and the tasks T2 and T3 are assigned to the reconfiguration circuit, an optimum combination is obtained. Similarly, when the task T1 is assigned to one main DSP and the tasks T2 and T3 are assigned to the other DSP as in the combination pattern (108), an optimal combination is obtained.

以上説明したように、本実施の形態では、従来の自動化技術では性能を有効に引き出すことが難しいＡＭＰでの最適化において、対象プログラムコード３００を複数の機能に分割し、それぞれの処理に適した演算要素（コア）に処理（タスク）を割り当てる。したがって、必要最低限の演算要素構成で十分な性能を抽出することができる。また、人手によって、長年の経験と勘から処理の割り当ての決定が行われていたため、必ずしも最適であるとは言えなかったのに対し、本実施の形態では定量的に処理（タスク）の割り当てを決定することができる。また、動的にタスクの再構成を行うことで、静的には解析できない部分についても最適な割り当てを可能とする。また、最適な組み合わせパターンが得られなかった場合は、再度入力画面５００を呼び出して、設定情報を設定しなおすことで、チューニングし、再度スケジューリングすることができる。 As described above, according to the present embodiment, the target program code 300 is divided into a plurality of functions and optimized for each process in the optimization with AMP for which it is difficult to effectively extract the performance with the conventional automation technology. A process (task) is assigned to a computation element (core). Therefore, sufficient performance can be extracted with the minimum necessary calculation element configuration. In addition, since the assignment of processing was determined manually based on many years of experience and intuition, it was not necessarily optimal, whereas in this embodiment, processing (task) was assigned quantitatively. Can be determined. In addition, by dynamically reconfiguring tasks, it is possible to optimally allocate a portion that cannot be analyzed statically. If the optimum combination pattern is not obtained, the input screen 500 can be called again to reset the setting information, and tuning and scheduling can be performed again.

このように、システム設計におけるソフトウェアの最適化および演算要素の構成の決定などを支援することから、リソース・期間の削減に役立ち、開発効率の大幅な改善を実現することができる。以上のことから、本実施の形態によれば、プログラムに応じた組み込みシステムの最適化を図りつつ、設計者の設計負担の軽減および設計期間の短縮化を図ることができるという効果を奏する。 As described above, since the optimization of the software in the system design and the determination of the configuration of the arithmetic element are supported, it is possible to reduce the resources and the period, and to realize a great improvement in the development efficiency. From the above, according to the present embodiment, it is possible to reduce the design burden on the designer and shorten the design period while optimizing the embedded system according to the program.

なお、本実施の形態で説明した設計支援方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネット等のネットワークを介して配布するとしてもよい。 The design support method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. Further, this program may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）コンピュータを、
対象プログラムコードを複数のタスクに分割する分割手段、
前記分割手段によって得られた各タスクを実行する演算要素の種別に基づいて、前記各タスクの実行時間に応じた実行コストを算出するタスク実行コスト算出手段、
前記タスクの実行順序をあらわす組み合わせパターンを生成する生成手段、
前記タスク実行コスト算出手段によって算出された実行コストに基づいて、前記生成手段によって生成された各組み合わせパターンの実行時間に応じた総実行コストを算出する総実行コスト算出手段、
前記総実行コスト算出手段によって算出された総実行コストに基づいて、前記組み合わせパターン群の中から特定の組み合わせパターンを決定する決定手段、
前記決定手段によって決定された決定結果を出力する出力手段、
として機能させることを特徴とする設計支援プログラム。 (Appendix 1) Computer
A dividing means for dividing the target program code into a plurality of tasks;
Task execution cost calculation means for calculating an execution cost according to the execution time of each task based on the type of calculation element that executes each task obtained by the dividing means;
Generating means for generating a combination pattern representing the execution order of the tasks;
Based on the execution cost calculated by the task execution cost calculation means, a total execution cost calculation means for calculating the total execution cost according to the execution time of each combination pattern generated by the generation means,
A determination unit that determines a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation unit;
Output means for outputting the determination result determined by the determination means;
Design support program characterized by functioning as

（付記２）前記コンピュータを、
前記演算要素の種別および当該演算要素に実行させる最適化処理に関する設定情報を取得する取得手段、
前記取得手段によって取得された設定情報に含まれる演算要素の種別により前記最適化処理が実行可能なタスクを前記複数のタスクの中から抽出する抽出手段として機能させ、
前記タスク実行コスト算出手段は、
前記設定情報に基づいて、前記抽出手段によって抽出されたタスクの実行時間に応じた実行コストを算出することを特徴とする付記１に記載の設計支援プログラム。 (Appendix 2)
An acquisition means for acquiring setting information related to the type of the calculation element and optimization processing to be executed by the calculation element;
A function that can be executed by the optimization process according to the type of the calculation element included in the setting information acquired by the acquisition unit to function as an extraction unit that extracts the task from the plurality of tasks;
The task execution cost calculation means includes:
The design support program according to appendix 1, wherein an execution cost corresponding to an execution time of a task extracted by the extraction unit is calculated based on the setting information.

（付記３）前記タスク実行コスト算出手段は、
さらに、前記演算要素間のデータ転送時間に応じた転送コストを算出し、
前記総実行コスト算出手段は、
前記タスク実行コスト算出手段によって算出された実行コストおよび転送コストに基づいて、前記各組み合わせパターンの実行時間に応じた総実行コストを算出することを特徴とする付記１または２に記載の設計支援プログラム。 (Supplementary Note 3) The task execution cost calculation means includes:
Furthermore, the transfer cost according to the data transfer time between the calculation elements is calculated,
The total execution cost calculation means includes:
The design support program according to appendix 1 or 2, wherein a total execution cost corresponding to an execution time of each combination pattern is calculated based on an execution cost and a transfer cost calculated by the task execution cost calculation means .

（付記４）前記決定手段は、
前記総実行コストが最小となる組み合わせパターンを前記特定の組み合わせパターンに決定することを特徴とする付記１〜３のいずれか一つに記載の設計支援プログラム。 (Supplementary note 4)
The design support program according to any one of appendices 1 to 3, wherein the combination pattern that minimizes the total execution cost is determined as the specific combination pattern.

（付記５）前記決定手段は、
前記総実行コストが最小でかつ前記演算要素の使用個数が最小となる組み合わせパターンを前記特定の組み合わせパターンに決定することを特徴とする付記１〜３のいずれか一つに記載の設計支援プログラム。 (Supplementary note 5)
4. The design support program according to any one of appendices 1 to 3, wherein a combination pattern that minimizes the total execution cost and minimizes the number of calculation elements to be used is determined as the specific combination pattern.

（付記６）前記決定手段は、
前記総実行コストが所定のしきい値以下となる組み合わせパターンを前記特定の組み合わせパターンに決定することを特徴とする付記１〜３のいずれか一つに記載の設計支援プログラム。 (Appendix 6) The determination means includes:
The design support program according to any one of appendices 1 to 3, wherein a combination pattern in which the total execution cost is equal to or less than a predetermined threshold is determined as the specific combination pattern.

（付記７）前記決定手段は、
前記総実行コストが所定のしきい値以下でかつ前記演算要素の使用数が最小となる組み合わせパターンを前記特定の組み合わせパターンに決定することを特徴とする付記１〜３のいずれか一つに記載の設計支援プログラム。 (Supplementary note 7)
The combination pattern in which the total execution cost is equal to or less than a predetermined threshold value and the number of use of the calculation elements is minimized is determined as the specific combination pattern. Design support program.

（付記８）対象プログラムコードを複数のタスクに分割する分割手段と、
前記分割手段によって得られた各タスクを実行する演算要素の種別に基づいて、前記各タスクの実行時間に応じた実行コストを算出するタスク実行コスト算出手段と、
前記タスクの実行順序をあらわす組み合わせパターンを生成する生成手段と、
前記タスク実行コスト算出手段によって算出された実行コストに基づいて、前記生成手段によって生成された各組み合わせパターンの実行時間に応じた総実行コストを算出する総実行コスト算出手段と、
前記総実行コスト算出手段によって算出された総実行コストに基づいて、前記組み合わせパターン群の中から特定の組み合わせパターンを決定する決定手段と、
前記決定手段によって決定された決定結果を出力する出力手段と、
を備えることを特徴とする設計支援装置。 (Supplementary Note 8) Dividing means for dividing the target program code into a plurality of tasks;
Task execution cost calculation means for calculating an execution cost according to the execution time of each task, based on the type of arithmetic element that executes each task obtained by the dividing means;
Generating means for generating a combination pattern representing the execution order of the tasks;
Based on the execution cost calculated by the task execution cost calculation means, a total execution cost calculation means for calculating a total execution cost according to the execution time of each combination pattern generated by the generation means;
A determination unit that determines a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation unit;
Output means for outputting the determination result determined by the determination means;
A design support apparatus comprising:

（付記９）中央処理装置および記憶装置を備えるコンピュータが、
前記中央処理装置により、対象プログラムコードを複数のタスクに分割して、当該複数のタスクを前記記憶装置に記憶する分割工程と、
前記中央処理装置により前記記憶装置にアクセスして、前記分割工程によって得られた各タスクを実行する演算要素の種別に基づいて、前記各タスクの実行時間に応じた実行コストを算出して、当該実行コストを前記記憶装置に記憶するタスク実行コスト算出工程と、
前記中央処理装置により、前記タスクの実行順序をあらわす組み合わせパターンを生成して、当該組み合わせパターンを前記記憶装置に記憶する生成工程と、
前記中央処理装置により前記記憶装置にアクセスして、前記タスク実行コスト算出工程によって算出された実行コストに基づいて、前記生成工程によって生成された各組み合わせパターンの実行時間に応じた総実行コストを算出して、当該総実行コストを前記記憶装置に記憶する総実行コスト算出工程と、
前記中央処理装置により前記記憶装置にアクセスして、前記総実行コスト算出工程によって算出された総実行コストに基づいて、前記組み合わせパターン群の中から特定の組み合わせパターンを決定して、その決定結果を前記記憶装置に記憶する決定工程と、
前記中央処理装置により前記記憶装置にアクセスして、前記決定工程によって決定された決定結果を出力する出力工程と、
を実行することを特徴とする設計支援方法。 (Supplementary Note 9) A computer including a central processing unit and a storage device is provided.
Dividing the target program code into a plurality of tasks by the central processing unit and storing the plurality of tasks in the storage device;
Accessing the storage device by the central processing unit, calculating the execution cost according to the execution time of each task, based on the type of arithmetic element that executes each task obtained by the division step, A task execution cost calculation step of storing an execution cost in the storage device;
Generating a combination pattern representing the execution order of the tasks by the central processing unit, and storing the combination pattern in the storage device;
The storage device is accessed by the central processing unit, and the total execution cost corresponding to the execution time of each combination pattern generated by the generation step is calculated based on the execution cost calculated by the task execution cost calculation step A total execution cost calculating step of storing the total execution cost in the storage device;
The central processing unit accesses the storage device, determines a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation step, and determines the determination result. A determination step of storing in the storage device;
An output step of accessing the storage device by the central processing unit and outputting a determination result determined by the determination step;
A design support method characterized by executing

本実施の形態にかかる設計支援装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the design support apparatus concerning this Embodiment. 本実施の形態にかかる設計支援装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the design assistance apparatus concerning this Embodiment. 対象プログラムコードの一記述例を示す説明図である。It is explanatory drawing which shows the example of 1 description of object program code. 図３に示した対象プログラムコードの分割例を示す説明図である。It is explanatory drawing which shows the example of a division | segmentation of the object program code shown in FIG. 入力画面の一例を示す説明図（その１）である。It is explanatory drawing (the 1) which shows an example of an input screen. 入力画面の一例を示す説明図（その２）である。It is explanatory drawing (the 2) which shows an example of an input screen. 入力画面の一例を示す説明図（その３）である。It is explanatory drawing (the 3) which shows an example of an input screen. 設定情報のデータ構造を示す説明図である。It is explanatory drawing which shows the data structure of setting information. 割当情報のデータ構造を示す説明図である。It is explanatory drawing which shows the data structure of allocation information. 本実施の形態にかかる設計支援処理手順を示すフローチャートである。It is a flowchart which shows the design assistance process procedure concerning this Embodiment. プログラムパターン割当処理（ステップＳ１００４）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of a program pattern allocation process (step S1004). 解析実行処理（ステップＳ１１０５）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of an analysis execution process (step S1105). ｆｏｒブロックのループ並列処理に関するプログラムパターン抽出処理手順を示すフローチャートである。It is a flowchart which shows the program pattern extraction process procedure regarding the loop parallel process of a for block. ｆｏｒブロックのパイプライン処理に関するプログラムパターン抽出処理手順を示すフローチャートである。It is a flowchart which shows the program pattern extraction process procedure regarding the pipeline process of a for block. ｓｗｉｔｃｈ−ｃａｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。It is a flowchart which shows the extraction process procedure of the program pattern regarding the simultaneous process of a switch-case block. ｓｗｉｔｃｈ−ｃａｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。It is a flowchart which shows the extraction process procedure of the program pattern regarding the simultaneous process of a switch-case block. 対象ブロックとなったｓｗｉｔｃｈ−ｃａｓｅブロックとその回路構成情報を示す説明図である。It is explanatory drawing which shows the switch-case block used as the object block, and its circuit structure information. ｉｆ−ｅｌｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。It is a flowchart which shows the extraction process procedure of the program pattern regarding the simultaneous process of an if-else block. ｉｆブロック／ｅｌｓｅブロックの同時処理に関するプログラムパターンの抽出処理手順を示すフローチャートである。It is a flowchart which shows the extraction process procedure of the program pattern regarding the simultaneous process of an if block / else block. ＳＩＭＤ演算処理に関するプログラムパターン抽出処理手順を示すフローチャートである。It is a flowchart which shows the program pattern extraction process procedure regarding SIMD arithmetic processing. ＶＬＩＷ命令実行処理に関するプログラムパターン抽出処理手順を示すフローチャートである。It is a flowchart which shows the program pattern extraction process procedure regarding a VLIW instruction execution process. 図１０に示したスケジューリング処理の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the scheduling process shown in FIG. 実施例１における設定情報を示す説明図である。It is explanatory drawing which shows the setting information in Example 1. FIG. 実施例１におけるプログラムパターン抽出結果を示す割当テーブルを示す説明図である。It is explanatory drawing which shows the allocation table which shows the program pattern extraction result in Example 1. FIG. 実施例１における最適化するための組み合わせパターンを示す説明図である。It is explanatory drawing which shows the combination pattern for the optimization in Example 1. FIG. タスクを纏めた例を示す説明図（その１）である。It is explanatory drawing (the 1) which shows the example which put together the task. タスクを纏めた例を示す説明図（その２）である。It is explanatory drawing (the 2) which shows the example which put together the task. タスクを纏めた例を示す説明図（その３）である。It is explanatory drawing (the 3) which shows the example which put together the task. タスクを纏めた例を示す説明図（その４）である。It is explanatory drawing (the 4) which shows the example which put together the task. 実施例１における実行コストおよび転送コストの算出結果テーブルを示す説明図である。It is explanatory drawing which shows the calculation result table of the execution cost and transfer cost in Example 1. FIG. 図２５に示した組み合わせパターン（１）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (1) shown in FIG. 図２５に示した組み合わせパターン（２）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (2) shown in FIG. 図２５に示した組み合わせパターン（３）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (3) shown in FIG. 図２５に示した組み合わせパターン（４）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (4) shown in FIG. 図２５に示した組み合わせパターン（５）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (5) shown in FIG. 図２５に示した組み合わせパターン（６）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (6) shown in FIG. 図２５に示した組み合わせパターン（７）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (7) shown in FIG. 図２５に示した組み合わせパターン（８）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (8) shown in FIG. 図２５に示した組み合わせパターン（９）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (9) shown in FIG. 図２５に示した組み合わせパターン（１０）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (10) shown in FIG. 図２５に示した組み合わせパターン（１１）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (11) shown in FIG. 図２５に示した組み合わせパターン（１２）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (12) shown in FIG. 実施例２における設定情報を示す説明図である。It is explanatory drawing which shows the setting information in Example 2. FIG. 実施例２におけるプログラムパターン抽出結果を示す割当テーブルである。12 is an allocation table showing a program pattern extraction result in the second embodiment. 実施例２における最適化するための組み合わせパターンを示す説明図である。It is explanatory drawing which shows the combination pattern for the optimization in Example 2. FIG. 実施例２における実行コストおよび転送コストの算出結果テーブルを示す説明図である。It is explanatory drawing which shows the calculation result table of the execution cost and transfer cost in Example 2. FIG. 図４５に示した組み合わせパターン（１０１）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (101) shown in FIG. 図４５に示した組み合わせパターン（１０２）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (102) shown in FIG. 図４５に示した組み合わせパターン（１０３）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (103) shown in FIG. 図４５に示した組み合わせパターン（１０４）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (104) shown in FIG. 図４５に示した組み合わせパターン（１０５）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (105) shown in FIG. 図４５に示した組み合わせパターン（１０６）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (106) shown in FIG. 図４５に示した組み合わせパターン（１０７）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (107) shown in FIG. 図４５に示した組み合わせパターン（１０８）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (108) shown in FIG. 図４５に示した組み合わせパターン（１０９）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (109) shown in FIG. 図４５に示した組み合わせパターン（１１０）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (110) shown in FIG. 図４５に示した組み合わせパターン（１１１）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (111) shown in FIG. 図４５に示した組み合わせパターン（１１２）のスケジューリング結果を示す説明図である。It is explanatory drawing which shows the scheduling result of the combination pattern (112) shown in FIG.

Explanation of symbols

２００設計支援装置
２０１分割部
２０２取得部
２０３抽出部
２０４タスク実行コスト算出部
２０５生成部
２０６総実行コスト算出部
２０７決定部
２０８出力部
３００対象プログラムコード 200 Design Support Device 201 Division Unit 202 Acquisition Unit 203 Extraction Unit 204 Task Execution Cost Calculation Unit 205 Generation Unit 206 Total Execution Cost Calculation Unit 207 Determination Unit 208 Output Unit 300 Target Program Code

Claims

Computer
A dividing means for dividing the target program code into a plurality of tasks;
An acquisition unit that acquires the setting information regarding the type of the calculation element that executes each task obtained by the dividing unit and the optimization process to be executed by the calculation element;
Extraction means for extracting a task that can be executed by the optimization process from the plurality of tasks according to the type of calculation element included in the setting information acquired by the acquisition means;
A task execution cost calculating means for calculating an execution cost according to the execution time of the task extracted by the extracting means based on the setting information ;
Generating means for generating a combination pattern of types of arithmetic elements for executing the task;
Based on the execution cost calculated by the task execution cost calculation means, a total execution cost calculation means for calculating the total execution cost according to the execution time of each combination pattern generated by the generation means,
A determination unit that determines a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation unit;
Output means for outputting the determination result determined by the determination means;
Design support program characterized by functioning as

The computer,
For each combination pattern generated by the generating means, based on the type of arithmetic element that executes each task, function as a determining means for determining whether each task can be executed in parallel ,
The total execution cost calculation means includes:
Based on the execution cost calculated by the task execution cost calculation means and the determination result of whether or not each task determined by the determination means can be executed in parallel, the total execution cost according to the execution time of each combination pattern design support program according to claim 1, characterized in that calculated.

The task execution cost calculation means includes:
Furthermore, the transfer cost according to the data transfer time between the calculation elements is calculated,
The total execution cost calculation means includes:
The design support according to claim 1 or 2, wherein a total execution cost corresponding to an execution time of each combination pattern is calculated based on an execution cost and a transfer cost calculated by the task execution cost calculation means. program.

A dividing means for dividing the target program code into a plurality of tasks;
An acquisition means for acquiring setting information related to the type of the calculation element that executes each task obtained by the dividing means and the optimization process to be executed by the calculation element;
Extraction means for extracting a task that can be executed by the optimization process from the plurality of tasks according to the type of calculation element included in the setting information acquired by the acquisition means;
Based on the setting information, task execution cost calculation means for calculating an execution cost according to the execution time of the task extracted by the extraction means;
Generating means for generating a combination pattern of types of calculation elements for executing the task;
Based on the execution cost calculated by the task execution cost calculation means, a total execution cost calculation means for calculating a total execution cost according to the execution time of each combination pattern generated by the generation means;
A determination unit that determines a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation unit;
Output means for outputting the determination result determined by the determination means;
A design support apparatus comprising:

A computer comprising a central processing unit and a storage device,
Dividing the target program code into a plurality of tasks by the central processing unit and storing the plurality of tasks in the storage device;
An acquisition step of accessing the storage device by the central processing unit and acquiring setting information relating to the type of arithmetic element that executes each task obtained by the dividing step and the optimization processing to be executed by the arithmetic element;
Extraction that accesses the storage device by the central processing unit and extracts a task that can be executed by the optimization process from the plurality of tasks according to the type of calculation element included in the setting information acquired by the acquisition step Process,
The central processing unit accesses the storage device, calculates an execution cost according to the execution time of the task extracted by the extraction step based on the setting information, and stores the execution cost in the storage device Task execution cost calculation process to perform,
The central processing unit generates a combination pattern of types of arithmetic elements that execute the task, and a generation step of storing the combination pattern in the storage device;
The storage device is accessed by the central processing unit, and the total execution cost corresponding to the execution time of each combination pattern generated by the generation step is calculated based on the execution cost calculated by the task execution cost calculation step A total execution cost calculating step of storing the total execution cost in the storage device;
The central processing unit accesses the storage device, determines a specific combination pattern from the combination pattern group based on the total execution cost calculated by the total execution cost calculation step, and determines the determination result. A determination step of storing in the storage device;
An output step of accessing the storage device by the central processing unit and outputting a determination result determined by the determination step;
A design support method characterized by executing