JP2013020580A

JP2013020580A - Parallelization method, system and program

Info

Publication number: JP2013020580A
Application number: JP2011155616A
Authority: JP
Inventors: Takero Yoshizawa; 武朗吉澤; Shuichi Shimizu; 周一清水
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2011-07-14
Filing date: 2011-07-14
Publication date: 2013-01-31
Anticipated expiration: 2031-07-14
Also published as: JP5775386B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique to divide the regions of a block diagram including a block other than coded object by changing the disposition of the block other than coded object to optimize the operation speed after coding.SOLUTION: A block diagram is converted and abstracted into a task graph of DAG. A series-parallel tree (SPT) is obtained by analyzing the structure of the task graph. The SPT includes: an S-node from which a serial execution node branches; and a P-node from which parallel execution node branches. The SPT is converted into another SPT until no P-node exists prior to the block other than coded object. A corresponding block diagram is divided, and a code is generated for each region; and compiled and allotted to a different processor or core to execute a block following the P-node of the obtained SPT in parallel. The task graph of the DAG is deformed so that scattered blocks other than coded object are preferably merged.

Description

この発明は、シミュレーション・システムにおいて、並列化によりプログラムの実行を高速化する技法に関する。 The present invention relates to a technique for speeding up program execution by parallelization in a simulation system.

近年、科学技術計算、シミュレーションなどの分野で、複数のプロセッサをもつ、いわゆるマルチプロセッサ・システムが使用されている。一方、最近になって特に盛んに開発されるようになってきたシミュレーションの分野として、ロボット、自動車、飛行機などのメトカトロニクスのプラントのシミュレーション用ソフトウェアがある。電子部品とソフトウェア技術の発展の恩恵により、ロボット、自動車、飛行機などでは、神経のように張り巡らされたワイヤ結線や無線ＬＡＮなどを利用して、大部分の制御が電子的に行われる。 In recent years, so-called multiprocessor systems having a plurality of processors have been used in fields such as scientific calculation and simulation. On the other hand, simulation software that has recently been actively developed includes simulation software for methcattronic plants such as robots, automobiles, and airplanes. Thanks to the development of electronic parts and software technology, robots, automobiles, airplanes, etc., perform most of the control electronically using wire connections, wireless LANs, etc. that are stretched like nerves.

それらは、本来的にはメカトロニクスの装置であるが、大量の制御ソフトウェアをも内蔵している。そのため、製品の開発に当たっては、制御プログラムの開発とそのテストに、長い時間と、膨大な費用と、多数の人員を費やす必要が出てきた。 They are essentially mechatronic devices, but they also contain a lot of control software. Therefore, in developing products, it has become necessary to spend a long time, enormous costs, and a large number of personnel for developing and testing control programs.

このようなテストにために従来行われている技法として、ＨＩＬＳ(Hardware In the Loop Simulation)がある。特に、自動車全体の電子制御ユニット（ＥＣＵ）をテストする環境は、フルビークルＨＩＬＳと呼ばれる。フルビークルＨＩＬＳにおいては、実験室内で、本物のＥＣＵが、エンジン、トランスミッション機構などをエミュレーションする専用のハードウェア装置に接続され、所定のシナリオに従って、テストが行われる。ＥＣＵからの出力は、監視用のコンピュータに入力され、さらにはディスプレイに表示されて、テスト担当者がディスプレイを眺めながら、異常動作がないかどうか、チェックする。 As a technique conventionally performed for such a test, there is HILS (Hardware In the Loop Simulation). In particular, the environment for testing the electronic control unit (ECU) of the entire automobile is called full vehicle HILS. In the full vehicle HILS, a real ECU is connected to a dedicated hardware device that emulates an engine, a transmission mechanism, and the like in a laboratory, and a test is performed according to a predetermined scenario. The output from the ECU is input to a monitoring computer and further displayed on a display, and a tester checks whether there is an abnormal operation while looking at the display.

しかし、ＨＩＬＳは、専用のハードウェア装置を使い、それと本物のＥＣＵの間を物理的に配線しなくてはならないので、準備が大変である。また、別のＥＣＵに取り替えてのテストも、物理的に接続し直さなくてはならないので、手間がかかる。さらに、本物のＥＣＵを用いたテストであるため、テストに実時間を要する。従って、多くのシナリオをテストすると、膨大な時間がかかる。また、ＨＩＬＳのエミュレーション用のハードウェア装置は、一般に、非常に高価である。 However, HILS requires a dedicated hardware device and has to be physically wired between it and a real ECU, so preparation is difficult. In addition, the test after replacing with another ECU also takes time since it must be physically reconnected. Furthermore, since the test is performed using a real ECU, real time is required for the test. Therefore, testing many scenarios takes a huge amount of time. In addition, a hardware device for HILS emulation is generally very expensive.

そこで近年、高価なエミュレーション用ハードウェア装置を使うことなく、シミュレーション全体をソフトウェアで構成する手法が提案されている。その手法においては、エンジンやトランスミッションなどのプラント部分には連続系シミュレーション（continuous simulation）を利用し、また、コントローラ部分にはステート・チャート（state chart）あるいは実際のソフトウエアコードを利用する。コントローラ部分のシミュレーション方法に応じて、前者はＭＩＬ (Model-in-the-Loop) シミュレーション、後者はＳＩＬ (Software-in-the-Loop) シミュレーションと呼ばれる。ＭＩＬやＳＩＬによればＥＣＵのハードウェアが存在しなくても、テストを実行可能である。 Therefore, in recent years, a method has been proposed in which the entire simulation is configured by software without using an expensive emulation hardware device. In the method, a continuous simulation is used for a plant part such as an engine and a transmission, and a state chart or actual software code is used for a controller part. Depending on the simulation method of the controller part, the former is called MIL (Model-in-the-Loop) simulation and the latter is called SIL (Software-in-the-Loop) simulation. According to MIL and SIL, a test can be executed without the presence of ECU hardware.

このようなＭＩＬ／ＳＩＬの構築を支援するシステムとして例えば、MathWorks社から入手可能なシミュレーション・モデリング・システムである、MATLAB(R)/Simulink(R)がある。MATLAB(R)/Simulink(R)を使用すると、画面上にグラフィカル・インターフェースによって、機能ブロックを配置し、その処理の流れを指定することによって、シミュレーション・プログラムを作成することができる。ここで、機能ブロックは、和や積などの基本演算や積分や条件分岐、さらには、ソフトウエア・コードを呼び出すブロックなどの種類があり、それぞれが入力あるいは・および出力を持つ。なお、コントローラの表現としてソフトウエア・ブロックを含むシミュレーションがＳＩＬである。 As a system that supports the construction of such MIL / SIL, for example, there is MATLAB® / Simulink®, which is a simulation modeling system available from MathWorks. Using MATLAB (R) / Simulink (R), a simulation program can be created by placing functional blocks on the screen using a graphical interface and specifying the processing flow. Here, there are various types of functional blocks such as basic operations such as sum and product, integration and conditional branching, and further a block for calling software code, and each has an input and / or an output. A simulation including a software block as an expression of a controller is SIL.

こうして、MATLAB(R)/Simulink(R)上で、いったん機能ブロックを構成配置してブロック線図を作成すると、次に、Real-Time Workshop(R)の機能を利用して、シミュレーションの実行をさらに高速化することができる。すなわち、ソフトウエア・コード以外のブロックを、等価な機能のＣ言語など既知のコンピュータ言語のソース・コードに変換することにより、逐次解釈的な処理から、ホストに最適化された処理に置き換える。なお、Real-Time WOrkshop(R) の場合、コード変換を実施する単位は、モデルの全体あるいは、図１に参照番号１０２、１０４などで示すようなサブブロックの単位である。ここで、サブブロックは、モデルの可読性のために導入される階層的な構造を表現するための特別な境界を表す。例えば、サブブロック１０８は、参照番号１０８ａで示すような内部的な構造をもつ。 In this way, once the functional block is configured and arranged on MATLAB (R) / Simulink (R) to create a block diagram, the simulation is executed using the Real-Time Workshop (R) function. Furthermore, the speed can be increased. That is, blocks other than the software code are converted into a source code in a known computer language such as C language having an equivalent function, thereby replacing a sequential interpretation process with a process optimized for the host. In the case of Real-Time WOrkshop®, the code conversion unit is the entire model or a sub-block unit as indicated by reference numerals 102 and 104 in FIG. Here, the sub-block represents a special boundary for expressing a hierarchical structure introduced for the readability of the model. For example, the sub-block 108 has an internal structure as indicated by reference numeral 108a.

ところが、自動車や航空機などの制御システムは複雑なので、数千個〜数十万個の機能ブロックを含むことが普通である。従って、シミュレーション・モデルを設計して、所望のモデルを得るためには、何度も設計・コンパイル・実行を繰り返す必要がある。 However, control systems such as automobiles and airplanes are complex, and usually include several thousand to several hundred thousand functional blocks. Therefore, in order to design a simulation model and obtain a desired model, it is necessary to repeat design, compilation, and execution many times.

図１において、例えば、オペレータが、サブブロック１０２〜１１０などのうち、サブブロック１１０を編集してコンパイルして実行可能バイナリ・コードを得る。そして、その実行可能バイナリ・コードをコンピュータ・システム上で実行しで動作を確認し、所望の動作が得られないならば、再度サブブロック１１０を編集し、コンパイル＆実行するということを繰り返す。 In FIG. 1, for example, an operator edits and compiles the sub-block 110 among the sub-blocks 102 to 110 to obtain executable binary code. Then, the executable binary code is executed on the computer system to confirm the operation, and if the desired operation cannot be obtained, the sub-block 110 is edited again, compiled and executed repeatedly.

機能ブロックの数が増えるにつれて、結果のソースコードが増大するが、すると、ソースコード全体をフル・コンパイルする際のコンパイル時間も長くなる。 As the number of functional blocks increases, the resulting source code increases, but this also increases the compilation time when fully compiling the entire source code.

一般に大規模なモデルを編集するときは、編集の対象となるサブブロックは、例えば図１のサブブロック１１０のように、一部に限定されることが多い。特に、複合的なモデルの場合には、設計者が担当する部分を除いた他のサブブロックは、変更せずに固定化してブラックボックスのように扱われることが多い。そこで、編集の対象となるサブブロックを残して、それ以外をあらかじめコード変換しておくことで、実行速度と編集作業の両立を図ることができる。また、マルチ・プロセッサ、あるいは、マルチ・コアの環境で、分割コンパイルした個別の実行コードを、異なるプロセッサまたはコアに割り当てて並列実行させることにより、実行時間が短縮される。特に後者について、関連技術として次の特許公開公報に記述されている技法がある。 In general, when editing a large-scale model, a sub-block to be edited is often limited to a part, for example, a sub-block 110 in FIG. In particular, in the case of a complex model, the other sub-blocks except for the part for which the designer is responsible are often fixed and treated like a black box without being changed. Therefore, both the execution speed and the editing work can be achieved by leaving the sub-blocks to be edited and performing code conversion on the other blocks in advance. Also, in a multi-processor or multi-core environment, the execution time can be shortened by allocating and separately executing the separately compiled code to different processors or cores. Particularly for the latter, there is a technique described in the following patent publication as a related technique.

特開平７−１１４４８６号公報は、デバッグ用の出力文を含む逐次ループを、並列化する方法に関し、逐次ループ内のＩ／Ｏ文を検出し、Ｉ／Ｏ文がループ不変式による条件節に含まれること検出して、Ｉ／Ｏ文を含む依存を解析し、実行順序を守るための同期文を挿入し、並列処理用のループに変換することを開示する。 Japanese Patent Application Laid-Open No. 7-114486 relates to a method for parallelizing a sequential loop including an output statement for debugging, detects an I / O statement in the sequential loop, and the I / O statement becomes a conditional clause by a loop invariant expression. It is disclosed that the inclusion is detected, the dependency including the I / O statement is analyzed, a synchronous statement for keeping the execution order is inserted, and the loop is converted into a parallel processing loop.

特開２０１１−９６１０７号公報は、ブロック線図などで表されるプログラムにおいて、内部状態をもつ機能ブロックと、内部状態をもたない機能ブロックとの接続関係に基づき、機能ブロック毎に、使用ブロック集合／定義ブロック集合の数を求め、その数に基づき、複数のストランドを割り当て、これにより、ブロック線図を、ストランドに分けて処理を並列化することを開示する。 Japanese Patent Laid-Open No. 2011-96107 discloses a block used for each functional block in a program represented by a block diagram or the like based on a connection relationship between a functional block having an internal state and a functional block having no internal state. It is disclosed that the number of sets / definition block sets is obtained, and a plurality of strands are assigned based on the number, thereby dividing the block diagram into strands to parallelize the processes.

図２は、サブブロックにアラインした単純分割の例を示す。すなわち、図２では、領域２０２、２０４、２０６などに分割されている。しかしながら、MATLAB(R)/Simulink(R) のように、タイムステップに基づくシミュレーションの場合には、必ずしもサブブロックの単位でまとめてコード変換ができるとは限らない。たとえば、ブロックの実行順序をスケジュールするにはサブブロックの中（下位の階層）まで解析する必要があり、その結果、サブブロックの中ほど（途中）から計算を開始することが決定した場合には、実質的にサブブロックは前後に分断されなければならない。また、例えば、MATLAB(R)/Simulink(R)のScopeブロックのような結果を表示するブロックはコード変換することができず、したがって、そのようなコード化することができない機能ブロックを含むサブブロックは、その単位のままではコード化できない。以上のような制約を考慮して試みられた分割の例を図３に示す。すなわち、図３に示すように、分割は、コード化対象外ブロック３１４、３１６を除外し、スケジュールに合わせた実行開始コード３１０、３１２が先頭にくるように分割・再構成によりコード変換を適用することが、シミュレーションの高速化のためには望ましい。 FIG. 2 shows an example of simple division aligned to sub-blocks. That is, in FIG. 2, it is divided into regions 202, 204, 206, and the like. However, in the case of simulation based on time steps, such as MATLAB (R) / Simulink (R), code conversion is not always possible in units of sub-blocks. For example, in order to schedule the execution order of blocks, it is necessary to analyze up to the middle of the sub-block (lower hierarchy). In effect, the sub-block must be divided back and forth. Also, for example, blocks that display results, such as the MATLAB (R) / Simulink (R) Scope block, cannot be transcoded, and therefore include a functional block that cannot be coded. Cannot be coded in its units. FIG. 3 shows an example of division attempted in consideration of the above constraints. That is, as shown in FIG. 3, the division excludes the non-coding blocks 314 and 316 and applies code conversion by division and reconfiguration so that the execution start codes 310 and 312 according to the schedule come to the top. This is desirable for speeding up the simulation.

しかしながら、そのような分割および再構成の手法は自明ではなく、特に、並列化のための分割手順は組合せ問題となり、最適な組合せを短時間で計算することは一般に非常に困難である。そこで、限られた時間の中で準最適な解を求めることが強く求められる。 However, such a division and reconstruction method is not self-evident, and in particular, a division procedure for parallelization becomes a combination problem, and it is generally very difficult to calculate an optimum combination in a short time. Therefore, it is strongly required to find a suboptimal solution within a limited time.

特開平７−１１４４８６号公報JP-A-7-114486 特開２０１１−９６１０７号公報JP 2011-96107 A

この発明の目的は、基本処理の流れを表すブロック線図について、その実行速度を最適化するように分割することである。
この発明の他の目的は、その際に、実行速度を損なわないように、コード化対象外ブロックを分割の境界に配置するための技法を提供することにある。 An object of the present invention is to divide a block diagram representing the flow of basic processing so as to optimize its execution speed.
Another object of the present invention is to provide a technique for arranging non-coding blocks on the boundary of division so as not to impair the execution speed.

この発明は、上記課題を解決するためになされたものであり、この発明に従うシステムは先ず、ブロック線図を、ＤＡＧ（directed acyclic graph）のタスク・グラフに変換する。 The present invention has been made to solve the above problems, and a system according to the present invention first converts a block diagram into a DAG (directed acyclic graph) task graph.

次に、この発明に従うシステムは、上記タスク・グラフを、既知のアルゴリズムに従い一旦直列−並列グラフ（ＳＰＧ: series-parallel graph）に変換してから、構造（構文）解析を行い、一意の直列−並列木（ＳＰＴ: series-parallel parse tree）を得る。ＳＰＴは、直列実行ノードがそこから分岐するＳノードと、並列実行ノードがそこから分岐するＰノードを含む。また、末端のノードは、タスク・グラフのノードを表す。 Next, the system according to the present invention converts the task graph into a serial-parallel graph (SPG) according to a known algorithm, and then performs a structural (syntax) analysis to obtain a unique serial- Obtain a parallel-parse tree (SPT). The SPT includes an S node from which a serial execution node branches and a P node from which a parallel execution node branches. The terminal node represents a node of the task graph.

次に、この発明に従うシステムは、上記ＳＰＴを、コード化対象外ブロックの上位にＰノードが存在しなくなるまで、別のＳＰＴに変形する。 Next, the system according to the present invention transforms the SPT into another SPT until there is no P node above the non-coding target block.

こうして最適なＳＰＴが得られると、発明に従うシステムは、結果のＳＰＴのＰノードの下に連なるブロックを並列に実行するように、コードを生成しコンパイルして、異なるプロセッサまたはコアに割り当てて、並列実行させる。 Once the optimal SPT is thus obtained, the system according to the invention generates and compiles the code to execute in parallel the blocks under the P node of the resulting SPT, assigns it to different processors or cores, and Let it run.

なお、あるＤＡＧのタスク・グラフに対して一般的に、ノード実行順序の前後関係に矛盾しないＳＰＴは複数存在する。そこで、本発明の好適な一側面によれば、ＳＰＴに変換する前に、ＤＡＧのタスク・グラフは、点在するコード化対象外ブロックをなるべくマージさせるように変形される。これによって、結果のタスク・グラフの分割数の低減がはかられる。一般に、各分割のサイズが同程度であれば、全体におけるクリティカル・パスが小さくなって並列処理での高速化の効果が大きくなる。また、逐次処理であっても、分割数が小さければ、呼び出しのオーバヘッドが減少するために、実行の高速化に大いに貢献する。 Note that there are generally a plurality of SPTs that do not contradict the context of the node execution order for a DAG task graph. Therefore, according to a preferred aspect of the present invention, the DAG task graph is transformed so that scattered non-coding target blocks are merged as much as possible before the conversion into the SPT. As a result, the number of divisions of the resulting task graph can be reduced. In general, if the size of each division is about the same, the overall critical path becomes smaller and the effect of speeding up in parallel processing increases. Even in the case of sequential processing, if the number of divisions is small, the overhead of calls is reduced, which greatly contributes to speeding up execution.

この発明によれば、ブロック線図であらわされるプログラム・コードにおいて、コード化対象外ブロックの存在を考慮して、コードの分割数を減らし、並列実行コストのバランスのとれた分割を与える技法が提供される。 According to the present invention, in a program code represented by a block diagram, a technique for reducing the number of code divisions and giving a balanced division of parallel execution costs in consideration of the presence of non-coding blocks is provided. Is done.

タスク・グラフのコンパイルと実行を説明するための図である。It is a figure for demonstrating compilation and execution of a task graph. サブブロックにアラインした単純分割を示す図である。It is a figure which shows the simple division | segmentation aligned to the subblock. コード化対象外ブロックと実行スケジュールに合わせた最大分割を示す図である。It is a figure which shows the largest division | segmentation according to the non-coding object block and the execution schedule. 本発明を実施するためのハードウェアのブロック図である。It is a block diagram of the hardware for implementing this invention. 本発明を実施するための機能ブロック図である。It is a functional block diagram for implementing this invention. 本発明の処理全体の概要フローチャートを示す図である。It is a figure which shows the general | schematic flowchart of the whole process of this invention. タスク・グラフ変形処理のフローチャートを示す図である。It is a figure which shows the flowchart of a task graph deformation | transformation process. タスク・グラフ変形処理におけるリスト・スケジューリングの例を示す図である。It is a figure which shows the example of the list scheduling in a task graph deformation | transformation process. タスク・グラフをＳＰＴに変換する処理のフローチャートを示す図である。It is a figure which shows the flowchart of the process which converts a task graph into SPT. タスク・グラフをＳＰＴに変換する処理の例を示す図である。It is a figure which shows the example of the process which converts a task graph into SPT. タスク・グラフをＳＰＴに変換する処理の例を示す図である。It is a figure which shows the example of the process which converts a task graph into SPT. ＳＰＴ変形処理のフローチャートを示す図である。It is a figure which shows the flowchart of a SPT deformation | transformation process. ＳＰＴ変形処理の例を示す図である。It is a figure which shows the example of a SPT deformation | transformation process. ＳＰＴ変形処理の例を示す図である。It is a figure which shows the example of a SPT deformation | transformation process. ＳＰＴ変形処理における主要な処理を説明するための図である。It is a figure for demonstrating the main processes in a SPT deformation | transformation process. ＳＰＴ変形処理における主要な処理を説明するための図である。It is a figure for demonstrating the main processes in a SPT deformation | transformation process. 個別のプロセッサまたはコアに割り当てるコードを生成するための処理のフローチャートを示す図である。It is a figure which shows the flowchart of the process for producing | generating the code allocated to an individual processor or a core.

以下、図面を参照して、本発明の一実施例の構成及び処理を説明する。以下の記述では、特に断わらない限り、図面に亘って、同一の要素は同一の符号で参照されるものとする。なお、ここで説明する構成と処理は、一実施例として説明するものであり、本発明の技術的範囲をこの実施例に限定して解釈する意図はないことを理解されたい。 The configuration and processing of an embodiment of the present invention will be described below with reference to the drawings. In the following description, the same elements are referred to by the same reference numerals throughout the drawings unless otherwise specified. It should be understood that the configuration and processing described here are described as an example, and the technical scope of the present invention is not intended to be limited to this example.

先ず、図４を参照して、本発明を実施するために使用されるコンピュータのハードウェアについて説明する。図４において、ホスト・バス４０２には、複数のＣＰＵ１４０４ａ、ＣＰＵ２４０４ｂ、ＣＰＵ３４０４ｃ、・・・ＣＰＵｎ４０４ｎが接続されている。ホスト・バス４０２にはさらに、ＣＰＵ１４０４ａ、ＣＰＵ２４０４ｂ、ＣＰＵ３４０４ｃ、・・・ＣＰＵｎ４０４ｎの演算処理のためのメイン・メモリ４０６が接続されている。 First, with reference to FIG. 4, the hardware of a computer used to implement the present invention will be described. 4, a plurality of CPU1 404a, CPU2 404b, CPU3 404c,... CPUn 404n are connected to the host bus 402. Further connected to the host bus 402 is a main memory 406 for arithmetic processing of the CPU1 404a, CPU2 404b, CPU3 404c,..., CPUn 404n.

一方、Ｉ／Ｏバス４０８には、キーボード４１０、マウス４１２、ディスプレイ４１４及びハードティスク・ドライブ４１６が接続されている。Ｉ／Ｏバス４０８は、Ｉ／Ｏブリッジ４１８を介して、ホスト・バス４０２に接続されている。キーボード４１０及びマウス４１２は、オペレータが、コマンドを打ち込んだり、メニューをクリックするなどして、操作するために使用される。ディスプレイ４１４は、必要に応じて、本発明に係るプログラムをＧＵＩで操作するためのメニューを表示するために使用される。 On the other hand, a keyboard 410, a mouse 412, a display 414, and a hard disk drive 416 are connected to the I / O bus 408. The I / O bus 408 is connected to the host bus 402 via the I / O bridge 418. The keyboard 410 and the mouse 412 are used by an operator to operate by typing a command or clicking a menu. The display 414 is used to display a menu for operating the program according to the present invention using a GUI as necessary.

この目的のために使用される好適なコンピュータ・システムのハードウェアとして、ＩＢＭ（Ｒ）ＳｙｓｔｅｍＸがある。その際、ＣＰＵ１４０４ａ、ＣＰＵ２４０４ｂ、ＣＰＵ３４０４ｃ、・・・ＣＰＵｎ４０４ｎは、例えば、インテル（Ｒ）Ｘｅｏｎ（Ｒ）であり、オペレーティング・システムは、Ｗｉｎｄｏｗｓ（商標）Ｓｅｒｖｅｒ２００３である。オペレーティング・システムは、ハードティスク・ドライブ４１６に格納され、コンピュータ・システムの起動時に、ハードティスク・ドライブ４１６からメイン・メモリ４０６に読み込まれる。 IBM (R) System X is the preferred computer system hardware used for this purpose. At that time, CPU1 404a, CPU2 404b, CPU3 404c,..., CPUn 404n are, for example, Intel (R) Xeon (R), and the operating system is Windows (trademark) Server 2003. The operating system is stored on the hard disk drive 416 and is read from the hard disk drive 416 into the main memory 406 when the computer system is started.

本発明を実施するためには、マルチプロセッサ・システムを用いることが必要である。ここでマルチプロセッサ・システムとは、一般に、独立に演算処理し得るプロセッサ機能のコアを複数もつプロセッサを用いるシステムを意図しており、従って、マルチコア・シングルプロセッサ・システム、シングルコア・マルチプロセッサ・システム、及びマルチコア・マルチプロセッサ・システムのどれでもよいことを理解されたい。 In order to implement the present invention, it is necessary to use a multiprocessor system. Here, the multiprocessor system is generally intended to be a system using a processor having a plurality of cores of processor functions that can independently perform arithmetic processing. Therefore, a multicore single processor system or a single core multiprocessor system is used. And any multi-core multi-processor system.

なお、本発明を実施するために使用可能なコンピュータ・システムのハードウェアは、ＩＢＭ（Ｒ）ＳｙｓｔｅｍＸに限定されず、本発明のシミュレーション・プログラムを走らせることができるものであれば、任意のコンピュータ・システムを使用することができる。オペレーティング・システムも、Ｗｉｎｄｏｗｓ（Ｒ）ＸＰ、Ｗｉｎｄｏｗｓ（Ｒ）７、Ｌｉｎｕｘ（Ｒ）、ＭａｃＯＳ（Ｒ）など、任意のオペレーティング・システムを使用することができる。さらに、シミュレーション・プログラムを高速で動作させるために、ＰＯＷＥＲ（商標）６ベースで、オペレーティング・システムがＡＩＸ（商標）のＩＢＭ（Ｒ）ＳｙｓｔｅｍＰなどのコンピュータ・システムを使用してもよい。 The hardware of the computer system that can be used for carrying out the present invention is not limited to IBM (R) System X, and any hardware that can run the simulation program of the present invention can be used. A computer system can be used. As the operating system, any operating system such as Windows (R) XP, Windows (R) 7, Linux (R), Mac OS (R), or the like can be used. Further, in order to operate the simulation program at a high speed, a computer system such as IBM (R) System P whose operating system is AIX (trademark) based on POWER (trademark) 6 may be used.

ハードティスク・ドライブ４１６にはさらに、MATLAB(R)/Simulink(R)、Cコンパイラまたは、C++コンパイラ、後述する、タスクグラフ生成、タスクグラフ変形直接−並列木（ＳＰＴ）変換、ＳＰＴ変形などのためのモジュール、ＣＰＵ割り当て用コード生成モジュールなどが格納されており、オペレータのキーボードやマウス操作に応答して、メイン・メモリ４０６にロードされて実行される。タスクグラフ生成、タスクグラフ変形直接−並列木（ＳＰＴ）変換、ＳＰＴ変形などのためのモジュールは、Java(R)、C、C++、C#などの既存の任意のプログラミング言語で作成することができる。 The hard disk drive 416 further includes MATLAB® / Simulink®, C compiler or C ++ compiler, task graph generation, task graph modification direct-parallel tree (SPT) conversion, SPT modification, etc. Are stored in the main memory 406 and executed in response to an operator's keyboard or mouse operation. Modules for task graph generation, task graph transformation direct-parallel tree (SPT) transformation, SPT transformation, etc. can be created in any existing programming language such as Java (R), C, C ++, C #.

尚、使用可能なシミュレーション・モデリング・ツールは、MATLAB(R)/Simulink(R)に
限定されず、オープンソースのScilab/Scicosなど、タイム・ステップに基づくタスク・グラフで表現される任意のシミュレーション・モデリング・ツールを使用することが可能である。 Note that the simulation modeling tools that can be used are not limited to MATLAB (R) / Simulink (R), but any simulation model represented by a task graph based on time steps, such as open source Scilab / Scicos. Modeling tools can be used.

あるいは、場合によっては、シミュレーション・モデリング・ツールを使わず、直接、Ｃ、Ｃ＋＋などでシミュレーション・システムのソース・コードを書くことも可能であり、その場合にも、個々の機能が、互いに依存関係にある個別の機能ブロックとして記述できるなら、本発明は適用可能である。 Alternatively, in some cases, it is possible to write the source code of a simulation system directly in C, C ++, etc. without using a simulation modeling tool. In this case as well, individual functions depend on each other. The present invention can be applied if it can be described as individual functional blocks.

図５は、本発明の実施例に係る機能ブロック図である。各々のブロックは、基本的に、ハードティスク・ドライブ４１６に格納されているモジュールに対応する。 FIG. 5 is a functional block diagram according to the embodiment of the present invention. Each block basically corresponds to a module stored in the hard disk drive 416.

図５において、シミュレーション・モデリング・ツール５０２は、好適には、MATLAB(R)/Simulink(R)であり、それ以外に、Scilab/Scicosなど、コード化対象外ブロックを含み得る、任意のシミュレーション・モデリング・ツールを使用することができる。 In FIG. 5, the simulation modeling tool 502 is preferably MATLAB® / Simulink®, and in addition, any simulation model that can include non-coded blocks such as Scilab / Scicos. Modeling tools can be used.

モデル・データ５０４は、シミュレーション・モデリング・ツール５０２を用いて作成され、好適にはブロック線図で記述され、ハードディスク・ドライブ４１６に保存されている。そのブロック線図に含まれるブロックにおいて、コード化対象外ブロックは、予めマークされている。 The model data 504 is created using a simulation modeling tool 502, preferably described in a block diagram, and stored in the hard disk drive 416. In the blocks included in the block diagram, the non-coding target blocks are marked in advance.

メイン・ルーチン５０６は、本発明に係る処理の各種モジュールを呼び出し、実行するためのプログラムであり、好適には、ディスプレイ４１４上で、キーボード４１０やマウス４１２で操作して処理を行うためのＧＵＩなどのインターフェースを与える。 The main routine 506 is a program for calling and executing various modules of the processing according to the present invention, and preferably a GUI for performing processing by operating the keyboard 410 or the mouse 412 on the display 414. Give the interface.

ＤＡＧ変換モジュール５０８は、モデル・データ５０４を読み込んで、ＤＡＧのタスク・グラフに変換する機能をもつ。好適には、変換されたタスク・グラフのデータは、メイン・メモリ４０６上に必要な領域を確保して展開される。なお、コンピュータ・メモリ上にグラフを表現するデータ構造として、行列表現あるいはリスト表現などが知られているが、この実施例で、既知の任意のデータ構造を使用することができる。 The DAG conversion module 508 has a function of reading the model data 504 and converting it into a DAG task graph. Preferably, the converted task graph data is expanded while securing a necessary area on the main memory 406. As a data structure for representing a graph on a computer memory, a matrix representation or a list representation is known. In this embodiment, any known data structure can be used.

タスク・グラフ変形モジュール５１０は、ＤＡＧ変換モジュール５０８によって変換して得られたタスク・グラフに含まれるコード化対象外ブロックを可能な範囲でマージする機能をもつ。変形された結果のタスク・グラフは、好適には、メイン・メモリ４０６上に必要な領域を確保して展開される。この処理は後で、図７のフローチャートを参照して、より詳細に説明する。 The task graph transformation module 510 has a function of merging non-coding target blocks included in the task graph obtained by conversion by the DAG conversion module 508 to the extent possible. The task graph resulting from the transformation is preferably expanded while securing a necessary area on the main memory 406. This process will be described in detail later with reference to the flowchart of FIG.

ＳＰＴ変換モジュール５１２は、タスク・グラフ変形モジュール５１０によって変形して得られたタスク・グラフを、直接−並列木（ＳＰＴ）に変換する機能をもつ。ＳＰＴは、直列実行ノードがそこから分岐するＳノードと、並列実行ノードがそこから分岐するＰノードを含む。変換された結果のタスク・グラフは、好適には、メイン・メモリ４０６上に必要な領域を確保して展開される。この処理は後で、図９のフローチャートを参照して、より詳細に説明する。 The SPT conversion module 512 has a function of converting the task graph obtained by transformation by the task graph transformation module 510 into a direct-parallel tree (SPT). The SPT includes an S node from which a serial execution node branches and a P node from which a parallel execution node branches. The task graph obtained as a result of the conversion is preferably expanded while securing a necessary area on the main memory 406. This process will be described in detail later with reference to the flowchart of FIG.

ＳＰＴ変形モジュール５１４は、ＳＰＴ変換モジュール５１２によって得られたＳＰＴのグラフを、別の直接−並列木（ＳＰＴ）に変形する機能をもつ。変形は、コード化対象外ブロックの上位にＰノードが存在しなくなるまで行われる。変形された結果のタスク・グラフは、好適には、メイン・メモリ４０６上に必要な領域を確保して展開される。この処理は後で、図１２のフローチャートを参照して、より詳細に説明する。 The SPT transformation module 514 has a function of transforming the SPT graph obtained by the SPT transformation module 512 into another direct-parallel tree (SPT). The transformation is performed until there is no P node above the non-coding target block. The task graph resulting from the transformation is preferably expanded while securing a necessary area on the main memory 406. This process will be described later in detail with reference to the flowchart of FIG.

ＣＰＵ割当てモジュール５１６は、ＳＰＴ変形モジュール５１４の変形の結果得られたＳＰＴに基づき、個別のプロセッサあるいはコアに割り当てるためのコードを抽出する。抽出されたコードは、好適には、ハードティスク・ドライブ４１６に書き出される。このコードは、機能ブロックに対応する、好適にはＣまたはＣ＋＋のソースコードである。ＣＰＵ割当てモジュール５１６の処理は後で、図１７のフローチャートを参照して、より詳細に説明する。 The CPU allocation module 516 extracts codes for allocation to individual processors or cores based on the SPT obtained as a result of the deformation of the SPT deformation module 514. The extracted code is preferably written to the hard disk drive 416. This code is preferably C or C ++ source code corresponding to the functional block. The processing of the CPU allocation module 516 will be described in more detail later with reference to the flowchart of FIG.

コンパイラ５１８は、ＣＰＵ割当てモジュール５１６によって書き出されたコードをコンパイルして、実行可能コードを生成する。実行環境５２０は、個別のプロセッサあるいはコアに割り当てて並列実行させる。実行環境５２０は、好適には、ＣＰＵ割当てモジュール５１６が生成した補助的な情報を参照して実行可能コードを割り当てる。 The compiler 518 compiles the code written by the CPU allocation module 516 to generate executable code. The execution environment 520 is assigned to individual processors or cores and executed in parallel. The execution environment 520 preferably assigns executable code with reference to auxiliary information generated by the CPU assignment module 516.

実行環境５２０の一部として、MATLAB(R)/Simulink(R)を利用する場合には、それが未コンパイルの（サブ）ブロックとコンパイル済みのサブブロックを区別して、前者はインタプリターとして実行し、後者はバイナリーコードとしてホスト実行する。 When using MATLAB (R) / Simulink (R) as part of the execution environment 520, it distinguishes uncompiled (sub) blocks from compiled subblocks, and the former is executed as an interpreter. The latter runs as binary code on the host.

図６は、本発明の処理の全体の流れを示す概要フローチャートである。この後の詳細な説明は、この流れに沿って説明することになる。 FIG. 6 is a schematic flowchart showing the overall flow of the processing of the present invention. The subsequent detailed description will be described along this flow.

ステップ６０２では、ＤＡＧ変換モジュール５０８により、モデル・データがＤＡＧのタスク・グラフに変換される。 In step 602, the DAG conversion module 508 converts the model data into a DAG task graph.

ステップ６０４では、タスク・グラフ変形モジュール５１０により、コード化対象外ブロックを可能な範囲でマージするように、タスク・グラフが変形される。 In step 604, the task graph is transformed by the task graph transformation module 510 so as to merge the blocks to be encoded as far as possible.

ステップ６０６では、ＳＰＴ変換モジュール５１２により、タスク・グラフがＳＰＴに変換される。 In step 606, the task graph is converted into SPT by the SPT conversion module 512.

ステップ６０８では、ＳＰＴ変形モジュール５１４により、ＳＰＴが、コード化対象外ブロックの上位にＰノードが存在しなくなるように、別のＳＰＴに変換される。 In step 608, the SPT transformation module 514 converts the SPT into another SPT so that no P node exists above the non-coding target block.

ステップ６１０では、ＳＰＴ変形モジュール５１４により生成されたＳＰＴに基づき、ＣＰＵ割当てモジュール５１６により、各ＣＰＵに割り当てるためのコードが生成される。 In step 610, based on the SPT generated by the SPT deformation module 514, a code for allocation to each CPU is generated by the CPU allocation module 516.

ステップ６１２では、コンパイラ５１８により、生成されたコードがコンパイルされ、ステップ６１４では、コンパイルされたコードが実行される。 In step 612, the generated code is compiled by the compiler 518, and in step 614, the compiled code is executed.

以下、各ステップについて詳細に説明するが、その前に、ＤＡＧおよびＳＰＴ上のノードに関する用語について定義を与えておく。
xノード : コード化対象外ノード
非xノード：コード化対象ノード
ＳＰＴ上のノードに関する用語
ノードの順序 : 木の(Tree)左優先の深さ優先探索(Depth first search)で辿られる順番で決定する。ＳＰＴ中のある二つのノードに対して、辿られる順番が他方より早いものを前、他方より遅い方を後ろとする。
根ノード : 一般的な定義に順ずる。
葉ノード : 子ノードを持たないノード。ＤＡＧ上のノードに対応する。（一般的な定義と同じ）
内部ノード : 葉ノードでないノード（一般的な定義と同じ）
親ノード : 一般的な定義に順ずる
子ノード : 一般的な定義に順ずる
兄ノード : 親ノードが同じノードのうち、対象ノードより前にあるノード
弟ノード : 親ノードが同じノードのうち、対象ノードより後ろにあるノード
兄弟ノード : 兄ノードおよび弟ノード
祖父ノード : 親ノードの親ノード
曽祖父ノード : 祖父ノードの親ノード
Sノード : 子ノードの直列実行を表す内部ノード
Pノード : 子ノードの並列実行を表す内部ノード Hereinafter, each step will be described in detail. Before that, terms relating to nodes on the DAG and the SPT are defined.
x node: non-coding target node non-x node: order of term nodes related to nodes on coding target node SPT: determined in the order followed by depth-first search of tree (Tree) left priority . For two nodes in the SPT, the order in which they are traced is earlier than the other, and the later is the later.
Root node: Follows the general definition.
Leaf node: A node that has no children. Corresponds to a node on the DAG. (Same as general definition)
Internal node: a node that is not a leaf node (same as the general definition)
Parent node: Child node that conforms to general definition: Brother node that conforms to general definition: Node whose parent node is the same, node before target node Brother node: Target node of the same parent node Node sibling nodes after the node: Brother nodes and brother nodes
Grandfather node: Parent node of parent node Great grandfather node: Parent node of grandfather node
S node: An internal node representing the serial execution of child nodes
P node: An internal node representing the parallel execution of child nodes

また、ＳＰＴの構成に関する補足について述べると、ＳＰＴ上では、Ｓノード同士またはＰノード同士が親子関係になることはないものとする。仮に同種の内部ノードが親子関係になった場合には、当該子ノードを切り離し、当該子ノードの子ノードを順序を維持したまま親ノードに移動させる（このとき移動されるノード以下のノードも引き連れて移動するものとする）処理を自動的に適用するものとする。上記処理は、木によって表現される直列・並列実行の意味合いを変化させない。 In addition, a supplement regarding the configuration of the SPT will be described. On the SPT, S nodes or P nodes do not have a parent-child relationship. If the same type of internal node has a parent-child relationship, the child node is disconnected, and the child node of the child node is moved to the parent node while maintaining the order (at this time, the nodes below the moved node are also brought along) Process automatically). The above processing does not change the meaning of serial / parallel execution expressed by a tree.

以上の定義と補足に基づき、図６の各ステップを順に説明していく。先ず、モデル・データをＤＡＧのタスク・グラフに変換するステップ６０２であるが、ＤＡＧ変換モジュール５０８は、モデル・データ５０４を読み込んで、ＤＡＧのタスク・グラフに変換する。このとき、図３にあるような、スケジュールの先頭になるブロックを解釈して、自己ループのないＤＡＧ構造へと変換する。好適には、変換されたタスク・グラフのデータは、メイン・メモリ４０６上に必要な領域を確保して展開される。ＤＡＧ変換モジュール５０８は、ブロック線図として記述されたモデル・データ５０４において、入力のない機能ブロックや実行のスケジュール上先頭に来る機能ブロックを全体のソースとして、またノード間の接続を有向エッジとしてＤＡＧ（無閉路有向グラフ）を構成する。全体のシンクとなるのは、出力のない機能ブロックや実行上、最後にスケジュールされる機能ブロックなどである。 Based on the above definitions and supplements, the steps in FIG. 6 will be described in order. First, in step 602 of converting model data into a DAG task graph, the DAG conversion module 508 reads the model data 504 and converts it into a DAG task graph. At this time, the block at the head of the schedule as shown in FIG. 3 is interpreted and converted to a DAG structure having no self-loop. Preferably, the converted task graph data is expanded while securing a necessary area on the main memory 406. In the model data 504 described as a block diagram, the DAG conversion module 508 uses a function block having no input or a function block that comes first in the execution schedule as a whole source and a connection between nodes as a directed edge. Constructs a DAG (acyclic directed graph). The entire sink is a function block without output or a function block scheduled last in execution.

タスク・グラフ変形モジュール５１０が、コード化対象外ブロックを可能な範囲でマージするように、タスク・グラフを変形するステップ６０４は、タスク・グラフに対して以下の性質を持つリストスケジューリングを行い、スケジュール上で隣接するコード化対象外ノード（X）をひとつにマージする処理を行う。 Step 604 of transforming the task graph so that the task graph transformation module 510 merges non-coding target blocks as much as possible performs list scheduling with the following properties on the task graph, and schedule Perform the process of merging adjacent non-coding target nodes (X) into one.

このとき、直前にスケジュールされたノードがxであれば可能な限りxをスケジュールし、直前にスケジュールされたノードがxでなければ可能な限りxでないノードをスケジュールする。 At this time, if the node scheduled immediately before is x, x is scheduled as much as possible, and if the node scheduled immediately before is not x, a non-x node is scheduled as much as possible.

ここでの目的は、タスクノードの前後関係に矛盾しない範囲でなるべくxをひとつの塊にすることである。これにより、次のステップで抽出されるＳＰＴ上で x が分散配置されて最終的なコードの分割数を増やしたり、最終的に利用(exploit)できないxとの並列性の抽出により本来利用できる並列性が抽出されない事態を極力回避する。 The purpose here is to make x a single block as much as possible within a range that does not contradict the context of task nodes. Thus, x is distributed and arranged on the SPT extracted in the next step to increase the number of final code divisions, or the parallel that can be originally used by extracting parallelism with x that cannot be finally used (exploit). Avoid situations where sex is not extracted as much as possible.

図７は、図６のステップ６０４に対応する、タスク・グラフ変形モジュール５１０の処理のフローチャートを示す図である。この処理は、xノード、非xノードをもつＤＡＧを入力とし、直列スケジュールSを出力する。直列スケジュールSは、タスク・グラフにおいて、マージされた単一のノードとして扱われる。 FIG. 7 is a flowchart of the process of the task / graph transformation module 510 corresponding to step 604 of FIG. In this process, a DAG having x nodes and non-x nodes is input and a serial schedule S is output. The serial schedule S is treated as a single merged node in the task graph.

さて、図７のステップ７０２において、タスク・グラフ変形モジュール５１０は、空のリストとしてS、ＤＡＧの全ノード（未スケジュールノード集合）としてCを用意する。 In step 702 of FIG. 7, the task / graph transformation module 510 prepares S as an empty list and C as all nodes (unscheduled node set) of the DAG.

ステップ７０４でタスク・グラフ変形モジュール５１０は、親をC中に持たないC中の非xノードの集合としてRoを、親をC中に持たないC中のxノードの集合としてRxを、それぞれ用意する。 In step 704, the task graph transformation module 510 prepares Ro as a set of non-x nodes in C that do not have a parent in C, and Rx as a set of x nodes in C that do not have a parent in C. To do.

ステップ７０６でタスク・グラフ変形モジュール５１０は、Ro = φ ∧ Rx = φすなわち、Roが空集合且つRxが空集合かどうか判断する。 In step 706, the task graph transformation module 510 determines whether Ro = φ φRx = φ, that is, whether Ro is an empty set and Rx is an empty set.

そうでないなら、タスク・グラフ変形モジュール５１０は、ステップ７０８で、直前にSに加えられたノードがxノードかどうか判断し、そうならステップ７１０で、Rx = φかどうか判断する。 If not, the task graph transformation module 510 determines in step 708 whether the node just added to S is an x node, and if so, in step 710 determines whether Rx = φ.

ステップ７１０でRx = φでないと判断すると、タスク・グラフ変形モジュール５１０は、ステップ７１２で、Rxから１つノードを取り出し、そのノードはRxから削除し、そのノードをSの最後に加える。更にそのノードをCから削除する。そして、ステップ７０４に戻る。 If it is determined in step 710 that Rx = φ is not true, task graph transformation module 510 extracts one node from Rx, deletes it from Rx, and adds the node to the end of S in step 712. Furthermore, the node is deleted from C. Then, the process returns to step 704.

ステップ７１０に戻って、Rx = φであると判断すると、タスク・グラフ変形モジュール５１０は、ステップ７１４に進み、Ro = φかどうか判断する。 Returning to step 710, if it is determined that Rx = φ, the task graph transformation module 510 proceeds to step 714 and determines whether Ro = φ.

ステップ７１４でRo = φでないと判断すると、タスク・グラフ変形モジュール５１０は、ステップ７１６で、Roから１つノードを取り出し、そのノードはRoから削除し、そのノードをSの最後に加える。更にそのノードをCから削除する。そして、ステップ７０４に戻る。 If it is determined in step 714 that Ro = φ is not true, task graph transformation module 510 retrieves one node from Ro, deletes it from Ro, and adds the node to the end of S in step 716. Furthermore, the node is deleted from C. Then, the process returns to step 704.

ステップ７０８に戻って、直前にSに加えられたノードがxノードでないなら、ステップ７１４に進み、タスク・グラフ変形モジュール５１０はRo = φかどうか判断し、もしそうなら、ステップ７１０に進む。ステップ７１０での処理は、上述のとおりである。ステップ７１４でRx = φでないと判断された場合の処理も上述のとおりである。 Returning to step 708, if the node added to S just before is not an x node, proceed to step 714, where the task graph transformation module 510 determines if Ro = φ, and if so, proceed to step 710. The processing in step 710 is as described above. The processing in the case where it is determined in step 714 that Rx = φ is also as described above.

こうして、ステップ７０４を経由してステップ７０６に戻った時点で、Roが空集合且つRxが空集合であると判断すると、タスク・グラフ変形モジュール５１０は、ステップ７１８で、S中で隣接するxノードを１つの集合とし、その集合毎に含まれるxノードをＤＡＧ上で１つのノードにマージする処理を行い、処理を終わる。 Thus, upon returning to step 706 via step 704, if it is determined that Ro is an empty set and Rx is an empty set, task graph transformation module 510 determines in step 718 the adjacent x nodes in S. Is set as one set, and the process of merging x nodes included in each set into one node on the DAG is completed.

図８は、図７の処理の様子を模式的に示す図である。すなわち、図８(a)に示すようなタスク・グラフが与えられると、図７のフローチャートに示すステップ７０２〜７１６の処理で、図８(b)に示すリスト・スケジューリングが行われる。そこで、ステップ７１８の処理で、隣接するxノードを１つの集合とし、その集合毎に含まれるxノードをＤＡＧ上で１つのノードにマージすることで、図８(c)に示す変更されたタスク・グラフが得られる。 FIG. 8 is a diagram schematically showing the state of the processing of FIG. That is, when a task graph as shown in FIG. 8A is given, list scheduling shown in FIG. 8B is performed in the processing of steps 702 to 716 shown in the flowchart of FIG. Therefore, in the process of step 718, the adjacent x nodes are made into one set, and the x nodes included in each set are merged into one node on the DAG, whereby the changed task shown in FIG.・ A graph is obtained.

図９は、ＳＰＴ変換モジュール５１２の処理のフローチャートを示す図である。この処理は、図６のステップ６０６に対応する。この処理は、ＤＡＧを入力とし、ＳＰＴを出力する。 FIG. 9 is a diagram illustrating a flowchart of processing of the SPT conversion module 512. This process corresponds to step 606 in FIG. In this process, DAG is input and SPT is output.

ステップ９０２で、ＳＰＴ変換モジュール５１２は、ＤＡＧを直列−並列グラフに変換して、それをＳＰＧとおく。ＤＡＧを直列−並列グラフに変換する処理として、例えば、A. Gonzalez Escribano, V. Cardenoso and A.J.C. van Gemund, “Conversion from NSP to SP graphs”, Tech. Rep. TR-DINFO-01-97, Universidad de Valladolid, Valladolid (Spain), 1997などに記述されている技法を使用することができる。また、ステップ９０２では、ＳＰＧの構造を解析して、ＳＰＴ（series-parallel parsed tree）を得る。
ここで、ＳＰＴ上の葉ノードはＳＰＧのノードに対応する。また、非端末（非葉）ノードはＳＰＧ上には存在しないが、間接的にノード間の接続順序を表す。 In step 902, the SPT conversion module 512 converts the DAG into a serial-parallel graph and puts it as SPG. For example, A. Gonzalez Escribano, V. Cardenoso and AJC van Gemund, “Conversion from NSP to SP graphs”, Tech. Rep. TR-DINFO-01-97, Universidad de Techniques described in Valladolid, Valladolid (Spain), 1997, etc. can be used. In step 902, the structure of the SPG is analyzed to obtain an SPT (series-parallel parsed tree).
Here, leaf nodes on the SPT correspond to SPG nodes. Also, non-terminal (non-leaf) nodes do not exist on the SPG, but indirectly represent the connection order between the nodes.

次にステップ９０４で、ＳＰＴ変換モジュール５１２は、ＳＰＧ上のノード数が1以下かどうか判断し、そうでないなら、ステップ９０６に進む。 Next, in step 904, the SPT conversion module 512 determines whether the number of nodes on the SPG is 1 or less, and if not, proceeds to step 906.

ステップ９０６でＳＰＴ変換モジュール５１２は、以下の処理を行う。すなわち、ＳＰＧ上で直列に接続しているノードのパスを見つける。ここで、直列に接続しているノードのパスとは、親を複数持つ、または親を持たないノードから、子を複数持つ、または子を持たないノードまでの、親と子を１つずつ持つノードによって構成されるパスを指す。 In step 906, the SPT conversion module 512 performs the following processing. That is, a path of nodes connected in series on the SPG is found. Here, the path of a node connected in series is a node with one parent and one child from a node having multiple parents or no parent to a node having multiple children or no children. Points to a path composed of nodes.

そして、ＳＰＴ変換モジュール５１２は、各々のパスLに対して、以下の処理を行う。
ＳＰＧ上のLに属するすべてのノードを一つのノードSに置き換える（このとき、Lの外部のノードとの接続関係は維持される）。
同時に、SをＳＰＴ上の一つのノードとし、L中のノードに該当するＳＰＴ上のノードをL中での順序順にＳＰＴ上のSの子ノードとして加える。ここで、ＳＰＧ上に構成されていくノードは、対応するノードがＳＰＴ上にも構築されていくため、つねに互いに対応関係が取れる。 Then, the SPT conversion module 512 performs the following processing for each path L.
All nodes belonging to L on the SPG are replaced with one node S (at this time, the connection relationship with nodes outside L is maintained).
At the same time, S is set as one node on the SPT, and a node on the SPT corresponding to the node in the L is added as a child node of the SPT on the SPT in the order in the order in the L. Here, the nodes configured on the SPG can always be associated with each other because the corresponding nodes are also constructed on the SPT.

次に、ＳＰＴ変換モジュール５１２は、ステップ９０８で、ＳＰＧ上で並列関係にあるノードの集合を見つける。ここで、並列関係にあるノードの集合とは、親および子ノードが一致（存在しない場合を含む）しているノードの集合を指す。 Next, in step 908, the SPT conversion module 512 finds a set of nodes in parallel relation on the SPG. Here, the set of nodes in a parallel relationship refers to a set of nodes whose parent and child nodes match (including a case where they do not exist).

また、ＳＰＴ変換モジュール５１２は、各々の集合Cに対して、以下の処理を行う。
ＳＰＧ上のCに属するすべてのノードを一つのノードPに置き換える（このとき、Pの外部のノードとの接続関係は維持される）。
同時に、PをＳＰＴ上の一つのノードとし、C中のノードに該当するＳＰＴ上のノードを、ＳＰＴ上のPの子ノードとして加える。ここでも、ＳＰＧ上に構成されていくノードは、対応するノードがＳＰＴ上にも構築されていくため、つねに互いに対応関係が取れることに留意されたい。 Further, the SPT conversion module 512 performs the following processing for each set C.
All the nodes belonging to C on the SPG are replaced with one node P (at this time, the connection relationship with a node outside P is maintained).
At the same time, P is set as one node on the SPT, and a node on the SPT corresponding to the node in C is added as a child node of P on the SPT. Again, it should be noted that the nodes configured on the SPG are always associated with each other because the corresponding nodes are also built on the SPT.

ステップ９０８の後は、ＳＰＴ変換モジュール５１２は、ステップ９０４に戻り、ＳＰＴ変換モジュール５１２は、ＳＰＧ上のノード数が1以下かどうか判断し、もしそうなら、処理を終わる。 After step 908, the SPT conversion module 512 returns to step 904, and the SPT conversion module 512 determines whether or not the number of nodes on the SPG is 1 or less, and if so, ends the process.

図１０は、図９に示すフローチャートの処理に従い、ＤＡＧをＳＰＴに変換する例を示す図である。すなわち、図１０(a)のＤＡＧが、図１０(b)のＳＰＴに変換される。その際、ＳＰＴでは、子ノードの直列実行を表す内部ノードであるSノードと、子ノードの並列実行を表す内部ノードであるPノードが生成されていることが見て取れる。 FIG. 10 is a diagram illustrating an example in which DAG is converted into SPT in accordance with the processing of the flowchart illustrated in FIG. 9. That is, the DAG in FIG. 10A is converted into the SPT in FIG. At that time, it can be seen that in the SPT, an S node that is an internal node representing serial execution of child nodes and a P node that is an internal node representing parallel execution of child nodes are generated.

参考までに、図１１は、タスク・グラフ変形モジュール５１０の変形を経ていないＤＡＧをＳＰＴに変換する例を示す図である。図１１(b)に示すＳＰＴは、図１０(b)のＳＰＴに比較して、xノードが葉ノードとして散在していることが見て取れる。 For reference, FIG. 11 is a diagram showing an example of converting a DAG that has not undergone the transformation of the task graph transformation module 510 into an SPT. In the SPT shown in FIG. 11B, it can be seen that the x nodes are scattered as leaf nodes as compared to the SPT in FIG. 10B.

図１２は、ＳＰＴ変形モジュール５１４の処理のフローチャートを示す図である。この処理は、図６のステップ６０８に対応する。この処理は、ＳＰＴを入力とし、ＳＰＴを出力する。 FIG. 12 is a diagram showing a flowchart of processing of the SPT deformation module 514. This process corresponds to step 608 in FIG. This process takes SPT as input and outputs SPT.

ステップ１２０２で、ＳＰＴ変形モジュール５１４は、ＳＰＴの根ノードがPノードである場合、新たな根ノードとしてSノードを設け、元の根であったPノードを当該Sノードの子ノードとする。また、上位にPノードを持つＳＰＴ上のすべてのxノードの集合をXとする。 In step 1202, if the root node of the SPT is a P node, the SPT transformation module 514 provides an S node as a new root node, and sets the P node that was the original root as a child node of the S node. Also, let X be the set of all x nodes on the SPT with P nodes in the upper level.

ステップ１２０４で、ＳＰＴ変形モジュール５１４は、Xが空集合かどうか判断する。もし空集合なら、処理を終わる。 In step 1204, the SPT deformation module 514 determines whether X is an empty set. If it is an empty set, the process ends.

もしXが空集合でないなら、ステップ１２０６に進み、ＳＰＴ変形モジュール５１４は、Xから１つの要素xを取り出す（xはXから削除）。 If X is not an empty set, the process proceeds to step 1206, where the SPT transformation module 514 extracts one element x from X (x is deleted from X).

ステップ１２０８で、ＳＰＴ変形モジュール５１４は、xの親がSかどうか判断する。そうでなければ、ＳＰＴ変形モジュール５１４はステップ１２１０で、xを祖父ノードとして、親ノードの直前に移動する。このとき、ＳＰＴ上のノードを移動する場合、その子ノード以下もすべて引き連れて移動することとする。 In step 1208, the SPT deformation module 514 determines whether the parent of x is S. Otherwise, the SPT transformation module 514 moves to step 1210 with x as the grandfather node and immediately before the parent node. At this time, when moving a node on the SPT, all the children below the child node are also moved.

ステップ１２１２で、ＳＰＴ変形モジュール５１４は、xの祖先にPノードが存在するかどうか判断する。もしそうなら、処理はステップ１２０８に戻り、そうでなければ処理はステップ１２０４に戻る。 In step 1212, the SPT transformation module 514 determines whether there is a P node in the ancestor of x. If so, processing returns to step 1208; otherwise processing returns to step 1204.

ステップ１２０８に戻って、xの親がSであると判断すると、処理はステップ１２１４に進み、そこで、ＳＰＴ変形モジュール５１４は、xの兄ノードの重みの和 < xの弟ノードの重みの和であるかどうか判断する。ここで、ＳＰＴ上のノードの重みとは、そのノード以下の葉ノードの重みの和である。葉ノードは元のＤＡＧ上のノードであり、ＤＡＧ上のノードはシミュレーション・モデル中の特定の処理と対応づいている。その処理の計算時間をＤＡＧ上のノードの重みとする。 Returning to step 1208, if it is determined that the parent of x is S, the process proceeds to step 1214, where the SPT transformation module 514 has the sum of the weights of the brother nodes of x <the sum of the weights of the brother nodes of x. Determine if there is. Here, the weight of the node on the SPT is the sum of the weights of the leaf nodes below the node. The leaf node is a node on the original DAG, and the node on the DAG is associated with a specific process in the simulation model. The calculation time of the process is set as the weight of the node on the DAG.

もしxの兄ノードの重みの和 < xの弟ノードの重みの和であると判断すると、ＳＰＴ変形モジュール５１４は、ステップ１２１６で、xの兄ノードおよびxを、順序を維持したままxの曽祖父ノードの子として、xの祖父ノードの直前に移動し、ステップ１２１２に戻る。xの兄ノードの重みの和 < xの弟ノードの重みの和ではないと判断すると、ＳＰＴ変形モジュール５１４は、ステップ１２１８で、xおよびxの弟ノードを、順序を維持したままxの曽祖父ノードの子として、xの祖父ノードの直後に移動し、ステップ１２１２に戻る。 If it is determined that the sum of the weights of the brother nodes of x is less than the sum of the weights of the brother nodes of x, the SPT transformation module 514 determines in step 1216 that the brother nodes and x of x are kept in order and As a child of the grandfather node, the node moves immediately before the grandfather node of x and returns to step 1212. When determining that the sum of the weights of the brother nodes of x is not the sum of the weights of the brother nodes of x, the SPT transformation module 514 determines in step 1218 that the sibling nodes of x and x are maintained in order, and the great-grandfather of x As a child of the node, move immediately after the grandfather node of x, and return to step 1212.

ステップ１２１２からは、そこでの判断に応じて、処理はステップ１２０８またはステップ１２０４に戻る。 From step 1212, processing returns to step 1208 or step 1204, depending on the determination made there.

図１３は、図１２に示すフローチャートの処理に従い、ＳＰＴを変形する例を示す図である。すなわち、図１３(b)では、変形後のＳＰＴでは最早、コード化対象外ブロックxの上位にPノードが存在しないことが見て取れる。 FIG. 13 is a diagram illustrating an example of modifying the SPT in accordance with the processing of the flowchart illustrated in FIG. That is, in FIG. 13B, it can be seen that there is no P node above the non-coding target block x in the transformed SPT.

参考までに、図１４は、タスク・グラフ変形モジュール５１０の変形を経ていないＤＡＧから変換されたＳＰＴ（図１４(a))を、図１２に示すフローチャートの処理に従い、変形した例を示す図である。図１４(b)ではやはり、変形後のＳＰＴでは最早、コード化対象外ブロックxの上位にPノードが存在しているが、コード化対象外ブロックxが、トップのSノードの下に分散していることが見て取れる。 For reference, FIG. 14 is a diagram showing an example in which the SPT (FIG. 14 (a)) converted from the DAG that has not undergone the transformation of the task / graph transformation module 510 is transformed in accordance with the process of the flowchart shown in FIG. is there. In FIG. 14 (b), the P node already exists in the upper part of the non-coding target block x in the modified SPT, but the non-coding target block x is distributed below the top S node. You can see that.

図１５及び図１６は、ＳＰＴの変形の途中の様子を示すである。先ず、図１５(a)に示すように、コード化対象外ブロックのノードxが、Pの直下にあるなら、変形処理は、図１５(b)または図１５(c)に示すように、ノードxを、祖父ノードSの子ノードとして、親ノードPの前または後ろに移動することである。すなわち、図１５(b)がノードxを親ノードPの後ろに移動することに対応し、図１５(c)がノードxを親ノードPの前に移動することに対応する。このようにしても、依存性の違反にはならない。 15 and 16 show a state in the middle of deformation of the SPT. First, as shown in FIG. 15 (a), if the node x of the non-coding target block is directly under P, the transformation process is performed as shown in FIG. 15 (b) or FIG. 15 (c). x is moved as a child node of the grandfather node S before or behind the parent node P. That is, FIG. 15B corresponds to moving the node x behind the parent node P, and FIG. 15C corresponds to moving the node x before the parent node P. This does not violate dependencies.

すると、図１６のような状態に移行する。これは、図１６(a)に示すように、ノードxが、Sの直下にある場合である。この場合の変形処理は、図１６(b)または図１６(c)に示すように、ノードxと、その同一の親Sの下にある先行ノードまたは後続ノードを、曽祖父ノードSの子として、祖父ノードPの前または後ろに移動することである。図１６(b)が後続ノードを祖父ノードPの後に移動することに対応し、図１６(c)が先行ノードを祖父ノードPの前に移動することに対応する。このようにしても、依存性の違反にはならない。この際、移動の方向によってPの下から移動するノードのワークロードが異なるため、小さい方を選択して移動させる。 Then, the state shifts to a state as shown in FIG. This is a case where the node x is directly under S as shown in FIG. In this case, as shown in FIG. 16 (b) or FIG. 16 (c), the transformation process uses the node x and the preceding node or the succeeding node under the same parent S as children of the great grandfather node S. Is to move in front of or behind the grandfather node P. 16B corresponds to moving the subsequent node after the grandfather node P, and FIG. 16C corresponds to moving the preceding node before the grandfather node P. This does not violate dependencies. At this time, since the workload of the node moving from under P differs depending on the moving direction, the smaller one is selected and moved.

図１７は、ＣＰＵ割当てモジュール５１６が、変形された結果のＳＰＴに基づき、複数のＣＰＵに個別に割り当てるコードに対応するＳＰＴを生成する処理のフローチャートを示す図である。この処理は、図６のステップ６１０に対応する。この処理は、ＳＰＴを入力とし、ＳＰＴを出力する。以下で、maxは、利用可能なプロセッサまたはコアの数である。 FIG. 17 is a diagram illustrating a flowchart of processing in which the CPU allocation module 516 generates SPTs corresponding to codes that are individually allocated to a plurality of CPUs based on the modified SPT. This process corresponds to step 610 in FIG. This process takes SPT as input and outputs SPT. In the following, max is the number of available processors or cores.

ステップ１７０２で、ＣＰＵ割当てモジュール５１６は、ＳＰＴの根ノードの子ノードのうち、Pノードの集合としてTを用意する。 In step 1702, the CPU allocation module 516 prepares T as a set of P nodes among the child nodes of the root node of the SPT.

ステップ１７０４で、ＣＰＵ割当てモジュール５１６は、Tが空集合かどうか判断し、もしそうなら処理を終了する。そうでなければ、ステップ１７０６に進み、Tから一つのPノードqを取り出す。qはTから削除する。また、Yを、qおよび、qの下位にあるすべてのPノードの集合とする。
また、e_max := 0,U_max := φ,p_max := nullとする。 In step 1704, the CPU allocation module 516 determines whether T is an empty set, and if so, ends the process. Otherwise, go to Step 1706 to extract one P node q from T. q is deleted from T. Also, let Y be the set of q and all P nodes under q.
Also, e _max : = 0, U _max : = φ, p _max : = null.

ステップ１７０８で、ＣＰＵ割当てモジュール５１６は、Yが空集合かどうか判断し、そうでなければ、ステップ１７１０で、Yから１つのPノードpを取り出す。そしてpはYから削除する。さらに、利用可能なプロセッサ数（n）分の空集合（U₁, …U_n）を用意し、pの各子ノードを、各集合の重みがなるべく均等になるようU₁, … U_nに配分する（このために bin packing algorithmなどを利用する）
また、以下の変数を用意する。
m := 配分された集合ごとに属するノードの重みの和を計算し、その中で最大の値
e := pの全子ノードの重みの和 - m
e_max < e なら e_max := e, U_max = {U₁, …, U_n} , p_max := p とする。この後、処理は、ステップ１７０８に戻る。 In step 1708, the CPU allocation module 516 determines whether Y is an empty set, otherwise, in step 1710, retrieves one P node p from Y. And p is deleted from Y. Furthermore, an empty set (U ₁ ,... U _n ) corresponding to the number of available processors (n) is prepared, and each child node of p is set to U ₁ ,… U _n so that the weights of each set are as equal as possible. Allocate (use bin packing algorithm etc. for this)
Also, prepare the following variables.
m: = Calculates the sum of the weights of the nodes belonging to each allocated set, and the maximum value among them
sum of weights of all child nodes of e: = p-m
If e _max <e, e _max : = e, U _max = {U ₁ ,…, U _n } and p _max : = p. Thereafter, the processing returns to Step 1708.

そこで、ステップ１７０８に戻って、Yが空集合であると判断されると、ＣＰＵ割当てモジュール５１６は、ステップ１７１２で、l := q の下位の葉ノードで、p_maxの下位でない葉ノードの内、p_maxよりも前の葉ノードのリストとする。
また、r := q の下位の葉ノードで、p_max の下位でない葉ノードの内、p_maxよりも後ろの葉ノードのリストとする。
こうしておいて、ＣＰＵ割当てモジュール５１６は、lに含まれる葉ノードを根ノードの子ノードとしてqの直前に、rに含まれる葉ノードを根ノードの子ノードとして qの直後にリスト内の順序を維持したまま移動する。そして、qをp_maxで置き換える。
そしてさらに、ＣＰＵ割当てモジュール５１６は、以下の処理を行う。
p_maxの子ノードを切り離す（一度子ノードがいない状態になる）。
U_maxの要素のうち、空でない各集合U_iに対し、新しいSノードをp_maxの子ノードとして作成する。
各 U_iに含まれるすべてのノードの下位の葉ノードを、順序関係を維持したまま対応するSノードの子ノードとする。 Therefore, returning to step 1708, if it is determined that Y is an empty set, the CPU allocation module 516 determines in step 1712 that the leaf node is a lower leaf node of l: = q and not lower than p _max . , P _max is a list of leaf nodes before p _max .
In addition, r: = the lower of the leaf node of q, of the leaf node is not a subordinate of p _max, a list of the back of the leaf node than p _max.
In this way, the CPU allocation module 516 sets the order in the list immediately after q with the leaf node included in l as the child node of the root node and immediately before q with the leaf node included in r as the child node of the root node. Move while maintaining. Then replace q with p _max .
Further, the CPU allocation module 516 performs the following processing.
_Detach the p _max child node (there is no child node once).
For each set U _i that is not empty among the elements of U _max , a new S node is created as a child node of p _max .
The lower leaf nodes of all the nodes included in each U _i are child nodes of the corresponding S node while maintaining the order relation.

このステップ１７１２の処理が終わると、処理はステップ１７０４に戻る。そして、Tが空集合でないなら、ステップ１７０６に進み、Tが空集合であると判断されると、処理は終了する。この処理の結果、U_max = {U₁, …, U_n}の各々にＳＰＴが与えられる。 When the process of step 1712 is completed, the process returns to step 1704. If T is not an empty set, the process proceeds to step 1706. If it is determined that T is an empty set, the process ends. As a result of this processing, SPT is given to each of U _max = {U ₁ ,..., U _n }.

この処理が終わると、出力となるＳＰＴは、以下のような構造となる。
根ノードがSノード
根ノードの子ノードは葉ノード（すなわちオリジナルのＤＡＧ上のノード）またはPノード
当該Pノードの子ノードの数はプロセッサ数n以下
当該Pノードの子ノードはすべてSノードで、葉ノードのみをその子ノードに持つ
ＳＰＴ上に存在するすべてのPノードの子ノードの数はプロセッサ数以下であるため、各子ノード以下の葉ノード（元のＤＡＧ上のノード）をどのようにプロセッサに配分するかを示した構造になっている。
根ノードの子ノードのうち、連続する非xノードである葉ノード、および各Pノードが一つのコード化の単位となる。 When this process ends, the output SPT has the following structure.
The root node is the S node. The child node of the root node is the leaf node (that is, the node on the original DAG) or P node. The number of child nodes of the P node is less than or equal to the number of processors n. The child nodes of the P node are all S nodes. Having only leaf nodes as child nodes Since the number of child nodes of all P nodes existing on the SPT is less than or equal to the number of processors, how are the leaf nodes (nodes on the original DAG) below each child node It is a structure that shows whether to allocate to
Among the child nodes of the root node, leaf nodes that are continuous non-x nodes and each P node are one coding unit.

こうして、図６のステップ６１２では、U_max = {U₁, …, U_n}の各々にＳＰＴがコードに展開されて、実行可能ファイルにコンパイルされ、ステップ６１４で個別のプロセッサまたはコアに割り当てられて実行される。この際、U₁, …, U_nのうちの、コード化対象外コードを含む要素は、実行可能ファイルにコンパイルにされず、MATLAB(R)/Simulink(R)のインタープリターにより実行されることになる。 Thus, in step 612 of FIG. 6, the SPT is expanded into code for each of U _max = {U ₁ ,..., U _n }, compiled into an executable file, and assigned to individual processors or cores in step 614. Executed. At this time, the elements of U ₁ ,…, U _n that contain uncoded code must not be compiled into an executable file but executed by the MATLAB (R) / Simulink (R) interpreter. become.

以上説明した実施例において、タスク・グラフをＳＰＴに変換する前に、コード化対象外コードをマージするように、タスク・グラフを変形するようにしたが、これは望ましい処理ではあるが必須ではなく、タスク・グラフを変形する処理を省略しても、コンパイル結果の実行速度向上の効果は得られることを理解されたい。 In the embodiment described above, the task graph is modified so that the non-coding target code is merged before the task graph is converted into the SPT, but this is a desirable process but is not essential. It should be understood that the effect of improving the execution speed of the compilation result can be obtained even if the process of deforming the task graph is omitted.

以上、この発明を特定の実施例に基づき説明してきたが、この発明は、この特定の実施例に限定されず、当業者が自明に思いつく様々な変形、置換などの構成、技法適用可能であることを理解されたい。例えば、特定のプロセッサのアーキテクチャ、オペレーティング・システムなどに限定されない。 The present invention has been described based on the specific embodiments. However, the present invention is not limited to the specific embodiments, and various configurations and techniques such as various modifications and replacements obvious to those skilled in the art can be applied. Please understand that. For example, the present invention is not limited to a specific processor architecture or operating system.

また、上記実施例は、MATLAB(R)/Simulink(R)を例にとって説明したが、これに限らず、任意のモデリング・ツールに適用可能であることを、この技術分野の当業者なら理解するであろう。 Further, although the above embodiment has been described by taking MATLAB (R) / Simulink (R) as an example, those skilled in the art will understand that the present invention is not limited to this and can be applied to any modeling tool. Will.

３０２コード化対象外ブロック
４０６メイン・メモリ
４１６ハードティスク・ドライブ
５０２シミュレーション・モデリング・ツール
５０４モデル・データ
５０８ＤＡＧ変換モジュール
５１０タスク・グラフ変形モジュール
５１２ＳＰＴ変換モジュール
５１４ＳＰＴ変形モジュール
５１６ＣＰＵ割当てモジュール
５１８コンパイラ
５２０実行環境 302 Block to be coded 406 Main memory 416 Hard disk drive 502 Simulation modeling tool 504 Model data 508 DAG conversion module 510 Task graph transformation module 512 SPT transformation module 514 SPT transformation module 516 CPU allocation module 518 Compiler 520 execution environment

Claims

A method of parallelizing a code configured by connecting a plurality of blocks including a block to be coded and a block to be coded by computer processing,
Preparing the code as a DAG task graph;
Transforming the task graph into a direct-parallel tree including an S node from which a serial execution node branches and a P node from which a parallel execution node branches;
Transforming the direct-parallel tree into another direct-parallel tree until the P node no longer exists above the non-coding block,
Code parallelization method.

Merging the non-coded blocks in the task graph before converting to the direct-parallel tree;
The code parallelization method according to claim 1.

The code parallelization method according to claim 2, wherein the merging of the non-coding target blocks is performed by list scheduling.

A method for executing the code generated by the code parallelization method of claim 1 in a multiprocessor environment,
Encoding individual nodes below the P node;
Separately compiling the coded code into executable code;
Assigning and executing the executable code to an individual processor;
Code parallel execution method.

A method for executing the code generated by the code parallelization method of claim 1 in a multiprocessor environment having an interpreter for executing the code of the non-coding target block,
Encoding individual nodes below the P node;
Individually compiling into executable code if the coded code does not include the non-coded block;
Assigning and executing the executable code to individual processors;
If the coded code includes the non-coded block, the coded code is executed by the interpreter.
Code parallel execution method.

A program for parallelizing a code configured by connecting a plurality of blocks including a non-coding target block and a coding target block by computer processing,
In the computer,
Preparing the code as a DAG task graph;
Transforming the task graph into a direct-parallel tree including an S node from which a serial execution node branches and a P node from which a parallel execution node branches;
Performing the step of transforming the direct-parallel tree into another direct-parallel tree until the P node no longer exists above the non-coding target block;
program.

Merging the non-coded blocks in the task graph before converting to the direct-parallel tree;
The program according to claim 6.

The program according to claim 7, wherein the merging of the non-coding blocks is performed by list scheduling.

A system for parallelizing a code configured by connecting a plurality of blocks including a block to be coded and a block to be coded by computer processing,
Storage means;
A DAG task graph stored in the storage means;
Means for transforming said task graph into a direct-parallel tree comprising an S node from which a serial execution node branches and a P node from which a parallel execution node branches;
Means for transforming the direct-parallel tree into another direct-parallel tree until the P node no longer exists above the non-coding target block;
Code parallelization system.

Means for merging the non-coded blocks in the task graph;
The code parallelization system according to claim 9.

The code parallelization system according to claim 10, wherein the merging of the non-coding blocks is performed by list scheduling.

A multiprocessor environment,
Means for encoding individual nodes below the P node;
Means for individually compiling the coded code into executable code;
Means for assigning and executing the executable code to individual processors;
The code parallelization system according to claim 9.

A multiprocessor environment having an interpreter for executing the code of the non-coding block;
Means for encoding individual nodes below the P node;
Means for separately compiling into executable code if the coded code does not include the non-coded block;
Means for assigning and executing the executable code to individual processors;
If the encoded code includes the non-coding target block, the encoded code further includes means for executing the encoded code by the interpreter.
The code parallelization system according to claim 9.