JPH04211858A

JPH04211858A - Device and method for dividing data flow graph

Info

Publication number: JPH04211858A
Application number: JP3063752A
Authority: JP
Inventors: Koichi Munakata; 浩一宗像; Mie Inaoka; 稲岡　美恵; Kenji Shima; 憲司嶋
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1990-04-02
Filing date: 1991-03-06
Publication date: 1992-08-03

Abstract

PURPOSE:To reduce influence upon processing execution time due to communication between processors by allocating each node of a data flow graph so that the number of pockets to flow between the processors becomes small. CONSTITUTION:In respect of an objective node to be allocated selected by a next allocated node selecting means 14, a preceding node to output an arc to be inputted to the objective node to be allocated is searched by a preceding node searching means 15. Next, the processor allocated to the preceding node is searched by a searching means 16 for the processor allocated to the preceding node. The processor is allocated to the objective node to be allocated by an allocated processor determining means 4 according to the allocating state of the processor so that the number of the packets to flow between the processors becomes small. Then, the data flow graph is divided.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】この発明は、オブジェクトプログ
ラムで記述されるデータフローグラフを並列処理を実行
する並列処理装置を構成する複数のプロセッサに分割す
るデータフローグラフ分割装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data flow graph dividing apparatus for dividing a data flow graph described in an object program into a plurality of processors constituting a parallel processing apparatus that executes parallel processing.

【０００２】0002

【従来の技術】図９は例えば特開昭６３−２２０３２８
号公報などに示された、従来及びこの発明により分割さ
れたデータフローグラフを実行する巡回パイプライン機
構を有するデータ駆動形プロセッサの構成を示すブロッ
ク図である。[Prior Art] FIG. 9 shows, for example, Japanese Patent Application Laid-Open No. 63-220328.
1 is a block diagram showing the configuration of a data-driven processor having a cyclic pipeline mechanism that executes a divided data flow graph according to the prior art and the present invention, as disclosed in Japanese Patent Publications and the like; FIG.

【０００３】そして、このデータ駆動形プロセッサは、
外部よりデータ及びタグからなるパケットを入力する入
力部７、データフローグラフを記憶し、前記入力部７か
ら出力されたパケットのタグの一部であるノード番号に
従って命令を読出し、この読出した命令を新たなタグと
し、前記パケットのデータとともに命令パケットを生成
するプログラム記憶部８、前記プログラム記憶部８から
出力された命令パケットを入力し、該命令パケットが有
するタグと同一のタグを有する命令パケットを検出し、
演算パケットとして出力するか、検出できない場合は前
記プログラム記憶部８から入力された該命令パケットを
記憶する発火処理部９、前記発火処理部９から出力され
た演算パケットを入力し、該演算パケットのタグの一部
で指定される演算を実行し、結果パケットを出力する演
算処理部１０、及び前記演算処理部１０から出力される
結果パケットを入力し、該結果パケットのタグの一部で
ある外部フラグの指示により外部へ出力するか、あるい
は前記入力部７へ出力する出力部１１から構成されてい
る。[0003] This data-driven processor is
An input unit 7 inputs a packet consisting of data and a tag from the outside, stores a data flow graph, reads out an instruction according to a node number which is a part of the tag of the packet output from the input unit 7, and executes the read instruction. A program storage unit 8 that generates an instruction packet with the data of the packet as a new tag, inputs the instruction packet output from the program storage unit 8, and generates an instruction packet having the same tag as the instruction packet. detect,
A firing processing unit 9 stores the command packet inputted from the program storage unit 8, and inputs the calculation packet output from the firing processing unit 9, and outputs it as a calculation packet, or if it cannot be detected, inputs the calculation packet output from the firing processing unit 9, an arithmetic processing unit 10 that executes an arithmetic operation specified by a part of a tag and outputs a result packet; and an external unit that inputs a result packet output from the arithmetic processing unit 10 and is It is comprised of an output section 11 that outputs to the outside or to the input section 7 according to instructions from a flag.

【０００４】また、このデータ駆動形プロセッサを巡回
するパケットは、図１０に示すようにタグ（データの識
別子）、第１のデータ、及び第２のデータで構成され、
該タグはスルーパケットフラグ、外部フラグ、世代番号
あるいはカラー番号、行先ノード番号、命令及びＬ／Ｒ
フラグから構成されている。[0004] Furthermore, as shown in FIG. 10, a packet circulating through this data-driven processor is composed of a tag (data identifier), first data, and second data.
The tags include through packet flag, external flag, generation number or color number, destination node number, command, and L/R.
Consists of flags.

【０００５】次に、従来のデータフローグラフ分割装置
について説明する（第１の従来例）。図１１は従来のデ
ータフローグラフ分割装置の構成を示すブロック図であ
り、図において、１はデータフローグラフを格納するデ
ータフローグラフ記憶手段、２は前記データフローグラ
フ記憶手段１に格納されているデータフローグラフを読
出して、各ノードごとにランク解析するランク解析手段
、３は前記ランク解析手段２によりランク解析された結
果を格納するランク記憶手段、４は前記ランク記憶手段
３に格納された解析結果をもとに、各ノードごとに割付
けるプロセッサを決定する割付けプロセッサ決定手段、
５は各ノードの前記割付けプロセッサ決定手段４により
決定された割付けプロセッサの情報を格納する割付けプ
ロセッサ記憶手段、６は以上の各手段による工程を管理
・制御する工程管理手段である。Next, a conventional data flow graph dividing device will be explained (first conventional example). FIG. 11 is a block diagram showing the configuration of a conventional data flow graph dividing device. In the figure, 1 is data flow graph storage means for storing data flow graphs, and 2 is stored in the data flow graph storage means 1. A rank analysis means reads out a data flow graph and performs a rank analysis for each node; 3 a rank storage means for storing the results of rank analysis by the rank analysis means 2; 4 an analysis stored in the rank storage means 3; allocation processor determining means for determining processors to be allocated to each node based on the results;
Reference numeral 5 denotes allocated processor storage means for storing information on the allocated processors determined by the allocated processor determining means 4 of each node, and 6 represents process management means for managing and controlling the processes performed by each of the above-mentioned means.

【０００６】次に第１の従来例の動作について図１２の
フローチャートを用いて説明する。まず、データフロー
グラフ記憶手段１に格納されているデータフローグラフ
を何台のプロセッサに割付けるかを決定し、プロセッサ
の台数を変数Ｐにセットする（ステップＳＴ１）。Next, the operation of the first conventional example will be explained using the flowchart shown in FIG. First, it is determined how many processors the data flow graph stored in the data flow graph storage means 1 is to be allocated to, and the number of processors is set in a variable P (step ST1).

【０００７】そして、ランク解析手段２がデータフロー
グラフ記憶手段１から読出したデータフローグラフを各
ノードごとにランク解析し、この結果をランク記憶手段
３に格納する（ステップＳＴ２）。すべてのノードにつ
いてランク解析が終了すると、わかった全ランク数を変
数Ｒにセットし（ステップＳＴ３）、最初のランクを示
す１を変数ｒに初期セットする（ステップＳＴ４）。[0007]Then, the rank analysis means 2 analyzes the ranks of the data flow graph read from the data flow graph storage means 1 for each node, and stores the results in the rank storage means 3 (step ST2). When rank analysis is completed for all nodes, the total number of ranks found is set in a variable R (step ST3), and 1 indicating the first rank is initially set in a variable r (step ST4).

【０００８】次に、割付けプロセッサ決定手段４はラン
クｒに属するノード数Ｉ（ｒ）を変数Ｐで除してその整
数部を変数Ｓにセットし（ステップＳＴ５）、さらに、
Ｉ（ｒ）−ＳＸＰを演算してその結果を変数ｔにセット
し（ステップＳＴ６）、最初に設定したＰ台のうちｔ台
のプロセッサにランクｒのノードをＳ＋１個ずつ順番に
割付けるとともに（ステップＳＴ７）、残りのプロセッ
サ（ｐ−ｔ台）に残りのノードをＳ個ずつ順番に割付け
て（ステップＳＴ８）、その結果を割付けプロセッサ記
憶手段５に格納する。Next, the allocation processor determining means 4 divides the number of nodes I(r) belonging to rank r by the variable P and sets the integer part to the variable S (step ST5), and further,
I(r)-SXP is calculated and the result is set in the variable t (step ST6), and S+1 nodes of rank r are sequentially assigned to t processors among the P initially set. Step ST7), the remaining processors (pt units) are sequentially allocated S nodes each (step ST8), and the result is stored in the allocated processor storage means 5.

【０００９】そして、次のランクのノードを割付けるた
めにｒ＋１を変数ｒにセットし（ステップＳＴ９））、
セットした変数ｒが全ランク数Ｒを越えるまで、ランク
ごとにすべてのノードについてプロセッサの割付けを行
っていく（ステップＳＴ１０）。[0009] Then, in order to allocate a node of the next rank, r+1 is set to the variable r (step ST9).
Processors are allocated to all nodes for each rank until the set variable r exceeds the total number of ranks R (step ST10).

【００１０】以上の処理により具体的には、図１３に示
すような動作単位であるノード１２とノード間のデータ
授受を示すアーク１３からなり、全ランク数５のデータ
フローグラフを２台のプロセッサに割付けた場合、図１
４に示すように、第１及び第２のプロセッサに割付ける
ことができる。Specifically, through the above processing, a data flow graph consisting of a node 12 which is an operation unit and an arc 13 indicating data exchange between nodes as shown in FIG. Figure 1
4, it can be allocated to the first and second processors.

【００１１】次に、第２の従来例の動作について図１５
のフローチャートを用いて説明する。ここで、分割する
データフローグラフは図１６に示すようなものであり、
この例では、３５個のノードによって構成され、制御命
令オペレーションのノードは「ＮＯＰ」、演算命令イン
クリメントのノードは「ＩＮＣ」、演算命令デクリメン
トのノード「ＤＥＣ」と記すものとする。また、分割し
たデータフローグラフを実行するプロセッサは２台とす
る。なお、このデータフローグラフ分割方法は特願平１
−１７６４４６号公報にも示されている。Next, FIG. 15 shows the operation of the second conventional example.
This will be explained using a flowchart. Here, the data flow graph to be divided is as shown in FIG.
In this example, it is composed of 35 nodes, and the control instruction operation node is written as "NOP," the calculation instruction increment node is written as "INC," and the calculation instruction decrement node is written as "DEC." Furthermore, it is assumed that there are two processors that execute the divided data flow graphs. This data flow graph division method is disclosed in Japanese Patent Application No. 1999.
It is also shown in the publication No.-176446.

【００１２】まず、データフローグラフの総ノード数「
３５」を求めると（ステップＳＴ１１）、この総ノード
数「３５」を並列処理するプロセッサの数「２」で除算
して、１プロセッサ当たりの分割ノード数を決定する（
ステップＳＴ１２）。この場合、総ノード数が「３５」
であるため、分割ノード数は「１７」と「１８」になる
。First, the total number of nodes in the data flow graph is
35" (step ST11), the total number of nodes "35" is divided by the number of processors that perform parallel processing "2" to determine the number of divided nodes per processor (
Step ST12). In this case, the total number of nodes is "35"
Therefore, the number of divided nodes is "17" and "18".

【００１３】そして、このように分割された各ノードは
、図１７に示すように斜線が施されている１７個のノー
ドが第１のプロセッサに、その他の１８個のノードが第
２のプロセッサにそれぞれ割り付けられる。[0013] As shown in FIG. 17, the nodes divided in this way are divided into 17 nodes with diagonal lines assigned to the first processor, and the other 18 nodes assigned to the second processor. assigned to each.

【００１４】ここで、並列処理を行うためにデータフロ
ーグラフを複数のプロセッサに分割する場合、同図に示
すようにプログラムを垂直方向に分割すると効果的であ
る。しかしながら、この図９のような割り付けを行うと
、例えば図１８に（ア）（イ）（ウ）で示すようなプロ
セッサ間を往復するデータの流れが発生する可能性が高
くなり、そのためのプロセッサ間通信の必要が生じてく
る。[0014] When dividing a data flow graph into a plurality of processors to perform parallel processing, it is effective to divide the program vertically as shown in the figure. However, if the allocation as shown in FIG. The need for inter-communication will arise.

【００１５】[0015]

【発明が解決しようとする課題】従来のデータフローグ
ラフ分割装置及び分割方法は以上のように構成されてい
るので、プロセッサ間を流れるパケット数を考慮せずに
データフローグラフの各ノードを複数のプロセッサに割
付けており、該分割されたデータフローグラフを各プロ
セッサがそれぞれ実行する際、プロセッサ間を流れるパ
ケット数が多くなり、冗長なプロセッサ間通信の発生確
率が高くなるため、その分実行時間が長くなるなどの課
題があった。[Problem to be Solved by the Invention] Since the conventional data flow graph division device and division method are configured as described above, each node of the data flow graph can be divided into multiple nodes without considering the number of packets flowing between processors. When each processor executes the divided data flow graph, the number of packets flowing between the processors increases, and the probability of occurrence of redundant inter-processor communication increases, so the execution time decreases accordingly. There were issues such as length of time.

【００１６】この発明は上記のような課題を解消するた
めになされたもので、分割したデータフローグラフを複
数のプロセッサで実行する際、プロセッサ間を流れるパ
ケット数が少なくなるようにデータフローグラフの各ノ
ードを割付けることで、プロセッサ間通信による処理実
行時間への影響を軽減させるデータフローグラフ分割装
置及び分割方法を得ることを目的とする。The present invention was made to solve the above-mentioned problems, and when a divided data flow graph is executed by a plurality of processors, the data flow graph is divided so that the number of packets flowing between the processors is reduced. The present invention aims to provide a data flow graph dividing device and a dividing method that reduce the influence of inter-processor communication on processing execution time by allocating each node.

【００１７】[0017]

【課題を解決するための手段】請求項（１）の発明に係
るデータフローグラフ分割装置は、実行するプロセッサ
を割付けるために次割付けノード選択手段により選択さ
れた割付け対象のノードについて、先行ノード探索手段
により該割付け対象のノードに入力するアークを出力す
る先行ノードを、データフローグラフ記憶手段に格納さ
れているデータを用いて探索し、さらに、先行ノード割
付けプロセッサ探索手段により該探索された先行ノード
がどのプロセッサに割付けられているかを、割付けプロ
セッサ記憶手段に格納されているデータを用いて探索し
、異なるプロセッサに割付けられたノードを結ぶアーク
を少なくなるように（プロセッサ間通信の回数が少なく
なるように）、前記割付け対象のノードをどのプロセッ
サに割付けるか決定するようにしたものである。[Means for Solving the Problems] The data flow graph dividing device according to the invention of claim (1) is configured to divide a preceding node into an allocation target node selected by a next allocation node selection means in order to allocate a processor to be executed. The search means searches for the preceding node that outputs the arc input to the node to be allocated, using the data stored in the data flow graph storage means, and the preceding node allocation processor searching means searches for the preceding node that outputs the arc input to the node to be allocated. The data stored in the allocated processor storage means is used to search for which processor a node is allocated to, and to reduce the number of arcs connecting nodes allocated to different processors (reducing the number of inter-processor communications). ), it is determined to which processor the node to be allocated is to be allocated.

【００１８】また、請求項（２）の発明に係るデータフ
ローグラフ分割方法は、並列処理装置で実行するデータ
フローグラフを、命令の実行ランクとデータの到着ラン
クに差がある箇所で分割し、前記並列処理装置を構成す
る各プロセッサに割付けるようにしたものである。Further, the data flow graph dividing method according to the invention of claim (2) divides the data flow graph to be executed by the parallel processing device at locations where there is a difference between the instruction execution rank and the data arrival rank, It is arranged to allocate it to each processor constituting the parallel processing device.

【００１９】[0019]

【作用】請求項（１）の発明における割付けプロセッサ
決定手段は、割付け対象のノードについて、先行ノード
がどのプロセッサに割付けられているかに応じて、プロ
セッサ間を流れるパケット数が少なくなるように該割付
け対象のノードにプロセッサを割付ける。[Operation] The allocation processor determining means in the invention of claim (1) performs allocation so that the number of packets flowing between processors is reduced depending on which processor the preceding node is allocated to for the node to be allocated. Assign a processor to the target node.

【００２０】また、請求項（２）の発明におけるデータ
フローグラフ分割方法は、データの到着ランクと命令の
実行ランクとに差のある箇所でデータフローグラフを分
割して、それらを各プロセッサにそれぞれ割付け、この
データの到着ランクと命令の実行ランクに差のある箇所
でプロセッサ間通信を行わせる。[0020] Furthermore, the data flow graph dividing method in the invention of claim (2) divides the data flow graph at locations where there is a difference between the data arrival rank and the instruction execution rank, and divides the data flow graph to each processor. Inter-processor communication is performed at a location where there is a difference between the data arrival rank and the instruction execution rank.

【００２１】[0021]

【実施例】以下、この発明の一実施例を図について説明
する。図１は請求項（１）の発明の一実施例によるデー
タフローグラフ分割装置の構成を示すブロック図であり
、第１の従来例である従来のデータフローグラフ分割装
置（図１１）と同一又は相当部分には同一符号を付して
説明を省略する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a data flow graph dividing device according to an embodiment of the invention of claim (1), and is the same or Corresponding parts are denoted by the same reference numerals and their explanation will be omitted.

【００２２】図において、１４は次の割付け対象となる
ノードを選択する次割付けノード選択手段、１５は前記
次割付けノード選択手段１４により選択された割付け対
象のノードに入力するアークを出力する先行ノードを、
データフローグラフ記憶手段１に格納されているデータ
を用いて探索する先行ノード探索手段、１６は前記先行
ノード探索手段１５により探索された先行ノードがどの
プロセッサに割付けられているのか、割付けプロセッサ
記憶手段５に格納されているデータを用いて探索する先
行ノード割付けプロセッサ探索手段である。In the figure, 14 is a next allocation node selection means for selecting the next node to be allocated, and 15 is a preceding node that outputs an arc to be input to the node to be allocated selected by the next allocation node selection means 14. of,
A preceding node search means 16 searches using data stored in the data flow graph storage means 1, and an allocation processor storage means 16 stores information on which processor the preceding node searched for by the preceding node search means 15 is allocated to. This is a preceding node allocation processor search means that searches using data stored in 5.

【００２３】なお、この発明のデータフローグラフ分割
装置で分割するデータフローグラフは前述したように、
図９に示すようなデータ駆動形プロセッサで実行される
。As mentioned above, the data flow graph divided by the data flow graph division device of the present invention is as follows.
It is executed by a data-driven processor as shown in FIG.

【００２４】次に動作について図２のフローチャートを
用いて説明する。まず、データフローグラフ記憶手段１
に格納されているデータフローグラフを何台のプロセッ
サに割付けるかを決定し（ステップＳＴ１４）、このデ
ータフローグラフをランク解析手段２が各ノードごとに
ランク解析し、すべてのノードについて完了すると、こ
の結果（全ランク数）を変数Ｒにセットし、さらに、各
ランクに属するノードのそれぞれに１からＩ（ｒ）まで
の番号を付けてランク記憶手段３に格納する（ステップ
ＳＴ１５）。但し、ｒはランク値、Ｉ（ｒ）はランクｒ
に属するノードの総数であり、以後、ランクｒに属する
ｉ番目のノードをｎ（ｒ，ｉ）、（ｒ＝１，２，…，Ｒ
，ｉ＝１，２，…，Ｉ（ｒ））とする。Next, the operation will be explained using the flowchart shown in FIG. First, data flow graph storage means 1
It is determined how many processors the data flow graph stored in is to be allocated to (step ST14), and the rank analysis means 2 performs rank analysis on this data flow graph for each node, and when the rank analysis is completed for all nodes, The result (total number of ranks) is set in the variable R, and each node belonging to each rank is numbered from 1 to I(r) and stored in the rank storage means 3 (step ST15). However, r is the rank value, I(r) is the rank r
Hereinafter, the i-th node belonging to rank r is defined as n(r,i), (r=1,2,...,R
, i=1, 2, ..., I(r)).

【００２５】ランク解析手段２によるランク解析が終了
すると、最初に割付けプロセッサ決定手段４がランク１
に属するノードを各プロセッサに、例えば順番に割付け
るなどして、できるだけ均等になるように割付け、初期
設定を行って（ステップＳＴ１６）この結果を割付けプ
ロセッサ記憶手段５に格納する。以後、ランク２から（
ステップＳＴ１７）割付けプロセッサ決定手段４が各ラ
ンクごとにすべてのノードを各プロセッサに割付ける（
ステップＳＴ８）。When the rank analysis by the rank analysis means 2 is completed, the allocation processor determination means 4 first selects the rank 1.
The nodes belonging to the processor are allocated to each processor in order, for example, so as to be as even as possible, initial settings are performed (step ST16), and the results are stored in the allocated processor storage means 5. From then on, from rank 2 (
Step ST17) The allocation processor determining means 4 allocates all nodes to each processor for each rank (
Step ST8).

【００２６】次に、このステップＳＴ１８の動作につい
て図３のフローチャートを用いて説明する。ここでは、
ランクｒに属するノードを割付ける場合について説明す
る。まず、次割付けノード選択手段１４は初期値として
ノード番号を示す変数ｉに１をセットし（ステップＳＴ
１８−１）、順次この番号に従ってプロセッサの割付け
を決定する割付け対象のノードを選択し、この選択され
たノードｎ（ｒ，ｉ）について、先行ノード探索手段１
５がデータフローグラフ記憶手段１に格納されているデ
ータを用いて探索した先行ノードが、全て同一プロセッ
サ（仮に、プロセッサＫとする）に割付けられているか
否かについて、先行ノード割付けプロセッサ探索手段１
６が割付けプロセッサ記憶手段５に格納されているデー
タを用いて探索（ステップＳＴ１８−２）Next, the operation of step ST18 will be explained using the flowchart of FIG. here,
The case of allocating nodes belonging to rank r will be explained. First, the next allocation node selection means 14 sets a variable i indicating a node number to 1 as an initial value (step ST
18-1), sequentially select nodes to be allocated to which processor allocation is to be determined according to this number, and for this selected node n(r,i), the preceding node search means 1
The preceding node allocation processor search means 1 determines whether all the preceding nodes searched using the data stored in the data flow graph storage means 1 are allocated to the same processor (temporarily assumed to be processor K).
6 is searched using the data stored in the allocated processor storage means 5 (step ST18-2)

【００２７】
もし、すべての先行ノードがプロセッサＫに割付けられ
ていれば、割付けプロセッサ決定手段４がノードｎ（ｒ
，ｉ）をプロセッサＫに割付け（ステップＳＴ１８−３
）、以後、ランクｒに属するすべてのノードについて行
い（ステップＳＴ１８−４、ＳＴ１８−５）割付けプロ
セッサ記憶手段５にこの結果を格納する。[0027]
If all preceding nodes are allocated to processor K, allocated processor determining means 4 determines node n(r
, i) to processor K (step ST18-3
), and thereafter, this is performed for all nodes belonging to rank r (steps ST18-4, ST18-5), and the results are stored in the allocation processor storage means 5.

【００２８】このステップＳＴ１８−１〜ＳＴ１８−５
までの処理では、先行ノードがすべて同一のプロセッサ
に割付けられている割付け対象のノードについての割付
けを行ったが、次に、先行ノードが異なるプロセッサに
割付けられている場合の処理を行う。[0028] These steps ST18-1 to ST18-5
In the processing up to now, allocation has been performed for nodes to be allocated where all the preceding nodes have been allocated to the same processor, but next, processing will be performed for the case where the preceding nodes have been allocated to different processors.

【００２９】すなわち、次割付けノード選択手段１４は
再度変数ｉに初期値１をセットし（ステップＳＴ１８−
６）、順次この番号に従ってプロセッサの割付けを決定
する割付け対象のノードを選択し、この選択されたノー
ドｎ（ｒ，ｉ）のうち、まだプロセッサの割付けが終了
していないノードについて（ステップＳＴ１８−７）、
まず、先行ノード探索手段１５が探索した先行ノードに
割付けられているプロセッサを先行ノード割付けプロセ
ッサ探索手段１６が探索し、割付けプロセッサ決定手段
４がこれらのプロセッサのうち、ランクｒのノードの割
付け数が最小のプロセッサに該割付け対象のノードｎ（
ｒ，ｉ）を割付け（ステップＳＴ１８−８）、以後、ラ
ンクｒに属するすべてのノードについて行い（ステップ
ＳＴ１８−９〜ＳＴ１８−１０）、割付けプロセッサ記
憶手段５にこの結果を格納する。That is, the next allocation node selection means 14 sets the initial value 1 to the variable i again (step ST18-
6) Sequentially select the nodes to be allocated for which processor allocation is to be determined according to this number, and among the selected nodes n(r, i), select nodes for which processor allocation has not yet been completed (step ST18- 7),
First, the preceding node allocated processor searching means 16 searches for processors allocated to the preceding node searched by the preceding node searching means 15, and the allocated processor determining means 4 determines the number of allocated nodes of rank r among these processors. The target node n(
r, i) (step ST18-8), thereafter this is performed for all nodes belonging to rank r (steps ST18-9 to ST18-10), and the result is stored in the allocation processor storage means 5.

【００３０】そして、このステップＳＴ１８における割
付け処理を、すべてのランクについて行う（ステップＳ
Ｔ１９，ＳＴ２０）。[0030] Then, the allocation process in step ST18 is performed for all ranks (step ST18).
T19, ST20).

【００３１】以上の処理により具体的には、図１３に示
すようなデータフローグラフ（ランク数５）を２台のプ
ロセッサに割付けた場合、図４に示すように、第１及び
第２のプロセッサに割付けることができる。Specifically, when the data flow graph (rank number 5) as shown in FIG. 13 is assigned to two processors by the above processing, the first and second processors can be assigned to

【００３２】なお、上記実施例ではデータフローグラフ
のランク解析が全て終了してから割付け対象のノードを
各プロセッサに割付けたが、各ノードごとに、ランク解
析と各プロセッサへの割付けを連続して行ってもよい。In the above embodiment, the nodes to be allocated are allocated to each processor after all rank analysis of the data flow graph is completed, but rank analysis and allocation to each processor are performed consecutively for each node. You may go.

【００３３】また、上記実施例ではあるノードをどのプ
ロセッサに割付けるかを決定する際、そのノードに直接
接続する先行ノードのみがどのプロセッサに割付けられ
ているかを参照したが、割付けようとしているノードに
間接的に接続しているノードで、そこから出力するアー
クを順次辿ることによって、割付け対象のノードに到達
できる複数のノードがどのプロセッサに割付けられてい
るかをも参照して、これに応じて該割付け対象のノード
をどのプロセッサに割付けるか決定してもよい。[0033] Furthermore, in the above embodiment, when determining which processor to allocate a certain node to, only the preceding nodes directly connected to that node are referred to which processors have been allocated. By sequentially tracing the arcs output from nodes that are indirectly connected to the node, the target node can be reached by referring to which processors are assigned multiple nodes, and depending on this, It may also be determined to which processor the node to be allocated is to be allocated.

【００３４】また、上記実施例ではデータフローグラフ
を上流から下流へ（ランク１からランクＲへ）の方向に
行うことも前提として説明したが、この分割にあたって
、下流から上流への逆方向に行っても同様の効果を奏す
る。Furthermore, in the above embodiment, the explanation was made on the assumption that the data flow graph is performed in the direction from upstream to downstream (from rank 1 to rank R), but in this division, the data flow graph is performed in the opposite direction from downstream to upstream. The same effect can be achieved.

【００３５】また、上記実施例では分割したデータフロ
ーグラフはデータ駆動形プロセッサで実行するものとし
たが、他の方式のプロセッサで実行しても同様の効果を
奏する。Further, in the above embodiment, the divided data flow graph is executed by a data-driven processor, but the same effect can be obtained even if the divided data flow graph is executed by a processor of another type.

【００３６】次に、請求項（２）の発明の一実施例によ
るデータフローグラフ分割方法の動作を図５のフローチ
ャートを用いて説明する。なお、分割するデータフロー
グラフは従来と同様に、図１６に示す構成であり、図９
に示した２台のプロセッサに割付ける場合について説明
する。Next, the operation of the data flow graph dividing method according to an embodiment of the invention as claimed in claim (2) will be explained using the flowchart of FIG. Note that the data flow graph to be divided has the configuration shown in FIG. 16, as in the conventional case, and the configuration shown in FIG.
The case of allocation to the two processors shown in FIG. 1 will be explained.

【００３７】まず、図１６に示したデータフローグラフ
上のデータの流れ（Ａ１〜Ａ１１）を、以下のルールに
従ってたどる（ステップＳＴ２１）。（１）データの流れの始点（始まり）は、データ入力、
データの分岐、データのコピー、および、複数出力命令
のある箇所（以下、これらを総称してデータの分岐と呼
ぶ）とする。（２）データの流れの終点（終わり）は、データの出力
、データのマージ、及び、複数入力命令のある箇所（以
下、これらを総称してデータのマージと呼ぶ）とする。First, the data flow (A1 to A11) on the data flow graph shown in FIG. 16 is traced according to the following rules (step ST21). (1) The starting point (beginning) of the data flow is the data input,
A data branch, a data copy, and a location with multiple output instructions (hereinafter, these will be collectively referred to as a data branch). (2) The end point (end) of the data flow is defined as the location of data output, data merging, and multiple input commands (hereinafter, these will be collectively referred to as data merging).

【００３８】次に、前記ステップＳＴ２１で作成したデ
ータの流れを以下のルールにしたがって連結する（ステ
ップＳＴ２２）。（１）実行ランクと到着ランクが同じデータの流れと連
結する。（２）（１）の条件を満たすデータの流れが複数ある場
合には、データの流れの長さが長いものと連結する。（データの流れの長さは、その中の含まれる命令の数で
表すことにする。）Next, the data flows created in step ST21 are connected according to the following rules (step ST22). (1) Execution rank and arrival rank are connected to the same data flow. (2) If there are multiple data streams that satisfy the condition (1), the longer data stream is connected. (The length of the data flow is expressed by the number of instructions included in it.)

【００３９】ここで、ランクとは、並列かつ同時に実行
可能な命令階層の組をいう。即ち、実行ランクとは、実
行に必要なデータが全て揃って実行できるランクである
。また、到着ランクとは、実行に必要なデータの各々に
付けられたランクで、そのデータが到着するランクを示
す。[0039] Here, a rank refers to a set of instruction hierarchies that can be executed in parallel and simultaneously. That is, the execution rank is a rank in which all the data necessary for execution is available and execution can be performed. Furthermore, the arrival rank is a rank assigned to each piece of data necessary for execution, and indicates the rank at which the data arrives.

【００４０】図８はそれを説明するためのデータフロー
グラフ図である。図示の例によれば、ノード１７の実行
ランクは“４”である。また、ランクが“３”のノード
１８からノード１７へ入力されるデータの到着ランクは
“４”であり、ランクが“２”のノード１９からノード
１７へ入力されるデータの到着ランクは“３”である。FIG. 8 is a data flow graph diagram for explaining this. According to the illustrated example, the execution rank of the node 17 is "4". Furthermore, the arrival rank of data input from node 18 with rank "3" to node 17 is "4", and the arrival rank of data input from node 19 with rank "2" to node 17 is "3". ” is.

【００４１】次に、前記ステップＳＴ２２で連結したデ
ータの流れＢ１〜Ｂ６の中で、その長さの長いものから
順に１つずつ第１のプロセッサ及び第２のプロセッサに
割り付ける。この場合、データの流れＢ１が第１のプロ
セッサに、Ｂ２が第２のプロセッサにそれぞれ割付けら
れる（ステップＳＴ２３）。[0041] Next, among the data flows B1 to B6 concatenated in step ST22, one by one is allocated to the first processor and the second processor in descending order of length. In this case, data flow B1 is allocated to the first processor, and data flow B2 is allocated to the second processor (step ST23).

【００４２】最後に、ステップＳＴ２３で割付けられた
データの流れＢ１，Ｂ２をそれぞれ第１及び第２のプロ
セッサの基本のデータの流れとし、以下、次のルールに
したがって残りのデータの流れＢ３〜Ｂ６を基本のデー
タの流れＢ１，Ｂ２に連結して各プロセッサに割付ける
（ステップＳＴ２４）。（１）始点と終点で連結されるデータの流れが同じプロ
セッサに割付けられている場合は、そのプロセッサに割
り付ける。（２）始点と終点で連結されるデータの流れが異なるプ
ロセッサに割付けられている場合は、（ａ）終点における実行ランクと到着ランクが異なる時
には、始点で連結されるデータの流れと同じプロセッサ
に割付ける。（ｂ）終点における実行ランクと到着ランクが同じ時に
は、始点と終点で割り付けられる命令数が少ない方のプ
ロセッサに割付ける。（３）始点が外部からの入力である場合には、全プロセ
ッサの中で割付けられる命令数が最も少ないプロセッサ
に割付ける。Finally, the data flows B1 and B2 allocated in step ST23 are taken as the basic data flows of the first and second processors, respectively, and the remaining data flows B3 to B6 are hereinafter defined according to the following rules. is connected to the basic data flows B1 and B2 and allocated to each processor (step ST24). (1) If data flows connected at the start point and end point are allocated to the same processor, they are allocated to that processor. (2) If the data flows connected at the start point and the end point are allocated to different processors, (a) If the execution rank and arrival rank at the end point are different, the data flow connected at the start point is allocated to the same processor. Assign. (b) When the execution rank at the end point and the arrival rank are the same, the instruction is allocated to the processor to which fewer instructions can be allocated at the start point and the end point. (3) If the starting point is an external input, allocate to the processor to which the least number of instructions can be allocated among all processors.

【００４３】ここで、前記ルール（２）の場合には、プ
ロセッサ間通信が発生する。その時、ルール（ａ）の場
合には、実行ランクと到着ランクが異なるため、通常デ
ータは実行ランクになるまで（演算相手のデータが到着
するまで）待つ必要があり、この待っている時間にプロ
セッサ間通信を行うことにより、実行時間に対してプロ
セッサ間通信が及ぼす影響を軽減している。Here, in the case of rule (2), inter-processor communication occurs. At that time, in the case of rule (a), since the execution rank and arrival rank are different, the data normally needs to wait until it reaches the execution rank (until the data for the operation partner arrives), and during this waiting time, the processor By performing inter-processor communication, the effect of inter-processor communication on execution time is reduced.

【００４４】また、ルール（ｂ）の場合も、プロセッサ
間通信を行っているのは、必ず分岐及びマージ箇所であ
るので、従来のデータフローグラフ分割でみられたよう
なデータの単純な流れを分断するようなプロセッサ間通
信が存在しないので、全体としてプロセッサ間通信の数
が減少すると予想される。[0044] Also, in the case of rule (b), inter-processor communication is always performed at branching and merging points, so the simple flow of data as seen in conventional data flow graph division is not possible. Since there is no disruptive inter-processor communication, it is expected that the overall number of inter-processor communications will decrease.

【００４５】実際に図１６に示したデータフローグラフ
を分割した結果が図６であり、この図６をプロセッサ間
通信をしている箇所が分かるように書き直した図が図７
である。図示のように、従来のデータフローグラフ分割
でみられたような冗長なプロセッサ間通信は存在しない
。FIG. 6 is the result of dividing the data flow graph shown in FIG. 16, and FIG. 7 is a rewritten version of FIG.
It is. As shown, there is no redundant inter-processor communication as seen in conventional dataflow graph partitioning.

【００４６】なお、上記実施例ではステップＳＴ２３に
て実行ランクと到着ランクが同じ時には、割付けられて
いる全ランクの総命令数の少ないプロセッサに割付ける
ことにしたが、当該データの流れが含まれるランク内の
命令の数を平均し、少ない方に割付けるようにしてもよ
く、負荷分散の面で更に有効である。Note that in the above embodiment, when the execution rank and the arrival rank are the same in step ST23, it is decided to allocate the instruction to a processor with a small total number of instructions in all allocated ranks, but the data flow is included. The number of instructions within a rank may be averaged and assigned to the smaller number, which is more effective in terms of load distribution.

【００４７】[0047]

【発明の効果】以上のように、この発明によれば次割付
けノード選択手段により選択された割付け対象のノード
について、先行ノード探索手段により該割付け対象のノ
ードに入力するアークを出力する先行ノードを探索し、
先行ノード割付けプロセッサ探索手段により該先行ノー
ドに割付けられているプロセッサを探索し、このプロセ
ッサの割付け状況に応じて、割付けプロセッサ決定手段
によりプロセッサ間を流れるパケット数が少なくなるよ
うに該割付け対象のノードにプロセッサを割付け、デー
タフローグラフを分割するようにしたので、データフロ
ーグラフを記述するプログラムの実行時のプロセッサ間
通信に要する無駄時間が短縮され、高速に実行できる効
果がある。As described above, according to the present invention, for a node to be allocated which is selected by the next allocation node selection means, the preceding node searching means can search for the preceding node which outputs the arc input to the node to be allocated. explore,
The preceding node allocated processor searching means searches for a processor allocated to the preceding node, and depending on the allocation status of this processor, the allocated processor determining means selects the node to be allocated so that the number of packets flowing between processors is reduced. Since the data flow graph is divided by assigning processors to the processors, the wasted time required for communication between processors when executing a program that describes the data flow graph is shortened, and the program can be executed at high speed.

【００４８】また、請求項（２）の発明によればデータ
フローグラフをデータの流れの中で、命令の実行ランク
とデータの到着ランクが異なる箇所で分割するように構
成したので、プロセッサ間通信は命令の実行ランクとデ
ータの到着ランクの異なる箇所で行われ、また、やむを
得ず他の箇所でプロセッサ間通信を行う場合も、データ
の分岐及びマージ箇所であるため、従来のデータフロー
グラフ分割でみられたデータの流れを分断するような冗
長なプロセッサ間通信は存在せず、プロセッサ間通信に
よる実行時間の低下を軽減し、処理の高速化が実現でき
る効果がある。Further, according to the invention of claim (2), since the data flow graph is divided in the data flow at points where the instruction execution rank and the data arrival rank are different, inter-processor communication is is performed at a location where the instruction execution rank and data arrival rank are different, and even if inter-processor communication is unavoidably performed at another location, this is the location where data is branched or merged, so it cannot be done with conventional data flow graph division. There is no redundant inter-processor communication that disrupts the flow of data, which reduces the reduction in execution time due to inter-processor communication and has the effect of speeding up processing.

[Brief explanation of the drawing]

【図１】請求項（１）の発明の一実施例によるデータフ
ローグラフ分割装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a data flow graph dividing device according to an embodiment of the invention as claimed in claim (1).

【図２】請求項（１）の発明によるデータフローグラフ
分割装置の動作を説明するフローチャートである。FIG. 2 is a flowchart illustrating the operation of the data flow graph dividing device according to the invention of claim (1).

【図３】図２におけるプロセッサ割付け動作（ステップ
ＳＴ１８）の詳細を説明するフローチャートである。FIG. 3 is a flowchart illustrating details of the processor allocation operation (step ST18) in FIG. 2;

【図４】請求項（１）の発明によるデータフローグラフ
分割装置のデータフローグラフの分割結果を示す図であ
る。FIG. 4 is a diagram showing a result of dividing a data flow graph by the data flow graph dividing apparatus according to the invention of claim (1).

【図５】請求項（２）の発明の一実施例によるデータフ
ローグラフ分割方法を示すフローチャートである。FIG. 5 is a flowchart showing a data flow graph dividing method according to an embodiment of the invention as claimed in claim (2).

【図６】請求項（２）の発明の一実施例によるデータフ
ローグラフ分解方法により分割されたデータフローグラ
フを示す図である。FIG. 6 is a diagram showing a data flow graph divided by a data flow graph decomposition method according to an embodiment of the invention as claimed in claim (2).

【図７】図６をプロセッサ間通信を行っている箇所を陽
に示すように書き直した説明図である。FIG. 7 is an explanatory diagram in which FIG. 6 is rewritten to explicitly show the portion where inter-processor communication is performed;

【図８】実行ランクおよび到着ランクを説明するデータ
フローグラフを示す図である。FIG. 8 is a diagram showing a data flow graph illustrating execution rank and arrival rank.

【図９】データフローグラフを実行するデータ駆動形プ
ロセッサの構成を示すブロック図である。FIG. 9 is a block diagram showing the configuration of a data-driven processor that executes a data flow graph.

【図１０】図９のデータ駆動形プロセッサを巡回するパ
ケットの構成を示す図である。FIG. 10 is a diagram showing the structure of a packet circulating through the data-driven processor of FIG. 9;

【図１１】第１の従来例のデータフローグラフ分割装置
の構成を示すブロック図である。FIG. 11 is a block diagram showing the configuration of a first conventional data flow graph dividing device.

【図１２】図１１のデータフローグラフ分割装置の動作
を示すフローチャートである。FIG. 12 is a flowchart showing the operation of the data flow graph dividing device of FIG. 11;

【図１３】第１の従来例及び請求項（１）の発明のデー
タフローグラフ分割装置により分割されるデータフロー
グラフを示す図である。FIG. 13 is a diagram showing a data flow graph divided by the data flow graph dividing apparatus of the first conventional example and the invention of claim (1).

【図１４】第１の従来例のデータフローグラフ分割装置
により分割されたデータフローグラフを示す図である。FIG. 14 is a diagram showing a data flow graph divided by a first conventional data flow graph dividing device.

【図１５】第２の従来例のデータフローグラフ分割方法
を説明するフローチャートである。FIG. 15 is a flowchart illustrating a second conventional data flow graph division method.

【図１６】第２の従来例及び請求項（２）の発明のデー
タフローグラフ分割方法により分割されるデータフロー
グラフを示す図である。FIG. 16 is a diagram showing a data flow graph divided by the second conventional example and the data flow graph division method of the invention according to claim (2).

【図１７】第２の従来例のデータフローグラフ分割方法
により分割されたデータフローグラフを示す図である。FIG. 17 is a diagram showing a data flow graph divided by a second conventional data flow graph division method.

【図１８】図１７をプロセッサ間通信を行っている箇所
を陽に示すように書き直した説明図である。FIG. 18 is an explanatory diagram in which FIG. 17 is rewritten to explicitly show the portion where inter-processor communication is performed.

[Explanation of symbols]

１　　データフローグラフ記憶手段２　　ランク解析手段３　　ランク記憶手段４　　割付けプロセッサ決定手段５　　割付けプロセッサ記憶手段６　　工程管理手段１４　　次割付けノード選択手段１５　　先行ノード探索手段 1 Data flow graph storage means 2 Rank analysis means 3 Rank storage means 4 Allocated processor determining means 5 Allocation processor storage means 6 Process control means 14 Next allocation node selection means 15 Preceding node search means

Claims

[Claims]

Claim 1: Rank-analyzing a data flow graph stored in a data flow graph storage means, and assigning nodes constituting the data flow graph to a plurality of processors constituting a parallel processing device, thereby In a data flow graph dividing device that divides a flow graph, a next allocation node selection means for selecting a node to be allocated next, and a node to be allocated selected by the next allocation node selection means are connected via an arc. a preceding node search means for searching for a preceding node already allocated from any of the plurality of processors; and a preceding node allocation processor searching means for searching for a processor to which the preceding node searched by the preceding node searching means has been allocated. A data flow graph dividing device characterized by comprising:

2. A data flow graph dividing method in which a data flow graph described in an object program is allocated to each of a plurality of processors constituting a parallel processing device that executes parallel processing, in which a node requiring a plurality of data is 1. A method for dividing a data flow graph, comprising dividing a data flow graph having a data flow graph at a location where there is a difference between an arrival rank of each data and an execution rank of an instruction, and allocating the divided data to each of the processors.