JPH02201695A

JPH02201695A - System for developing parallel processor

Info

Publication number: JPH02201695A
Application number: JP1021700A
Authority: JP
Inventors: Soichi Miyata; 宗一宮田; Satoshi Matsumoto; 敏松本; Kenji Shima; 憲司嶋; Takeshi Fukuhara; 福原　毅; Nobufumi Komori; 伸史小守
Original assignee: Mitsubishi Electric Corp; Sharp Corp
Current assignee: Mitsubishi Electric Corp; Sharp Corp
Priority date: 1989-01-31
Filing date: 1989-01-31
Publication date: 1990-08-09
Anticipated expiration: 2010-05-01
Also published as: JPH0740259B2

Abstract

PURPOSE:To efficiently execute debugging at high speed by inputting input data for processing from a control computer to a processor at high speed corresponding to a really applied speed and tracing a transfer condition by a tracer part. CONSTITUTION:A data packet 10 for processing execution is inputted from a control computer 31 through an interface part 40 to a processing element 10 of the parallel processor. The data packet incoming to trace ports is traced by plural tracer parts 60, for which the trace ports are connected to the plural terminals of the processing element 60, together with common time information synchronously with the internal clock of the system. Further, traced results stored in the trace memories of the plural tracer parts 60 are collected and formed as a file. Then, the results are formed as a data flow graph and displayed. Thus, the debugging of a hardware or a software can be made easy and speedy in the development of the parallel processor.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、並列処理装置開発システムに関し、特にデ
ータ駆動形（データフロー）プロセッサの開発における
ハードウェア、ソフトウェアのデバッグを容易化、迅速
化できる開発支援環境（開発支援ツール）を有する並列
処理装置開発システムに関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a parallel processing device development system, and particularly to a system that can facilitate and speed up debugging of hardware and software in the development of data-driven (data flow) processors. The present invention relates to a parallel processing device development system having a development support environment (development support tool).

[Conventional technology]

第１１図は従来のデータ駆動形プロセッサの開発システ
ムの構成を示す図であり、図において、９０はホストパ
ソコン、９１はタイマ、９２はメモリアドレスジェネレ
ータ、９３はデータ駆動形プロセッサ、９４はグラフィ
ックデイスプレィコントローラ、９５はイメージメモリ
、９６はＣＲＴ１９７はシステムハス、９８はイメージ
メモリバスである。FIG. 11 is a diagram showing the configuration of a conventional data-driven processor development system. In the figure, 90 is a host personal computer, 91 is a timer, 92 is a memory address generator, 93 is a data-driven processor, and 94 is a graphic processor. 95 is an image memory, 96 is a CRT 197 is a system bus, and 98 is an image memory bus.

この構成において、初期化、ブレークポイントの設定、
メモリ各部のダンプ表示・ロード・設定移動、入力、出
力、オブジェクトプログラムロードなどを行なう。In this configuration, initialization, setting breakpoints,
Performs dump display, loading, setting movement, input, output, object program loading, etc. of each part of memory.

次に動作について説明する。Next, the operation will be explained.

初期化、オブジェクトロード、各部メモリの設定・ロー
ドなどは、ホストパソコン９０からのコマンドに基づい
て、各部へホストパソコン９０からデータを書き込むこ
とにより行なわれる。逆に、ダンプ表示は各部からホス
トパソコン９０ヘデータを読み出すことにより行ない、
移動はこれらの組み合わせによって行なう。演算用デー
タの投入もホストパソコン９０から行なうことができる
。Initialization, object loading, setting/loading of each section's memory, etc. are performed by writing data from the host computer 90 to each section based on commands from the host computer 90. Conversely, dump display is performed by reading data from each part to the host computer 90,
Movement is performed by a combination of these. Data for calculation can also be input from the host computer 90.

[Problem to be solved by the invention]

従来の並列処理装置開発システムは以上のように構成さ
れ、ホストパソコンからのコマンドによって上記の処理
を行なうが、メモリの値を実行途中や、終了の段階で表
示できるのみで、データ駆動形プロセッサにおける処理
をトレースして、その結果によって、データ駆動形プロ
セッサを用いた応用システムの開発におけるデバッグの
能率化を図ることが可能な開発支援環境ではなく、デー
タ駆動形プロセッサの応用システムの効率的開発が行な
えないという問題点があった。Conventional parallel processing device development systems are configured as described above, and perform the above processing in response to commands from the host computer, but they can only display memory values during execution or at the end of execution, which is difficult for data-driven processors. It is not a development support environment that can trace processing and use the results to streamline debugging in the development of application systems using data-driven processors, but rather it is The problem was that it couldn't be done.

この発明は上記の問題点を解消するためになされたもの
で、データ駆動形プロセッサの応用システム開発におけ
るハードウェア及びソフトウェアのデバッグを能率よく
、高速に実施できる開発支援環境を有する並列処理装置
開発システムを得ることを目的とする。This invention was made to solve the above problems, and is a parallel processing device development system having a development support environment that enables efficient and high-speed debugging of hardware and software in the development of application systems for data-driven processors. The purpose is to obtain.

[Means to solve the problem]

この発明に係る並列処理装置開発システムは、１１１１
１卸コンピユータからマルチプロセッサ並列処理装置に
、該並列処理装置の実応用の処理速度に対応して高速に
処理用入力データを投入する、専用の投入パケットメモ
リを具備したデータ入力部と、上記並列処理装置の機能
部名部分のデータ転送状況を時刻情報とともに処理実行
状態でトレースする、トレースメモリを具備したトレー
ス部とを備えるとともに、該トレース部によるトレース
結果をファイル化し、データフローグラフを作成表示す
るようにしたものである。The parallel processing device development system according to this invention is 1111
1. A data input unit equipped with a dedicated input packet memory that inputs processing input data from a wholesale computer to a multiprocessor parallel processing device at high speed corresponding to the processing speed of the actual application of the parallel processing device; It is equipped with a trace unit equipped with a trace memory that traces the data transfer status of the functional unit name part of the processing device along with time information in the processing execution state, and also converts the trace results by the trace unit into a file to create and display a data flow graph. It was designed to do so.

[Effect]

この発明においては、専用の投入パケットメモリを具備
したデータ入力部によって処理用入力データの投入を実
応用の処理速度に対応して高速に行なうとともに、トレ
ースメモリを具備したトレーサによってプロセッサ各機
能におけるデータ転送状況を実行状態のままトレースし
、さらにこのトレース結果をファイル化し、データフロ
ーグラフ化するようにしたから、並列処理装置の開発に
おけるハードウェア及びソフトウェアのデバッグを能率
よく、高速に実施できる。In this invention, a data input unit equipped with a dedicated input packet memory inputs input data for processing at a high speed corresponding to the processing speed of practical applications, and a tracer equipped with a trace memory inputs input data for each processor function. Since the transfer status is traced while it is being executed, and the trace results are converted into a file and converted into a data flow graph, debugging of hardware and software in the development of a parallel processing device can be carried out efficiently and at high speed.

Ｃ実施例〕以下、この発明の一実施例を図について説明する。C Example] An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例による並列処理装置開発シス
テムの構成を示す図である。FIG. 1 is a diagram showing the configuration of a parallel processing device development system according to an embodiment of the present invention.

図において、１は開発支援環境、１０はデータ駆動形プ
ロセッサを含んで成るプロセシングエレメント（ＰＥ）
、２０はデータ駆動形プロセッサ本体部、２１は拡張プ
ログラム記憶部（ＥｘｔｅｎｄｅｄＰｒｏｇｒａｍ　Ｓ
ｔｏｒｅ：　Ｅ　Ｐ　Ｓ　）　、２　２は拡張データ記
憶部、（Ｅｘｔｅｎｄｅｄ　Ｄａｔａ　Ｓｔｏｒｅ：Ｅ
　Ｄ　Ｓ）　、２　３は外部カラー／スタック処理部（
Ｅｘｔｅｒｎａｌ　Ｃｏｌｏｒ／Ｓｔａｃｋ　Ｐｒｏｃ
ｅｓｓ：　Ｅ　Ｃ　Ｓ）　、３　１はパソコンなどの制
御コンピュータ、４０はインタフェース部、６０はトレ
ーサ部である。In the figure, 1 is a development support environment, and 10 is a processing element (PE) including a data-driven processor.
, 20 is a data-driven processor main unit, and 21 is an extended program storage unit (Extended Program S).
tore: EP S ), 2 2 is an extended data storage unit, (Extended Data Store: E
DS), 2 3 is an external color/stack processing unit (
External Color/Stack Proc
ess: ECS), 3 1 is a control computer such as a personal computer, 40 is an interface section, and 60 is a tracer section.

第２図はプロセシングエレメント１０の構成ヲ示す図で
、図において、１１はキャシュプログラム記憶（Ｃａｃ
ｈｅ　Ｐｒｏｇｒａｍ　Ｓｔｏｒｅ：　Ｃ　Ｐ　Ｓ）　
、１　２は発火処理（Ｆｉｒｉｎｇ　Ｃｏｎｔｒｏｌ：
　Ｆ　Ｃ）　、１　３は演算処理（Ｆｕｎｃｔｉｏｎａ
ｌ　Ｐｒｏｃｅｓｓ：　Ｆ　Ｐ）　、１　４は合流分岐
機能（Ｊｕｎｃｔｉｏｎ　＆　Ｂｒａｎｃｔ＋：　Ｊ　
＆Ｂ　）　、１　５はキューバソファ　（Ｑｕｅｕｉｎ
ｇ　Ｂｕｆｆｅｒ：　ＱＢ）である。FIG. 2 is a diagram showing the configuration of the processing element 10. In the figure, 11 is a cache program storage (Cache program storage).
he Program Store: CPS)
, 1 2 is the firing process (Firing Control:
FC), 1 3 is arithmetic processing (Functiona)
l Process: F P), 14 is a junction & Branct+: J
&B), 1 5 is a Cuban sofa (Queuin
g Buffer: QB).

第３図はデータ駆動形プロセッサのパケットを示す図で
あり、図において、各フィールドは図中に示している通
りの機能を有するものである。FIG. 3 is a diagram showing a packet of a data-driven processor, and in the figure, each field has a function as shown in the figure.

第４図はインターフェース部４０の構成図であリ、４１
はインターフェース部４０とトレーサ部６０を管理する
従来形（ノイマン）マイクロプロセッサユニットであり
、４２はマイクロプロセッサユニット４１のプログラム
とデータを記憶するＲＯＭ及びＲＡＭ、４３は制御コン
ピュータ３１と接続するシリアルＩ１０コントローラ、
４４はプロセシングエレメント１０のデータ駆動形プロ
セッサにデータパケットを投入するための出力ポート、
４５は投入パケットメモリにデータ書き込みするための
書込みアドレスカウンタ、４６は投入パケットメモリか
らデータ読み出しするための読出しアドレスカウンタ、
４７はデータパケットの出力ポート４４からの出力を終
了するときに用いられるパケット出力を停止するための
停止アドレスラッチ、４８ａは投入パケットメモリのア
ドレスマルチプレクサ、４８ｂは投入パケットメモリの
データマルチプレクサ、４９は出力ポートの出力ドライ
バ、５０は４ｋＸ６４ｂｉｔｓの投入パケットメモリ、
５１はデータパケット投入時に投入パケットメモリ５０
のアドレスを停止アドレスラッチ４７の内容と比較する
アドレス比較器、５２はデータパケット投入状態を示す
フリップフロップ、５３はデータパケット投入開始トリ
ガ発生器、５４はデータパケット投入の際のパケット間
隔を記憶する投入間隔ラッチ、５５はデータパケット投
入間隔測定のための投入間隔カウンタ、５６はデータパ
ケット投入の出力制御部である。FIG. 4 is a configuration diagram of the interface section 40, 41
is a conventional (Neumann) microprocessor unit that manages the interface unit 40 and tracer unit 60; 42 is a ROM and RAM that stores programs and data for the microprocessor unit 41; and 43 is a serial I10 controller that connects to the control computer 31. ,
44 is an output port for inputting data packets to the data-driven processor of the processing element 10;
45 is a write address counter for writing data into the input packet memory; 46 is a read address counter for reading data from the input packet memory;
Reference numeral 47 indicates a stop address latch for stopping the packet output, which is used when ending the output of data packets from the output port 44, 48a indicates an address multiplexer for input packet memory, 48b indicates a data multiplexer for input packet memory, and 49 indicates an output. Port output driver, 50 is 4k x 64 bits input packet memory,
51 is an input packet memory 50 when inputting a data packet.
52 is a flip-flop that indicates the data packet input state, 53 is a data packet input start trigger generator, and 54 stores the packet interval when data packets are input. An input interval latch 55 is an input interval counter for measuring the data packet input interval, and 56 is an output control unit for inputting data packets.

第５図はトレーサ部６０の構成図であり、図において、
６１はトレースポート、６２はタイマ、６３は入力ラッ
チ、６４は同期化回路、６５はデマルチプレクサ、６６
はモード制御部、６７はアドレスカウンタ、６８は４ｋ
ｘ９６ｂｉｔのトレースメモリ、６９はリード／ライト
コントローラ、７０はブレークポイントラッチ、７１は
比較器、７２はブレークポイントアドレスランチである
。FIG. 5 is a configuration diagram of the tracer section 60, and in the figure,
61 is a trace port, 62 is a timer, 63 is an input latch, 64 is a synchronization circuit, 65 is a demultiplexer, 66
is a mode control unit, 67 is an address counter, and 68 is a 4k
x96-bit trace memory, 69 a read/write controller, 70 a breakpoint latch, 71 a comparator, and 72 a breakpoint address launch.

次にこの実施例の動作について説明する。Next, the operation of this embodiment will be explained.

データ駆動プロセッサ１０はＣＰＳＩＩ、ＦＣｌ２、Ｆ
Ｐ１３を基本要素とし、Ｊ＆Ｂ１４、ＱＢ１５を合わせ
て第２図のように巡回パイプラインを構成している。Ｅ
ＰＳ２１はＣＰＳＩＩＯ外部拡張プログラムを格納する
機能部、ＥＤＳ２２は配列データなどを格納する機能部
、ＥＣ３２３はカラー管理と外部キューの機能部である
。The data driven processor 10 is CPSII, FCl2, F
P13 is the basic element, and J&B14 and QB15 are combined to form a cyclic pipeline as shown in FIG. E
PS21 is a functional unit that stores a CPSIIO external extension program, EDS22 is a functional unit that stores array data, etc., and EC323 is a functional unit that handles color management and external queues.

第３図のようなフォーマットを有する入力パケットはＪ
＆Ｂ１４の合流部を経て、ＣＰＳｌｌに入力される。Ｃ
ＰＳＩＩはＥＰＳ２１を持ち、ＦＣｌ２を通過したパケ
ットの次位行先をトリガに、ＥＰＳから次に必要となる
プログラムデータを取り出し、ＣＰＳＩＩに格納する。The input packet with the format shown in Figure 3 is J
The signal is input to CPSll through the confluence section of &B14. C
The PSII has an EPS 21, and, triggered by the next destination of the packet that has passed through the FCl2, extracts the next required program data from the EPS and stores it in the CPSII.

単項演算の場合はそのまま、二項演算の場合にはオペラ
ンド対を形成した後、ＦＣｌ２から出力される。この演
算パケットはＦＰ１３に送られ命令コード（ＯＰＣ：　
０ｐｅｒａｔｉｏｎ　Ｃｏｄｅ）により演算され、Ｊ＆
Ｂ１４の分岐機能により出力されるか否かが判定され、
出力されない場合には再びＣＰＳＩＩに戻って以下同様
の処理を繰り返す。In the case of a unary operation, it is output as is, and in the case of a binary operation, it is output from the FCl2 after forming an operand pair. This calculation packet is sent to the FP13 and the instruction code (OPC:
0operation Code), and J&
It is determined whether or not to output by the branch function of B14,
If it is not output, the process returns to CPSII again and the same process is repeated.

プログラムはループを除いて、基本的に若いノード番号
から実行されることを前提とし、ＦＣｌ２のマツチング
メモリはハツシュされ、マツチングメモリでハツシュ衝
突した時には、世代が小さいもの、ノード番号の小さい
ものを保存し、そうでないものを巡回パイプラインある
いはＥＣ３２３へ送出し、常に優先度の高いものから順
に処理を実行して、チップ内の巡回パイプラインが溢れ
ることなく処理実行がなされることを可能とするもので
ある。It is assumed that the program is basically executed from the lowest node number, except for loops, and the matching memory of FCl2 is hashed, and when there is a hash collision in the matching memory, the one with the smaller generation or the one with the smaller node number It is possible to execute processing without overflowing the cyclic pipeline in the chip by saving the cyclic pipeline in the chip and sending the other ones to the cyclic pipeline or EC323, and always executing the processing in order of priority. It is something to do.

また、ＣＰＳＩＩとＦＣｌ２の間は巡回パイプラインの
他の部分のデータバス２本分を有し、ＣＰＳＩＩにおけ
るＣ０ＰＹ処理時にもデータ転送路に隘路のない構成と
なっている。Furthermore, there are two data buses for other parts of the cyclic pipeline between CPSII and FCl2, so that there is no bottleneck in the data transfer path even during C0PY processing in CPSII.

開発支援環境１は制御コンピュータ３１によってその動
作が制御される。ハードウェアは、インターフェース部
４０及びトレーサ部６０を含み、最小６枚構成である。The operation of the development support environment 1 is controlled by a control computer 31. The hardware includes a minimum of six pieces including an interface section 40 and a tracer section 60.

そのうち、データ駆動形プロセッサを含んで成るプロセ
シングエレメント１０はデータ駆動形プロセッサ本体部
２０、拡張プロセッサ記憶部ＥＰＳ２１、拡張データ記
憶部ＥＤＳ２２、外部カラー・スタック処理部ＥＣ３２
３の４枚で構成されている。Among them, the processing element 10 including a data-driven processor includes a data-driven processor main unit 20, an extended processor storage unit EPS21, an extended data storage unit EDS22, and an external color stack processing unit EC32.
It consists of 4 pieces of 3.

開発支援環境１全体の制御は制御コンピュータ３１によ
って行なわれる。制御コンピュータ３１の開発支援環境
制御プログラムによってデータの投入・収集を行なう。The entire development support environment 1 is controlled by a control computer 31. Data is input and collected by the development support environment control program of the control computer 31.

同制御プログラムによる処理機能を以下に示す。The processing functions of the control program are shown below.

■　投入のモードプログラム・データのロード、入力データパケットのロ
ード、入力データパケット数の設定、入力データパケッ
トの投入、ダンプ用バケットの投入。■ Input mode Load program/data, load input data packets, set number of input data packets, input input data packets, input dump bucket.

■　収集のモードブレークポイント比較値の設定、ブレークポイントマス
ク値の設定、トレース開始するトレーサの起動、トレー
スアドレスカウンタのプリセット、トレースメモリのフ
ァイルへの書き込み、ブレーポイント発生アドレス読み
出し。■ Collection mode Set breakpoint comparison value, set breakpoint mask value, start tracer to start tracing, preset trace address counter, write trace memory to file, read breakpoint generation address.

メモリのダンプはプロセシングエレメント１０のＰＥ＃
毎にＥＤＳ２１．ＦＣｌ２．ＥＰＳ２１に対してスター
トアドレス、エンドアドレスを指定してそれぞれのメモ
リから出力されるダンプバケットを、トレーサに収集す
るものである。この他に、初ル］化、所定時間の待機、
制御プログラムからの復帰などが行なえるものとなって
いる。Memory dump is performed by PE# of processing element 10.
Every EDS21. FCl2. A tracer collects dump buckets output from each memory by specifying a start address and an end address for the EPS 21. In addition to this,
It is possible to return from the control program.

第４図及び第５図はそれぞれインターフェース部及びト
し−ス部の機能構成図である。制御コンピュータ：３１
とデζ−タ駆動形プロセソザを含む開発支援環境１との
）〜・−夕の転送は、インターフェース部４０を介して
行なわれる。FIGS. 4 and 5 are functional configuration diagrams of the interface section and the toss section, respectively. Control computer: 31
Transfers between the data and the development support environment 1 including the data-driven processor are performed via the interface unit 40.

以下に両機能部の動作を詳述する。インターフェース部
４０は・シリアルボー１〜４３を持ち、制御コンピュー
タ３１と接続される。、ＭＰＵ４１はシリアルボーＩ・
４３からのコマンドにより、上記の投入モー　ドと収集
モ・−ドの機能を実行する。インターフェース部４０は
電源供給と同時に開発支援環境ｌ全体の初期化を行な２
つだ後、制御コンピュータからのコマンド４待つ。プロ
グラム、データのロードは、１パケット当り第３図の様
なタグ領域とデータ３２ビツトを−まとめにして、投入
バケットメモリ５０の２ワードへの書き込みを行ない出
力ポート４４からバケット毎に出力することにより行う
。入力データパケットの投入は高速に行うことが不可欠
であるので、投入バケットメモリ５０を用いている。投
入バケットメモリ５０は書き込みアドレスカウンタ４５
を用いて最大４にワード分までロードし、入力データパ
ケット数に対応した情報を停止アドレスラッチ４７に設
定した後、投入間隔を指定した投入コマンドにより読出
しアドレスカウンタ４６が停止アドレスラッチ４７　に
設定された値に一致するまで一気に投入する。The operations of both functional units will be explained in detail below. The interface unit 40 has serial ports 1 to 43 and is connected to the control computer 31. , MPU41 is serial baud I.
The above-mentioned input mode and collection mode functions are executed by commands from 43. The interface unit 40 initializes the entire development support environment 2 at the same time as supplying power.
After that, wait for command 4 from the control computer. To load programs and data, the tag area and 32 bits of data as shown in FIG. This is done by Since it is essential to input data packets at high speed, an input bucket memory 50 is used. The input bucket memory 50 has a write address counter 45
After loading up to 4 words using the input data packet number and setting information corresponding to the number of input data packets in the stop address latch 47, the read address counter 46 is set in the stop address latch 47 by an input command that specifies the input interval. Pour in all at once until it matches the specified value.

更に、インターフェース部４０のアドレス・データバス
はトレーサ部６０へ接続され、トレーサ部６０の制御も
併せて行われる。Further, the address/data bus of the interface section 40 is connected to the tracer section 60, and the tracer section 60 is also controlled.

トレーサ部６０は４ｋＸ９５ｂｉｔのトレースメモリ６
８を持ち、プロセシングエレメント１０のいずれの端子
にも接続してそのトレースを行うことができる。トレー
サ部６０はインターフェース部４０のアドレス・データ
バスと接続され、インターフェース部４０から直接制御
される。必要に応じて、ブレークポイントの比較値、マ
スクデータの設定を行うことが可能である。インターフ
ェース部４０からトレーサ番号、トレースモードの設定
の後、トレースの開始が指示される。その後トレーサ部
６０はトレースポート６１から入って来るデータパケッ
トを内部クロックと同期化してランチし、時間情報とと
もにトレースメモリ６８内に貯えて行く。ブレークポイ
ントランチ７０にブレークポイントを設定した時には、
比較器７１の出力で一致の採れたバケットを検出した後
停止するが、停止にあたっては直ちに停止、メモリ容量
の１／２のトレースの後停止、メモリ容量分のトレース
の後停止の３つのトレースモードを選択でき、トレース
履歴を有効に記憶する。The tracer section 60 has a 4k x 95 bit trace memory 6.
8, and can be connected to any terminal of the processing element 10 for tracing. The tracer section 60 is connected to the address/data bus of the interface section 40 and is directly controlled by the interface section 40. If necessary, it is possible to set breakpoint comparison values and mask data. After setting the tracer number and trace mode, the interface unit 40 instructs to start tracing. Thereafter, the tracer section 60 synchronizes the data packets coming in from the trace port 61 with the internal clock, launches them, and stores them in the trace memory 68 along with time information. When setting a breakpoint on breakpoint launch 70,
It stops after detecting a matching bucket in the output of the comparator 71, but there are three trace modes: stop immediately, stop after tracing 1/2 of the memory capacity, and stop after tracing the memory capacity. can be selected and the trace history can be effectively stored.

トレース終了後、ブレークポイントアドレスラッチ７２
からブレークポイント発生アドレスを読み出し、トレー
スモードとアドレスカウンタ６フ値から有効データを決
定できる。After tracing, breakpoint address latch 72
The breakpoint generation address can be read from , and valid data can be determined from the trace mode and address counter 6 value.

このトレースメモリ６８内のトレース結果は、インター
フェース部４０により、ダンプし、ファイル化すること
ができ、このファイルをリスト表示することができる。The trace results in the trace memory 68 can be dumped and converted into a file by the interface unit 40, and this file can be displayed as a list.

トレース部６０は複数接続可能で１インターフェース部
４０当り１５個のトレース部６０を用いて１５点を同時
計測可能である。演算パケットや結果パケットなど機能
部との接続点でトレースを行い、同時に記憶されるトレ
ース時刻情報と共にシステム全体の動作を捕捉できる。A plurality of trace sections 60 can be connected, and 15 trace sections 60 can be used per one interface section 40 to simultaneously measure 15 points. Tracing is performed at connection points with functional units such as calculation packets and result packets, and the operation of the entire system can be captured along with the trace time information stored at the same time.

本実施例ではトレーサ部６０によって収集されたプロセ
シングエレメント１０の各部のデータを表示することが
可能である。第６図はトレース結果の表示の様子の一例
を示す図であり、第６図ｆａ）はＦＰ１３出力表示、第
６図（ｂ）はＥＣ３２３出力表示を示している。In this embodiment, it is possible to display data of each part of the processing element 10 collected by the tracer unit 60. FIG. 6 is a diagram showing an example of how trace results are displayed. FIG. 6 fa) shows an FP13 output display, and FIG. 6B shows an EC323 output display.

データ駆動形プロセッサは言語処理系と併せて開発して
いるが、言語処理系のコンパイラ出力形式ファイルと、
マツパ出力であるオブジェクトコード、及び実行トレー
ス結果については、そのプログラムを図式表記・修正す
るツールを備えている。これはマルチプロセッサ実行環
境においてもプロセッサ毎のオブジェクトコードと実行
トレース結果の図的表示を可能とするものである。これ
らを比較することによって、未処理ノードや未投入デー
タパケット、未アクセスメモリなどが極めて簡単に発見
でき、マルチプロセッサのデバッグが容易である。実行
時の最大並列度、平均並列度、実行ランク数、実行時間
などが時刻情報と併せて表示でき、シミュレータ実行結
果と同様に稼動率の評価も簡単に行なえる。The data-driven processor is developed in conjunction with the language processing system, but the language processing system's compiler output format file,
For the object code output from Matsupa and the execution trace results, tools are provided to graphically represent and modify the program. This allows graphical display of object code and execution trace results for each processor even in a multiprocessor execution environment. By comparing these, unprocessed nodes, uninput data packets, unaccessed memory, etc. can be found extremely easily, making it easy to debug multiprocessors. Maximum degree of parallelism, average degree of parallelism, number of execution ranks, execution time, etc. during execution can be displayed along with time information, and operation rates can be easily evaluated in the same way as simulator execution results.

この表示にあたっては、例えば特願昭６．２−５４４０
６「連想記憶装置及びデータ駆動形計算機」に記載され
たデータ入れ換え方式におけるノード番号付与に必要な
ランク解析結果を用いて表示し、アークについても、重
なりを減少するアルゴリズムを導入して視認性を向上さ
せている。For this display, for example, Japanese Patent Application No. 6.2-5440
6 "Associative memory devices and data-driven computers" The results of the rank analysis necessary for assigning node numbers in the data replacement method described in 6. Improving.

コンパイラ出力形式ファイルの図的表示は関数単位で行
なえるもので、複数の関数が展開されたマツパ出力（オ
ブジェクトコード）の図的表示に比較すると一般に見易
いものである。A graphical display of a compiler output format file can be performed on a function-by-function basis, and is generally easier to see than a graphical display of Matsupa output (object code) in which multiple functions are expanded.

第７図は３次多項式演算プログラムのデータフローグラ
フのマツパ出力のグラフインク表示例を示す図、第８図
はトレース結果の表示例を示す図である。FIG. 7 is a diagram showing a graph ink display example of the mapper output of the data flow graph of the cubic polynomial calculation program, and FIG. 8 is a diagram showing a display example of the trace result.

このように本実施例では、投入パケットメモリ５０を備
えたインタフェース部４０を介して制御コンピュータ３
１からの処理実行のためのデータパケットを高速に並列
処理装置のプロセシングエレメント１０に投入するとと
もに、プロセシングエレメント１０の複数の端子にトレ
ースポートが接続された、各々トレースメモリ６８を備
えた複数のトレーサ部６０により上記トレースポートに
入ってくるデータパケットをシステムの内部クロックと
同期して共通の時刻情報とともにトレースし、さらに該
複数のトレーサ部６０のトレースメモリに貯えられたト
レース結果を収集してファイル化し、データフローグラ
フ化して表示するようにしたから、並列処理装置の開発
におけるハードウェアあるいはソフトウェアのデバッグ
を極めて容易化、迅速化できる。In this way, in this embodiment, the control computer 3
A plurality of tracers each having a trace memory 68 and having trace ports connected to a plurality of terminals of the processing element 10 input data packets for processing execution from 1 to the processing element 10 of the parallel processing device at high speed. The unit 60 traces data packets entering the trace port in synchronization with the internal clock of the system together with common time information, and further collects the trace results stored in the trace memories of the plurality of tracer units 60 and files them. Since the data flow graph is displayed in the form of a data flow graph, debugging of hardware or software in the development of a parallel processing device can be extremely facilitated and speeded up.

なお、上記実施例では、データ駆動形プロセッサの開発
支援環境として述べてきたが、他の並列処理計算機ある
いはプロセッサにおいても同様の方式で実現が可能であ
る。Although the above embodiment has been described as a development support environment for a data-driven processor, it can also be implemented in a similar manner in other parallel processing computers or processors.

また、第１図の実施例において、プロセッサエレメント
１０の接続方法については特に明記していないが、第９
図のような、シャツフルネット接続や、第１０図のよう
なデイジ−チェーン接続など、その他の様々のものが可
能である。In addition, in the embodiment shown in FIG.
Various other connections are possible, such as a shirt-full net connection, as shown, or a daisy-chain connection, as shown in FIG.

〔Effect of the invention〕

以上のように、この発明によれば、専用の投入パケット
メモリを具備したデータ入力部を介して制御コンピュー
タより処理用入力データをプロセッサに実応用の処理速
度に対応して高速に投入するとともに、トレースメモリ
を具備したトレーサ部によってプロセッサの各機能部に
おけるデータ転送状況を実行状態のまま時刻情報ととも
にトレースし、さらにこのトレース結果をファイル化し
、データフローグラフ化して表示するようにしたから、
並列処理装置の開発におけるハードウェア及びソフトウ
ェアのデバッグを能率よく、高速に実施できる効果があ
る。As described above, according to the present invention, input data for processing is inputted from the control computer to the processor at high speed corresponding to the processing speed of the actual application via the data input unit equipped with a dedicated input packet memory, and A tracer section equipped with a trace memory traces the data transfer status of each functional section of the processor along with time information while in the execution state, and the trace results are converted into a file and displayed as a data flow graph.
This has the effect of enabling efficient and high-speed debugging of hardware and software in the development of parallel processing devices.

[Brief explanation of the drawing]

第１図は本発明の一実施例による並列処理装置開発シス
テムの構成を示す図、第２図はプロセシングエレメント
の構成を示す図、第３図はデータ駆動形プロセッサのバ
ケットを示す図、第４図はインターフェース部の構成図
、第５図はトレーサ部の構成図、第６図はトレース結果
の一例を示す図、第７図はデータフローグラフであるマ
ツパ出力のグラフインク表示例を示す図、第８図はトレ
ース結果の表示例を示す図、第９図はデータ駆動形プロ
セッサのシャツフルネット接続を示す図、第１０図はデ
ータ駆動形プロセッサのデイジ−チェーン接続を示す図
、第１１図は従来の並列処理装置開発システムの構成を
示す図である。１は開発支援環境、１０はデータ駆動形プロセッサから
成るプロセシングエレメント、２０はデータ駆動形プロ
セッサ本体、４０はインタフェース部、６０はトレーサ
部である。なお図中同一符号は同−又は相当部分を示す。FIG. 1 is a diagram showing the configuration of a parallel processing device development system according to an embodiment of the present invention, FIG. 2 is a diagram showing the configuration of processing elements, FIG. 3 is a diagram showing buckets of a data-driven processor, and FIG. Figure 5 is a configuration diagram of the interface section, Figure 5 is a configuration diagram of the tracer unit, Figure 6 is a diagram showing an example of a trace result, Figure 7 is a diagram showing an example of graph ink display of Matupa output which is a data flow graph, FIG. 8 is a diagram showing a display example of trace results, FIG. 9 is a diagram showing a shirtful net connection of data-driven processors, FIG. 10 is a diagram showing a daisy-chain connection of data-driven processors, and FIG. 1 is a diagram showing the configuration of a conventional parallel processing device development system. 1 is a development support environment, 10 is a processing element consisting of a data-driven processor, 20 is a data-driven processor main body, 40 is an interface section, and 60 is a tracer section. Note that the same reference numerals in the figures indicate the same or equivalent parts.

Claims

[Claims]

(1) A parallel processing device consisting of one or more multiprocessors, a control computer for driving the parallel processing device, memory means for storing a plurality of packets for processing execution from the control computer, and a data packet input section comprising a measuring means for inputting the packets stored in the memory means to the multiprocessor of the parallel processing device at settable input intervals; and a desired location of the plurality of functional sections of the multiprocessor. a tracer unit connected to an input/output port and a data transfer port provided in the controller, and equipped with an internal trace memory in which data packets from the ports are captured and stored in synchronization with an internal clock along with common time information. , and a function of displaying data packets that are processing execution results captured in the trace memory of the tracer unit in the order of processing;
A parallel processing device development system characterized by having a function of comparing this with an execution program.