JPH03218539A

JPH03218539A - Debug method in parallel computer system

Info

Publication number: JPH03218539A
Application number: JP2316124A
Authority: JP
Inventors: Kyoko Iwazawa; 岩澤　京子; Giichi Tanaka; 義一田中
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-11-27
Filing date: 1990-11-22
Publication date: 1991-09-26

Abstract

PURPOSE:To detect the cause of a deadlock state by linking a debug processing control routine to a compiling source program when there is a bug in a sub-program. CONSTITUTION:All processor element 24-0 to 24-n are set to a state where they can be started by the debug processing control routine of a host computer 10, and the processor elements are actually operated one by one. When the processor element which comes to a data waiting state stops and the information on a resuming address and the state are given to the host computer 10, it starts the other processor and the processor element in the data waiting state is registered in a starting table or a queue. For selecting one element to be started, the initialization of the starting table or the queue is variously changed and the starting order of the processor elements is changed as against a processing which is originally executed in parallel. Thus, the bug can easily be removed from trace data of respective conditions.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、並列計算機システムに係わり、特に各プロセ
ッサエレメント用に書かれたユーザープログラムの並列
処理に固有のバグを取り除くデバッグ処理制御方法とそ
のためのシステムに関する。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to parallel computer systems, and in particular to a debug processing control method for eliminating bugs inherent in parallel processing of user programs written for each processor element, and a debug processing control method therefor. Regarding the system.

[Prior art]

計算機速度の向上のために、複数のプロセッサエレメン
トを同時に動かす、並列計算機システムが考案されてい
る。In order to improve computer speed, parallel computer systems have been devised that run multiple processor elements simultaneously.

従来の並列計算機システムには、全てのプロセッサ・エ
レメントが共有のメモリに接続され、そのメモリを介し
てデータを受け渡しするタイプのものと，各プロセッサ
・エレメントがローカルメモリを有し，ネットワークを
介して他のプロセッサ・エレメントに直接データを転送
するタイプのものとがある。前者では、データの受け渡
し時の共有メモリセクセス順序を制御するために同期制
御がなされ、後者では、他のプロセッサ・エレメントか
らのデータの送信をそのデータの受信の順序を制御する
ために、データ送受信の制御がなされる。Conventional parallel computer systems include those in which all processor elements are connected to a shared memory and data exchanged via that memory, and those in which each processor element has local memory and data is exchanged via a network. There are types that transfer data directly to other processor elements. In the former, synchronization control is performed to control the shared memory access order during data transfer, and in the latter, synchronization control is used to control the order of data transmission and reception from other processor elements. Control is exercised.

並列計算機システムに対するユーザインタフェイスとし
て、フォートラン（ＦＯＲＴＲＡＮ）のような従来の言
語が流用されたり、又は、専用言語が用意されている。Conventional languages such as FORTRAN are used as user interfaces for parallel computer systems, or dedicated languages are prepared.

これらの言語を用いてプログラムをコーディングする場
合、並列処理に固有のバグがプログラムに入りこむこと
がある。並列処理に固有のバグには、共有するメモリを
アクセスする時に必要な順序付けのための同期制御の不
正（抜けも含む）による同一アドレスへの複数プロセッ
サからの定義・使用によるものや、分散メモリに対して
プログラムを書き換える時に入りこむデータ送受信の不
正（抜けも含む）やアルゴリズムの誤りがある。When programs are coded using these languages, bugs specific to parallel processing may be introduced into the programs. Bugs specific to parallel processing include those caused by multiple processors defining and using the same address due to incorrect synchronization control (including omissions) for ordering required when accessing shared memory, and bugs caused by distributed memory. On the other hand, there are errors in data transmission and reception (including omissions) and algorithm errors that occur when rewriting programs.

特に、同期制御がさ九ずに並列処理が行なわれるときは
、各処理の実行順序は保証されず、実行するたびに実行
順序が異なり，再現性が保証されない。In particular, when parallel processing is performed without synchronization control, the execution order of each process is not guaranteed, and the execution order differs each time the processes are executed, so reproducibility is not guaranteed.

このような並列計算機のデバッグ方法として確立した手
法があるわけではないが、これらのバグを取り除くため
にデバッグ方法も幾つか考えられている。製品として世
に出ているものに，ＳＥＱＥＮＴ　ＣＯＭＰＵＴＥＲ　
ＳＹＳＥＭＳ社の”Ｐｄｂｘ　ＰａｒａｌｌｅｌＤｅｂ
ｕｇｇｅｒ”があり、製品パンフレット“ＰｄｂｘＰａ
ｒａｌｌｅｌ　Ｄｅｂｕｇｇｅｒ　ｆｏｒ　Ｂａｌａｎ
ｃｅ　ＣｏｍｐｕｔｅｒＳｙｓｔｅＩＩＩｓ”に解説が
記されている。この製品は並列実行時にＰｓｓｔ／Ｗａ
ｉｔの同期制御についてトレースを出力することにより
並列動作状況を認識させる手段を用意している。また，
プロセッサの実行を途中で中断させる機能を持っている
。　同期制御の不正によるバグを含むプログラムに対し
て、安定した再現性のある状態でデバッグするためには
、並列に処理を行わず、逐次に実行して、その時のデー
タの状態から同期制御の不良を検出することが望ましい
。Although there is no established method for debugging such parallel computers, several debugging methods have been considered to remove these bugs. SEQENT COMPUTER is a product that is available in the world.
“Pdbx ParallelDeb” from SYSEMS
There is a product pamphlet “PdbxPa
ralrel Debugger for Balan
ce Computer System IIIs”.This product supports Psst/Wa during parallel execution.
We have prepared a means to recognize the parallel operation status by outputting a trace regarding IT synchronous control. Also,
It has the ability to interrupt processor execution midway through. In order to debug a program that contains a bug due to incorrect synchronization control in a stable and reproducible state, execute it sequentially instead of processing in parallel, and check the state of the data at that time to find out whether the synchronization control is defective or not. It is desirable to detect

デバッグ時に処理を逐次的に実行する方法はＪＰ−１−
１０６２３４に開示されている。しかしここでは，一度
全処理を並列に実行させ，そのときのトレースデータを
利用して、逐次に実行すべき処理の順序を決めている。JP-1- How to execute processing sequentially during debugging
106234. However, here, all processes are once executed in parallel, and the trace data at that time is used to determine the order of processes to be executed sequentially.

[Problem to be solved by the invention]

この従来技術では、逐次実行する前に並列に実行する必
要がある。また、一般に，並列計算機システム用のプロ
グラムでは、データの送受信や同期制御の実行文がユー
ザープログラムに記述されているため、記述されている
ため、複数のこれらのプログラムを逐次実行させると一
つのプログラムの実行中に送られることの無いデータを
待ち続けて、計算を進めることができないというデータ
待ち状態になり，そのプログラムの実行が進行しなくな
るという問題がある。すなわち、正常なプログラムであ
るにもかかわらずプログラムの実行が進まなくなる。This prior art requires parallel execution before serial execution. In addition, in general, in programs for parallel computer systems, executable statements for data transmission and reception and synchronization control are written in the user program. There is a problem in that the program continues to wait for data that will not be sent during execution, resulting in a data waiting state in which calculation cannot proceed, and the execution of the program stops. In other words, the program execution stops even though it is a normal program.

本発明の目的は、上記問題点を解決し，並列に実行させ
るためにプロセッサごとに記述したプログラムを、より
簡単な方法で逐次に実行させる方法を提供することにあ
る。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and provide a method for sequentially executing programs written for each processor in a simpler manner to execute them in parallel.

本発明の他の目的は、このようなプログラムを正常なデ
ータ送受信や同期制御によりデータ待ち状態に陥ること
なく，逐次に実行させる方法を提供することにある。Another object of the present invention is to provide a method for sequentially executing such programs without falling into a data waiting state through normal data transmission/reception and synchronous control.

また、本発明の他の目的は、通常実行用ライブラリとデ
バッグ実行用ライブラリを準備し、ユーザープログラム
のメインプログラムを修正することなく，デバッグ処理
を行なうことができるデバッグ処理制御方法とそのため
のシステムを提供することにある。Another object of the present invention is to provide a debug processing control method and a system therefor, which can prepare a normal execution library and a debug execution library and perform debug processing without modifying the main program of a user program. It is about providing.

[Means to solve the problem]

上記目的達成のため、本発明では、次の方法により並列
処理用プログラムを逐次に実行する。In order to achieve the above object, the present invention sequentially executes a parallel processing program using the following method.

本実施例の動作の概要をまず述べる。ホスト計算機用ユ
ーザープログラムの中で呼び出されるデバッグ処理制御
ルーチンが、以下のような処理を実行する：（１）初期化として、起動可能状態とされた全プロセッ
サエレメントが、ホスト計算機により参照される起動テ
ーブルまたはキューに登録される。First, an overview of the operation of this embodiment will be described. The debug processing control routine called in the user program for the host computer executes the following processing: (1) As initialization, all processor elements that have been enabled to start are activated so that they are referenced by the host computer. Added to a table or queue.

（２）起動可能なプロセッサエレメントのうち任意の１
台のプロセッサエレメントが起動され、そのプロセッサ
エレメントが、「データ待ち状態」か「処理終了状態』
になるまで対応するプログラムを実行し、その状態と再
開アドレスをホスト計算機に渡して停止する。(2) Any one of the processor elements that can be activated
One processor element is activated, and the processor element is in the "data waiting state" or "processing completed state".
It executes the corresponding program until it reaches , passes its status and restart address to the host computer, and then stops.

（３）実行可能なプロセッサエレメントが無くなるまで
、（２）の処理を繰り返す間に同時に実行されるべきプ
ロセッサエレメントを１台ずつ動かす。(3) While repeating the process in (2), processor elements to be executed at the same time are moved one by one until there are no more executable processor elements.

（４）（２）の起動可能なプロセッサエレメントのうち
起動をかける１台を選択するのに，起動テーブルまたは
キューの初期設定を様々に変えて、本来並列に実行すべ
き処理に対してプロセッサエレメントの起動順序を変化
させる。(4) In order to select one of the processor elements that can be activated in (2), the initial settings of the activation table or queue are changed variously, and the processor elements are used for processing that should originally be executed in parallel. Change the startup order of .

[Effect]

処理（１）により、ホスト計算機のデバック処理制御ル
ーチンにより、全プロセッサエレメントが起動可能な状
態として取り扱える状態にされる。Through process (1), the host computer's debug processing control routine brings all processor elements into a state where they can be activated.

処理（２）と（３）により、実際にプロセッサエレメン
トが１台ずつ動かすことができる。「データ待ち状態」
となったプロセッサエレメントが停止してホスト計算機
に再開アドレススとともにその状態を通知し、通知を受
け取ったホスト計算機が他のプロセッサに起動をかける
。このように、分散メモリに必要なデータの送受信や共
有メモリアクセスの順序付けに必要は同期制御を含むプ
ロセッサエレメントごとのサブプログラムが、分散メモ
リに必要なデータの送受信や共有メモリアクセスの順序
付けに必要な同期制御で指定された順序を守りながら１
台ずつのプロセッサエレメントで実行されることが可能
となる。また，「データ待ち状態」で再開アドレスを渡
したプロセッサエレメントを起動テーブルまたはキュー
に登録し、他のプロセッサエレメントを実行したことに
よりデータが到着し、当該プロセッサエレメントを実行
することが可能となった場合には，そのプロセッサエレ
メント以下起動をかけることができる。Through processes (2) and (3), the processor elements can actually be moved one by one. "Waiting for data"
The processor element that has become disabled stops and notifies the host computer of its status along with the restart address, and the host computer that receives the notification starts up other processors. In this way, subprograms for each processor element, including synchronization control, are necessary for sending and receiving data required for distributed memory and for ordering shared memory accesses. 1 while following the order specified by synchronous control.
It becomes possible to execute on one processor element at a time. Also, by registering the processor element to which the restart address was passed in the "data waiting state" in the startup table or queue and executing other processor elements, data arrived and it became possible to execute the processor element. In this case, the processor element below can be activated.

処理（４）により、ユーザープログラムに記述されたデ
ータ送受信や同期制御に従う範囲で，本来並列に実行す
べきプロセスの順番を様々に変化させることができる。Through process (4), the order of processes that should originally be executed in parallel can be varied in various ways within the scope of data transmission/reception and synchronization control described in the user program.

これにより、並列実行時に起こり得る様々な実行順序を
１台のプロセッサエレメントのみが稼動している状態で
実現させることができ、従来の１台のプロセッサエレメ
ントにおけるデバッグと類似の対話的に処理を一時中断
したり、データの出力を行うといった手法を使うことも
可能となる。As a result, various execution orders that can occur during parallel execution can be realized while only one processor element is running, and processing can be temporarily performed interactively similar to conventional debugging on one processor element. It is also possible to use methods such as interrupting or outputting data.

これらの機能により、各々の状況のトレースデータから
、並列に実行すると結果不正となる計算や、同期制御の
誤り（Ｓｅｎｄ／Ｒｅｃｉｅｖｅの抜けや，対応関係の
不一致）を検出することが可能となり、並列処理の異常
終了や結果不正の原因であるユーザープログラムのバグ
の除去を容易にすることができる。These functions make it possible to detect calculations that would result in incorrect results if executed in parallel, and errors in synchronization control (missing Send/Receive, mismatched correspondence relationships) from the trace data of each situation. It is possible to easily remove bugs in user programs that cause abnormal termination of processing or incorrect results.

以下、実施例の詳細を述べる。The details of the embodiment will be described below.

〔Example〕

以下、本発明の一実施例によるデバッグ処理制御システ
ムをローカル記憶を有する分散型並列計算機を対象とし
て添付図面を参照しつつ説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A debug processing control system according to an embodiment of the present invention will be described below with reference to the accompanying drawings, targeting a distributed parallel computer having local storage.

第２図は、デバッグ処理制御システムを含む並列計算機
システムの概要を示す。FIG. 2 shows an overview of a parallel computer system including a debug processing control system.

ホスト計算機１０は主記憶１２．Ｉ／ＯＰｒｏｃｅｓｓ
ｏｒ　１　４、インストラクションプロセッサ１６を有
する。主記憶には、実行プロセッサキュ−１２−１、終
了判定キュー１２−２、ｍａｉｎＰｒｏｇｒａｍ　１　
３　−　１　．通常処理用とデバッグ処理の２つのライ
ブラリ１３−２．１３−３が格納されている。Ｉ／Ｏデ
バイス１７はＩ／Ｏプロセッサ１４に接続されている。The host computer 10 has a main memory 12. I/O Process
or 1 4, and has an instruction processor 16. The main memory includes an execution processor queue 12-1, an end determination queue 12-2, and a mainProgram 1
3-1. Two libraries 13-2 and 13-3 are stored, one for normal processing and one for debug processing. I/O device 17 is connected to I/O processor 14 .

アレイコントローラ２２は，インストラクションプロセ
ッサ１６に接続されている。ホスト計算機１０がアレイ
コントローラ２２を通してｎ＋１個のプロセッサエレメ
ント２４−０〜２４−ｎを制御する。各プロセッサレエ
メントはローカル記憶を有し、プロセッサ番号が割当て
られている。Array controller 22 is connected to instruction processor 16. The host computer 10 controls n+1 processor elements 24-0 to 24-n through the array controller 22. Each processor element has local storage and is assigned a processor number.

各プロセッサエレメントはネットワーク２８を介して互
いにデータの送受信を行なう。各プロセッサエレメント
は他のプロセッサエレメントのローカル記憶２６−Ｏ〜
２６−ｎをアクセスすることはできないが、プロセッサ
エレメント群とホスト計算機を接続するアレイコントロ
ーラ２２を介してホスト計算機１０にある主記憶２１を
自由にアクセスすることができる。Each processor element sends and receives data to and from each other via the network 28. Each processor element has local memory 26-O of other processor elements.
26-n, but the main memory 21 in the host computer 10 can be freely accessed via the array controller 22 that connects the processor element group and the host computer.

第３図はこのような並列計算機システムに対するユーザ
プログラムのメインプログラムの一例を示す。メインプ
ログラム１３−１がホスト計算機によって実行されると
き、メインプログラム１３−１のなかの文３１により各
プロセッサエレメントにサブプログラムを主記憶１２か
らロードするためのロードルーチン３３がロードモジュ
ール名称即ち“ＳＵＢＩ”を引数として呼び出される。FIG. 3 shows an example of the main program of the user program for such a parallel computer system. When the main program 13-1 is executed by the host computer, a load routine 33 for loading a subprogram from the main memory 12 to each processor element by a statement 31 in the main program 13-1 is called a load module name, ie, "SUBI". ” is called as an argument.

次に，文３２により、プロセッサエレメント用サブプロ
グラムの入り口名“ＳＵＢ　１”を引数として、プロセ
ッサエレメントに起動をかけるための通常処理制御ルー
チン３４が呼び出される。Next, the statement 32 calls the normal processing control routine 34 for starting the processor element, using the processor element subprogram entry name "SUB 1" as an argument.

本発明では，第８図に示すように，通常処理時に全プロ
セッサエレメントを一斉に起動させるための通常処理制
御ルーチンを含むライブラリ１３−３と、デバッグ処理
に各プロセッサエレメントを順番に起動させるためのデ
バッグ処理制御ルーチンを含むライブラリ１３−２が用
意されている。As shown in FIG. 8, the present invention includes a library 13-3 containing a normal processing control routine for starting all processor elements at once during normal processing, and a library 13-3 containing a normal processing control routine for starting all processor elements in sequence during debug processing. A library 13-2 containing debug processing control routines is prepared.

２つのライブラリ１３−２と１３−３は、両制御ルーチ
ン除いて同じである。両ルーチンには同じ入口名が与え
られており、メインソースプログラム１５−１がコンパ
イルされた後、処理目的に従ってライブラリ１３−２ｏ
ｒ１３−３がコンパイルされたメインプログラム１５−２
にリンクされる。従ってデバッグ処理用にメインソース
プログラム１５−１が修正された後、コンパイルされラ
イブラリ１３−２をリンクされることは必要ない。また
、デバッグ処理終了後再びメインソースプログラム１５
−１が元に戻され、コンパイルされ，ライブラリ１３−
３とリンクされる必要もない。ユーザーは処理目的によ
ってライブラリを選択し，コンパイルメインプロダラム
とリンクすれば良い。これにより，デバッグ処理等を容
易に短時間で行なうことができる。The two libraries 13-2 and 13-3 are the same except for both control routines. Both routines are given the same entrance name, and after the main source program 15-1 is compiled, the library 13-2o is
Main program 15-2 with r 13-3 compiled
linked to. Therefore, after the main source program 15-1 is modified for debugging, it is not necessary to compile and link the library 13-2. In addition, after the debugging process is completed, the main source program 15 is
−1 is reverted, compiled, and library 13−
There is no need to link with 3. The user can select a library according to the processing purpose and link it with the compile main program. This allows debugging and the like to be performed easily and in a short time.

第１図は、デバッグ処理制御ルーチン３４の処理の概要
を示す。ここで用いている実行プロセッサキュー１２−
１の構成を第４図に示す。実行プロセッサキュー１２−
１の各要素は実行されるべきプロセッサの番号を示すプ
ロセッサ番号４１、そのプロセッサで実行されるすべき
サブプログラムの実行開始アドレス４２、及び次のキュ
ー要素を示すためのポインタ４３を含む。FIG. 1 shows an overview of the processing of the debug processing control routine 34. Execution processor queue 12 used here
The configuration of 1 is shown in FIG. Execution processor queue 12-
Each element of 1 includes a processor number 41 indicating the number of the processor to be executed, an execution start address 42 of the subprogram to be executed by that processor, and a pointer 43 for indicating the next queue element.

第１図のデバッグ処理制御ルーチン３４の概要を、第５
図のサブプログラムを各プロセッサエレメントが実行す
る場合を例にとって説明する。The outline of the debug processing control routine 34 in FIG.
An example in which each processor element executes the subprogram shown in the figure will be explained.

Ｆｉｇ．３のメインプログラムが実行され、文３１でロ
ードルーチン３３が実行され各サブプログラムが主記憶
にから対応するプロセッサエレメントのローカル記憶２
６−ｉ　（ｉ＝ｏ，ｎ）にロードされる。Fig. 3 is executed, a load routine 33 is executed at statement 31, and each subprogram is transferred from the main memory to the local memory 2 of the corresponding processor element.
6-i (i=o, n).

文５３はデータの送信のための送信処理ルーチンの呼び
出しを表す。文５３において、第１引数の“Ｎ　＋　１
　”はデータの送り先プロセッサエレメントのプロセッ
サ番号を示し、第２引数の“Ｎ＋１″は送信データを区
別するための識別子を表し，第３引数の゛′Ｒ″は送信
データを示す。文５４はデータの受信のための受信処理
ルーチンの呼び出しを表す。文５４において、第１引数
のＪ’　Ｎ　７７は、受信データを区別するための識別
子を表し、第２引数の“ｐ　Ｉ＋は受信データを示す。Statement 53 represents a call to a transmission processing routine for data transmission. In statement 53, the first argument “N + 1
" indicates the processor number of the destination processor element of the data, the second argument "N+1" indicates an identifier for distinguishing the transmitted data, and the third argument "'R" indicates the transmitted data. Statement 54 represents a call to a reception processing routine for receiving data. In the sentence 54, the first argument J' N 77 represents an identifier for distinguishing received data, and the second argument "p I+" represents received data.

この受信ルーチンが呼び出されたときに、データ“Ｐ”
が届いていない場合は、プロセッサエレメントは何もし
ないでデータ“Ｐ　ｌｊの到着を待つデータ待ち状態と
なる。文５６は，計算結果を出力するためのダンプルー
チンの呼び出しを表し、第１引数１（　Ｎ　ｌｊはプロ
セッサ番号を、第２引数“Ａ　”は出力配列を、第３引
数”Ｎ−１”と第４引数ＩＮ　３　１Ｆは出力配列の要
素と個数を表す。When this receive routine is called, the data “P”
If ``P lj'' has not arrived, the processor element does nothing and enters a data waiting state where it waits for the arrival of data ``P lj.'' Statement 56 represents a call to a dump routine to output the calculation result, and the first argument 1 (N lj represents the processor number, the second argument "A" represents the output array, and the third argument "N-1" and the fourth argument IN 3 1F represent the elements and number of the output array.

第５図のサブルーチン５０の仮引数ｊ（Ａ”とｉｔ　Ｎ
　Ｉ＋は、５１の宣言によりホスト計算機の主記憶にあ
ることが分かる。６Ｎ″はプロセッサ番号を表している
。プロセッサ番号の小さい順又は大きい順に優先順位を
割当てた時、このサブプログラム５０が各プロセッサエ
レメントにより実行される場合を，第６ＡとＢ図を参照
して説明する。Formal arguments j(A” and it N of subroutine 50 in FIG.
It can be seen that I+ is in the main memory of the host computer from the declaration 51. 6N" represents a processor number. The case where this subprogram 50 is executed by each processor element when priorities are assigned in order of decreasing or increasing processor number will be explained with reference to FIGS. 6A and 6B. do.

メインプログラム１３−１中の文３２が実行され，デバ
ッグ処理制御ルーチン３４が呼び出される。Statement 32 in main program 13-1 is executed and debug processing control routine 34 is called.

デバッグ処理制御ルーチン３４では、まずステップ１で
、ホスト計算１！１１０はサブプログラム５０の入り口
アドレスを実行開始アドレスとして実行プロセッサキュ
ー１２−１に全プロセッサエレメントのステータスを登
録する。プロセッサ番号の昇順の時には第６Ａ図の６１
、降順の時には第６Ｂ図の６５のようになる。終了判定
キュー１２−２には、データ送受信時のデッドロック状
態を検出するために、適当な時期例えばスタート時にお
ける実行プロセッサキュー１２−１の状態が保存される
。キュー１２−２は終了状態にないプロセッサエレメン
トの各々が，データの待ちのデッドロック状態か、通常
の実行状態かを、判定するために使用される。従って，
終了判定キュー１２−２も実行プロセッサキュー１２−
１と同様に第４図示すような形式をしている。In the debug processing control routine 34, first in step 1, the host calculation 1!110 registers the status of all processor elements in the execution processor queue 12-1 with the entry address of the subprogram 50 as the execution start address. 61 in Figure 6A when the processor numbers are in ascending order.
, in descending order, it becomes like 65 in FIG. 6B. The end determination queue 12-2 stores the state of the execution processor queue 12-1 at an appropriate time, for example, at the start, in order to detect a deadlock state during data transmission and reception. The queue 12-2 is used to determine whether each processor element that is not in the termination state is in a deadlock state waiting for data or in a normal execution state. Therefore,
The termination judgment queue 12-2 is also the execution processor queue 12-.
Similar to 1, it has the format shown in Figure 4.

ステップ２で実行プロセッサキュー１２−１に登録され
ているプロセッサエレメントの中から任意の１台が選ば
れ、これに対応するキュー要素がキューから削除され、
このプロセッサエレメントに起動をかける。In step 2, an arbitrary processor element is selected from among the processor elements registered in the execution processor queue 12-1, and the corresponding queue element is deleted from the queue.
Activate this processor element.

昇順の時はまずプロセッサ番号Ｏのプロセッサエレメン
トが選択され、これに起動をかけると文５４でデータ待
ちとなる。一方降順の時は、まずプロセッサ番号Ｎのプ
ロセッサエレメントが選択され起動される、実行プロセ
ッサキューからはプロセッサＮに対するキュー要素が削
除される。In ascending order, the processor element with processor number O is selected first, and when it is activated, it waits for data at statement 54. On the other hand, in descending order, the processor element with processor number N is selected and activated first, and the queue element for processor N is deleted from the execution processor queue.

ステップ１の初期化と、ステップ２の選び方により、各
プロセッサエレメント間の実行順序を変えることができ
る。この例のように単純にプロセッサ番号の昇順や降順
に実行プロセッサキューの要素を並べても良いし、ラン
ダムに並べたり、ある固有のプロセッサを先頭にするこ
ともできる。Depending on the initialization of step 1 and the selection of step 2, the execution order among the processor elements can be changed. The elements of the execution processor queue may be simply arranged in ascending or descending order of processor number as in this example, or they may be arranged randomly, or a certain unique processor may be placed at the top.

起動をかけられたプロセッサエレメントは、実行状況の
把握に必要な送受信データに関するデータや選択された
ローカルメモリにある変数や配列の値をトレースデータ
として出力しながら、データ待ちか終了となるまで実行
する。The activated processor element executes until it waits for data or terminates, while outputting data related to sent and received data necessary to understand the execution status and the values of variables and arrays in the selected local memory as trace data. .

プロセッサエレメントに起動をかけたら、ステップ３で
このプロセッサエレメントが「実行終了」か「データ待
ち」となった後、再実行アドレスが返されるまで待つ。Once the processor element has been activated, in step 3 the processor element waits until the re-execution address is returned after the processor element has "finished execution" or has become "waiting for data".

「実行終了」が返されたら、制御ステップ４を経由して
ステップ５に進む。ステップ５では実行プロセッサキュ
ー１２−１が空であればデバッグ処理を終了し、まだ実
行プロセッサキュー１２−１にプロセッサエレメントが
登録されていれば、ステップ９で終了判定キュー１２−
２を更新した後制御はステップ２に戻る。If "execution completed" is returned, the process proceeds to step 5 via control step 4. In step 5, if the execution processor queue 12-1 is empty, the debugging process is terminated, and if a processor element is still registered in the execution processor queue 12-1, in step 9, the termination judgment queue 12-1 is empty.
After updating step 2, control returns to step 2.

一方「データ待ち』が返された時は，制御ステップ４を
介してステップ６に進む。ステップ６では当該プロセッ
サエレメントのプロセッサ番号は開始アドレスとともに
実行プロセッサキュー１２−１の最後に登録される６したがって、昇順の場合にはステップ６で文５４のアト
レスが実行プロセッサキューの最後に登録され，実行プ
ロセッサキューは第６Ａ図の６２のようになる。On the other hand, when "waiting for data" is returned, the process proceeds to step 6 via control step 4. In step 6, the processor number of the processor element in question is registered at the end of the execution processor queue 12-1 along with the start address6. , in ascending order, the address of statement 54 is registered at the end of the execution processor queue in step 6, and the execution processor queue becomes as shown at 62 in FIG. 6A.

一方、ｌＩｉＪｉ＠の時はまず、プロセッサＮが処理を
実行するが、文５４でデータ待ちとなり，ステップ６で
実行プロセッサキューは第６Ｂ図の６６のようになる。On the other hand, when lIiJi@, processor N first executes the process, but waits for data at statement 54, and at step 6 the execution processor queue becomes as shown at 66 in FIG. 6B.

その後、ステップ７が実行される。Then step 7 is executed.

ステップ７と後続のステップ８でデバッグ処理の終了判
定が行なわれる。実行プロセスキュー１２−２に登録さ
れたプロセッサエレメントが終了判定キュー１２−２に
登録されたプロセッサエレメントと等しく、かつ、プロ
セッサエレメントごとに、両キューの開始アドレスが等
しければ、プロセッサエレメントが全て「データ待ち」
のため実行できるものがないという、デッドロック状態
になったことを示している。この場合にはこれらのプロ
セッサエレメントがデッドロック状態にあることを示す
データをＩ／Ｏデバイス１７に出力する。また，実行プ
ロセッサキュー１２−１と終了判定キュー１２−２のプ
ロセッサエレメントが異なるときは、全プロセッサエレ
メントの実行可能性が否定されたわけではないので，制
御はステップ９を介してステップ２に戻り，次のプロセ
ッサエレメントを起動させる。プロセソサエレメントの
開始アドレスが異なっていたり又はステップ５で実行プ
ロセッサキューが空でない場合には、ステップ９が実行
される。ステップ９では実行プロセッサキューの内容が
終了判定キューに登録される。また、昇順の例では終了
判定キュー１２−２は６２の状態に更新されて、制御は
ステップ２に戻り、次にプロセッサ１を起動する。プロ
セッサ１が実行されるときには、既にプロセッサ０が文
５３によりプロセッサｌが受信すべきデータを送信して
いるため、文５４でデータ待ちとはならずに、最後まで
処理を進め実行終了となる。従ってプロセッサ１は登録
されず、実行プロセッサキュー１２−１は第６Ａ図の６
３のようになるうこのようにしてデバッグ処理を進める
と、プロセッサＯをキュー１２−１に残してデバッグ処
理は終了する。In step 7 and subsequent step 8, it is determined whether the debugging process is complete. If the processor element registered in the execution process queue 12-2 is equal to the processor element registered in the termination determination queue 12-2, and the start addresses of both queues are the same for each processor element, then all the processor elements Wait”
This indicates a deadlock situation where nothing can be executed. In this case, data indicating that these processor elements are in a deadlock state is output to the I/O device 17. Furthermore, when the processor elements in the execution processor queue 12-1 and the completion determination queue 12-2 are different, the possibility of execution of all processor elements has not been denied, so control returns to step 2 via step 9. Starts the next processor element. If the starting addresses of the processor elements are different or if the execution processor queue is not empty in step 5, step 9 is executed. In step 9, the contents of the execution processor queue are registered in the termination determination queue. Further, in the example of ascending order, the termination determination queue 12-2 is updated to the state of 62, control returns to step 2, and the processor 1 is then activated. When processor 1 is executed, processor 0 has already transmitted the data to be received by processor 1 in statement 53, so the process proceeds to the end without waiting for data in statement 54 and ends the execution. Therefore, processor 1 is not registered and the execution processor queue 12-1 is 6 in FIG. 6A.
If the debugging process proceeds in the manner shown in 3, the debugging process ends with the processor O left in the queue 12-1.

降順の場合には、プロセッサＮの終了後のステップ２で
プロセッサＮ−１が起動される。この時も，並列実行状
況を把握するために、前述したようなトレースデータを
出力する。この場合は、文５４で受信されるべきデータ
は送信されていないため、全プロセッサエレメントを一
巡し、実行プロセッサキュー１２−１は第６Ｂ図の６７
のようになる。二巡目において、プロセッサＮを文５４
から開始すると、一巡目のプロセッサＮ−１の文５３に
より既に受信データが送信されているため、最後まで実
行し、終了する。In the case of descending order, processor N-1 is activated in step 2 after processor N terminates. At this time as well, the trace data described above is output in order to understand the parallel execution status. In this case, since the data to be received in statement 54 has not been sent, all processor elements are passed around, and the execution processor queue 12-1 is stored at 67 in FIG. 6B.
become that way. In the second round, processor N is
When starting from , the received data has already been transmitted by statement 53 of processor N-1 in the first round, so the process is executed to the end and ends.

従って実行プロセッサキュー１２−１は第６Ｂ図の６８
のようになる。このように、二巡目では次々にプロセッ
サエレメントにより受信データが受信され、最後にはプ
ロセッサＯのみが実行プロセッサキュー１２−１に残る
。Therefore, the execution processor queue 12-1 is 68 in FIG. 6B.
become that way. In this way, in the second round, the received data is received by the processor elements one after another, and finally only processor O remains in the execution processor queue 12-1.

プロセッサ番号の昇順に行なっても、降順に行なっても
、終了したとき実抑プロセッサキュー１２−１が空でな
い状態で，プロセッサ０がデータ待ちのままであること
が分かる。これは明らかにソースプログラムにバグがあ
ることを示し、他のプロセッサエレメントがデータを送
信するか，プロセッサ０の場合はデータ受信がないよう
にするか，いずれかの方法で誤りを修正しなければなら
ないことがわかる。Regardless of whether the process is performed in ascending order of processor numbers or in descending order, it can be seen that when the processing is completed, the actual suppressed processor queue 12-1 is not empty and processor 0 is still waiting for data. This clearly indicates that there is a bug in the source program, and the error must be corrected by either sending data to another processor element or, in the case of processor 0, not receiving data. I know it won't happen.

また，ホスト計算機１０の主記憶１２にあり、各プロセ
ッサエレメントにより定義され、参照されている配列Ａ
に関するトレースデータを文５６の実行時に採取すると
、昇順の場合には第７Ａ図の７０のようになり、降順の
場合には第７Ｂ図の７１のようになる。Also, the array A that exists in the main memory 12 of the host computer 10 and is defined and referenced by each processor element.
If the trace data related to the above is collected when the statement 56 is executed, it will be as shown in 70 in FIG. 7A in the case of ascending order, and as shown in 71 in FIG. 7B in the case of descending order.

昇順の場合には、第７Ａ図７０に示すとおり、たとえば
，プロセッサ２で参照される配列Ａ（１）はプロセッサ
１により定義された値である。このように、直前に起動
されたプロセッサにより定義された値が用いられている
。しかし、降順に実行すると，第７Ｂ図７１に示すとお
り、各プロセッサエレメントは，サブプログラム“ＳＵ
ＢＩ”が実行される直前の配列Ａの値を参照している。In the case of ascending order, for example, array A(1) referenced by processor 2 is a value defined by processor 1, as shown in FIG. 7A. In this way, the values defined by the most recently activated processor are used. However, when executed in descending order, each processor element executes the subprogram "SU" as shown in FIG.
The value of array A immediately before execution of "BI" is referenced.

全てのプロセッサエレメントが配列Ａの参照を終えたの
ち、各プロセッサエレメントは、配列Ａの値を定義して
いる。この結果、データ送受信による順を守っても、昇
順の場合と，降順の場合とでは計算結果が異なる。各プ
ロセッサエレメントを並列に動かして毎回計算結果が異
なるのは、主記憶の配列Ａの定義参照順序が不定である
ことが原因であることが分かる。従って主記憶の配列Ａ
の定義参照順序を固定する同期制御を挿入して、ソース
プログラムを修正する必要があることを検出できる。After all processor elements finish referencing array A, each processor element defines the value of array A. As a result, even if the order of data transmission and reception is maintained, the calculation results will differ between ascending order and descending order. It can be seen that the reason why the calculation results differ each time when the processor elements are operated in parallel is that the definition reference order of the array A in the main memory is undefined. Therefore, main memory array A
It is possible to detect the need to modify the source program by inserting synchronization control that fixes the definition reference order.

〔Effect of the invention〕

本発明によれば、以上の実施例では、ローカルメモリを
有するプロセッサ・エレメントからなる並列計算機につ
いて述べたが、共有メモリを介して接続されるプロセッ
サ・エレメントからなる並列計算機においては、上記実
施例における「データ送受信」のかわりに「同期制御」
を含む処理の不良を同様にして検出できる。According to the present invention, in the above embodiments, a parallel computer consisting of processor elements having local memory was described, but in a parallel computer consisting of processor elements connected via a shared memory, the above embodiments can be used. "Synchronous control" instead of "data transmission and reception"
Detection of processing failures, including errors, can be detected in the same way.

並列に動く複数のプロセッサエレメントに対して、プロ
セッサエレメントごとに提供される、デ一夕送受信や同
期制御を含むサブプログラムにバグがある場合、ソース
プログラムを再コンパイルすることなく、デバッグ処理
制御ルーチンをコンパイルソースプログラムにリンクす
るだけで、データ送受信や同期制御によって指定された
順序を守る範囲でプロセッサエレメントを１台ずつ、様
々な順序で実行させることができる。これにより，様々
な実行順序におけるトレースデータを採取して解析する
と，実行順序によって計算結果が異なるデータ送受信も
れや同期漏れなどの不良や，デッドロック状態の原因を
検出することができる３尚、上記のように、実行プロセ
スキューにプロセッサ要素を登録する順番については種
々の方法が考えられる。基本的にはユーザーの任意であ
るが、デハッグの目的やデバッグ対象プログラムに合わ
せて次のようなアルゴリズムが考えられる。If there is a bug in a subprogram including data transmission/reception and synchronization control provided for each processor element for multiple processor elements running in parallel, you can run the debug processing control routine without recompiling the source program. By simply linking to a compiled source program, processor elements can be executed one by one in various orders as long as the order specified by data transmission/reception and synchronization control is maintained. As a result, by collecting and analyzing trace data in various execution orders, it is possible to detect defects such as data transmission/reception omissions and synchronization omissions, where calculation results differ depending on the execution order, and causes of deadlock conditions. As described above, various methods can be considered for the order in which processor elements are registered in the execution process queue. Basically, it is up to the user's choice, but the following algorithms can be considered depending on the purpose of debugging and the program to be debugged.

プロセス実行順序なランダムである場合には、実行開始
時間に関して乱数を発生させて、実行順序を決定する方
法がある。一方，デバッグ処理時間を短くする必要があ
るときは、データ送信を多く含むプロセスが優先して実
行されるように実行順序を決定することが望ましい。If the process execution order is random, there is a method of determining the execution order by generating random numbers regarding the execution start time. On the other hand, when it is necessary to shorten the debug processing time, it is desirable to determine the execution order so that processes that involve a large amount of data transmission are executed with priority.

また本実施例では終了判定キューが使用されているが，
そのキューを使用しなくとも各プロセノサエレメントに
より対応するサブプログラムが実行され終了したことが
検出されれば良い。例えば，対応するサブプログラムの
実行を終了したプロセッサエレメントのプロセッサ番号
を単に記憶するだけでも良い。この場合、上記例のよう
にプロセッサエレメントが残ったときは、再度当該プロ
セッサエレメントを起動させ，その結果が前回と同じで
あれば、終了と判定すれば良い。In addition, although a termination judgment queue is used in this example,
Even if the queue is not used, it is sufficient to detect that the corresponding subprogram has been executed and completed by each processor element. For example, it is possible to simply store the processor number of the processor element that has finished executing the corresponding subprogram. In this case, if a processor element remains as in the above example, it is sufficient to activate the processor element again, and if the result is the same as the previous time, it may be determined that the process has ended.

更に、本実施例では、実行プロセッサキューが使用され
ているが，単なるテーブルを使用することができる事も
明らかである。この場合には、第９図に示されるように
、ｍａｉｎ　ｍｅｍｏｒｙ　ｕｎｉｔ　１　２内に実行
プロセッサテーブル１２−５、終了判定テーブル１２−
６、及びアドレス発生器１２−７を用意される。テーブ
ル１２−５の各要素は各プロセッサエレメントに対応し
、対応するプロセッサエレメントが実行待ちかどうかを
示すフラグと再開アドレスを格納する。アドレス発生器
１２−７が、各テーブルの要素を上述のように指定し、
その実行結果に従ってフラグと再開アドレスを更新する
ように，アドレスを発生すれば良い。Furthermore, although in this embodiment an execution processor queue is used, it is clear that a simple table could also be used. In this case, as shown in FIG. 9, the execution processor table 12-5 and the termination determination table 12-
6, and an address generator 12-7 are provided. Each element of table 12-5 corresponds to each processor element, and stores a flag indicating whether the corresponding processor element is waiting for execution or not, and a restart address. The address generator 12-7 specifies the elements of each table as described above,
It is sufficient to generate an address so that the flag and restart address are updated according to the execution result.

[Brief explanation of drawings]

第１図は本発明の一実施例のデバッグ処理制御方法を説
明するためのフローチャート、第２図は本発明が適用さ
れる並列計算機システムの構成図、第３図はホスト計算
機用ユーザプログラムを示す，第４図は実行プロセッサ
キューの構成図、第５図は各プロセッサエレメントにロ
ードされるサブプログラムを示す。第６Ａと６Ｂ図は第
３図のユーザプログラムが実行された時の実行プロセッ
サキューの状態を示した図、第７Ａと７Ｂ図は第３図の
ユーザプログラムが実行された時の実行順序と主記憶ア
クセスを示した図、第８図はユーザプログラムのメイン
プログラムをコンパイルしリンクするときの様子を説明
するための図。第９図はキューの代りにテーブルを使用
する場合の実施例の構成を示す図である。FIG. 1 is a flowchart for explaining a debug processing control method according to an embodiment of the present invention, FIG. 2 is a configuration diagram of a parallel computer system to which the present invention is applied, and FIG. 3 shows a user program for a host computer. , FIG. 4 is a configuration diagram of an execution processor queue, and FIG. 5 shows subprograms loaded into each processor element. Figures 6A and 6B are diagrams showing the state of the execution processor queue when the user program in Figure 3 is executed, and Figures 7A and 7B are diagrams showing the execution order and main status when the user program in Figure 3 is executed. FIG. 8 is a diagram showing storage access, and is a diagram for explaining how the main program of the user program is compiled and linked. FIG. 9 is a diagram showing the configuration of an embodiment in which a table is used instead of a queue.

Claims

[Scope of Claims] 1. A program execution method for sequentially executing a plurality of programs of the system in a parallel computer system, which comprises the following steps. (a) registering the identifiers of a plurality of processors in a predetermined order in a pending queue;
(c) As a result of executing any program on any processor, when that program reaches the execution completion state, the identifier of that processor is waited for as described above. When the program is removed from the queue and becomes in a data waiting state for data to be given from another processor while the program is running, execution of the program is stopped and the processor's identifier is set as the last element in the pending queue. and (d) repeating steps (b) and (c). 2. The program execution method according to item 1, wherein step (d) comprises repeating steps (b) and (c) until the identifier in the execution queue does not change. 3. The program execution method according to item 2, further comprising the step of outputting the identifier left in the execution queue as the identifier of the processor that executed the program with the bug. 4. The program execution method according to claim 1, wherein the order is arbitrarily determined. 5. The program execution method according to claim 4, wherein the order is determined according to the number of data transmission processes included in each program. 6. The program execution method according to claim 4, wherein the order is determined by generating random numbers based on the execution time of each program. 7. A program execution method according to claim 1, wherein the processors sequentially selected from the execution queue execute the corresponding programs until the processors are in either the data wait state or the processing end state. 8. A program execution method according to claim 1, wherein the executing step includes the following steps. When the executing processor enters the data waiting state, registering the identifier and restart address of the processor in the execution waiting queue as the last element, and corresponding to each processor before and after execution of the corresponding program by the processor. It is determined that there is no executable processor when there is no change in the contents of each element of the execution queue. 9. In claim 1, the step of determining the program containing the bug includes determining a statement that causes the bug in the program. 10. A debugging method for a parallel computer system, which consists of the following steps. (a) registering the identifiers of a plurality of processors in a predetermined first order in an execution queue;
(c) As a result of executing any program on any processor, when that program reaches the execution completion state, the identifier of that processor is executed as described above. When the program is removed from the waiting queue and becomes in a data waiting state for data to be given from another processor while the program is running, execution of the program is stopped and the identifier of that processor is placed at the end of the waiting queue. (d) repeat steps (b) and (c). The step (d) includes the steps (b) and () until the identifier in the pending queue no longer changes.
Repeat c) to send and receive data, synchronization control data,
outputting the values of arrays and variables as trace data; (e) registering the identifiers of the plurality of processors in the execution queue in a predetermined second order after the execution step is completed; , (e) 1 according to the order of the identifiers registered in the queue.
(g) As a result of executing any program on any processor, when that program reaches the execution completion state, the identifier of that processor is executed as described above. When the program is removed from the waiting queue and becomes in a data waiting state for data to be given from another processor while the program is running, execution of the program is stopped and the processor's identifier is placed at the end of the waiting queue. (h) repeat steps (e) and (f); The step (h) repeats the steps (e) and () until the identifier in the pending queue no longer changes.
Repeat f) to send and receive data, synchronization control data,
11. The program execution method according to claim 10, wherein the first and second orders are arbitrarily determined, comprising: outputting values of arrays and variables as trace data. 12. The program execution method according to claim 11, wherein the first and second orders are determined according to the number of data transmission processes included in each program. 13. A program execution method according to claim 11, wherein the first and second orders are determined by generating random numbers based on execution times of each program. 14. The program execution method according to claim 10, wherein the processor executes the corresponding program until the processor enters either the data waiting state or the processing end state. 15. A program execution method according to claim 10, wherein the executing step includes the following steps. When the program enters the data waiting state, registering the identifier and restart address of the processor in the execution queue as the last element; To determine that there is no executable processor when there is no change in the contents of each element in the pending queue. 16. In claim 9, the step of determining the program containing the bug includes the following steps. Determine the statement that causes the bug. 17. A method for controlling debug processing in a parallel computer system, including the following steps. setting a flag and an execution start address for each element of the execution waiting table, each element of the execution waiting table corresponding to one of a plurality of processors, and setting the flag of the corresponding processor and the execution start address; is stored, and the flag indicates whether the corresponding processor is in the execution waiting state. While updating the execution start address of the processor in the data waiting state, the flag corresponding to the executable processor is stored in the execution waiting table. referencing the elements of the execution waiting table in a predetermined order until the flag is no longer set, causing the processors to execute the corresponding program one by one from the execution start address; determining a program having a bug from the elements of the execution waiting table; 18. The program execution method according to claim 17, wherein the order is arbitrarily determined. 19. The program execution method according to claim 18, wherein the program execution method is determined according to the number of data transmission processes included in each program. 20. The program execution method according to claim 18, wherein a random number is generated based on the execution time of each program to determine the order. 21. In claim 17, the executing step includes the following steps. A program execution method in which a processor executing the program executes the program until it enters either a data waiting state or a processing end state. 22. A program execution method according to claim 17, wherein the executing step includes the following steps. To determine that there is no executable processor when there is no change in the contents of table elements corresponding to each processor before and after execution of a corresponding program by a certain processor. 23. In claim 17, the step of executing comprises the following steps. Setting flags and execution start addresses for each element of the execution waiting table, and updating the execution start addresses of the processors in the data waiting state, while setting flags corresponding to executable processors in the execution waiting table. referring to the elements of the execution waiting table according to different predetermined orders until no more programs are executed, causing the processors to execute the corresponding programs one by one from the execution start address; and in the predetermined order. Outputting first and second trace data during execution of the step of executing the program and the step of executing the program in a predetermined different order, respectively; and detecting bugs from the first and second trace data. Determine which programs you have. 24. In claim 17, the step of determining the program containing the bug includes the step of determining a statement that causes the bug. 23 A method for improving the workability of debug processing includes the following steps: providing first and second libraries containing a normal processing control routine and a debug processing control routine, respectively; and both routines having the same routine name. However, the first and second libraries are the same except for the normal processing control routine and the debug processing control routine, and the main program is compiled and the first library is added to the main program compiled during normal processing. linking the second library during debugging; and executing the linked main program.