JP2001005688A

JP2001005688A - Debugging support device for parallel program

Info

Publication number: JP2001005688A
Application number: JP11177771A
Authority: JP
Inventors: Shiyunsuke Mizumi; 俊介水見; Takashi Takahashi; 俊高橋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1999-06-24
Filing date: 1999-06-24
Publication date: 2001-01-12

Abstract

PROBLEM TO BE SOLVED: To make a series of debugging WRITE operations semiautomatic and to make easily specifiable the causing place of a bug by automatically performing calculation and result comparison for check-sum processing at all specific positions. SOLUTION: A syntax analysis part 4 analyzes inputted parallel source programs and records a position where a breakpoint is inserted and an array name of a check-sum object. A source core comparison part 6 analyzes the correspondence relation between a sequential program and a program of parallel version, records the breakpoint and the position right after the update of a target array, and embeds a STOP sentence at the breakpoint and a process code for performing a series of check-sum processing right after the update of array data. A result comparison function insertion part 8 inserts instruction sentences for implementing three prescribed functions at prescribed positions of the parallel program, executes the sequential program as to source codes 9 and 10 to be outputted to receive target data 13, and executes the parallel object 11 while using the received data part of input data to obtain result comparison output data 14.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は複数のプロセッサで
構成される並列計算機用に作成されたプログラムのデバ
ッグ方式に係わり、科学技術分野における広範囲の数値
解析プログラムに対して、並列化されたプログラムの作
成を効果的に支援する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a debugging method for a program created for a parallel computer composed of a plurality of processors. The present invention relates to a device that effectively supports creation.

【０００２】[0002]

【従来の技術】現在、大規模技術計算は並列計算機上に
移行しつつある。それに伴い、これまでに蓄積された膨
大な量の技術計算プログラムの並列化作業も行われてい
る。ところが、並列計算機のハードウエア構成は、通常
の1プロセッサからなる計算機と大きく異なるため、当
然実行時の挙動も異なる。このことが並列プログラム中
のバグ位置の特定を大変困難なものとし、並列プログラ
ムの作成に多大な時間がかかる主な要因となっていた。2. Description of the Related Art At present, large-scale technical computing is shifting to parallel computers. Along with this, parallelization work of a huge amount of technical calculation programs accumulated so far is also being performed. However, since the hardware configuration of a parallel computer is significantly different from that of a normal computer having one processor, the behavior at the time of execution is naturally different. This makes it very difficult to identify the location of a bug in a parallel program, and has been a major factor that takes a lot of time to create a parallel program.

【０００３】従来のシステムにも、ndbなどのソースコ
ードレベルで作業を行えるシンボリックデバッガなどが
存在する。しかし、そのようなツールでは、異常終了し
た場所や、プログラムが暴走を起こした場所の特定を行
うことはできるが、そうしたバグの原因となった個所を
特定することは困難であった。また、並列プログラムの
バグには、異常終了は起こらないもののオリジナルと最
終結果だけが微妙に異なる、といったものが良く見られ
る。[0003] In a conventional system, there is a symbolic debugger or the like which can work at a source code level such as ndb. However, while such tools can identify where a crash has occurred or where a program has runaway, it has been difficult to identify the locations that caused such bugs. It is also common to find bugs in parallel programs where the abnormal termination does not occur but only the final result differs slightly from the original.

【０００４】[0004]

【発明が解決しようとする課題】ところが、こうしたバ
グの原因を特定するのは非常に困難であり、シンボリッ
クデバッガなどのツールはあまり有効ではなかった。通
常このような場合、プログラム開発者が当てを付けた位
置にデバッグ用のWRITE文を挿入し、配列変数の値を出
力させ、それとオリジナルの結果とを突き合わせてバグ
の位置を特定する、と言った作業が行われてきた。しか
し、この作業は一つ一つ手作業で行われるため、非常に
多くの手間と時間が必要であり、並列化作業そのものの
効率を非常に悪いものにしているという問題があった。However, it is very difficult to identify the cause of such a bug, and tools such as a symbolic debugger have not been very effective. Usually, in such a case, the program developer inserts a debugging WRITE statement at the position assigned, outputs the value of the array variable, matches it with the original result, and specifies the position of the bug. Work has been done. However, since this work is manually performed one by one, a great deal of labor and time are required, and there is a problem that the efficiency of the parallelization work itself is extremely poor.

【０００５】本発明の目的は、上記の一連のデバッグWR
ITE作業を半自動化し、並列プログラムのバグの原因と
なった、プログラム中の処理の個所を容易に特定できる
プログラムデバッグ方法および支援装置を提供すること
である。[0005] The object of the present invention is to provide a series of debug WRs as described above.
An object of the present invention is to provide a program debugging method and a support device that semi-automates ITE work and can easily identify a processing location in a program that has caused a bug in a parallel program.

【０００６】[0006]

【課題を解決するための手段】そこで、上記目的を達成
するために、本発明では、下記のことを行う。In order to achieve the above object, the present invention performs the following.

【０００７】並列プログラム中のブレークポイン
トと結果を調べたい配列を、指示文をプログラム中に挿
入する、あるいはウインドウ画面上でのマウス操作など
を介して指定する。A breakpoint in a parallel program and an array whose result is to be checked are specified by inserting a directive into the program or by operating a mouse on a window screen.

【０００８】ただし、ここで言うブレークポイントと
は、ユーザが任意に指定したプログラム上の位置であ
り、この位置までプログラムを実行させることを意味す
る。[0008] However, the break point here is a position on the program arbitrarily specified by the user, and means that the program is executed up to this position.

【０００９】デバッグ支援装置はプログラム全体
の構文解析をすることにより、プログラムのTREE（木構
造）図を作成し、で指定したブレークポイントのプロ
グラム全体に対する位置と、指定された配列変数の値が
更新される個所全てを特定する。The debug support apparatus creates a TREE (tree structure) diagram of the program by analyzing the syntax of the entire program, and updates the position of the breakpoint specified in the entire program and the value of the specified array variable. Identify all places where

【００１０】デバッグ支援装置は、で特定され
た、指定変数が更新される全ての個所の直後に、通常チ
ェックサムと呼ばれる、目的の配列の総和計算とその結
果の出力を行うためのコマンドを挿入する。また、ブレ
ークポイントにはSTOP文を挿入する。The debug support apparatus inserts a command, usually called a checksum, for calculating the sum of a target array and outputting the result, immediately after all the places where the specified variables are updated, specified by I do. Also, a STOP statement is inserted at the breakpoint.

【００１１】デバッグ支援装置は、オリジナルの
逐次プログラムと並列プログラムの対応関係を調べ、オ
リジナルソースコードにも、と同様の処理を施す。The debug support device checks the correspondence between the original sequential program and the parallel program, and performs the same processing on the original source code.

【００１２】オリジナル版と並列版のプログラム
をコンパイル実行し、オリジナルと並列版のチェックサ
ムの結果をで指定したブレークポイントまで記録し、
結果に違いがあった個所の情報も記録または画面上に出
力する。あるいは、逐次版と並列版のチェックサムを逐
一比較し、結果が異なった個所でプログラムの実行を一
時中断させる。当然この場合には処理，においてそ
のようにIF文を加えておく。Compile and execute the original version and the parallel version of the program, record the results of the checksum of the original version and the parallel version up to the breakpoint specified by,
The information of the places where the results differ is also recorded or output on the screen. Alternatively, the checksums of the serial version and the parallel version are compared one by one, and the execution of the program is temporarily stopped at a place where the result is different. Naturally, in this case, the IF statement is added in the processing.

【００１３】本発明によれば従来一つ一つ手作業で行わ
なくてはならなかったデバッグ用のWRITE文挿入作業を
自動化し、かつ、もれなく計算機に行わせることができ
るので、ユーザの負担を著しく軽減でき、並列プログラ
ムの開発期間を大幅に短縮できる。According to the present invention, the task of inserting a WRITE statement for debugging, which had to be performed manually one by one in the past, can be automated and completely performed by a computer, so that the burden on the user is reduced. This can significantly reduce the time required for parallel program development.

【００１４】[0014]

【発明の実施の形態】以下、本発明によるデバッグ方式
を、添付図面を参照して説明する。図１は本発明による
デバッグ方式の全体構造の１例である。これは入力デー
タとしての逐次および並列ソースプログラム１，２と５
つの機能部を持つデバッガ３と、実行し結果を出力する
オブジェクトコード11，12から構成される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A debugging system according to the present invention will be described below with reference to the accompanying drawings. FIG. 1 shows an example of the overall structure of a debugging system according to the present invention. These are the sequential and parallel source programs 1, 2, and 5 as input data.
It comprises a debugger 3 having two functional units, and object codes 11 and 12 for executing and outputting a result.

【００１５】まず、ユーザが並列プログラム１の希望位
置に、ブレークポイントとチェックサム対象の配列を指
定するための指示行を挿入する。First, the user inserts a breakpoint and an instruction line for designating a check sum target array at a desired position in the parallel program 1.

【００１６】次に、オリジナルの逐次プログラム１と指
示行を挿入したデバッグ対象の並列プログラム２を入力
ファイルとしてデバッガ３を起動する。構文解析部４は
入力された並列ソースプログラムの文字列を解析し、ブ
レークポイントの挿入された位置とチェックサム対象の
配列名を記憶装置上に記録する。Next, the debugger 3 is started with the original sequential program 1 and the parallel program 2 to be debugged into which the instruction line is inserted, as input files. The syntax analysis unit 4 analyzes the input character string of the parallel source program, and records the position where the breakpoint is inserted and the name of the array to be checksummed on the storage device.

【００１７】次のTREE構造解析部５ではメインプログラ
ムから始まり、サブルーチンコールを通じてプログラム
の終端まで通じるプログラムのTREE(木)構造を解析し、
系統図を作成する。このTREE構造はサブルーチンコール
を処理の始めから順に追っていくことにより知ることが
でき、この機能を持った、市販ソフトも広く流通してい
る。一方、チェックサム対象の配列に対しては、直前に
作成された系統図に従い処理の最初からブレークポイン
トに至るまでの間にデータの更新が行われる個所を全て
特定し、更新直後の位置を記憶装置上に記録しておく。
これらの配列は、途中COMMONブロック中のデータとし
て、あるいはサブルーチンコールの引数としてのいずれ
かでサブルーチン間を引き渡されていくので、これらを
系統図に沿ってたどれば、たとえサブルーチン間で異な
る変数名が使用されていたとしても、元が同じであるこ
とを確定できる。The next TREE structure analysis section 5 analyzes the TREE (tree) structure of the program starting from the main program and reaching the end of the program through a subroutine call.
Create a system diagram. This TREE structure can be known by tracing subroutine calls in order from the beginning of processing, and commercially available software with this function is widely distributed. On the other hand, for the checksum target array, all locations where data update is performed from the beginning of processing to the breakpoint are specified according to the system diagram created immediately before, and the position immediately after update is stored. Record it on the device.
These arrays are passed between subroutines either as data in the COMMON block or as arguments of subroutine calls, so if these are followed along the system diagram, even if variable names differ between subroutines Is used, it can be determined that the elements are the same.

【００１８】ソースコード比較部６ではオリジナルの逐
次プログラムと並列版のプログラムとの対応関係を解析
し、ブレークポイントと目的とする配列の更新直後の位
置を記録する。一般に、逐次プログラムから並列プログ
ラムを作成する場合、チューニングやアルゴリズムの変
更は前もって逐次プログラムの段階で行う。したがっ
て、通常２つのプログラムの主な違いは、ループカウン
タの最大値、配列サイズ通信を行うために加えたサブル
ーチンとサブルーチンコール程度であり、意図的に違い
が出るように作成しない限り互いのプログラムの対応関
係を見つけるのは困難ではない。ここで用いる、類似し
た２つのプログラム(文字列)間の相違点や、対応する行
を見つけ出す技術は既存技術であり、多くのエディタや
OSに組み込まれている。The source code comparing section 6 analyzes the correspondence between the original sequential program and the parallel version of the program, and records the breakpoint and the position immediately after updating the target array. Generally, when a parallel program is created from a sequential program, tuning and algorithm change are performed in advance at the stage of the sequential program. Therefore, the main differences between the two programs are usually the maximum value of the loop counter and the subroutine and subroutine call added to perform the array size communication. Finding the correspondence is not difficult. The technology used to find differences between two similar programs (character strings) and corresponding lines is an existing technology, and many editors and
Built into the OS.

【００１９】逐次プログラムと並列プログラムにおい
て、上記ステップで記録した個所に対して、ブレークポ
イントにはSTOP文を、また、配列データの更新直後には
一連のチェックサム処理を行わせるための処理コードを
埋め込む。In the sequential program and the parallel program, a STOP statement is provided for a break point at a location recorded in the above step, and a processing code for performing a series of checksum processing immediately after updating the array data. Embed.

【００２０】結果比較関数挿入部では、次の３つの機能
をさせるための命令文を並列プログラムの所定の位置に
挿入する。The result comparison function insertion unit inserts a command statement for performing the following three functions at a predetermined position in the parallel program.

【００２１】(1)逐次プログラムの実行結果から得られ
た配列チェックサムの出力データを、ローカルデータと
してまとめて格納する機能を持たせたサブルーチンを作
成し、メインルーチンの実行文の始めにコールする。(1) Create a subroutine having a function to collectively store the output data of the array checksum obtained from the execution result of the sequential program as local data, and call it at the beginning of the execution statement of the main routine. .

【００２２】(2)(1)で格納した逐次プログラムの結果デ
ータから必要なデータを取り出すための命令文をチェッ
クサム処理部分の直後に挿入する (3)逐次プログラムと並列プログラムのチェックサム結
果を照合して、異なっていた場合にはそこで実行を停止
させるための命令文を、チェックサム処理部分の直後に
挿入する。(2) Insert a statement for extracting necessary data from the result data of the sequential program stored in (1) immediately after the checksum processing part. (3) Check the results of the checksum of the sequential program and the parallel program. After checking, if they differ, a statement for stopping the execution is inserted immediately after the checksum processing part.

【００２３】ここまでの一連の処理により出力されるの
は通常のソースコード９，10なので、通常の逐次プログ
ラム、あるいは並列プログラムをコンパイル実行する手
順で処理する。ただし、この例の場合は、始めに逐次プ
ログラムを所定の位置まで実行させて目的のデータ13を
取り、それを入力データの一部として並列用オブジェク
ト11を実行させ、最終目的の結果比較出力データ14を得
る。この実施例では、使用する計算機として分散メモリ
型並列計算機を想定して説明したが、同様の動作をさせ
るだけであるなら、計算機のアーキテクチャには全く依
存していないので、共有メモリ型並列計算機であっても
良く、さらに単一プロセッサの計算機で模擬させること
も可能である。The normal source codes 9 and 10 are output by a series of processing up to this point, so that the processing is performed in a procedure for compiling and executing a normal sequential program or a parallel program. However, in the case of this example, first, the sequential program is executed up to a predetermined position, the target data 13 is obtained, the object data 13 is executed as a part of the input data, and the final target result comparison output data is obtained. You get 14. In this embodiment, a distributed memory type parallel computer has been assumed as the computer to be used. However, if only the same operation is performed, the shared memory type parallel computer does not depend on the computer architecture at all. Alternatively, the simulation can be performed by a single-processor computer.

【００２４】ここで、上で解説した処理の流れを、FORT
RANプログラムを例にしてより詳細に説明する。図２は
逐次プログラムのコアの１部分である。これをJの次元
でデータおよびループを分割し並列化したのが図３であ
る。図３では逐次プログラムとの違いを太字で表してあ
る。また、!csumの文字で始まる行は、チェックサムを
させるための指示行で、プログラムのこの位置まで実行
し、その間に現れる配列u、あるいはサブルーチン間で
名前は異なるが対応する配列の値が更新される個所の直
後全てに、チェックサム処理用の命令パターンを挿入し
ていくことを意味する。Here, the processing flow described above is described in FORT
This will be described in more detail using a RAN program as an example. FIG. 2 shows a part of the core of the sequential program. FIG. 3 shows the result obtained by dividing the data and the loop in the dimension of J and parallelizing them. In FIG. 3, the difference from the sequential program is shown in bold. The line starting with the character! Csum is an instruction line for performing a checksum. The program is executed up to this position in the program. This means that an instruction pattern for checksum processing is inserted immediately after all of the positions.

【００２５】図４はこのことをプログラム全体の骨組み
構造を用いて説明したものである。サブルーチンsub1で
はuと言う名の配列は、これをコールしたメインルーチ
ン内ではaと言う名の配列で宣言されている。配列uとa
が同じ物であることは、前述のように、サブルーチンコ
ールの引数の対応関係を解析すれはわかることであり、
したがって、これらの配列が更新される個所全てにチェ
ックサム処理用の命令パターンを埋め込んでいくことに
なる。FIG. 4 explains this using the framework structure of the entire program. In the subroutine sub1, the array named u is declared as an array named a in the main routine that called it. Arrays u and a
Are the same thing, as described above, it can be understood by analyzing the correspondence between the arguments of the subroutine call.
Therefore, an instruction pattern for checksum processing is embedded in all locations where these arrays are updated.

【００２６】図４b2のサブルーチンは逐次プログラムの
結果データをローカル配列に読み込み保存し、必要なら
いつでも呼び出せるようにするためのサブルーチンであ
る。この例では、キーワードとして用いている文字変数
の値によりデータの入出力を制御している。The subroutine shown in FIG. 4b2 is a subroutine for reading and storing the result data of the sequential program in a local array so that it can be called whenever necessary. In this example, data input / output is controlled by the value of a character variable used as a keyword.

【００２７】次に、一連のチェックサムに関わる処理を
させるためのアルゴリズムの一例を、図５に示す。また
図６は、これをより具体的にFORTRANのプログラムで表
したものである。まず、チェックサム計算を行いたい配
列に対して、プロセッサごとの部分和を求めるa1。次に
それらの部分和に対して、プロセッサ間での総和を求め
るa2。図６では、これをMPI_ALLGETHERという関数を用
いて実現している。ちなみにここで用いたMPIという文
字で始まる関数は、事実上の世界標準である、並列プロ
グラム用通信関数ライブラリMPIで提供されている、組
み込み関数の一つである。図６の場合、各プロセッサで
計算された個別のu_sumの値をプロセッサ間で総和し、
結果(total_u)を全てのプロセッサに所持させる機能を
持つ。Next, FIG. 5 shows an example of an algorithm for performing a process related to a series of checksums. FIG. 6 shows this more specifically by a FORTRAN program. First, a1 is obtained for a partial sum for each processor for an array for which checksum calculation is to be performed. Next, for those partial sums, the sum between processors is calculated a2. In FIG. 6, this is realized using a function called MPI_ALLGETHER. By the way, the function that starts with the letters MPI used here is one of the built-in functions provided by the MPI, a communication function library for parallel programs, which is a de facto global standard. In the case of FIG. 6, the individual u_sum values calculated by each processor are summed up between the processors,
It has a function that all processors have the result (total_u).

【００２８】次に、逐次プログラムと並列プログラムで
得た結果を比較する上での、誤差の許容範囲を定めるa
3。通常、逐次版と並列版では、数学的に等価である範
囲内の計算順序の違いを生じる。したがって、逐次版と
並列版では、たし合わせる順序が異なっていたりするた
め、まるめ誤差の影響で結果が完全には一致しない。そ
れをどの程度まで許容するのかをここで設定する。Next, an allowable error range for comparing the results obtained by the sequential program and the parallel program is determined.
3. Normally, the sequential version and the parallel version produce a difference in the calculation order within a range that is mathematically equivalent. Therefore, the results of the sequential version and the parallel version do not completely match due to rounding errors because the order in which they are combined may be different. The extent to which this is allowed is set here.

【００２９】次に、逐次プログラム上の対応する位置で
の、対応する配列のチェックサム結果を取得するa4。実
際には図４b1，b2のようにメインルーチンの始めに、デ
ータをローカルに保持するための専用サブルーチンをコ
ールし、そこで逐次計算の結果をまとめてファイルから
読み込み、専用の配列に格納しておき、データを取り出
す時も必要に応じてこのサブルーチンを呼び出ようにす
る。図４のb2では、キーワード(put get)等をサブルー
チンコールの引数として指定することにより、結果デー
タをローカル配列に格納したり、取り出したりできるよ
うに想定してある。Next, the checksum result of the corresponding array at the corresponding position on the sequential program is obtained (a4). Actually, at the beginning of the main routine, a dedicated subroutine for holding data locally is called as shown in FIGS. 4b1 and b2, and the results of the sequential calculation are read together from a file and stored in a dedicated array. This subroutine is called as needed when data is taken out. In b2 of FIG. 4, it is assumed that the result data can be stored in or retrieved from the local array by specifying a keyword (put get) or the like as an argument of the subroutine call.

【００３０】次に、逐次プログラムでの結果と並列プロ
グラムでの結果を比較しa5、その差が上a3で指定した許
容範囲内であれば次の計算へ進む。また、許容範囲外で
あれば自プロセッサの識別番号を取得するa6。ここで
も、図６ではMPIの組み込み関数を用いて実現してい
る。Next, the result of the sequential program and the result of the parallel program are compared with each other. If the difference is within the allowable range specified by the above a3, the process proceeds to the next calculation. If the value is outside the allowable range, the identification number of the own processor is acquired a6. Again, in FIG. 6, this is realized by using a built-in function of MPI.

【００３１】次に自プロセッサがルートプロセッサ(こ
れはどのプロセッサでも良いが通常は0番プロセッサ)で
あるかどうか識別させるa7。ルートであればそのサブル
ーチンの名前と行番号などを出力したのち実行を停止、
そうでなければ直ちに実行を停止させる。Next, a7 is made to identify whether the own processor is a root processor (this can be any processor, but is usually the 0th processor). If it is a root, execution is stopped after outputting the name and line number of the subroutine,
Otherwise, stop execution immediately.

【００３２】図７は本発明におけるデバッガの別の実施
例である。これは実施例１の発展型であり、基本的な流
れは前記実施例と同じである。ただし、ここでは逐次プ
ログラムと並列プログラムの両方を一つに取り込み、新
たにタスク並列型の並列プログラムを作成し、N台のプ
ロセッサを用いて行っていた計算をN+1台で行わせると
ころが大きく異なる。タスク並列処理とは、互いに直接
の依存関係がなく独立に計算できる複数の処理を別々の
プロセッサに処理させることを意味する。この場合は逐
次プログラムと並列プログラムは基本的には独立のプロ
グラムなので別々のプロセッサで同時に実行させてもな
にも問題はない。FIG. 7 shows another embodiment of the debugger according to the present invention. This is an extension of the first embodiment, and the basic flow is the same as that of the first embodiment. However, here, both the sequential program and the parallel program are combined into one, a new task parallel type parallel program is created, and the calculation that was performed using N processors is performed by N + 1 different. Task parallel processing means that a plurality of processes that can be independently calculated without direct dependency are processed by different processors. In this case, since the sequential program and the parallel program are basically independent programs, there is no problem if they are executed simultaneously by different processors.

【００３３】プログラム方法としては、一般にマスタ・
スレーブ方式と呼ばれている方法が使える。これは、任
意に選ばれた一つのマスタプロセッサに、プログラム全
体を管理させ、その他のプロセッサをスレーブとして自
分の管理下に置き、必要に応じてスレーブに処理をさせ
る方法である。As a programming method, generally, a master
A method called a slave method can be used. This is a method in which one arbitrarily selected master processor manages the entire program, places the other processors under its own control as slaves, and causes the slaves to perform processing as necessary.

【００３４】ここではマスタプロセッサに逐次プログラ
ムを実行させ、複数のスレーブ役のプロセッサに並列プ
ログラムを実行させる。Here, the master processor is caused to execute a sequential program, and a plurality of slave role processors are caused to execute a parallel program.

【００３５】図８はさらに別の実施例である。これはさ
らに実施例２の発展型である。2番目の実施例との大き
な違いは、タスク並列プログラムの実行部分もデバッガ
の一部として取り込み、ユーザの対話的な指示により、
実行途中で一時的に止めたり、進めたりできるようにし
ている点である。FIG. 8 shows still another embodiment. This is a further development of the second embodiment. The major difference from the second embodiment is that the execution part of the task parallel program is also included as part of the debugger,
The point is that you can temporarily stop or advance during execution.

【００３６】モニタ上で常に逐次プログラムを並列プロ
グラムの実行位置を負いつつ、結果が異なった位置で実
行を停止させ、修正、再コンパイル実行を、常にモニタ
上でプログラムの位置を確認しながら繰り返す。但し、
対話形式のデバッグ実行は、並列環境を模擬させること
により単一プロセッサ上でも可能である。While taking the execution position of the parallel program on the monitor, the execution of the parallel program is always stopped at a position where the result is different, and the correction and recompilation are repeated while always confirming the position of the program on the monitor. However,
Interactive debug execution is possible on a single processor by simulating a parallel environment.

【００３７】図９に、本発明のデバッガを適用して並列
プログラムをデバッグするデバッグ支援システムを示
す。中央演算装置，，がネットワーク50により接
続され、並列演算可能にシステム構成されている。中央
演算装置はデバッガ兼用計算機40で、内部記憶媒体42
に格納されている自動並列化コンパイラを有し、またデ
ィスプレイ41、データ入力装置４3、データ出力装置44
を具備している。中央演算装置のデバッガでデバッグ
に必要な命令文を付加された逐次プログラム１0と並列
プログラム11は、自身の内部記憶媒体42と、中央演算装
置45，46の内部記憶媒体に実装され、並列演算される。FIG. 9 shows a debugging support system for debugging a parallel program by applying the debugger of the present invention. The central processing units are connected by a network 50, and the system is configured to be capable of performing parallel operations. The central processing unit is a debugger / computer 40, and an internal storage medium 42.
And a display 41, a data input device 43, and a data output device 44.
Is provided. The sequential program 10 and the parallel program 11 to which a statement necessary for debugging is added by the debugger of the central processing unit are mounted on its own internal storage medium 42 and the internal storage media of the central processing units 45 and 46, and are subjected to parallel operation. You.

【００３８】図10に、計算機40のディスプレイ41の画面
を示す。画面上に図２、図３で例示した逐次および並列
プログラムが表示され、並列プログラムでのブレークポ
イント位置とチェックサム対象の配列を図示のようにマ
ウスにより指定して、チックサム処理のための指定文を
自動挿入することができる。FIG. 10 shows a screen of the display 41 of the computer 40. The sequential and parallel programs illustrated in FIGS. 2 and 3 are displayed on the screen, and a breakpoint position in the parallel program and a sequence of a checksum are designated by a mouse as shown in FIG. Can be automatically inserted.

【００３９】[0039]

【発明の効果】本発明によれば、従来手作業で行われ、
並列プログラム開発の大半の時間を要していたデバッグ
作業を半自動化することができ、短時間のうちに確実に
バグの原因となった位置を特定することが可能であるた
め、並列プログラム作成の効率を大幅に上げることがで
きる。According to the present invention, the operation is conventionally performed manually,
Debug work, which took most of the time for parallel program development, can be semi-automated, and the location that caused the bug can be reliably identified in a short time. Efficiency can be greatly increased.

[Brief description of the drawings]

【図１】本発明の一実施例デバッガのブロック図であ
る。FIG. 1 is a block diagram of a debugger according to an embodiment of the present invention.

【図２】逐次プログラムのコア部分の一例を示す図であ
る。FIG. 2 is a diagram illustrating an example of a core portion of a sequential program.

【図３】図２を並列化した例を示す図である。FIG. 3 is a diagram showing an example in which FIG. 2 is parallelized.

【図４】チェックサム挿入場所の説明図である。FIG. 4 is an explanatory diagram of a checksum insertion place.

【図５】チェックサム処理関係のフローチャートであ
る。FIG. 5 is a flowchart related to checksum processing.

【図６】チェックサム処理の具体例を示す図である。FIG. 6 is a diagram showing a specific example of a checksum process.

【図７】本発明の第２の実施例を示すブロック図であ
る。FIG. 7 is a block diagram showing a second embodiment of the present invention.

【図８】本発明の第３の実施例を示すブロック図であ
る。FIG. 8 is a block diagram showing a third embodiment of the present invention.

【図９】本発明を実施するデバッグ支援装置と、ネット
ワークで接続される他の並列計算機とを示す適用システ
ム構成図である。FIG. 9 is a configuration diagram of an application system showing a debug support device embodying the present invention and another parallel computer connected via a network.

【図10】本発明を実施する対話型デバッグ支援装置を示
す図である。FIG. 10 is a diagram illustrating an interactive debug support device that embodies the present invention.

[Explanation of symbols]

３…デバッガ、７…チェックサム挿入部、10…逐次プロ
グラム、11…並列プログラム、40…デバッガ兼用計算
機、41…ディスプレイ、42…内部記憶媒体、43…データ
入力装置、44…データ出力装置、50…ネットワーク。3 debugger, 7 checksum insertion unit, 10 sequential program, 11 parallel program, 40 debugger / computer, 41 display, 42 internal storage medium, 43 data input device, 44 data output device, 50 …network.

Claims

[Claims]

1. A debugger for debugging a parallel program parallelized so that it can be distributed and executed by a plurality of processors based on a source program described in a sequential process. A TREE structure analysis unit that analyzes the tree structure of the program and creates a systematic diagram of the entire program, a source code comparison unit that analyzes the correspondence between sequential programs and parallel programs, and a sequential program. A checksum insertion unit that embeds a predetermined pattern for calculating the sum of the array data for both parallel programs, and a result comparison that embeds a predetermined pattern for comparing the values of both obtained and recording the result It is composed of an instruction insertion section, and automatically calculates and calculates the results for checksum processing at all predetermined positions. A debugger for parallel programs, characterized by performing a comparison.

2. A program according to claim 1, wherein both the sequential program and the parallel program are fetched as a part of one task parallel program, the programs are executed in parallel, and the results of the checksum calculation at the corresponding locations of both programs are read one by one. A debugger characterized in that it can be compared and recorded.

3. A debugger having a function according to claim 1 or 2, wherein a computer having a processor, a storage device, and an input / output device is used. A debug support device for a parallel program, wherein a command statement to be embedded is configured to be specified on a screen.

4. The method according to claim 3, wherein a series of debugging operations such as starting, executing, displaying a result, and correcting a debugger can be performed in real time and interactively on a screen by starting the debugger once. Debugging support device for parallel programs.