JP2011203994A

JP2011203994A - Analysis device, analysis method and analysis program

Info

Publication number: JP2011203994A
Application number: JP2010070504A
Authority: JP
Inventors: Naoki Sueyasu; 直樹末安; Yasuyuki Ono; 康行大野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-03-25
Filing date: 2010-03-25
Publication date: 2011-10-13
Anticipated expiration: 2030-03-25
Also published as: JP5526914B2

Abstract

PROBLEM TO BE SOLVED: To provide an improved sampling method.SOLUTION: An analysis device 30 acquires sampling data obtained, by sampling waiting time of own process at communication with other processes for the communications of own process with the other process from processors 11-14. The analysis device 30, having acquired the sampling data, determines for each of the plurality of processes a total of waiting time of own process with the other processes and a total of waiting time of the other process at the communications of the other process with own process, analyzes the results, and evaluates the processing distribution state with respect to the plurality of processings.

Description

本発明は、解析装置、解析方法および解析プログラムに関する。 The present invention relates to an analysis apparatus, an analysis method, and an analysis program.

近年、複数のプロセッサをインターコネクト装置（Interconnect）を用いて接続したシステムである分散メモリ型マルチプロセッサが開発されている。分散メモリ型マルチプロセッサは、各プロセッサにそれぞれプロセスを割り当て、プロセス間でＭＰＩ（The Message Passing Interface Standard）などの通信ライブラリを使って相互に通信することで、全体の分散並列処理を行う。 In recent years, a distributed memory multiprocessor, which is a system in which a plurality of processors are connected using an interconnect device (Interconnect), has been developed. The distributed memory multiprocessor assigns a process to each processor and communicates with each other using a communication library such as MPI (The Message Passing Interface Standard) to perform the entire distributed parallel processing.

かかる分散並列処理では、各プロセスに負荷を適切に配分することで性能の向上が図られる。このため、分散メモリ型マルチプロセッサ並列計算機で動作するアプリケーションプログラム（分散並列処理プログラム）は、性能特性指標の項目として、負荷バランスの均一性を有する。この負荷バランスの均一性を簡易かつ精密に観測し、プログラムの性能特性を正確に分析することが期待される。 In such distributed parallel processing, performance is improved by appropriately allocating a load to each process. For this reason, an application program (distributed parallel processing program) operating on a distributed memory multiprocessor parallel computer has load balance uniformity as an item of performance characteristic index. It is expected that the uniformity of the load balance will be observed easily and precisely, and the performance characteristics of the program will be analyzed accurately.

分散並列処理プログラムの各プロセスは、それぞれ計算処理を行い、その後、相互に通信処理を行う。通信処理では、通信相手の計算処理が終わっていない場合には通信待ちが発生する。したがって、負荷バランスの均一性を測定するためには、通信待ちの発生状況を把握することが有効である。 Each process of the distributed parallel processing program performs calculation processing and then performs communication processing with each other. In communication processing, communication waiting occurs when calculation processing of the communication partner is not completed. Therefore, in order to measure the uniformity of the load balance, it is effective to grasp the state of occurrence of waiting for communication.

プログラムの性能分析手法としては、トレースログ方式やサンプリング方式が知られている。トレースログ方式は、プログラム実行時に、各種イベントが発生する度に時刻等の付随する情報とともにイベントの発生をログに出力する。そして、プログラムの走行後、ログを解析することで、各種性能分析を行う手法である。トレースログ方式は、各種イベントの発生時刻や回数を正確に記録することができる点が長所であるが、出力されるログの量が膨大であり、長時間にわたるプログラム実行を分析することができない。また、ログの出力自体が性能特性に影響する場合がある。 As a program performance analysis method, a trace log method and a sampling method are known. The trace log method outputs the occurrence of an event to a log together with accompanying information such as time each time various events occur during program execution. This is a technique for analyzing various performances by analyzing logs after running the program. The trace log method is advantageous in that the time and number of occurrences of various events can be accurately recorded. However, the amount of logs to be output is enormous and the program execution over a long time cannot be analyzed. In addition, the log output itself may affect the performance characteristics.

サンプリング方式では、プログラム実行時に、一定の時間間隔ごとにプログラムの走行状況等が確認され、記録される。そして、プログラムの走行後、記録を統計的に解析することで各種性能分析がなされる。サンプリング方式は、トレースログ方式に比して記録の量を抑えることができ、長時間にわたるプログラム実行を分析することができる。また、外乱が少ないため、性能特性への影響も少ない。 In the sampling method, when the program is executed, the running state of the program is confirmed and recorded at regular time intervals. Then, after running the program, various performance analyzes are performed by statistically analyzing the records. The sampling method can suppress the amount of recording as compared with the trace log method, and can analyze program execution over a long period of time. In addition, since there is little disturbance, there is little influence on the performance characteristics.

特開平５−２５０３３９号公報JP-A-5-250339 特開平６−５９９４４号公報JP-A-6-59944 特開２００４−３４１７５０号公報JP 2004-341750 A 特開平６−８３６０８号公報JP-A-6-83608 特開２００７−２０７１７３号公報JP 2007-207173 A

しかしながら、サンプリング方式では、各種イベントの発生時刻や回数を正確に知ることができない。従って、従来のサンプリング方式によるプログラムの性能分析では、通信待ち状況を悪化させている原因の特定も難しい。 However, in the sampling method, it is impossible to accurately know the time and number of occurrences of various events. Therefore, in the performance analysis of the program by the conventional sampling method, it is difficult to identify the cause of the deterioration of the communication waiting state.

そこで、１つの側面では、本発明は、改善されたサンプリング方式を提供することを目的とする。 Thus, in one aspect, the present invention aims to provide an improved sampling scheme.

１つの案では、解析装置、解析方法および解析プログラムは、分散並列処理される複数のプロセスの各々について、他のプロセスとの間の通信の際の自プロセスの待機時間をサンプリングしたサンプリングデータを取得し、サンプリングデータに基づいて、複数のプロセスの各々について、他のプロセスとの間の自プロセスの待機時間の合計と、他のプロセスと自プロセスとの間の通信の際の他のプロセスの待機時間の合計を求め、その結果を解析して複数のプロセスに対する処理の配分状態を評価する。 In one proposal, the analysis apparatus, the analysis method, and the analysis program acquire sampling data obtained by sampling the waiting time of the own process at the time of communication with other processes for each of a plurality of processes that are distributed and processed in parallel. Based on the sampling data, for each of the plurality of processes, the total waiting time of the own process between the other process and the waiting time of the other process during the communication between the other process and the own process. The total time is obtained, and the result is analyzed to evaluate the distribution of processing to a plurality of processes.

本発明によれば、改善されたサンプリング方式を提供することができる。 According to the present invention, an improved sampling method can be provided.

図１は、実施例に係る分散メモリ型マルチプロセッサ並列計算機の構成図である。FIG. 1 is a configuration diagram of a distributed memory multiprocessor parallel computer according to an embodiment. 図２は、プロセス間の通信についての説明図である。FIG. 2 is an explanatory diagram of communication between processes. 図３は、分散プロセスの処理についての説明図である。FIG. 3 is an explanatory diagram of the distributed process. 図４は、サンプリングデータの比較例の説明図である。FIG. 4 is an explanatory diagram of a comparative example of sampling data. 図５は、サンプリングデータＤ０に対応する処理状態の説明図である。FIG. 5 is an explanatory diagram of a processing state corresponding to the sampling data D0. 図６は、プロセッサ１１〜１４における処理動作を説明するフローチャートである。FIG. 6 is a flowchart for explaining the processing operation in the processors 11 to 14. 図７は、解析装置３０の処理動作を説明するフローチャートである。FIG. 7 is a flowchart for explaining the processing operation of the analysis device 30. 図８は、集計データＤ２の具体例の説明図である。FIG. 8 is an explanatory diagram of a specific example of the total data D2. 図９は、解析部３３による解析についての説明図である。FIG. 9 is an explanatory diagram for the analysis by the analysis unit 33. 図１０は、関数ごとの評価を行う場合のプロセッサ１１〜１４における処理動作を説明するフローチャートである。FIG. 10 is a flowchart for explaining the processing operation in the processors 11 to 14 when the evaluation for each function is performed. 図１１は、サンプリング（Ｓ６００）の詳細について説明するフローチャートである。FIG. 11 is a flowchart illustrating details of sampling (S600). 図１２は、関数について評価する場合の解析装置３０の処理動作を説明するフローチャートである。FIG. 12 is a flowchart for explaining the processing operation of the analysis apparatus 30 when evaluating a function. 図１３は、待ち状況マトリクス、関数毎マトリクス、集計マトリクスの具体例の説明図である。FIG. 13 is an explanatory diagram of specific examples of the waiting situation matrix, the function matrix, and the aggregation matrix. 図１４は、集計マトリクスからの評価と修正指針の作成についての説明図である。FIG. 14 is an explanatory diagram for the evaluation from the aggregation matrix and the creation of a correction guideline.

以下に、本願の開示する解析装置、解析方法および解析プログラムを図面に基づいて詳細に説明する。なお、以下の具体的な実施例に本発明を限定するものではない。 Hereinafter, an analysis device, an analysis method, and an analysis program disclosed in the present application will be described in detail with reference to the drawings. The present invention is not limited to the following specific examples.

［システムの構成］
図１は、実施例に係る分散メモリ型マルチプロセッサ並列計算機の構成図である。図１に示した例では、プロセッサ１１〜１４がインターコネクト装置（interconnect）２１に接続され、プロセッサ１１〜１４はインターコネクト装置２１を介して相互に通信可能である。 [System configuration]
FIG. 1 is a configuration diagram of a distributed memory multiprocessor parallel computer according to an embodiment. In the example shown in FIG. 1, the processors 11 to 14 are connected to an interconnect device (interconnect) 21, and the processors 11 to 14 can communicate with each other via the interconnect device 21.

プロセッサ１１〜１４は、それぞれが１または複数のプロセスを実行する。図１に示した例では、プロセッサ１１がアプリケーションプロセスＰａ１を実行し、プロセッサ１２がアプリケーションプロセスＰａ２を実行している。同様に、プロセッサ１３がアプリケーションプロセスＰａ３を実行し、プロセッサ１４がアプリケーションプロセスＰａ４を実行している。 Each of the processors 11 to 14 executes one or a plurality of processes. In the example shown in FIG. 1, the processor 11 executes the application process Pa1, and the processor 12 executes the application process Pa2. Similarly, the processor 13 executes the application process Pa3, and the processor 14 executes the application process Pa4.

アプリケーションプロセスＰａ１〜Ｐａ４は、アプリケーションプログラムのプロセスを分散して割り当てたプロセスであり、プロセッサ１１〜１４によって並列して処理される。プロセッサ１１〜１４は、自らに配分されたプロセスを計算処理した後、インターコネクト装置２１を介して通信し、処理を同期する。 Application processes Pa1 to Pa4 are processes in which application program processes are distributed and allocated, and are processed in parallel by the processors 11 to 14. The processors 11 to 14 perform calculation processing on the processes allocated to themselves, and then communicate via the interconnect device 21 to synchronize the processing.

図２は、プロセス間の通信についての説明図である。図２に示した例では、プロセスＰａ１において、プロセスＰａ２に対する送信処理ＭＰＩ＿Ｓｅｎｄが発生している。プロセスＰａ２は、プロセスＰａ１からの通信を受信処理ＭＰＩ＿Ｒｅｃｖによって受信する。その後、同様に、プロセスＰａ２において、プロセスＰａ１に対する送信処理ＭＰＩ＿Ｓｅｎｄが発生し、プロセスＰａ１は受信処理ＭＰＩ＿Ｒｅｃｖによって受信をおこなっている。さらに、プロセスＰａ１とプロセスＰａ２でＭＰＩ＿Ｂａｒｒｉｅｒによる同期を行っている。 FIG. 2 is an explanatory diagram of communication between processes. In the example shown in FIG. 2, the transmission process MPI_Send for the process Pa2 occurs in the process Pa1. The process Pa2 receives the communication from the process Pa1 by the reception process MPI_Recv. Thereafter, similarly, in the process Pa2, a transmission process MPI_Send for the process Pa1 occurs, and the process Pa1 receives by the reception process MPI_Recv. Further, the process Pa1 and the process Pa2 perform synchronization by MPI_Barrier.

図３は、分散プロセスの処理についての説明図である。図３に示した例では、プロセスＰａ１が自らに割り当てられた処理を８０ｍｓかけて計算した後、２０ｍｓの通信処理で同期している。これに対し、プロセスＰａ２は、自らに割り当てられた計算処理を４０ｍｓで終了し、プロセスＰａ１の計算処理終了を待つ通信待ち状態が４０ｍｓ発生している。 FIG. 3 is an explanatory diagram of the distributed process. In the example shown in FIG. 3, the process Pa1 calculates the process assigned to itself over 80 ms, and then synchronizes with the communication process of 20 ms. On the other hand, the process Pa2 ends the calculation process assigned to itself in 40 ms, and a communication wait state for waiting for the end of the calculation process of the process Pa1 occurs for 40 ms.

同様に、プロセスＰａ３は、自らに割り当てられた計算処理を２０ｍｓで終了し、プロセスＰａ１の計算処理終了を待つ通信待ち状態が６０ｍｓ発生している。また、プロセスＰａ４は、自らに割り当てられた計算処理を６０ｍｓで終了し、プロセスＰａ１の計算処理終了を待つ通信待ち状態が２０ｍｓ発生している。 Similarly, the process Pa3 ends the calculation process assigned to itself in 20 ms, and a communication waiting state for waiting for the end of the calculation process of the process Pa1 occurs for 60 ms. In addition, the process Pa4 ends the calculation process assigned to itself in 60 ms, and a communication waiting state for waiting for the end of the calculation process of the process Pa1 occurs for 20 ms.

図３に示した例では、プロセスＰａ１の計算処理が重いために、プロセスＰａ２〜Ｐａ４が待たされる状態となっている。このため、プロセスＰａ１に割り当てていた処理を他のプロセスに割り当てることとすれば、負荷バランスを向上することができる。 In the example shown in FIG. 3, since the calculation process of the process Pa1 is heavy, the processes Pa2 to Pa4 are in a waiting state. For this reason, if the process assigned to the process Pa1 is assigned to another process, the load balance can be improved.

図４は、サンプリングデータの比較例の説明図である。図４に示したように、分散したプロセスＰａ１〜Ｐａ４からサンプリングによって取得したサンプリングデータＤ０は、各プロセスの計算処理コスト、通信待ちコスト、通信処理コストを取得している。 FIG. 4 is an explanatory diagram of a comparative example of sampling data. As shown in FIG. 4, the sampling data D0 acquired by sampling from the distributed processes Pa1 to Pa4 acquires the calculation processing cost, communication waiting cost, and communication processing cost of each process.

このサンプリングデータＤ０から、各プロセスが計算、通信待ち、通信処理にどれだけの時間を割いたかを知ることができる。図４に示した例では、プロセスＰａ３の計算処理コストが他のプロセスに比して高いため、プロセスＰａ３が各プロセスの通信待ち状況を乱していると考えることができる。しかし、プロセスＰａ３も通信待ちコストを持っており、他のプロセスの計算終了を待っていることから、他のプロセスによって処理が遅らさせている可能性もある。 From this sampling data D0, it is possible to know how much time each process has spent for calculation, waiting for communication, and communication processing. In the example shown in FIG. 4, since the calculation processing cost of the process Pa3 is higher than that of other processes, it can be considered that the process Pa3 disturbs the communication waiting state of each process. However, since the process Pa3 also has a communication waiting cost and is waiting for the completion of calculation of another process, there is a possibility that the process is delayed by another process.

図５は、サンプリングデータＤ０に対応する処理状態の説明図である。図５に示した例では、１マスがサンプリングデータＤ０のコスト１０に対応している。図５に示したように、同期タイミングｔ１までの間、プロセスＰａ４はコスト１０の計算処理を実行し、コスト５０の通信待ちをしている。同様に、同期タイミングｔ１までの間、プロセスＰａ１はコスト５０の計算処理を実行し、コスト１０の通信待ちをしている。また、同期タイミングｔ１までの間、プロセスＰａ２はコスト４０の計算処理を実行し、コスト２０の通信待ちをしており、プロセスＰａ３はコスト６０の計算処理を実行している。 FIG. 5 is an explanatory diagram of a processing state corresponding to the sampling data D0. In the example shown in FIG. 5, one square corresponds to the cost 10 of the sampling data D0. As shown in FIG. 5, until the synchronization timing t <b> 1, the process Pa <b> 4 executes a calculation process of the cost 10 and waits for the communication of the cost 50. Similarly, until the synchronization timing t1, the process Pa1 executes the calculation process of the cost 50 and waits for the communication of the cost 10. In addition, until the synchronization timing t1, the process Pa2 executes the calculation process of the cost 40, waits for the communication of the cost 20, and the process Pa3 executes the calculation process of the cost 60.

同期タイミングｔ１から同期タイミングｔ２までの間、プロセスＰａ１〜Ｐａ３は、コスト１０の計算処理を実行し、コスト３０の通信待ちをしている。そして、同期タイミングｔ１から同期タイミングｔ２までの間、プロセスＰａ４は、コスト４０の計算処理を実行している。 During the period from the synchronization timing t1 to the synchronization timing t2, the processes Pa1 to Pa3 execute the calculation process of the cost 10 and wait for the communication of the cost 30. And the process Pa4 is performing the calculation process of the cost 40 from the synchronous timing t1 to the synchronous timing t2.

すなわち、図５に示した例では、プロセスＰａ３の計算コストが高く、全体の通信待ち状況を悪化させているものの、最も通信待ち状況に悪影響を与えているのは同期タイミングｔ１から同期タイミングｔ２までのプロセスＰａ４である。このように、負荷がばらつく、すなわち計算処理の量にブレがあることで、他のプロセスを待たせるプロセスが存在した場合、サンプリングデータＤ０が有する計算処理コスト、通信待ちコスト、通信処理コストのデータから通信待ち状況を正確に評価することができない。 That is, in the example shown in FIG. 5, although the calculation cost of the process Pa3 is high and the overall communication waiting state is deteriorated, it is from the synchronization timing t1 to the synchronization timing t2 that most adversely affects the communication waiting state. Process Pa4. As described above, when there is a process in which the load varies, that is, there is a fluctuation in the amount of calculation processing, and there is a process that waits for another process, data of the calculation processing cost, the communication waiting cost, and the communication processing cost that the sampling data D0 has The communication waiting status cannot be accurately evaluated.

そこで、図１に示した解析装置３０は、分散並列処理される複数のプロセスの各々について、他のプロセスとの間の通信の際の自プロセスの待機時間をサンプリングしたサンプリングデータを取得する。解析装置３０は、取得したサンプリングデータに基づいて、複数のプロセスの各々について、他のプロセスとの間の自プロセスの待機時間の合計と、他のプロセスと自プロセスとの間の通信の際の他のプロセスの待機時間、すなわち被待機時間の合計を求める集計を行い、集計結果を解析して複数のプロセスに対する処理の配分状態を評価する。 Therefore, the analysis device 30 shown in FIG. 1 acquires sampling data obtained by sampling the waiting time of the own process at the time of communication with other processes for each of a plurality of processes that are distributed and processed in parallel. Based on the acquired sampling data, the analysis device 30 determines, for each of the plurality of processes, the total waiting time of the own process between the other processes and the communication between the other processes and the own process. Aggregation for obtaining the waiting time of other processes, that is, the total waiting time is performed, and the result of the aggregation is analyzed to evaluate the distribution state of processing for a plurality of processes.

このため、図１に示したように、解析装置３０は、インターコネクト装置２１と接続し、インターコネクト装置２１を介してプロセッサ１１〜１４と通信する。なお、図１では、解析装置３０を一つの装置として実施する場合の構成を示したが、例えば、プロセッサ１１〜１４のいずれかが解析装置３０の機能を実現するプログラムを実行してもよいし、解析用のプロセッサを別途接続してもよい。 Therefore, as illustrated in FIG. 1, the analysis device 30 is connected to the interconnect device 21 and communicates with the processors 11 to 14 via the interconnect device 21. In FIG. 1, the configuration in which the analysis device 30 is implemented as one device is shown, but for example, any of the processors 11 to 14 may execute a program that realizes the function of the analysis device 30. A processor for analysis may be connected separately.

プロセッサ１１は、アプリケーションプロセスＰａ１に加えてサンプリングスレッドＰｓ１を実行する。同様に、プロセッサ１２は、アプリケーションプロセスＰａ２に加えてサンプリングスレッドＰｓ２を実行する。また、プロセッサ１３はアプリケーションプロセスＰａ３に加えてサンプリングスレッドＰｓ３を実行し、プロセッサ１４はアプリケーションプロセスＰａ４に加えてサンプリングスレッドＰｓ４を実行する。 The processor 11 executes the sampling thread Ps1 in addition to the application process Pa1. Similarly, the processor 12 executes the sampling thread Ps2 in addition to the application process Pa2. The processor 13 executes the sampling thread Ps3 in addition to the application process Pa3, and the processor 14 executes the sampling thread Ps4 in addition to the application process Pa4.

サンプリングスレッドＰｓ１は、一定の時間間隔ごとにアプリケーションプロセスＰａ１の走行状況等を確認して記録する。この時、サンプリングスレッドＰｓ１は、アプリケーションプロセスＰａ１が通信待ち状態に入ってから通信待ち状態が解除されるまでの間の時間コストを、通信相手ごとに積算してサンプリングデータＤ１ａを作成する。 The sampling thread Ps1 confirms and records the running status of the application process Pa1 at regular time intervals. At this time, the sampling thread Ps1 creates sampling data D1a by accumulating the time cost from when the application process Pa1 enters the communication waiting state to when the communication waiting state is released for each communication partner.

図１に示した例では、サンプリングデータＤ１ａは、送信先プロセスＰａ２に対する通信待ちコストが０、送信先プロセスＰａ３に対する通信待ちコストが１０、送信先プロセスＰａ４に対する通信待ちコストが３０であったことを示している。 In the example shown in FIG. 1, the sampling data D1a indicates that the communication waiting cost for the destination process Pa2 is 0, the communication waiting cost for the destination process Pa3 is 10, and the communication waiting cost for the destination process Pa4 is 30. Show.

同様に、サンプリングスレッドＰｓ２は、一定の時間間隔ごとにアプリケーションプロセスＰａ２の走行状況等を確認し、通信待ち状態に入ってから通信待ち状態が解除されるまでの間の時間コストを、通信相手ごとに積算してサンプリングデータＤ１ｂを作成する。 Similarly, the sampling thread Ps2 confirms the running status of the application process Pa2 at regular time intervals, and calculates the time cost from entering the communication wait state until the communication wait state is released for each communication partner. To create sampling data D1b.

また、サンプリングスレッドＰｓ３は、一定の時間間隔ごとにアプリケーションプロセスＰａ３の走行状況等を確認し、通信待ち状態に入ってから通信待ち状態が解除されるまでの間の時間コストを、通信相手ごとに積算してサンプリングデータＤ１ｃを作成する。 In addition, the sampling thread Ps3 checks the running status of the application process Pa3 at regular time intervals, and calculates the time cost from entering the communication waiting state until the communication waiting state is canceled for each communication partner. Integration is performed to create sampling data D1c.

そして、サンプリングスレッドＰｓ４は、一定の時間間隔ごとにアプリケーションプロセスＰａ４の走行状況等を確認し、通信待ち状態に入ってから通信待ち状態が解除されるまでの間の時間コストを、通信相手ごとに積算してサンプリングデータＤ１ｄを作成する。 The sampling thread Ps4 checks the running status of the application process Pa4 at regular time intervals, and calculates the time cost from entering the communication waiting state until the communication waiting state is released for each communication partner. Integration is performed to create sampling data D1d.

解析装置３０は、データ取得部３１、データ集計部３２および解析部３３を有する。データ取得部３１は、サンプリングスレッドＰｓ１〜Ｐｓ４からサンプリングデータＤ１ａ，Ｄ１ｂ，Ｄ１ｃ，Ｄ１ｄを取得する。 The analysis device 30 includes a data acquisition unit 31, a data totaling unit 32, and an analysis unit 33. The data acquisition unit 31 acquires sampling data D1a, D1b, D1c, and D1d from the sampling threads Ps1 to Ps4.

データ集計部３２は、サンプリングデータＤ１ａ，Ｄ１ｂ，Ｄ１ｃ，Ｄ１ｄを集計した集計データＤ２を作成する。この集計データＤ２から、プロセスＰａ１〜Ｐａ４の各々について、他のプロセスとの通信を待機した待機時間と、他のプロセスに待機させた被待機時間とを求めることができる。 The data totaling unit 32 creates total data D2 by totaling the sampling data D1a, D1b, D1c, D1d. From each of the aggregated data D2, for each of the processes Pa1 to Pa4, it is possible to obtain the standby time for waiting for communication with other processes and the standby time for waiting for other processes.

解析部３３は、集計データＤ２を解析してプロセスＰａ１〜Ｐａ４に対する処理の配分状態を評価し、評価結果を出力する処理部である。 The analysis unit 33 is a processing unit that analyzes the aggregated data D2, evaluates a distribution state of processes for the processes Pa1 to Pa4, and outputs an evaluation result.

［処理動作］
図６は、プロセッサ１１〜１４における処理動作を説明するフローチャートである。以下の説明では、プロセッサ１１を例に説明を行なうが、プロセッサ１２〜１４についても同様である。 [Processing operation]
FIG. 6 is a flowchart for explaining the processing operation in the processors 11 to 14. In the following description, the processor 11 is described as an example, but the same applies to the processors 12 to 14.

まず、プロセッサ１１は、アプリケーションプログラムから割り当てられたプロセスＰａ１を実行する。このプロセスＰａ１は、処理の開始時にサンプリングスレッドＰｓ１を生成する（Ｓ１０１）。プロセッサ１１は、プロセスＰａ１において計算処理を実行する（Ｓ１０２）とともに、サンプリングスレッドＰｓ１によるサンプリングを実行する（Ｓ２０１）。 First, the processor 11 executes the process Pa1 assigned from the application program. The process Pa1 generates a sampling thread Ps1 at the start of processing (S101). The processor 11 executes calculation processing in the process Pa1 (S102) and also performs sampling by the sampling thread Ps1 (S201).

プロセッサ１１は、計算処理（Ｓ１０２）が終了した後、サンプリングスレッドＰｓ１の消去を待ち（Ｓ１０３）、サンプリングデータＤ１ａを出力して（Ｓ１０４）、処理を終了する。 After completing the calculation process (S102), the processor 11 waits for the erasure of the sampling thread Ps1 (S103), outputs the sampling data D1a (S104), and ends the process.

図７は、解析装置３０の処理動作を説明するフローチャートである。図７に示したように、まず、データ取得部３１がサンプリングデータＤ１ａ，Ｄ１ｂ，Ｄ１ｃ，Ｄ１ｄを取得し（Ｓ３０１）、データ集計部３２は、サンプリングデータＤ１ａ，Ｄ１ｂ，Ｄ１ｃ，Ｄ１ｄを集計する（Ｓ３０２）。そして、解析部３３は、集計データＤ２を解析して評価結果を出力し（Ｓ３０３）、処理を終了する。 FIG. 7 is a flowchart for explaining the processing operation of the analysis device 30. As shown in FIG. 7, first, the data acquisition unit 31 acquires the sampling data D1a, D1b, D1c, D1d (S301), and the data totaling unit 32 totals the sampling data D1a, D1b, D1c, D1d ( S302). And the analysis part 33 analyzes the total data D2, outputs an evaluation result (S303), and complete | finishes a process.

［データと処理の具体例］
図８は、集計データＤ２の具体例の説明図である。図８に示した集計データＤ２は、受信側のプロセスと送信側のプロセスについて通信待ちコストを示している。具体的には、受信側のプロセス、すなわちサンプリングしたプロセスがプロセスＰａ１について、送信先プロセスＰａ２に対する通信待ちコストが０、送信先プロセスＰａ３に対する通信待ちコストが１０、送信先プロセスＰａ４に対する通信待ちコストが３０である。 [Specific examples of data and processing]
FIG. 8 is an explanatory diagram of a specific example of the total data D2. The aggregated data D2 shown in FIG. 8 indicates the communication waiting cost for the receiving process and the transmitting process. Specifically, when the process on the receiving side, that is, the sampled process is the process Pa1, the communication waiting cost for the destination process Pa2 is 0, the communication waiting cost for the destination process Pa3 is 10, and the communication waiting cost for the destination process Pa4 is 30.

また、受信側のプロセスＰａ２について、送信先プロセスＰａ１に対する通信待ちコストが１０、送信先プロセスＰａ３に対する通信待ちコストが２０、送信先プロセスＰａ４に対する通信待ちコストが３０である。 Further, for the process Pa2 on the receiving side, the communication waiting cost for the transmission destination process Pa1 is 10, the communication waiting cost for the transmission destination process Pa3 is 20, and the communication waiting cost for the transmission destination process Pa4 is 30.

同様に、受信側のプロセスＰａ３について、送信先プロセスＰａ１に対する通信待ちコストが０、送信先プロセスＰａ２に対する通信待ちコストが０、送信先プロセスＰａ４に対する通信待ちコストが３０である。 Similarly, for the process Pa3 on the receiving side, the communication waiting cost for the destination process Pa1 is 0, the communication waiting cost for the destination process Pa2 is 0, and the communication waiting cost for the destination process Pa4 is 30.

そして、受信側のプロセスＰａ４について、送信先プロセスＰａ１に対する通信待ちコストが４０、送信先プロセスＰａ２に対する通信待ちコストが３０、送信先プロセスＰａ３に対する通信待ちコストが５０である。 For the process Pa4 on the receiving side, the communication waiting cost for the destination process Pa1 is 40, the communication waiting cost for the destination process Pa2 is 30, and the communication waiting cost for the destination process Pa3 is 50.

この集計データＤ２の行の合計は、そのプロセスが他のプロセスとの通信を待機した待機時間の合計であり、集計データＤ２の列の合計は他のプロセスに待機させた被待機時間の合計となる。 The total of the rows of the aggregated data D2 is the total of the waiting time that the process waited for communication with the other process, and the total of the columns of the aggregated data D2 is the sum of the waiting time that the other process waited for Become.

すなわち、図８に示した例では、プロセスＰａ１の待機時間の合計は４０、プロセスＰａ２の待機時間の合計は４０、プロセスＰａ３の待機時間の合計は３０、プロセスＰａ４の待機時間の合計は１２０である。また、プロセスＰａ１の被待機時間の合計は５０、プロセスＰａ２の被待機時間の合計は３０、プロセスＰａ３の被待機時間の合計は８０、プロセスＰａ４の被待機時間の合計は９０である。 That is, in the example shown in FIG. 8, the total waiting time of process Pa1 is 40, the total waiting time of process Pa2 is 40, the total waiting time of process Pa3 is 30, and the total waiting time of process Pa4 is 120. is there. The total waiting time of the process Pa1 is 50, the total waiting time of the process Pa2 is 30, the total waiting time of the process Pa3 is 80, and the total waiting time of the process Pa4 is 90.

行の合計値、すなわち受信側方向（横方向）のコストの合計値は、自プロセスが通信待ちした延べ時間の合計であり、値が大きいものは計算処理が軽かったことを示している。図８の例では、プロセスＰａ４の待機時間が１２０であり、プロセスＰａ４の負荷が他のプロセスに比して低い状態にあったことを示している。 The total value of the rows, that is, the total value of the cost in the receiving side direction (lateral direction) is the total of the total time that the own process waited for communication, and a large value indicates that the calculation processing was light. In the example of FIG. 8, it is shown that the waiting time of the process Pa4 is 120, and the load of the process Pa4 is lower than that of other processes.

列の合計値、すなわち送信側方向（縦方向）のコストの合計値は、自プロセスが他のプロセスに通信待ちさせた延べ時間の合計であり、値が多いものは処理が重かったことを示している。図８の例では、プロセスＰａ４の被待機時間の合計が９０であり、プロセスＰａ４は、他のプロセスに比して負荷が高い状態にあったことを示している。また、次に負荷が高かったプロセスはプロセスＰａ３である。 The total value of the column, that is, the total cost in the transmission direction (vertical direction) is the total time that the own process waited for other processes to wait for communication, and a large value indicates that processing was heavy. ing. In the example of FIG. 8, the total waiting time of the process Pa4 is 90, which indicates that the process Pa4 is in a higher load than other processes. The process having the next highest load is process Pa3.

この待機時間と被待機時間を総合的に判断すると、プロセスＰａ４は、負荷が低い場合と高い場合とのブレが他のプロセスに比して大きく、通信待ち状況に改善の余地が大きいこと、プロセスＰａ３は、負荷が高いため通信待ち状況の改善の余地があることが判る。 Comprehensively determining the waiting time and the waiting time, the process Pa4 has a large blur between the case where the load is low and the case where the load is high compared to other processes, and there is a large room for improvement in the communication waiting state. It can be seen that Pa3 has room for improvement in the waiting state for communication because the load is high.

図９は、解析部３３による解析についての説明図である。図９の評価テーブルＤ３に示したように、待機時間のコストと被待機時間のコストがともに低い場合（Ｌ＿Ｌ）、そのプロセスの計算処理は適切で、通信待ちが少ない。また、相手を待たせることもない。したがって、改善は不要である。 FIG. 9 is an explanatory diagram for the analysis by the analysis unit 33. As shown in the evaluation table D3 of FIG. 9, when both the cost of the standby time and the cost of the standby time are low (L_L), the calculation processing of the process is appropriate and the communication waiting time is small. Also, it does not make the other party wait. Therefore, no improvement is necessary.

待機時間のコストが低く、被待機時間のコストが高い場合（Ｌ＿Ｈ）、そのプロセスは計算処理が多く、通信待ちが少ない。また、相手を待たせることが多い。このため、改善が必要であり、仕事を減らすことが望ましい。 When the cost of the waiting time is low and the cost of the waiting time is high (L_H), the process has a lot of calculation processing and the waiting time for communication is small. Also, it often makes the other party wait. For this reason, improvement is necessary and it is desirable to reduce work.

待機時間のコストが高く、被待機時間のコストが低い場合（Ｈ＿Ｌ）、そのプロセスは、計算処理が少なく、通信待ちが多い。そして、相手を待たせることが少ない。すなわち、計算処理を行っていない時間が多いため、改善が必要であり、仕事を増加することが望ましい。 When the cost of the standby time is high and the cost of the standby time is low (H_L), the process has a small amount of calculation processing and a large waiting time for communication. And it rarely makes the other party wait. In other words, since there are many times when no calculation processing is performed, improvement is necessary and it is desirable to increase work.

待機時間のコストと被待機時間のコストがともに高い場合（Ｈ＿Ｈ）、そのプロセスは、通信待ちが多く、相手を待たせることも多い。そのため、改善が必要であり、仕事量のブレを減らすことが望ましい。 When the cost of the waiting time and the cost of the waiting time are both high (H_H), the process often waits for communication and often makes the other party wait. Therefore, improvement is necessary, and it is desirable to reduce the fluctuation of the work amount.

このように、解析装置３０は、プロセスの待機時間と被待機時間からプロセス単位で負荷バランス、すなわち処理の配分状態の均一性を精度よく評価し、改善の指針を出力することが出来る。 As described above, the analysis apparatus 30 can accurately evaluate the load balance, that is, the uniformity of the processing distribution state, in units of processes from the process standby time and the standby time, and output an improvement guideline.

［関数の評価］
また、解析装置３０は、サンプリング時にユーザ関数ごとに待ち時間を積算させることで、関数について評価を行うことも出来る。図１０は、関数ごとの評価を行う場合のプロセッサ１１〜１４における処理動作を説明するフローチャートである。以下の説明では、プロセッサ１１を例に説明を行なうが、プロセッサ１２〜１４についても同様である。 [Function evaluation]
The analysis device 30 can also evaluate the function by integrating the waiting time for each user function at the time of sampling. FIG. 10 is a flowchart for explaining the processing operation in the processors 11 to 14 when the evaluation for each function is performed. In the following description, the processor 11 is described as an example, but the same applies to the processors 12 to 14.

プロセッサ１１は、通信待ち処理を開始した場合に、通信待ちに入った時刻Ｔｓｔを記録し（Ｓ４０１）、受信処理を行う（Ｓ４０２）。そして、受信が終了し、通信待ちが解除された場合に時刻Ｔｅｎｄを記録し（Ｓ４０３）、時刻Ｔｅｎｄと時刻Ｔｓｔの差分である待ち時間を送信元ごとの待ち状況マトリクスに加算する（Ｓ４０４）。 When the communication waiting process is started, the processor 11 records the time Tst when the communication is started (S401), and performs the receiving process (S402). Then, when reception is completed and communication waiting is canceled, time Tend is recorded (S403), and a waiting time that is the difference between time Tend and time Tst is added to the waiting status matrix for each transmission source (S404).

また、プロセッサ１１は、アプリケーションプログラムから割り当てられたプロセスＰａ１を実行する際に、プロセスＰａ１からサンプリングスレッドＰｓ１を生成する（Ｓ５０１）。プロセッサ１１は、プロセスＰａ１において計算処理を実行する（Ｓ５０２）とともに、サンプリングスレッドＰｓ１によるサンプリングを実行する（Ｓ６００）。 Further, when executing the process Pa1 assigned by the application program, the processor 11 generates a sampling thread Ps1 from the process Pa1 (S501). The processor 11 executes a calculation process in the process Pa1 (S502) and performs sampling by the sampling thread Ps1 (S600).

プロセッサ１１は、プロセスＰａ１の計算処理（Ｓ５０２）が終了した後、サンプリングスレッドＰｓ１の消去を待ち（Ｓ５０３）、サンプリングデータＤ１ａを出力して（Ｓ５０４）、処理を終了する。 After the calculation process (S502) of the process Pa1 ends, the processor 11 waits for the sampling thread Ps1 to be deleted (S503), outputs the sampling data D1a (S504), and ends the process.

図１１は、サンプリング（Ｓ６００）の詳細について説明するフローチャートである。図１１に示したように、プロセッサ１１は、プロセスＰａ１からサンプリングを行う（Ｓ６０１）度に、待ち状況マトリクスを参照する（Ｓ６０２）。そして、サンプリングがヒットしたユーザ関数の関数毎マトリクスに加算し（Ｓ６０３）、待ち状況マトリクスをゼロクリアする（Ｓ６０４）。関数別のサンプリングには、一例としてcall-graphプロファイリングなどを用いればよい。 FIG. 11 is a flowchart illustrating details of sampling (S600). As shown in FIG. 11, the processor 11 refers to the wait status matrix every time sampling is performed from the process Pa1 (S601) (S602). Then, the sampling is added to the function-by-function matrix of the user function (S603), and the waiting situation matrix is cleared to zero (S604). For example, call-graph profiling may be used for sampling by function.

図１２は、関数について評価する場合の解析装置３０の処理動作を説明するフローチャートである。図１２に示したように、まず、データ取得部３１がサンプリングデータを取得し（Ｓ７０１）、データ集計部３２が集計する（Ｓ７０２）。解析部３３は、ユーザ関数ごとに関数毎マトリクスを解析し、受信時に要した延べ時間や、受信させた時の延べ時間を求めて、集計マトリクスを作成し（Ｓ７０３）、解析レポートを出力する（Ｓ７０４）。 FIG. 12 is a flowchart for explaining the processing operation of the analysis apparatus 30 when evaluating a function. As shown in FIG. 12, first, the data acquisition unit 31 acquires sampling data (S701), and the data totaling unit 32 totals (S702). The analysis unit 33 analyzes the matrix for each function for each user function, finds the total time required for reception and the total time for reception, creates a total matrix (S703), and outputs an analysis report ( S704).

［データと処理の具体例］
図１３は、待ち状況マトリクス、関数毎マトリクス、集計マトリクスの具体例の説明図である。図１３に示した待ち状況マトリクスＤ４は、プロセスＰａ４の待ち状況を示すデータである。マトリクスＤ４は、送信側のプロセスＰａ１との通信でコスト４０の通信待ちを行い、送信側のプロセスＰａ２との通信でコスト３０の通信待ちを行い、送信側のプロセスＰａ３との通信でコスト５０の通信待ちを行ったことを示している。 [Specific examples of data and processing]
FIG. 13 is an explanatory diagram of specific examples of the waiting situation matrix, the function matrix, and the aggregation matrix. The wait status matrix D4 illustrated in FIG. 13 is data indicating the wait status of the process Pa4. The matrix D4 waits for communication at cost 40 in communication with the process Pa1 on the transmission side, waits for communication at cost 30 in communication with the process Pa2 on the transmission side, and costs 50 in communication with process Pa3 on the transmission side. Indicates that communication has been waited.

図１３に示した関数毎マトリクスＤ５は、プロセスＰａ４の関数ごとの待ち状況を示すデータである。関数毎マトリクスＤ５は、関数ｃ１について、送信側のプロセスＰａ１との通信でコスト３００の通信待ちを行い、送信側のプロセスＰａ２との通信でコスト１０の通信待ちを行い、送信側のプロセスＰａ３との通信でコスト１００の通信待ちを行ったことを示している。 The function-by-function matrix D5 illustrated in FIG. 13 is data indicating a waiting state for each function of the process Pa4. For each function c1, the function matrix D5 waits for communication at cost 300 by communication with the process Pa1 on the transmission side, waits for communication at cost 10 by communication with the process Pa2 on the transmission side, and It shows that the communication waiting of the cost 100 was performed by this communication.

同様に、関数毎マトリクスＤ５は、関数ｃ２について、送信側のプロセスＰａ１との通信でコスト１００、送信側のプロセスＰａ２との通信でコスト３００、送信側のプロセスＰａ３との通信でコスト２００の通信待ちを行ったことを示している。 Similarly, the function-by-function matrix D5 is a communication of the function c2 at a cost of 100 for communication with the process Pa1 on the transmission side, a cost of 300 for communication with the process Pa2 on the transmission side, and a cost of 200 for communication with the process Pa3 on the transmission side Indicates waiting.

また、関数毎マトリクスＤ５は、関数ｃ３について、送信側のプロセスＰａ１との通信でコスト１０、送信側のプロセスＰａ２との通信でコスト３０、送信側のプロセスＰａ３との通信でコスト３００の通信待ちを行ったことを示している。 Further, the function-specific matrix D5 indicates that the function c3 is waiting for communication with the cost 10 for communication with the process Pa1 on the transmission side, the cost 30 for communication with the process Pa2 on the transmission side, and the cost 300 for communication with the process Pa3 on the transmission side. It has been shown that.

また、関数毎マトリクスＤ５は、関数ｃ４について、送信側のプロセスＰａ１との通信でコスト４００、送信側のプロセスＰａ２との通信でコスト３００、送信側のプロセスＰａ３との通信でコスト５００の通信待ちを行ったことを示している。 Further, the function matrix D5 indicates that the function c4 is waiting for communication with the cost 400 for communication with the process Pa1 on the transmission side, the cost 300 for communication with the process Pa2 on the transmission side, and the cost 500 for communication with the process Pa3 on the transmission side. It has been shown that.

解析装置３０は、この関数毎マトリクスを各プロセスから取得し、関数ごとに集計して集計マトリクスを作成する。図１３に示した集計マトリクスＤ６は、関数ｃ３についての集計結果を例示したものである。 The analysis device 30 acquires the matrix for each function from each process, and totals the functions for each function to create a total matrix. The aggregation matrix D6 illustrated in FIG. 13 exemplifies the aggregation result for the function c3.

集計マトリクスＤ６は、受信側のプロセスＰａ１について、送信先プロセスＰａ２に対する通信待ちコストが０、送信先プロセスＰａ３に対する通信待ちコストが１００、送信先プロセスＰａ４に対する通信待ちコストが３００である。 In the aggregation matrix D6, for the process Pa1 on the receiving side, the communication waiting cost for the transmission destination process Pa2 is 0, the communication waiting cost for the transmission destination process Pa3 is 100, and the communication waiting cost for the transmission destination process Pa4 is 300.

また、受信側のプロセスＰａ２について、送信先プロセスＰａ１に対する通信待ちコストが１００、送信先プロセスＰａ３に対する通信待ちコストが２００、送信先プロセスＰａ４に対する通信待ちコストが３００である。 For the receiving process Pa2, the communication waiting cost for the transmission destination process Pa1 is 100, the communication waiting cost for the transmission destination process Pa3 is 200, and the communication waiting cost for the transmission destination process Pa4 is 300.

同様に、受信側のプロセスＰａ３について、送信先プロセスＰａ１に対する通信待ちコストが０、送信先プロセスＰａ２に対する通信待ちコストが０、送信先プロセスＰａ４に対する通信待ちコストが３００である。 Similarly, for the process Pa3 on the receiving side, the communication waiting cost for the destination process Pa1 is 0, the communication waiting cost for the destination process Pa2 is 0, and the communication waiting cost for the destination process Pa4 is 300.

そして、受信側のプロセスＰａ４について、送信先プロセスＰａ１に対する通信待ちコストが１０、送信先プロセスＰａ２に対する通信待ちコストが３０、送信先プロセスＰａ３に対する通信待ちコストが３００である。 For the receiving process Pa4, the communication waiting cost for the destination process Pa1 is 10, the communication waiting cost for the destination process Pa2 is 30, and the communication waiting cost for the destination process Pa3 is 300.

したがって、関数ｃ３についてプロセスＰａ１の待機時間の合計、すなわち行の合計は４００、プロセスＰａ２の待機時間の合計は６００、プロセスＰａ３の待機時間の合計は３００、プロセスＰａ４の待機時間の合計は３４０となる。また、プロセスＰａ１の被待機時間の合計、すなわち列の合計は１１０、プロセスＰａ２の被待機時間の合計は３０、プロセスＰａ３の被待機時間の合計は６００、プロセスＰａ４の被待機時間の合計は９００となる。 Therefore, for the function c3, the total waiting time of the process Pa1, that is, the total row is 400, the total waiting time of the process Pa2 is 600, the total waiting time of the process Pa3 is 300, and the total waiting time of the process Pa4 is 340. Become. In addition, the total waiting time of the process Pa1, that is, the column total is 110, the total waiting time of the process Pa2 is 30, the total waiting time of the process Pa3 is 600, and the total waiting time of the process Pa4 is 900. It becomes.

図１４は、集計マトリクスからの評価と修正指針の作成についての説明図である。集計マトリクスＤ６の待機時間と被待機時間から、関数ごとのプロセス評価Ｄ７を求めることが出来る。プロセス評価Ｄ７は、各関数についてプロセスの待機時間の合計が平均よりも低く、被待機時間の合計が平均よりも低い場合にＬ＿Ｌ、待機時間の合計が平均よりも低く、被待機時間の合計が平均よりも高い場合にＬ＿Ｈ、待機時間の合計が平均よりも高く、被待機時間の合計が平均よりも低い場合にＨ＿Ｌ、待機時間の合計が平均よりも高く、被待機時間の合計が平均よりも高い場合にＨ＿Ｈの値を取る。 FIG. 14 is an explanatory diagram for the evaluation from the aggregation matrix and the creation of a correction guideline. A process evaluation D7 for each function can be obtained from the standby time and the standby time of the aggregation matrix D6. The process evaluation D7 is L_L when the total waiting time of the processes is lower than the average for each function and the total waiting time is lower than the average, the total waiting time is lower than the average, and the total waiting time is L_H when higher than average, total waiting time is higher than average, H_L when total waiting time is lower than average, total waiting time is higher than average, total waiting time is higher than average If H is also high, the value of H_H is taken.

関数ｃ３の例では、待機時間の合計の平均値が（４００＋６００＋３００＋３４０）／４で４１０となる。プロセスＰａ１は、待機時間の合計が４００、被待機時間の合計が１１０であるので、Ｌ＿Ｌ、プロセスＰａ２は、待機時間の合計が６００、被待機時間の合計が３０であるので、Ｈ＿Ｌとなる。また、プロセスＰａ３は、待機時間の合計が３００、被待機時間の合計が６００であるので、Ｌ＿Ｈ、プロセスＰａ４は、待機時間の合計が３４０、被待機時間の合計が９００であるので、Ｌ＿Ｈとなる。 In the example of the function c3, the average value of the total waiting time is 410 in (400 + 600 + 300 + 340) / 4. The process Pa1 has a total waiting time of 400 and the total waiting time is 110, so L_L, and the process Pa2 has a total waiting time of 600 and the total waiting time is 30, so it becomes H_L. Further, since the total waiting time is 300 and the total waiting time is 600 for the process Pa3, L_H, and the total processing time for the process Pa4 is 340 and the total waiting time is 900. Become.

同様に、プロセス評価Ｄ７は、関数ｃ１について、プロセスＰａ１〜Ｐａ４でＨ＿Ｈの値をとる。プロセス評価Ｄ７は、関数ｃ２について、プロセスＰａ１でＨ＿Ｌ、プロセスＰａ２でＬ＿Ｌ、プロセスＰａ３でＨ＿Ｌ、プロセスＰａ４でＬ＿Ｌの値をとる。そして、プロセス評価Ｄ７は、関数ｃ４について、プロセスＰａ１とプロセスＰａ２でＨ＿Ｈ、プロセスＰａ３とプロセスＰａ４でＬ＿Ｌの値をとる。 Similarly, the process evaluation D7 takes the value of H_H in the processes Pa1 to Pa4 for the function c1. The process evaluation D7 takes a value of H_L for the process Pa1, L_L for the process Pa2, H_L for the process Pa3, and L_L for the process Pa4 for the function c2. The process evaluation D7 takes a value of H_H for the process Pa1 and the process Pa2 and L_L for the process Pa3 and the process Pa4 for the function c4.

プロセス全体としては、プロセスＰａ１とプロセスＰａ２がＨ＿Ｌ、プロセスＰａ３とプロセスＰａ４がＨ＿Ｈの値をとる。 As the entire process, the process Pa1 and the process Pa2 take the value H_L, and the process Pa3 and the process Pa4 take the value H_H.

このプロセス評価Ｄ７から、解析装置３０は、プロセス修正指針Ｄ８を作成する。Ｌ＿Ｌについては修正が不要であり、Ｌ＿Ｈについては仕事の減少が望ましく、Ｈ＿Ｌについては仕事の増加が望ましく、Ｈ＿Ｈについては仕事量のブレの減少が望ましい。 From this process evaluation D7, the analysis apparatus 30 creates a process correction guideline D8. No correction is required for L_L, a decrease in work is desirable for L_H, an increase in work is desirable for H_L, and a decrease in work blur is desirable for H_H.

このため、プロセス修正指針Ｄ８は、関数ｃ１について、プロセスＰａ１〜Ｐａ４に対するブレの減少を提示し、関数ｃ２について、プロセスＰａ１とプロセスＰａ３に対する仕事増を提示する。また、プロセス修正指針Ｄ８は、関数ｃ３について、プロセスＰａ２に対する仕事増と、プロセスＰａ３とプロセスＰａ４に対する仕事減を提示し、関数ｃ４について、プロセスＰａ１とプロセスＰａ２に対するブレの減少を提示する。 For this reason, the process correction guideline D8 presents a reduction in shake relative to the processes Pa1 to Pa4 for the function c1, and presents an increase in work for the processes Pa1 and Pa3 for the function c2. Further, the process correction guideline D8 presents an increase in work for the process Pa2 and a decrease in work for the processes Pa3 and Pa4 for the function c3, and a decrease in blurring for the processes Pa1 and Pa2 for the function c4.

そして、プロセス修正指針Ｄ８は、プロセスＰａ１とプロセスＰａ２の全体に対して仕事増を提示し、プロセスＰａ３とプロセスＰａ４の全体に対してブレの減少を提示する。 Then, the process correction guideline D8 presents an increase in work for the entire process Pa1 and process Pa2, and presents a reduction in shake for the entire process Pa3 and process Pa4.

上述してきたように、本実施例にかかる解析装置３０は、プロセッサ１１〜１４から、分散並列処理される複数のプロセスの各々について、他のプロセスとの間の通信の際の自プロセスの待機時間をサンプリングしたサンプリングデータを取得する。サンプリングデータを取得した解析装置３０は、複数のプロセスの各々について、他のプロセスとの間の自プロセスの待機時間の合計と、他のプロセスと自プロセスとの間の通信の際の他のプロセスの待機時間の合計を求め、その結果を解析して複数のプロセスに対する処理の配分状態を評価する。このため、解析装置３０は、負荷バランスを精度よく評価し、もって分散並列処理プログラムの性能向上に寄与することができる。 As described above, the analysis apparatus 30 according to the present embodiment, for each of a plurality of processes subjected to distributed parallel processing from the processors 11 to 14, waits for the own process when communicating with other processes. Get the sampling data sampled. The analysis apparatus 30 that has acquired the sampling data has, for each of the plurality of processes, the total waiting time of the own process with the other process and the other process at the time of communication between the other process and the own process. The total waiting time is obtained, and the result is analyzed to evaluate the distribution state of processing for a plurality of processes. For this reason, the analysis device 30 can accurately evaluate the load balance and contribute to improving the performance of the distributed parallel processing program.

なお、本実施例はあくまで一例であり、構成及び動作は適宜変更して実施することができる。例えば、本実施例では、４つのプロセッサを有するシステムを例示して説明を行ったが、任意の数のプロセッサを有するシステムに適用可能である。 In addition, a present Example is an example to the last, A structure and operation | movement can be changed suitably and can be implemented. For example, in this embodiment, a system having four processors has been described as an example, but the present invention can be applied to a system having an arbitrary number of processors.

１１〜１４プロセッサ
２１インターコネクト装置
３０解析装置
３１データ取得部
３２データ集計部
３３解析部
Ｐａ１〜Ｐａ４プロセス
Ｐｓ１〜Ｐｓ４サンプリングスレッド
Ｄ１ａ，Ｄ１ｂ，Ｄ１ｃ，Ｄ１ｄサンプリングデータ
Ｄ２集計データ
Ｄ３評価テーブル
Ｄ４待ち状況マトリクス
Ｄ５関数毎マトリクス
Ｄ６集計マトリクス
Ｄ７プロセス評価
Ｄ８プロセス修正指針 11-14 Processor 21 Interconnect device 30 Analysis device 31 Data acquisition unit 32 Data totaling unit 33 Analysis unit Pa1-Pa4 Process Ps1-Ps4 Sampling thread D1a, D1b, D1c, D1d Sampling data D2 Total data D3 Evaluation table D4 Waiting state matrix D5 Matrix for each function D6 Aggregation matrix D7 Process evaluation D8 Process modification guidelines

Claims

For each of a plurality of processes that are distributed in parallel processing, an acquisition unit that acquires sampling data obtained by sampling the waiting time of the own process at the time of communication with another process;
Based on the sampling data, for each of the plurality of processes, the total waiting time of the own process between the other process and the waiting time of the other process during communication between the other process and the own process. A totaling section that calculates the total time,
An analysis apparatus comprising: an analysis unit that analyzes a result of aggregation by the aggregation unit and evaluates a distribution state of processing for the plurality of processes.

The aggregating unit calculates the sum of the waiting time of the own process with the other process for each function being executed at the time of sampling, and the other process in the communication between the other process and the own process. The analysis apparatus according to claim 1, wherein a total waiting time is obtained, and the analysis unit evaluates a process distribution state for each function.

The analysis unit
The total waiting time of the own process with the other process is smaller than that of the other process, and the total waiting time of the other process at the time of communication between the other process and the own process is Evaluate that process allocation should be reduced for processes that are large relative to the process,
The total waiting time of the own process with the other process is larger than that of the other process, and the total waiting time of the other process at the time of communication between the other process and the own process is Assess that the process allocation should be increased for small processes relative to the process,
The total waiting time of the own process with the other process is larger than that of the other process, and the total waiting time of the other process at the time of communication between the other process and the own process is The analysis apparatus according to claim 1, wherein it is evaluated that process variation should be suppressed for a process that is larger than the process.

For each of a plurality of processes that are distributed in parallel processing, obtaining sampling data obtained by sampling the waiting time of the own process at the time of communication with other processes;
Based on the sampling data, for each of the plurality of processes, the total waiting time of the own process between the other process and the waiting time of the other process during communication between the other process and the own process. An aggregation step to find the total time,
Analyzing the aggregation result of the aggregation step and evaluating the distribution state of the processing for the plurality of processes.

A procedure for acquiring sampling data obtained by sampling the waiting time of the own process at the time of communication with other processes for each of a plurality of processes that are distributed and processed in parallel.
Based on the sampling data, for each of the plurality of processes, the total waiting time of the own process between the other process and the waiting time of the other process during communication between the other process and the own process. An aggregation procedure to find the total time,
An analysis program for causing a computer to execute a procedure for analyzing a counting result obtained by the counting procedure and evaluating a distribution state of processing for the plurality of processes.