JP3821834B2

JP3821834B2 - Parallel efficiency calculation method

Info

Publication number: JP3821834B2
Application number: JP2006001754A
Authority: JP
Inventors: 茂夫折居
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2006-01-06
Filing date: 2006-01-06
Publication date: 2006-09-13
Anticipated expiration: 2022-07-22
Also published as: JP2006120182A

Description

本発明は、並列計算機システムの性能評価技術及び当該性能評価の結果の利用技術に関する。なお、本発明の技術は、従来のハイパフォーマンス・コンピューティング（ＨＰＣ：High Performance Computing）で扱われる分野（構造解析、流体解析、計算化学等）、グリッド又はクラスタ上に展開されるバイオシミュレーション、ウェブ（Ｗｅｂ）サービス（例えばＭtoＭ（Machine to Machine））等、並列処理を行う全ての分野に適用可能である。 The present invention relates to a performance evaluation technique for a parallel computer system and a technique for using a result of the performance evaluation. The technology of the present invention can be applied to fields (structural analysis, fluid analysis, computational chemistry, etc.) handled by conventional high performance computing (HPC), biosimulation developed on a grid or cluster, web ( Web) service (for example, MtoM (Machine to Machine)) is applicable to all fields that perform parallel processing.

アプリケーション毎に並列計算機システムの性能は著しく異なる。従ってその性能評価は重要である。並列計算機システムの性能評価方法には、（１）ある特定の処理を種々の計算機で実施して比較するものと、（２）ある計算機が自己のポテンシャルに対しどの位の性能を発揮しているかを評価する自己完結型の２通りがある。前者はベンチマークテストとして主に計算機間の性能比較に使用される。後者は導入後の実運用で実施する必要がある。この自己完結型の性能評価は並列効率という指標を用いて行うことができるが、実際には実施されていない。また並列効率の代わりにプロセッサ数ｐを変えながら時間測定を行い、理想的な減少の度合い１／ｐと比較する並列性能評価（いわゆるスケーラビリティ評価）も可能であるが、数回の時間測定を必要とするため通常は実施されない。またスケーラビリティ評価は定性的であり、厳密な並列性能評価を行うことはできない。従って現在、並列効率の悪い処理を検知できず、それらは野放し状態である。 The performance of the parallel computer system varies greatly depending on the application. Therefore, its performance evaluation is important. There are two methods for evaluating the performance of parallel computer systems: (1) a specific process performed by various computers for comparison, and (2) how much performance a computer exhibits against its own potential. There are two types of self-contained types. The former is mainly used as a benchmark test for performance comparison between computers. The latter needs to be implemented in actual operation after introduction. Although this self-contained performance evaluation can be performed using an index called parallel efficiency, it has not been actually implemented. In addition, parallel performance evaluation (so-called scalability evaluation) is possible by measuring time while changing the number of processors p instead of parallel efficiency and comparing with the ideal degree of decrease 1 / p, but several time measurements are required. Therefore, it is not usually implemented. Scalability evaluation is qualitative, and strict parallel performance evaluation cannot be performed. Therefore, currently, it is not possible to detect processes with poor parallel efficiency, and they are in an open state.

並列効率による並列処理の性能評価は、以下に示す式（１）及び（２）より決定される並列効率Ｅ_p(p)を求めることにより行われる。なお、ｐはプロセッサ数、τ(1)は１プロセッサで実行した場合の処理時間、τ(p)は同じ処理をｐ個のプロセッサで実行した場合の処理時間、τ_i(p)は１≦ｉ≦ｐでｉ番目のプロセッサの処理時間である。

例えば式（１）は、柄谷，中村，奥田，矢川；並列有限要素法コードのGeoFEMの性能評価，Transactions of JSCES, No.20000022 (2000)という文献に開示されている。
”ＵＸＰ／Ｖアナライザ使用手引書Ｖ２０用”，富士通株式会社，１９９９年９月３０日，第２版，ｐｐ．１３−３１（マニュアル2004-00476-001）特開２０００−２９８５９３号公報特開平０９−２６５４５９号公報 The performance evaluation of the parallel processing based on the parallel efficiency is performed by obtaining the parallel efficiency E _p (p) determined by the following equations (1) and (2). Here, p is the number of processors, τ (1) is the processing time when executed by one processor, τ (p) is the processing time when the same processing is executed by p processors, and τ _i (p) is 1 ≦ The processing time of the i-th processor with i ≦ p.

For example, the equation (1) is disclosed in the literature: Karaya, Nakamura, Okuda, Yagawa; GeoFEM performance evaluation of parallel finite element method code, Transactions of JSCES, No. 20000022 (2000).
"UXP / V Analyzer User's Manual for V20", Fujitsu Limited, September 30, 1999, 2nd edition, pp. 13-31 (Manual 2004-00476-001) JP 2000-298593 A Japanese Patent Application Laid-Open No. 09-265459

しかし、従来の方法で並列効率を決定しても、並列性能阻害要因との定量的関係が明確でなかったため、どの阻害要因がどの位並列効率に効いているかは分からなかった。また、一部の並列性能評価技術（例えば日本国特許出願番号２００１−２４１１２１、米国特許出願番号０９／９９８１６０）には、図１に示すように「ロードバランスが保たれ且つ各処理時間γ_i（並列部）、χ_i,1（冗長処理部）、χ_i,2（通信部）、χ_i,others（その他の並列性能阻害要因）が同じ」という条件が必要で、一部の並列処理にしか適用できないという問題があった。 However, even if the parallel efficiency was determined by the conventional method, the quantitative relationship with the parallel performance impediment factors was not clear, so it was not possible to know which impediment factor worked for the parallel efficiency. Further, some parallel performance evaluation techniques (for example, Japanese Patent Application No. 2001-241121, US Patent Application No. 09/998160) include “the load balance is maintained and each processing time γ _i ( Parallel part), χ _{i, 1} (redundant processing part), χ _{i, 2} (communication part), χ _{i, others} (other parallel performance impediment factors) are required. There was a problem that it could only be applied.

また、従来の方法ではグリッドやクラスタによる並列処理への適用が難しい。これはグリッドやクラスタ上に分散している、計算に必要なメモリ、データ、ＣＰＵ等の資源を１つのプロセッサに集めると、１つのプロセッサで実現できない程大きな処理となる場合が多いためである。すなわち、τ(1)を測定すること自体が難しい。また式（１）においてτ(1)とτ(p)を実測で求めるということはプロセッサの性能が同じであるということを前提としているが、グリッドやクラスタ上の個々のプロセッサ性能は通常異なるため、実測したτ(1)とτ(p)を式（１）に代入しても正しい並列効率を決定できないという問題もある。 In addition, it is difficult to apply the conventional method to parallel processing using a grid or cluster. This is because when resources such as memory, data, and CPU, which are distributed on a grid or cluster, are collected in one processor, the processing is often so large that it cannot be realized by one processor. That is, it is difficult to measure τ (1) itself. In formula (1), to obtain τ (1) and τ (p) by actual measurement is based on the premise that the performance of the processors is the same, but the performance of individual processors on the grid or cluster is usually different. There is also a problem that the correct parallel efficiency cannot be determined even if the actually measured τ (1) and τ (p) are substituted into the equation (1).

従って本発明の目的は、「ロードバランスが保たれている」という条件をはずし、ヘテロなプロセッサ環境を含め多種類の並列処理に適用でき、並列効率と並列性能評価指標及び並列性能阻害要因間の定量的関係付けを行う並列処理性能評価技術を提供することである。 Therefore, the object of the present invention is to remove the condition that “the load balance is maintained”, and can be applied to various kinds of parallel processing including a heterogeneous processor environment, and between the parallel efficiency, the parallel performance evaluation index, and the parallel performance impediment factor. It is to provide a parallel processing performance evaluation technology that performs quantitative association.

また本発明の他の目的は、並列効率等を用いて、並列計算機システムの適切な運用を可能にするための技術を提供することである。 Another object of the present invention is to provide a technique for enabling an appropriate operation of a parallel computer system using parallel efficiency or the like.

さらに本発明の他の目的は、並列効率等を用いて、並列計算機システムの能力増強、更新等に対する適切な判断を可能にするための技術を提供することである。 Furthermore, another object of the present invention is to provide a technique for enabling appropriate judgment on capacity enhancement, update, etc. of a parallel computer system using parallel efficiency or the like.

さらに本発明の他の目的は、並列効率等を用いて、並列計算機システムにおいて実行するプログラムのチューニングやアルゴリズムの選定を適切に実施できるようにするための技術を提供することである。 Still another object of the present invention is to provide a technique for appropriately executing tuning of a program executed in a parallel computer system and selection of an algorithm using parallel efficiency or the like.

本発明の第１の態様に係る、並列計算機システムの並列効率Ｅ_p(p)を計算する並列効率計算方法は、並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、並列計算機システムの記憶部に格納するステップと、データ取得部とロードバランス寄与率計算部と仮想並列化率計算部と並列性能阻害要因寄与率計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータのデータ取得部により、並列計算機システムの記憶部から、並列計算部分の処理時間γ_i(p)及び各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、ログデータ格納部に格納するステップと、ロードバランス寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、仮想並列化率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率Ｒp(p)を計算し、記憶装置に格納する仮想並列化率計算ステップと、並列性能阻害要因寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分ｊの処理時間χ_i,j(p)の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、並列効率計算部により、記憶装置に格納された、ロードバランス寄与率Ｒb(p)と仮想並列化率Ｒp(p)と並列性能阻害要因寄与率Ｒj(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、記憶装置に格納するステップとを含む。なお、並列効率Ｅ_p(p)は、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間がｐ個のプロセッサの各処理時間τ_i(p)に等しいと仮定した場合の総処理時間に対する、並列処理を行わない場合における処理時間の割合である。 The parallel efficiency calculation method for calculating the parallel efficiency E _p (p) of the parallel computer system according to the first aspect of the present invention is the parallel computer system processing time γ _i (p) (i Indicates the processor number), and the processing time χ _{i, j} (p) of each parallel performance impediment factor j is measured and stored in the storage unit of the parallel computer system, and the data acquisition unit and load balance contribution rate calculation A parallel computing rate calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device. The step of acquiring the processing time γ _i (p) of the portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j and storing them in the log data storage unit, and the log data by the load balance contribution rate calculation unit Case in storage A load balance contribution ratio Rb (p) representing a degree of load balance between processors included in the parallel computer system using the data obtained, and storing the load balance contribution ratio Rb (p) in a storage device; By using the data stored in the log data storage unit by the conversion rate calculation unit, the virtual parallelization rate Rp ( all processors included in the parallel computer system using the data stored in the log data storage unit by the virtual parallelization rate calculation step of calculating p) and storing it in the storage device, and the parallel performance impediment factor contribution rate calculation unit All processing parallel performance impediment factor contribution ratio Rj representing the percentage of processing time chi _{i, j} of each parallel performance impediment factor portion j (p) (p) calculated for time storage of The parallel performance impediment factor contribution rate calculation step to be stored and the parallel efficiency calculation unit store the load balance contribution rate Rb (p), virtual parallelization rate Rp (p), and parallel performance impediment factor contribution rate Rj stored in the storage device. (p) and the parallel efficiency E _p (p)

And calculating and storing in a storage device. Note that the parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time is equal to each processing time τ _i (p) of p processors when parallel processing is performed by p processors. This is the ratio of processing time to time when parallel processing is not performed.

本発明の第２の態様に係る並列効率計算方法は、並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、並列計算機システムの記憶部に格納するステップと、データ取得部とロードバランス寄与率計算部と補助指標計算部と並列性能阻害要因寄与率計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータのデータ取得部により、並列計算機システムの記憶部から、並列計算部分の処理時間γ_i(p)及び各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、ログデータ格納部に格納するステップと、ロードバランス寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、補助指標計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムにおいて実施する処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率Ａ_p(p)を計算し、記憶装置に格納する加速率計算ステップと、並列性能阻害要因寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分ｊの処理時間χ_i,j(p)の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、並列効率計算部により、記憶装置に格納された、ロードバランス寄与率Ｒb(p)と加速率Ａ_p(p)と並列性能阻害要因寄与率Ｒj(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップとを含む。 The parallel efficiency calculation method according to the second aspect of the present invention is a parallel computer system in which processing time γ _i (p) (i indicates a processor number) of a parallel calculation part in processing and each parallel performance impediment factor j A step of measuring the processing time χ _{i, j} (p) and storing it in the storage unit of the parallel computer system, a data acquisition unit, a load balance contribution rate calculation unit, an auxiliary index calculation unit, and a parallel performance impediment factor contribution rate calculation unit A parallel efficiency calculation unit, a log data storage unit, and a storage device, a computer data acquisition unit, from the storage unit of the parallel computer system, the processing time γ _i (p) of the parallel calculation unit and each parallel performance impediment factor j It acquires processing time χ _{i, j (p),} and storing the log data storage unit, the load balance contribution ratio calculation unit, by using the data stored in the log data storage, is included in the parallel computer system The load balance contribution ratio Rb (p) representing the degree of load balance between the processors is calculated, and the data stored in the log data storage section by the load balance contribution ratio calculation step stored in the storage device and the auxiliary index calculation section Is used to calculate the acceleration rate A _p (p) that represents the limit of improvement in the degree of reduction in processing time by parallelizing the processing executed in the parallel computer system, and to store the acceleration rate in the storage device, and the parallel performance Using the data stored in the log data storage unit by the inhibition factor contribution rate calculation unit, the processing time χ _{i, j} (p of each parallel performance inhibition factor part j with respect to the total processing time of all processors included in the parallel computer system ), The parallel performance impediment factor contribution ratio Rj (p) is calculated and stored in the storage device. Been, by using the load balance contribution ratio Rb (p) and the acceleration rate A _p (p) and the parallel performance impediment factor contribution ratio Rj (p), parallel efficiency E _p a (p)

And storing in the storage device.

本発明の第３の態様に係る並列効率計算方法は、並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、並列計算機システムの記憶部に格納するステップと、データ取得部とロードバランス寄与率計算部と並列性能阻害要因寄与率計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータのデータ取得部により、並列計算機システムの記憶部から、並列計算部分の処理時間γ_i(p)及び各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、ログデータ格納部に格納するステップと、ロードバランス寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、並列性能阻害要因寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分ｊの処理時間χ_i,j(p)の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、並列効率計算部により、記憶装置に格納された、ロードバランス寄与率Ｒb(p)と並列性能阻害要因寄与率Ｒj(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップとを含む。 The parallel efficiency calculation method according to the third aspect of the present invention is a parallel computer system in which processing time γ _i (p) (i indicates a processor number) of a parallel calculation part in processing and each parallel performance impediment factor j Measuring the processing time χ _{i, j} (p) and storing it in the storage unit of the parallel computer system; a data acquisition unit, a load balance contribution rate calculation unit, a parallel performance impediment factor contribution rate calculation unit, and a parallel efficiency calculation unit A data acquisition unit of a computer having a log data storage unit and a storage device, from the storage unit of the parallel computer system, the processing time γ _i (p) of the parallel calculation part and the processing time χ _{i of} each parallel performance impediment factor j _, get the _j (p), and storing the log data storage unit, the load balance contribution ratio calculation unit, by using the data stored in the log data storage, among the processors included in the parallel computer system The load balance contribution ratio Rb (p) representing the degree of load balance is calculated and stored in the log data storage section by the load balance contribution ratio calculation step stored in the storage device and the parallel performance impediment factor contribution ratio calculation section. Is used to calculate the parallel performance impediment factor contribution ratio Rj (p) that represents the ratio of the processing time χ _{i, j} (p) of each parallel performance impediment factor part j to the total processing time of all processors included in the parallel computer system. The parallel performance impediment factor contribution rate calculation step stored in the storage device and the load efficiency contribution ratio Rb (p) and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device by the parallel efficiency calculation unit To calculate parallel efficiency E _p (p)

And storing in the storage device.

本発明の第４の態様に係る並列効率計算方法は、並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、並列計算機システムの記憶部に格納するステップと、データ取得部とロードバランス寄与率計算部と仮想並列化率計算部と並列効率計算部と補助指標計算部とログデータ格納部と記憶装置とを有するコンピュータのデータ取得部により、並列計算機システムの記憶部から、並列計算部分の処理時間γ_i(p)及び各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、ログデータ格納部に格納するステップと、ロードバランス寄与率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、仮想並列化率計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率Ｒp(p)を計算し、記憶装置に格納する仮想並列化率計算ステップと、補助指標計算部により、ログデータ格納部に格納されたデータを用いて、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間γ_i(p)の和αと、各プロセッサにおいて実施された処理の処理時間の和βとを計算し、記憶装置に格納する補助指標計算ステップと、並列効率計算部により、記憶装置に格納された、上記αと上記βとロードバランス寄与率Ｒb(p)と仮想並列化率Ｒp(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、記憶装置に格納するステップとを含む。 In the parallel efficiency calculation method according to the fourth aspect of the present invention, in a parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and each parallel performance impediment factor j processing time chi _{i, j} and (p) were measured, and storing in the storage unit of the parallel computer system, a data acquisition unit and the load balance contribution ratio calculating unit and the virtual parallelization ratio calculator and the parallel efficiency calculation unit aid A computer data acquisition unit having an index calculation unit, a log data storage unit, and a storage device allows the parallel computer part processing time γ _i (p) and the processing time of each parallel performance impediment factor j from the storage unit of the parallel computer system. get the χ _{i, j (p),} and storing the log data storage unit, the load balance contribution ratio calculation unit, by using the data stored in the log data storing unit, each process included in the parallel computer system The load balance contribution ratio Rb (p) representing the degree of load balance between the servers is calculated and stored in the log data storage section by the load balance contribution ratio calculation step stored in the storage device and the virtual parallelization ratio calculation section. Virtual parallelization that calculates a virtual parallelization rate Rp (p) that represents a proportion of time of a part that is calculated in parallel by each processor among processes executed in a parallel computer system using data, and stores it in a storage device The processing time γ _i (p) of the parallel calculation portion of the processing performed in each processor included in the parallel computer system using the data stored in the log data storage unit by the rate calculation step and the auxiliary index calculation unit Is calculated by an auxiliary index calculating step for storing the sum α of the processing times and the sum β of the processing time of the processing executed in each processor and storing it in the storage device, and the parallel efficiency calculating unit. Using the α, β, load balance contribution rate Rb (p), and virtual parallelization rate Rp (p) stored in the storage device, the parallel efficiency E _p (p) is calculated.

And calculating and storing in a storage device.

本発明の第５の態様に係る並列効率計算方法は、並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)と、冗長処理以外に並列性能阻害要因が存在する場合にはｐ＞１で発生し且つｐに依存する並列性能阻害要因ｊによる処理時間Ｘ_i,j(p)とを測定し、並列計算機システムの記憶部に格納するステップと、データ取得部と補助指標計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータのデータ取得部により、並列計算機システムの記憶部から、並列計算部分の処理時間γ_i(p)と各並列性能阻害要因ｊの処理時間χ_i,j(p)と冗長処理以外に並列性能阻害要因が存在する場合には処理時間Ｘ_i,j(p)とを取得し、ログデータ格納部に格納するステップと、補助指標計算部により、ログデータ格納部に格納されたデータを用いて、１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間ρを計算し、記憶装置に格納するステップと、補助指標計算部により、ログデータ格納部に格納されているデータを用いて、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間γ_i(p)の和である第２の処理時間αを計算し、記憶装置に格納するステップと、並列効率計算部により、並列計算機システムにおいて使用したプロセッサの数ｐと、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間τ(p)と、記憶装置に格納された、第１の処理時間ρ及び第２の処理時間αとを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップとを含む。 In the parallel efficiency calculation method according to the fifth aspect of the present invention, in a parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and each parallel performance impediment factor j If there is a parallel performance impediment other than the processing time χ _{i, j} (p) and redundant processing, the processing time X _{i, j} (p ) And storing in the storage unit of the parallel computer system, the data acquisition unit of the computer having the data acquisition unit, the auxiliary index calculation unit, the parallel efficiency calculation unit, the log data storage unit and the storage device, in parallel When there are parallel performance impediments other than the processing time γ _i (p) of the parallel computation part, the processing time χ _{i, j} (p) of each parallel performance impediment factor j, and redundant processing from the storage unit of the computer system Acquire processing time X _{i, j} (p) and store log data When the processing is performed by one processor using the data stored in the log data storage unit by the step stored in the unit and the data stored in the log data storage unit, this corresponds to the total processing time of the parallel performance impeding part of the processing. The first processing time ρ was calculated and stored in the storage device, and the auxiliary index calculation unit was implemented in each processor included in the parallel computer system using the data stored in the log data storage unit The second processing time α, which is the sum of the processing times γ _i (p) of the parallel calculation part of the processing, is stored in the storage device, and the parallel efficiency calculation unit determines the processor used in the parallel computer system. P, the longest processing time τ (p) when parallel processing is performed by p processors, the first processing time ρ and the second processing time stored in the storage device The parallel efficiency E _p (p) is calculated using the processing time α of

And storing in the storage device.

本発明の第６の態様に係る、並列計算機システムの並列効率を計算する並列効率計算方法は、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、並列計算機システムにおいて実施した処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率Ｒp(p)を計算し、記憶装置に格納する仮想並列化率計算ステップと、並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、ロードバランス寄与率Ｒb(p)と仮想並列化率Ｒp(p)と並列性能阻害要因寄与率Ｒj(p)とを用いて並列効率を計算し（例えば実施の形態における式（４−４））、記憶装置に格納するステップとを含む。 According to a sixth aspect of the present invention, a parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system has a load balance contribution ratio Rb (p) representing a degree of load balance between processors included in the parallel computer system. A load balance contribution ratio calculating step for calculating and storing in a storage device, and a virtual parallelization ratio Rp (p) representing a time ratio of a part of the processing executed in the parallel computer system that is calculated in parallel by each processor A virtual parallelization rate calculating step for calculating and storing in the storage device, and a parallel performance impediment factor contribution ratio Rj (representing the ratio of processing time of each parallel performance impediment factor portion to the total processing time of all processors included in the parallel computer system) p) is calculated and stored in the storage device in parallel performance impediment factor contribution rate calculation step, load balance contribution rate Rb (p) and virtual parallelization rate Rp (p) in parallel Calculate the parallel efficiency by using the ability inhibitory factor contribution ratio Rj (p) (Equation (4-4) in the example embodiment), and storing in the storage device.

これにより並列効率は、ロードバランス寄与率、仮想並列化率及び並列性能阻害要因寄与率といった並列性能評価指標と定量的に関係付けられる。 Thereby, the parallel efficiency is quantitatively related to parallel performance evaluation indexes such as a load balance contribution rate, a virtual parallelization rate, and a parallel performance impediment factor contribution rate.

本発明の第７の態様に係る、並列計算機システムの並列効率を計算する並列効率計算方法は、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、並列計算機システムにおいて実施する処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率Ａ_p(p)を計算し、記憶装置に格納する加速率計算ステップと、並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、ロードバランス寄与率Ｒb(p)と加速率Ａ_p(p)と並列性能阻害要因寄与率Ｒj(p)とを用いて並列効率を計算し（例えば実施の形態における式（４−５））、記憶装置に格納するステップとを含む。 According to a seventh aspect of the present invention, there is provided a parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system, wherein a load balance contribution ratio Rb (p) representing a degree of load balance among processors included in the parallel computer system is calculated. Calculate and store the load balance contribution ratio calculation step to be calculated and stored in the storage device, and the acceleration rate A _p (p) representing the limit of improvement in the degree of reduction in processing time by parallelizing the processing executed in the parallel computer system. Acceleration rate calculation step to be stored in the apparatus and parallel performance impediment factor contribution ratio Rj (p) representing the ratio of the processing time of each parallel performance impediment factor part to the total processing time of all processors included in the parallel computer system, The parallel performance impediment factor contribution ratio calculation step stored in the storage device, the load balance contribution ratio Rb (p), the acceleration rate A _p (p), and the parallel performance impediment contribution ratio Rj (p) are used in parallel. Calculating column efficiency (for example, equation (4-5) in the embodiment), and storing it in the storage device.

これにより並列効率は、ロードバランス寄与率及び並列性能阻害要因寄与率といった並列性能評価指標並びに加速率という補助指標と定量的に関係付けられる。 Thereby, the parallel efficiency is quantitatively related to a parallel performance evaluation index such as a load balance contribution ratio and a parallel performance impediment factor contribution ratio and an auxiliary index such as an acceleration rate.

本発明の第８の態様に係る、並列計算機システムの並列効率を計算する並列効率計算方法は、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、ロードバランス寄与率Ｒb(p)と並列性能阻害要因寄与率Ｒj(p)とを用いて並列効率を計算し（例えば実施の形態における式（８−２））、記憶装置に格納するステップとを含む。 A parallel efficiency calculation method for calculating the parallel efficiency of a parallel computer system according to an eighth aspect of the present invention provides a load balance contribution ratio Rb (p) representing a degree of load balance between processors included in the parallel computer system. A load balance contribution ratio calculation step for calculating and storing in the storage device, and a parallel performance impediment factor contribution ratio Rj () representing the ratio of the processing time of each parallel performance impediment factor part to the total processing time of all processors included in the parallel computer system p) is calculated and the parallel efficiency is calculated using the parallel performance impediment factor contribution ratio calculation step stored in the storage device, the load balance contribution ratio Rb (p) and the parallel performance impediment contribution ratio Rj (p) ( For example, the step (8-2) in the embodiment and the step of storing in the storage device are included.

例えば、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和が、１プロセッサにより同一処理を実施した場合の処理時間とほぼ一致する場合、すなわちほとんど並列計算できるような処理内容の場合にはこのようにして計算することができる。 For example, if the sum of the processing times of the parallel computing portion of the processing executed in each processor included in the parallel computer system is substantially the same as the processing time when the same processing is performed by one processor, that is, almost parallel calculation can be performed. In the case of such processing contents, it can be calculated in this way.

本発明の第９の態様に係る、並列計算機システムの並列効率を計算する並列効率計算方法は、並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算される部分の、時間についての割合を表す仮想並列化率Ｒp(p)を計算し、記憶装置に格納する仮想並列化率計算ステップと、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和と、各プロセッサにおいて実施された処理の処理時間の和と、ロードバランス寄与率Ｒb(p)と、仮想並列化率Ｒp(p)とを用いて並列効率を計算し（例えば実施の形態における式（９−１））、記憶装置に格納するステップとを含む。本発明の第１の態様の変形例である。 A parallel efficiency calculation method for calculating the parallel efficiency of a parallel computer system according to a ninth aspect of the present invention provides a load balance contribution ratio Rb (p) representing a degree of load balance between processors included in the parallel computer system. A load balance contribution ratio calculating step for calculating and storing in a storage device, and a virtual parallelization rate Rp (p) representing a time ratio of a part of the processing executed in the parallel computer system that is calculated in parallel by each processor The virtual parallelization rate calculation step for calculating and storing in the storage device, the sum of the processing time of the parallel calculation portion of the processing executed in each processor included in the parallel computer system, and the processing of the processing executed in each processor The parallel efficiency is calculated using the sum of time, the load balance contribution ratio Rb (p), and the virtual parallelization ratio Rp (p) (for example, in the embodiment) (Equation (9-1)) and storing in the storage device. It is a modification of the 1st mode of the present invention.

本発明の第１０の態様に係る、並列計算機システムの並列効率を計算する並列効率計算方法は、１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間を計算し、記憶装置に格納するステップと、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和である第２の処理時間を計算し、記憶装置に格納するステップと、並列計算機システムにおいて使用したプロセッサの数と、並列計算機システムに含まれる各プロセッサにおいて実施された処理の処理時間のうち最長の処理時間と、第１の処理時間と、第２の処理時間とを用いて並列効率を計算し（例えば実施の形態における式（９−２））、記憶装置に格納するステップとを含む。 The parallel efficiency calculation method for calculating the parallel efficiency of the parallel computer system according to the tenth aspect of the present invention corresponds to the total processing time of the parallel performance impeding part of the processing when the processing is performed by one processor. Calculating a processing time of 1 and storing the processing time in a storage device; and calculating a second processing time that is a sum of processing times of a parallel computing portion of processing executed in each processor included in the parallel computer system; A step of storing in the storage device; the number of processors used in the parallel computer system; the longest processing time of the processing times executed in each processor included in the parallel computer system; and the first processing time; Calculating parallel efficiency using the second processing time (for example, equation (9-2) in the embodiment), and storing in the storage device; Including.

所定のモデル化に基づき一度の測定により得られる処理時間だけで並列効率が計算できるようになっている。 The parallel efficiency can be calculated with only a processing time obtained by one measurement based on a predetermined modeling.

また、上で述べたロードバランス寄与率計算ステップにおいて、上記ロードバランス寄与率Ｒb(p)を、並列計算機システムに含まれる全プロセッサにおいて実施された処理の全処理時間を、並列計算機システムに含まれる各プロセッサにおいて実施された処理の処理時間のうち最長の処理時間及び並列計算機システムにおいて使用したプロセッサ数により除することにより計算する（例えば実施の形態における式（５））ような構成とすることも可能である。 Further, in the load balance contribution ratio calculation step described above, the load balance contribution ratio Rb (p) is included in the parallel computer system, and the total processing time of the processes executed in all the processors included in the parallel computer system is included in the parallel computer system. A calculation may be made by dividing by the longest processing time of processing times executed in each processor and the number of processors used in the parallel computer system (for example, equation (5) in the embodiment). Is possible.

さらに、上で述べた仮想並列化率計算ステップにおいて、仮想並列化率Ｒp(p)を、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和を、１プロセッサにより同一処理を実施した場合の第３の処理時間に相当する処理時間により除することにより計算する（例えば実施の形態における式（６−１））ような構成とすることも可能である。 Furthermore, in the virtual parallelization rate calculation step described above, the virtual parallelization rate Rp (p) is set to the sum of the processing time of the parallel calculation portion among the processing executed in each processor included in the parallel computer system. It is also possible to adopt a configuration such that calculation is performed by dividing by the processing time corresponding to the third processing time when the same processing is performed by the processor (for example, Expression (6-1) in the embodiment).

また、上で述べた並列性能阻害要因寄与率計算ステップにおいて、特定の並列性能阻害要因についての並列性能阻害要因寄与率Ｒ_j(p)を、並列計算機システムに含まれる各プロセッサにおける特定の並列性能阻害要因部分の処理時間の和を、並列計算機算システムに含まれる各プロセッサの処理時間の和により除することにより計算する（例えば実施の形態における式（７））ような構成とすることも可能である。 Also, in the parallel performance impediment factor contribution rate calculation step described above, the parallel performance impediment factor contribution ratio R _j (p) for the specific parallel performance impediment factor is determined as the specific parallel performance in each processor included in the parallel computer system. It is also possible to calculate by dividing the sum of the processing times of the obstruction factor part by the sum of the processing times of the processors included in the parallel computer system (for example, equation (7) in the embodiment). It is.

また、上で述べた加速率計算ステップにおいて、上記加速率を、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和を１プロセッサにより同一処理を実施した場合の第３の処理時間に相当する処理時間により除することにより計算される仮想並列化率を、１から差し引いた値の逆数として計算する（例えば実施の形態における式（６−２））ような構成とすることも可能である。 Further, in the acceleration rate calculation step described above, when the same processing is performed by one processor, the above acceleration rate is the sum of the processing times of the parallel calculation portion of the processing executed in each processor included in the parallel computer system. The virtual parallelization rate calculated by dividing by the processing time corresponding to the third processing time is calculated as the reciprocal of the value subtracted from 1 (for example, the expression (6-2) in the embodiment) A configuration is also possible.

さらに、上で述べた処理時間は、実際の処理時間に加え、対応する事象の確認回数で表される場合もある。 Furthermore, the processing time described above may be represented by the number of confirmations of the corresponding event in addition to the actual processing time.

また、計算された並列効率に並列計算機システムにおいて使用したプロセッサ数を乗じて補助指標を計算し、記憶装置に格納するステップをさらに含むような構成であってもよい。これにより、並列計算機システムにおいてプロセッサ何個分の処理を実施したかを提示することができるようになる。 Moreover, the structure which further includes the step which calculates the auxiliary | assistant parameter | index by multiplying the calculated parallel efficiency by the number of processors used in the parallel computer system, and stores it in a memory | storage device may be sufficient. As a result, it is possible to present how many processors have been processed in the parallel computer system.

さらに、上で述べた第３の処理時間を、１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間と並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和である第２の処理時間との和により計算する（例えば実施の形態における式（１５））ような構成であってもよい。所定のモデル化により処理時間の１度の測定にて並列効率等の計算が可能になる。 Furthermore, when the third processing time described above is performed by one processor, each of the processing included in the parallel computer system and the first processing time corresponding to the total processing time of the parallel performance impeding portion of the processing. The configuration may be such that the calculation is performed by the sum of the second processing time that is the sum of the processing times of the parallel calculation portion of the processing performed in the processor (for example, Expression (15) in the embodiment). Calculation of parallel efficiency and the like becomes possible by measuring the processing time once by the predetermined modeling.

さらに、上で述べた第１の処理時間が、並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち冗長処理又は通信処理の処理時間の和により計算される（例えば実施の形態における式（１２−１）等）ような構成とすることも可能である。 Furthermore, the first processing time described above is calculated by the sum of processing times of redundant processing or communication processing among the processing executed in each processor included in the parallel computer system (for example, the expression ( 12-1) etc.) is also possible.

また、本発明の第６乃至第１０の態様において、目標並列効率を設定するステップと、計算された並列効率とプロセッサ数の積を目標並列効率で除することにより最適プロセッサ数を計算し、記憶装置に格納するステップとをさらに含むような構成も可能である。多くのプロセッサを投入しても処理時間の短縮につながるとは限らず、このように最適プロセッサ数が計算できれば無駄な資源の投入を防止することができる。 Further, in the sixth to tenth aspects of the present invention, the step of setting the target parallel efficiency, and the optimum processor number is calculated by dividing the product of the calculated parallel efficiency and the number of processors by the target parallel efficiency, and stored. A configuration that further includes a step of storing in the apparatus is also possible. Even if a large number of processors are input, the processing time is not necessarily shortened. If the optimal number of processors can be calculated in this way, it is possible to prevent unnecessary resources from being input.

さらに、本発明の第６乃至第１０の態様において、システム増強時における増加分の稼働時間と予測並列効率とを設定するステップと、並列計算機システムに現在含まれる各プロセッサにおいて実施された処理の処理時間の和と計算された並列効率との全処理についての積和と、増加分の稼働時間及び予測並列効率の積との和を、並列計算機システムに現在含まれる各プロセッサの稼働時間の和で除することにより、システム増強時の加速率を計算し（例えば実施の形態における式（１８））、記憶装置に格納するステップとをさらに含むような構成であってもよい。システム増強時においてシステム運用者に適切な定量的指針を与えることができるようになる。 Further, in the sixth to tenth aspects of the present invention, a step of setting an increased operating time and predicted parallel efficiency at the time of system enhancement, and processing of processing executed in each processor currently included in the parallel computer system The sum of the sum of time and the calculated parallel efficiency for all processes, and the product of the increased operating time and predicted parallel efficiency is the sum of the operating time of each processor currently included in the parallel computer system. In other words, the system may further include a step of calculating the acceleration rate at the time of system enhancement (for example, the equation (18) in the embodiment) and storing it in the storage device. Appropriate quantitative guidelines can be given to system operators during system enhancement.

また、本発明の第６乃至第１０の態様において、並列計算機システムに対する新たな並列計算機システムの性能倍率を設定するステップと、新たな並列計算機システムの性能倍率を用いて見積並列効率を計算し、記憶装置に格納するステップとをさらに含むような構成であってよい。システムリプレイス時における定量的指針を与えることができるようになる。 Further, in the sixth to tenth aspects of the present invention, a step of setting a performance factor of a new parallel computer system with respect to the parallel computer system, and calculating an estimated parallel efficiency using the performance factor of the new parallel computer system, And a step of storing in the storage device. Quantitative guidelines can be given at the time of system replacement.

さらに、本発明の第６乃至第１０の態様において、並列計算機システムに現在含まれる各プロセッサにおいて実施された処理の処理時間の和と計算された並列効率との全処理についての積和を、並列計算機システムに現在含まれる各プロセッサの稼働時間の和で除することにより、システム運用効率を計算し（例えば実施の形態における式（１７））、記憶装置に格納するステップとをさらに含むような構成であってもよい。従来の稼働率という考え方に比して本発明のように並列効率を考慮したシステム運用効率を使用した方が、システム運用状況をより実際に即した形で評価することができるようになる。 Furthermore, in the sixth to tenth aspects of the present invention, the sum of products for all the processes of the sum of the processing times of the processes executed in each processor currently included in the parallel computer system and the calculated parallel efficiency is calculated in parallel. The system further includes a step of calculating the system operation efficiency by dividing by the sum of the operating time of each processor currently included in the computer system (for example, the equation (17) in the embodiment) and storing it in the storage device It may be. Using the system operation efficiency considering the parallel efficiency as in the present invention as compared with the conventional concept of operating rate enables the system operation status to be evaluated in a more realistic manner.

また、本発明の第６乃至第１０の態様において、目標処理時間を設定するステップと、目標処理時間を用いて目標並列効率を計算し、記憶装置に格納するステップと、目標並列効率の妥当性を確認するステップとをさらに含むような構成であってもよい。例えば目標並列効率は、線形外挿にて計算することができる。 Further, in the sixth to tenth aspects of the present invention, the step of setting the target processing time, the step of calculating the target parallel efficiency using the target processing time and storing it in the storage device, and the validity of the target parallel efficiency It may be configured to further include a step of confirming. For example, the target parallel efficiency can be calculated by linear extrapolation.

さらに、目標並列効率の妥当性が確認された場合には、チューニング実施後の並列効率を計算し、記憶装置に格納するステップと、チューニング実施後の並列効率と目標並列効率とを比較するステップとをさらに含むような構成であってもよい。より定量的な観点で、アプリケーション等のチューニングを実施することができるようになる。 Further, when the validity of the target parallel efficiency is confirmed, the step of calculating the parallel efficiency after the tuning is performed and storing it in the storage device, and the step of comparing the parallel efficiency after the tuning is performed with the target parallel efficiency; May be included. Tuning of applications and the like can be performed from a more quantitative viewpoint.

また、本発明の第６乃至第１０の態様において、目標処理時間を設定するステップと、異なるアルゴリズム毎に当該アルゴリズムにおける並列効率を用いて必要となるプロセッサ数の見積値を計算し、記憶装置に格納するステップと、プロセッサ数の見積値が並列計算機システムにおいて実施する当該アルゴリズムによる処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率より小さく、異なるアルゴリズムについて計算されたプロセッサ数の見積値のうち最小の値となるアルゴリズムを抽出するステップとをさらに含むような構成であってもよい。より並列効率を向上させることができるアルゴリズムを定量的に選択することができるようになる。 In addition, in the sixth to tenth aspects of the present invention, the step of setting the target processing time and the estimated value of the number of processors required using the parallel efficiency of the algorithm for each different algorithm are calculated and stored in the storage device. The number of processors calculated for different algorithms is smaller than the acceleration rate that represents the limit of improvement in the degree of reduction in processing time due to parallelization of processing by the algorithm executed in the parallel computer system, and the estimated value of the number of processors. The configuration may further include a step of extracting an algorithm having a minimum value from the estimated values. An algorithm that can further improve parallel efficiency can be quantitatively selected.

なお、本発明に係る並列効率計算方法はプログラム及びコンピュータにて実施することができ、当該プログラムをコンピュータで実行する場合には当該コンピュータは並列効率計算装置となる。また、このようなプログラムは、例えばフレキシブルディスク、ＣＤ−ＲＯＭ、光磁気ディスク、半導体メモリ、ハードディスク等の記憶媒体又は記憶装置に格納される。また、ネットワークなどを介して配布される場合もある。尚、中間的な処理結果はメモリに一時保管される。 The parallel efficiency calculation method according to the present invention can be implemented by a program and a computer, and when the program is executed by a computer, the computer becomes a parallel efficiency calculation device. Such a program is stored in a storage medium or storage device such as a flexible disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, or a hard disk. Also, it may be distributed via a network or the like. The intermediate processing result is temporarily stored in the memory.

以上述べたように本発明によれば、「ロードバランスが保たれている」という条件をはずし、グリッドコンピューティングを含む、ヘテロなプロセッサ環境を含め多種類の並列処理に適用でき、並列効率と並列性能評価指標及び並列性能阻害要因間の定量的関係付けを行うことができる。 As described above, according to the present invention, the condition that “the load balance is maintained” is removed, and the present invention can be applied to various kinds of parallel processing including a heterogeneous processor environment including grid computing. A quantitative relationship between the performance evaluation index and the parallel performance impediment factor can be made.

また、他の側面によれば並列効率等を用いて、並列計算機システムの適切な運用も可能となる。 Further, according to another aspect, the parallel computer system can be appropriately operated using parallel efficiency or the like.

さらに、他の側面によれば並列効率等を用いて、並列計算機システムの能力増強、更新等に対する適切な判断が可能になる。 Furthermore, according to another aspect, it is possible to make an appropriate determination on the capacity enhancement, update, etc. of the parallel computer system using parallel efficiency or the like.

さらに、他の側面によれば並列効率等を用いて、並列計算機システムにおいて実行するプログラムのチューニングやアルゴリズムの選定を適切に実施できるようになる。 Furthermore, according to another aspect, it becomes possible to appropriately execute tuning of a program executed in a parallel computer system and selection of an algorithm using parallel efficiency or the like.

［本発明の原理］
本発明では、並列効率Ｅ_p(p)を並列性能評価指標で記述することにより、並列効率Ｅ_p(p)を並列性能阻害要因と定量的に結び付ける。図２のように、並列処理時間τ_i(p)は、並列計算部の処理時間γ_i(p)と、各並列性能阻害要因ｊの処理時間χ_i,j(p)との和で式（３）のように表わすことができる。ここで１≦ｊ≦ｊ_Othersである。なお、図２においてｉはプロセッサ番号であり、ｐはプロセッサ個数である。また、図２ではプロセッサｉとプロセッサｉ＋１についてのみ示されている。

[Principle of the present invention]
In the present invention, the parallel by describing in parallel performance metrics efficiency E _p a (p), linking the parallel efficiency E _p a (p) in parallel performance impediment factor quantitatively. As shown in FIG. 2, the parallel processing time tau _i (p) is the processing time of parallel computing section gamma _i (p), wherein the sum of the processing time chi _{i, j} of each parallel performance impediment factor j (p) It can be expressed as (3). Here, 1 ≦ j ≦ j _Others . In FIG. 2, i is a processor number and p is the number of processors. In FIG. 2, only the processor i and the processor i + 1 are shown.

そして式（１）を以下のように変形し、さらに並列性能評価指標であるロードバランス寄与率Ｒb(p)、仮想並列化率Ｒp(p)、並列性能阻害要因寄与率Ｒj(p)を導入して並列効率Ｅ_p(p)を記述する。

Then, formula (1) is modified as follows, and load performance contribution index Rb (p), virtual parallelization ratio Rp (p), and parallel performance impediment factor contribution ratio Rj (p) are introduced. And describe the parallel efficiency E _p (p).

なお、式（１）から式（４−１）への変形は、式（１）の分子及び分母にτ_i(p)のｉについての和を掛け算することにより行われる。また、式（４−２）への変形は、式（４−１）における各要素の位置を変更すると共に、式（４−１）の分子及び分母にγ_i(p)のｉについての和を掛け算することにより行われる。また、以下の式が式（３）から導かれる。これは、τ_i(p)のｉについての和を表すものである。

The transformation from the formula (1) to the formula (4-1) is performed by multiplying the numerator and denominator of the formula (1) by the sum of τ _i (p) for i. Further, the transformation to the equation (4-2) changes the position of each element in the equation (4-1), and adds the numerator and denominator of the equation (4-1) to the sum of i of γ _i (p). This is done by multiplying Further, the following expression is derived from Expression (3). This represents the sum of τ _i (p) for i.

そうすると、ロードバランス寄与率Ｒb(p)、仮想並列化率Ｒp(p)、並列性能阻害要因寄与率Ｒj(p)により以下の式が導かれる。

また加速率Ａ_p(p)を用いると以下のようにも表される。

Then, the following equation is derived from the load balance contribution rate Rb (p), the virtual parallelization rate Rp (p), and the parallel performance impediment factor contribution rate Rj (p).

When the acceleration rate A _p (p) is used, it is also expressed as follows.

なお、ロードバランス寄与率Ｒb(p)、仮想並列化率Ｒp(p)、並列性能阻害要因寄与率Ｒj(p)、加速率Ａ_p(p)は以下のように表される。

なお、並列性能阻害要因はｊで番号付けされる。 The load balance contribution rate Rb (p), the virtual parallelization rate Rp (p), the parallel performance impediment factor contribution rate Rj (p), and the acceleration rate A _p (p) are expressed as follows.

The parallel performance impediment factors are numbered with j.

ロードバランスが保たれた状態とは、図１で示したように、プロセッサの処理時間τ_i(p)が均等の状態である。式（５）はこの状態をＲb(p)＝１とし、更に保たれない状態を１／ｐ≦Ｒb(p)≦１で表す。図３のように並列処理時に１台のプロセッサのみで処理する場合には式（５）の分子がτ(p)となるため、ロードバランス寄与率Ｒb(p)は１／ｐの下限値となる。また式（５）によれば、ロードバランス寄与率Ｒb(p)は並列効率Ｅ_p(p)の比率となり、並列性能を直感的に把握することを容易にする。 The state where the load balance is maintained is a state where the processing time τ _i (p) of the processor is equal as shown in FIG. Equation (5) represents this state as Rb (p) = 1, and further represents a state that cannot be maintained as 1 / p ≦ Rb (p) ≦ 1. As shown in FIG. 3, when only one processor is used for parallel processing, the numerator of equation (5) becomes τ (p), so the load balance contribution ratio Rb (p) is a lower limit of 1 / p. Become. Further, according to the equation (5), the load balance contribution ratio Rb (p) is a ratio of the parallel efficiency E _p (p), which makes it easy to intuitively understand the parallel performance.

仮想並列化率Ｒp(p)は並列計算部の処理時間γ_iの和がτ₁(1)に占める割合である。これが１より小さい場合、その処理は並列処理できない処理を含んでいることを示す。この割合により並列性能の上限を加速率Ａ_p(p)として表わすことができる。Ａ_p(p)は、プロセッサを無限に投入した時の理想的な上限値Ａ_p(p)＝τ(1)／Σ_jχ_1,j(1)＝１／（１−（Σ_iγ_i(p)／τ(1)））である。通常のτ(1)／τ(p)は、並列性能阻害要因によりＡ_p(p)より小さな値となる。 The virtual parallelization rate Rp (p) is the ratio of the sum of the processing times γ _i of the parallel computing units to τ ₁ (1). When this is smaller than 1, it indicates that the process includes a process that cannot be processed in parallel. By this ratio, the upper limit of the parallel performance can be expressed as the acceleration rate A _p (p). A _p (p) is an ideal upper limit A _p (p) = τ (1) / Σ _j χ _{1, j} (1) = 1 / (1- (Σ _i γ _i (p) / τ (1))). Normal τ (1) / τ (p) is smaller than A _p (p) due to parallel performance impediment factors.

並列性能阻害要因寄与率Ｒj(p)は式（７）で示すようにτ_i(p)のｉについての和で規格化されているため、高並列、低並列に関わらず並列性能阻害要因の寄与を処理時間の割合として把握できるようになっている。またこの割合が並列効率の比率となるため、並列性能の阻害を定量的に把握することができる。 The parallel performance impediment factor contribution ratio Rj (p) is standardized by the sum of i of τ _i (p) as shown in Equation (7). The contribution can be grasped as a percentage of processing time. In addition, since this ratio becomes the ratio of parallel efficiency, it is possible to quantitatively grasp the inhibition of parallel performance.

式（２），（５），（６−１），（７）の各変数は、τ(1)を除く全てが並列実行時に測定可能である。式（８−１）が成り立つ時、式（６−１）から仮想並列化率Ｒp(p)はほぼ１となり、式（４−４）は式（８−２）に等価となる。

すなわち、並列効率Ｅ_p(p)はτ(1)という推定値を用いずに済むので、正確に決定することができる。一方、条件式（８−１）に関係なく、式（８−２）を、式（４−４），（４−５）の代替値として並列性能評価において用いることも可能である。この場合、式（８−２）の値はＲp(p)≦１ゆえ、式（４−４）、（４−５）の値に等しいか小さい値となる。 All of the variables in equations (2), (5), (6-1), and (7) can be measured during parallel execution except for τ (1). When equation (8-1) holds, the virtual parallelization rate Rp (p) is approximately 1 from equation (6-1), and equation (4-4) is equivalent to equation (8-2).

That is, the parallel efficiency E _p (p) can be determined accurately because it is not necessary to use the estimated value τ (1). On the other hand, irrespective of the conditional expression (8-1), the expression (8-2) can be used in the parallel performance evaluation as an alternative value of the expressions (4-4) and (4-5). In this case, since the value of the equation (8-2) is Rp (p) ≦ 1, the value is equal to or smaller than the values of the equations (4-4) and (4-5).

並列効率Ｅ_p(p)は、上で述べた式（４−４），（４−５），（８−２）及び次の式（９−１）でも計算することが出来る。

式（９−１）は、式（４−３）からロードバランス寄与率Ｒb(p)及び仮想並列化率Ｒp(p)のみを用いて変形した結果である。 The parallel efficiency E _p (p) can also be calculated by the equations (4-4), (4-5), (8-2) and the following equation (9-1) described above.

Expression (9-1) is a result of transformation from Expression (4-3) using only the load balance contribution ratio Rb (p) and the virtual parallelization ratio Rp (p).

また、式（３）よりτ(1)は式（１０）となる。

ここでγ₁(1)とχ_1,j(1)を、γ_i(p)とχ_i,j(p)を用いてモデル化する。グリッドやクラスタのように異なったＣＰＵ性能を有する計算機による並列処理でのτ₁(1)の実測は不可能であるため、このモデル化によりτ(1)が決定でき、式（６−１）の仮想並列化率Ｒp(p)を計算することが可能となる。 Also, from equation (3), τ (1) becomes equation (10).

Here, γ ₁ (1) and χ _{1, j} (1) are modeled using γ _i (p) and χ _{i, j} (p). Since it is impossible to actually measure τ ₁ (1) in parallel processing by computers having different CPU performances such as grids and clusters, τ (1) can be determined by this modeling, and Equation (6-1) It is possible to calculate the virtual parallelization rate Rp (p).

プロセッサ性能が同じで理想的な場合、並列計算部をｐ個のプロセッサで処理すると、処理時間はｐ＝１と比較して１／ｐとなる。この場合、τ_i(p)＝γ_i(p)であり、任意のプロセッサのγ_i(p)をｐ倍するとγ₁(1)を求めることができる。一方グリッドやクラスタ上には異なったＣＰＵ性能を有する計算機が存在するのが通常であり、ｐ個のプロセッサにおいて実測されたγ_i(p)を基に、プロセッサ数１の場合のγ₁(1)を式（１１）のように推定する。

この式（１１）の概念図を図４に示す。式（１１）のモデル化により、個々のプロセッサの性能が異なっている場合でも、仮想的に１プロセッサのγ₁(1)の時間を決めることができる。 When the processor performance is the same and ideal, when the parallel computing unit is processed by p processors, the processing time is 1 / p compared to p = 1. In this case, a _{_{τ i (p) = γ i}} (p), can be obtained when any processor gamma _i a (p) p multiplies γ _{1 (1).} On the other hand, computers having different CPU performances usually exist on a grid or cluster. Based on γ _i (p) measured in p processors, γ ₁ (1 ) Is estimated as shown in Equation (11).

A conceptual diagram of this equation (11) is shown in FIG. By modeling the equation (11), the time of γ ₁ (1) of one processor can be virtually determined even if the performance of each processor is different.

また、χ_1,j(1)を冗長処理とそれ以外という２つに分けてモデル化する。γ₁(1)に属さない処理時間はすべてχ_1,j(1)に含まれるものとする。 In addition, χ _{1, j} (1) is modeled by dividing it into two processes, that is, redundant processing and others. All processing times not belonging to γ ₁ (1) are included in χ _{1, j} (1).

（１）冗長処理時間のモデル化
各プロセッサが全く同じ処理を行うとき、ここではそれを冗長処理と呼ぶ。この処理は並列処理ではなく、プロセッサが増えても処理時間は減少しない。そこでｊ＝１を冗長処理とし、その時間χ_1,1(1)を式（１２−１）乃至（１２−４）のようにモデル化する。

ここでiiは以下の式におけるiiである。

(1) Modeling of redundant processing time When each processor performs exactly the same processing, it is called redundant processing here. This processing is not parallel processing, and the processing time does not decrease even if the number of processors increases. Therefore, j = 1 is set as a redundancy process, and the time χ _1,1 (1) is modeled as shown in equations (12-1) to (12-4).

Here, ii is ii in the following formula.

冗長処理は、同じ手続き（処理内容）ではあるがデータが異なる処理を並列処理する、いわゆるデータパラレルに多い処理である。データパラレルの場合、ロードバランスを保つため、ＣＰＵ性能が同じプロセッサによる並列処理であることを想定している。そこでプロセッサ毎の冗長処理時間の違いを各プロセッサの時間測定に起因するばらつきによるものと考えることができる。この場合、各プロセッサの測定値を平均化した式（１２−１）を適用するのが妥当である。 Redundant processing is processing that is the same procedure (processing content) but processing different data in parallel, so-called data parallel processing. In the case of data parallel, to maintain load balance, it is assumed that the CPU performance is parallel processing by the same processor. Therefore, it can be considered that the difference in the redundant processing time for each processor is due to the variation caused by the time measurement of each processor. In this case, it is appropriate to apply Expression (12-1) obtained by averaging the measurement values of the processors.

一方グリッドやクラスタでは異なる性能を有するプロセッサが紛れ込むことも想定される。この異なる性能を有するプロセッサによる影響を的確に捉えようとする場合、式（１２−２）及び（１２−３）を用いる。式（１２−２）を用いれば式（６−１）の仮想並列化率Ｒp(p)は最小に見積もられ、並列効率Ｅ_p(p)は最大となる。式（１２−３）を用いれば仮想並列化率Ｒp(p)は最大に見積もられ、並列効率Ｅ_p(p)は最小となる。これら２つの並列効率Ｅ_p(p)を比べて、異なるＣＰＵ性能のプロセッサでデータパラレル処理を行っていることの検知が可能となる。 On the other hand, it is assumed that processors having different performances are mixed in the grid and cluster. In order to accurately grasp the influence of the processors having different performances, the equations (12-2) and (12-3) are used. If Expression (12-2) is used, the virtual parallelization rate Rp (p) of Expression (6-1) is estimated to be the minimum, and the parallel efficiency E _p (p) is maximized. If Expression (12-3) is used, the virtual parallelization rate Rp (p) is estimated to the maximum, and the parallel efficiency E _p (p) is minimized. By comparing these two parallel efficiencies E _p (p), it is possible to detect that data parallel processing is being performed by processors having different CPU performances.

τ(p)は式（２）で決まり、そのプロセッサｉの冗長処理時間は式（１２−４）の値である。従って、並列効率Ｅ_p(p)を決定したデータを解析すると考えれば式（１２−４）の使用が妥当である。一方、この式はプロセッサiiの情報のみからχ_1,1(1)が決定されることを意味し、他のプロセッサの値と大きく異なる場合を検知できないという欠点を持つ。この例として図５はＣＰＵ性能が１／５のプロセッサがプロセッサ１（ｉ＝１）である場合を示す。式（１２−４）では冗長処理の時間をプロセッサ１の値のみで評価することになる。式（１２−１）では４プロセッサの各値の平均で評価する。式（１２−２）ではプロセッサ１の値で、式（１２−３）ではｉ＝２，３，４のプロセッサの値で評価することになる。従って式（１２−１）をベースとして、必要に応じて他の定義を使用するのが妥当である。 τ (p) is determined by equation (2), and the redundant processing time of the processor i is the value of equation (12-4). Therefore, if it is considered that the data for which the parallel efficiency E _p (p) is determined is analyzed, the use of Expression (12-4) is appropriate. On the other hand, this equation means that χ _1,1 (1) is determined only from the information of the processor ii, and has a drawback that it cannot detect a case that is significantly different from the values of other processors. As an example of this, FIG. 5 shows a case where the processor whose CPU performance is 1/5 is the processor 1 (i = 1). In the equation (12-4), the redundant processing time is evaluated only by the value of the processor 1. In the expression (12-1), evaluation is performed by averaging the values of the four processors. In the expression (12-2), the evaluation is performed using the value of the processor 1, and in the expression (12-3), the evaluation is performed using the values of the processors with i = 2, 3, and 4. Therefore, it is appropriate to use other definitions as necessary based on the formula (12-1).

（２）χ_1,j(1)（２≦ｊ≦ｊ_Others）のモデル化
実際に並列性能阻害要因による処理時間を測定すると、χ_1,j(1)≠０の場合が存在する。この処理時間は並列処理しても減らないため、式（６−１）の仮想並列化率Ｒp(p)に反映され、式（６−２）の加速率Ａ_pが有限の値となり、処理に投入して意味があるプロセッサ数の上限値が決まる。そこで冗長処理以外の並列性能阻害要因による処理時間χ_1,j(1)（２≦ｊ≦ｊ_Others）を、ｐ＝１の処理時間χ_1,j(1)とｐ＞１の処理時間を表わす式（１３−２）で式（１３−１）のようにモデル化し、χ_i,j(p)とｐ＞１で発生し且つｐに依存する並列性能阻害要因による処理時間Χ_i,j(p)を測定してχ_1,j(1)を求める。すなわち、χ_1,j(1)＝χ_i,j(p)−Χ_i,j(p)であり、右辺の２項は両方とも測定により求めるものとする。

(2) Modeling χ _{1, j} (1) (2 ≦ j ≦ j _Others ) When the processing time due to the parallel performance impediment factor is actually measured, there is a case where χ _{1, j} (1) ≠ 0. Since this processing time is not decrease after parallel processing, are reflected in the virtual parallelization ratio Rp (p) of the formula (6-1), the acceleration rate A _p of formula (6-2) becomes a finite value, the processing To determine the upper limit of the number of processors that are meaningful. Therefore the processing time by the parallel performance impediment factors other than the redundant processing _{χ 1, j (1) (} 2 ≦ j ≦ j Others), a p = 1 the processing time chi _{1, j} and (1) p> 1 the processing time The expression (13-2) is modeled as in Expression (13-1), and the processing time Χ _{i, j} due to the parallel performance impediment that occurs when χ _{i, j} (p) and p> 1 and depends on p Measure (p) to obtain χ _{1, j} (1). That is, χ _{1, j} (1) = χ _{i, j} (p) −Χ _{i, j} (p), and the two terms on the right side are both obtained by measurement.

例としてχ_1,j(1)≠０の場合でｐ＝１の処理時間を図６（ａ）に、ｐ＝４の処理時間を図６（ｂ）に示す。図６（ｂ）に示されるように、並列処理時のχ_i,2(p)はｐ＝１の処理時間χ_1,2(1)にΧ_i,2(p)をプラスした式（１３−１）のようになる。このような現象は、通信等で通信のハードウェアを起動するまでの前処理がｐ＝１の時にも実行される場合等で観測される。 As an example, in the case of χ _{1, j} (1) ≠ 0, the processing time of p = 1 is shown in FIG. 6A, and the processing time of p = 4 is shown in FIG. 6B. As shown in FIG. 6B, χ _{i, 2} (p) at the time of parallel processing is an expression obtained by adding Χ _{i, 2} (p) to the processing time χ _1,2 (1) of p = 1 -1). Such a phenomenon is observed when the preprocessing until the communication hardware is activated by communication or the like is executed even when p = 1.

式（１３−１）及び（１３−２）よりχ_1,j(1)（２≦ｊ≦ｊ_Others）は冗長処理と同様に式（１３−３），（１３−４），（１３−５）により求めることができる。

例えば図６（ｂ）ではχ_i,2(p)＝χ_1,2(1)＋Χ_i,2(p)と実測されるので、Χ_i,2(p)＝５，６，７，８を実測して式（１３−３），（１３−４），（１３−５）からχ_1,2(1)＝５を算出できる。この値は図６（ａ）のχ_1,2(1)と一致する。 From the equations (13-1) and (13-2), χ _{1, j} (1) (2 ≦ j ≦ j _Others ) is represented by the equations (13-3), (13-4), (13- 5).

For example, in FIG. 6B, since it is actually measured as χ _{i, 2} (p) = χ _1,2 (1) + Χ _{i, 2} (p), Χ _{i, 2} (p) = 5, 6, 7, 8 Χ _1,2 (1) = 5 can be calculated from the equations (13-3), (13-4), and (13-5). This value coincides with χ _1,2 (1) in FIG.

（３）χ_i,jothers(p)の決定方法
並列処理阻害要因に分類して実測できないχ_i,jothers(p)は式（１３−６）で求める。

₍₃₎ χ _i, jothers not be measured by classifying the determination method parallelism impediment _{(p) χ i, jothers (} p) is found from Equation (13-6).

次に、モデル化により求められた式（１１）により、式（１０）を以下の式で書き直す。

また、式（８−１）は、以下のように変形される。

この式が成り立つためには式（１４）の条件を満たす必要がある。この条件式は測定値より求めた値の大小関係の比較であるため、具体的に判定を行うことができる。

Next, Formula (10) is rewritten by the following formula by Formula (11) obtained by modeling.

Moreover, Formula (8-1) is deform | transformed as follows.

In order for this equation to hold, the condition of equation (14) must be satisfied. Since this conditional expression is a comparison of the magnitude relationship of the values obtained from the measured values, it can be specifically determined.

この式（１４）のγ_i(p)は実測により求まる。またχ_1,j(1)のｊについての和は、モデル化式（１２−１），（１２−２），（１２−３），（１２−４）及び式（１３−１），（１３−２），（１３−３），（１３−４），（１３−５）より求めることができる。その結果、初めて式（８−１）の判定が式（１４）を用いて具体的に可能となる。例えば以下のような場合、式（１４）の条件が成立し、式（６−１）の仮想並列化率Ｒp(p)はほぼ１となり、式（４−４），（４−５）は式（８−２）と等価となり、推定値であるτ(1)の影響がほぼゼロになると言う意味で正確な並列効率Ｅ_p(p)を得ることができる。

Γ _i (p) in the equation (14) is obtained by actual measurement. The sum of χ _{1, j} (1) with respect to j is expressed by modeling equations (12-1), (12-2), (12-3), (12-4) and equations (13-1), ( 13-2), (13-3), (13-4), and (13-5). As a result, it is possible for the first time to specifically determine the equation (8-1) using the equation (14). For example, in the following case, the condition of Expression (14) is satisfied, the virtual parallelization rate Rp (p) of Expression (6-1) is almost 1, and Expressions (4-4) and (4-5) are Equivalent to the equation (8-2), an accurate parallel efficiency E _p (p) can be obtained in the sense that the influence of the estimated value τ (1) becomes almost zero.

また式（１０），（１１），（１２−１），（１２−２），（１２−３），（１２−４），（１３−１），（１３−２），（１３−３），（１３−４），（１３−５），（１３−６）によりτ(1)も具体的に計算することができる式（１５）となる。式（１５）は、τ(1) を各プロセッサの並列処理時間γ_i(p)の総和とｐ＝１の時の並列効率阻害要因による処理時間χ_1,j(1)の和として表わしたものである。

以上説明した式（１），（２），（１５）より以下に示す式（９−２）を得る。

Moreover, Formula (10), (11), (12-1), (12-2), (12-3), (12-4), (13-1), (13-2), (13-3) ), (13-4), (13-5), and (13-6), τ (1) can be specifically calculated as Equation (15). Equation (15) represents τ (1) as the sum of the parallel processing time γ _i (p) of each processor and the processing time χ _{1, j} (1) due to the parallel efficiency impediment factor when p = 1. Is.

From the expressions (1), (2), and (15) described above, the following expression (9-2) is obtained.

式（９−１）及び（９−２）は、並列計算部の時間γ_i(p)のｉについての和を用いるもので、式（４−４），（４−５）と比べ、χ_i,j(p)のデータなしに並列効率Ｅ_p(p)が求まる利点がある。但し、χ_1,j(1)のデータは必要である。 Expressions (9-1) and (9-2) use the sum of i of the time γ _i (p) of the parallel calculation unit, and compared with Expressions (4-4) and (4-5), χ There is an advantage that the parallel efficiency E _p (p) can be obtained without the data of _{i, j} (p). However, the data of χ _{1, j} (1) is necessary.

式（４−４），（４−５），（７）で示したように、本発明では並列性能阻害要因ｊは任意の数を追加できる。並列性能阻害要因の追加例を図７（ａ）及び図７（ｂ）に示す。図７（ｂ）は立ち上がり時間χ_TCを考慮して時間測定した場合、図７（ａ）は同じ処理で時間測定しない場合の処理時間を示す。図７（ａ）の場合には、以下のような計算がなされる。
τ₁＝１０＋５＋９０＋２０＋２０＝１４５
τ₂＝１０＋８０＋１０＝１００
τ₃＝１５＋８０＋１０＝１０５
τ₄＝１０＋９０＋１０＝１１０
Ｒb(4)＝(145+100+105+110)／(145×4)＝0.7931
Ｒ_C(4)＝(25+20+25+20)／460＝0.1957
Ｒp(4)＝１（仮定）
Ｅ_p(4)＝0.7931×1×(1-0.1957)＝0.6379 As indicated by the equations (4-4), (4-5), and (7), in the present invention, an arbitrary number of parallel performance impediment factors j can be added. Examples of additional parallel performance impediment factors are shown in FIGS. 7 (a) and 7 (b). FIG. 7B shows the processing time when the time is measured in consideration of the rise time χ _TC , and FIG. 7A shows the processing time when the time is not measured in the same processing. In the case of FIG. 7A, the following calculation is performed.
τ ₁ = 10 + 5 + 90 + 20 + 20 = 145
τ ₂ = 10 + 80 + 10 = 100
τ ₃ = 15 + 80 + 10 = 105
τ ₄ = 10 + 90 + 10 = 110
Rb (4) = (145 + 100 + 105 + 110) / (145 × 4) = 0.7931
R _C (4) = (25 + 20 + 25 + 20) /460=0.1957
Rp (4) = 1 (assumed)
E _p (4) = 0.7931 × 1 × (1-0.1957) = 0.6379

また、図７（ｂ）の場合には、以下のような計算がなされる。
τ₁＝１０＋５＋９０＋２０＋２０＝１４５
τ₂＝５＋１０＋８０＋１０＝１０５
τ₃＝１０＋１５＋８０＋１０＝１１５
τ₄＝１５＋１０＋９０＋１０＝１２５
Ｒb(4)＝(145+105+115+125)／(145×4)＝0.8448
Ｒ_C(4)＝(25+20+25+20)／490＝0.1837
Ｒ_TC(4)＝(0+5+10+15)／490＝0.0612
Ｒp(4)＝１（仮定）
Ｅ_p(4)＝0.8448×1×(1-0.1837-0.0612)＝0.6379 In the case of FIG. 7B, the following calculation is performed.
τ ₁ = 10 + 5 + 90 + 20 + 20 = 145
τ ₂ = 5 + 10 + 80 + 10 = 105
τ ₃ = 10 + 15 + 80 + 10 = 115
τ ₄ = 15 + 10 + 90 + 10 = 125
Rb (4) = (145 + 105 + 115 + 125) / (145 × 4) = 0.8448
R _C (4) = (25 + 20 + 25 + 20) /490=0.1837
R _TC (4) = (0 + 5 + 10 + 15) /490=0.0612
Rp (4) = 1 (assumed)
E _p (4) = 0.8448 × 1 × (1-0.1837-0.0612) = 0.6379

これらの並列性能評価指標の値を図８にまとめて示す。図７（ａ）から求めた値（ケース１）と比べると、図７（ｂ）から求めた値（ケース２）は、立ち上がり時間のためのＲ_TCを追加することによりＲ_Cが減少しＲbが増加するが、Ｅ_pは同じであることが分かる。この場合、並列性能阻害要因を追加することによりＥ_pが変わるのではなく、その内訳がより明確になる。 The values of these parallel performance evaluation indexes are collectively shown in FIG. Compared with the value obtained from FIG. 7A (Case 1), the value obtained from FIG. 7B (Case 2) decreases R _C by adding R _TC for the rise time, and R b It can be seen that E _p is the same. In this case, E _p is not changed by adding a parallel performance impediment factor, but the breakdown becomes clearer.

式（５）で示したようにロードバランス寄与率Ｒb(p)を表現することにより、ロードバランスと並列効率Ｅ_p(p)を関係付けることができる。ロードバランス寄与率Ｒb(p)を式（５）のように定義する理由は、図９に示すように並列性能阻害要因の寄与が各プロセッサにおいて異なった状態でロードバランスが成り立つ場合を考慮できるからである。図９では、例えばプロセッサ１の並列処理部分は他に比べ非常に少なく、冗長処理が非常に多くなっているが、全てのプロセッサの処理時間は一致しているのでロードバランスは保たれている。すなわち、γ_i(p)とχ_i,j(p)とが個々にはバランスしていないが全体にはバランスしているという状態である。なお、χ_i,jothers(p)（＝τ_i−γ_i−χ_i,1−χ_i,2）は例えばＩ／Ｏによる処理時間である。 By expressing the load balance contribution ratio Rb (p) as shown in the equation (5), the load balance and the parallel efficiency E _p (p) can be related. The reason for defining the load balance contribution ratio Rb (p) as shown in equation (5) is that it can be considered that the load balance is established in a state where the contribution of the parallel performance impediment factor is different in each processor as shown in FIG. It is. In FIG. 9, for example, the number of parallel processing portions of the processor 1 is very small compared to the others, and the redundant processing is very large. However, since the processing times of all the processors are the same, the load balance is maintained. That is, γ _i (p) and χ _{i, j} (p) are not individually balanced but are balanced in total. Note that χ _{i, jothers} (p) (= τ _i −γ _i −χ _{i, 1} −χ _{i, 2} ) is, for example, processing time by I / O.

図９の場合、ロードバランス寄与率Ｒb(p)は１である。図１０のように並列処理において１台のプロセッサのみで処理する場合、Ｒb(p)は下限１／ｐとなる。また、図１１のように、プロセッサ１とプロセッサ２の処理時間は一致しているが、プロセッサ３及び４の処理時間とは一致しておらず、ロードバランスが取れていない。この場合に、Ｒb(p)は以下のとおりになる。

In the case of FIG. 9, the load balance contribution ratio Rb (p) is 1. When processing is performed by only one processor in parallel processing as shown in FIG. 10, Rb (p) is the lower limit 1 / p. In addition, as shown in FIG. 11, the processing times of the processor 1 and the processor 2 are the same, but the processing times of the

processors

3 and 4 are not the same, and the load balance is not achieved. In this case, Rb (p) is as follows.

さらに、低並列のときは顕在化しなかった並列処理阻害要因が高並列で顕在化する場合がある。従来の性能評価指標の１つである並列化率（＝（ｐ＝１の並列処理部の処理時間）／（(ｐ＝１の並列処理部の処理時間)＋(ｐ＝１の並列処理できない部分の処理時間)））では、この現象を十分に捉えることができなかった。例えば図１２の例では、ｐ＝１における並列化率は0.99（＝１９８／（１９８＋２））であり、残りの０．０１が並列処理できない処理時間の割合である。ところがこの値は図中のｐ＝１００の場合のような高並列でも２時間であり、並列処理できない部分が５０％（≒２／（１．９８＋２））を占める現実を反映していない。本発明では、式（７）で示したように、並列性能阻害要因Ｒ_j(p)を、χ_i,j(p)のｉについての和をτ_i(p)のｉについての和で規格化した値として表現している。この規格化により、高並列でτ(p)が小さな値になった時も、Ｒ_j(p)の上限は１となり、各々の並列性能阻害要因の影響を並列処理時の百分率で表わすことができる。 Furthermore, parallel processing impediment factors that did not become apparent in the case of low parallelism may become apparent in high parallelism. Parallelization rate (= (processing time of parallel processing unit with p = 1)) / ((processing time of parallel processing unit with p = 1) + (p = 1 parallel processing is not possible) In part processing time))), this phenomenon could not be fully captured. For example, in the example of FIG. 12, the parallelization rate at p = 1 is 0.99 (= 198 / (198 + 2)), and the remaining 0.01 is the ratio of processing time that cannot be processed in parallel. However, this value is 2 hours even in high parallelism as in the case of p = 100 in the figure, and does not reflect the fact that the portion that cannot be processed in parallel accounts for 50% (≈2 / (1.98 + 2)). In the present invention, as shown in the equation (7), the parallel performance impediment factor R _j (p) is standardized by the sum of i of χ _{i, j} (p) with respect to _i of τ _i (p). It is expressed as a digitized value. As a result of this standardization, even when τ (p) becomes small in high parallelism, the upper limit of R _j (p) is 1, and the influence of each parallel performance impediment factor can be expressed as a percentage during parallel processing. it can.

以上述べたように、並列効率Ｅ_p(p)を計算すると共に、並列性能評価指標Ｒb(p)，Ｒp(p)，Ｒ_j(p)（Ｒ_RED(4)，Ｒ_C(4)，．．．，Ｒ_Others(4)）と補助指標Ａ_p(p)，Ｅ_p(p)・ｐをも計算することができる。この計算結果の一例を図１３に示す。この図１３に示した８つの項目で、並列性能を定量的に表現することができる。 As described above, the parallel efficiency E _p (p) is calculated and the parallel performance evaluation indexes Rb (p), Rp (p), R _j (p) (R _RED (4), R _C (4), R _Others (4)) and auxiliary indices A _p (p), E _p (p) · p can also be calculated. An example of the calculation result is shown in FIG. The parallel performance can be quantitatively expressed by the eight items shown in FIG.

図１３に示したように、Ｅ_p(p)・ｐ＝１．７７７であるから、４プロセッサ構成の並列計算機システムであるが１．７７７プロセッサの性能で処理していることが分かる。並列効率はロードバランス寄与率で９４％（Ｒb(4)＝0.9392）に低下する。並列性能阻害要因の影響は、冗長処理が２２％（Ｒ_RED(4)＝0.2230）、通信が３３％（Ｒ_C(4)＝0.3309）、その他が３％（Ｒ_Others(4)＝0.0288）である。従って通信と冗長処理で５５％並列性能を低下させている。図１３ではＲ_p(4)＝0.8821ということから、プロセッサを無限に投入した時の並列最大性能が１プロセッサの8.482（＝Ａ_p(4)＝１／（１−0.8821））倍であることが推定できる。従ってこの処理は８プロセッサ以下で行われるべき処理であることが分かる。 As shown in FIG. 13, since E _p (p) · p = 1.777, it is understood that the parallel computer system has a 4-processor configuration, but is processing with the performance of the 1.777 processor. The parallel efficiency decreases to 94% (Rb (4) = 0.9392) as a load balance contribution ratio. The effects of parallel performance impediments are 22% for redundant processing (R _RED (4) = 0.2230), 33% for communication (R _C (4) = 0.3309), and 3% for other (R _Others (4) = 0.0288) It is. Therefore, 55% parallel performance is reduced by communication and redundant processing. In FIG. 13, since R _p (4) = 0.8821, the maximum parallel performance when an infinite number of processors are added is 8.482 (= A _p (4) = 1 / (1−0.8821)) times that of one processor. Can be estimated. Therefore, it can be seen that this processing should be performed by 8 processors or less.

また、例えば並列効率の設定目標値（Ｅ_p）_Tを０．８と設定した場合、図１３で示されるような処理群を想定すると、最適なプロセッサ数は以下の式で計算される。
（ｐ）_OPT＝Ｅ_p(4)／（Ｅ_p）_T・ｐ
＝0.4443／0.8×４＝2.215
従って、最適なプロセッサの見積値は（ｐ）_OPT＝２となる。 For example, when the set target value (E _p ) _T of parallel efficiency is set to 0.8, assuming the processing group as shown in FIG. 13, the optimum number of processors is calculated by the following equation.
(P) _OPT = E _p (4) / (E _p ) _T · p
= 0.4443 / 0.8 × 4 = 2.215
Therefore, the estimated value of the optimal processor is (p) _OPT = 2.

なお処理群とは、同一アプリケーション・プログラムで同じ機能を使い入力データだけを変えた複数の処理のことであり、科学技術計算のパラメトリック・スタディ等で頻繁に実施される処理である。 The processing group refers to a plurality of processes in which only the input data is changed using the same function in the same application program, and is a process frequently performed in a parametric study of scientific and technical calculation.

従来、並列計算機システムの評価は、以下で示す式（１６）に示す稼働率(Net Working Rate)ＮＷＲ_systemで行われていた。しかし、並列効率の低い処理も含まれる場合があるため、稼働率がよいからといって必ずしもシステムの運用効率が高いとは限らなかった。

稼働時間と処理時間の総和（以下の式）の例を図１４に示す。

図１４では、稼働時間Ｔiに対して処理時間の総和は少なくなっている。この減少の度合いはプロセッサによって異なる。 Conventionally, evaluation of a parallel computer system has been performed by a working rate (Net Working Rate) NWR _system represented by the following equation (16). However, since processing with low parallel efficiency may be included, just because the operation rate is good, the system operation efficiency is not always high.

An example of the sum of operating time and processing time (the following formula) is shown in FIG.

In FIG. 14, the total processing time is less than the operating time Ti. The degree of this reduction varies depending on the processor.

本発明により、式（１６）を基にシステム運用効率Ｅ_systemという指標を作り、システムの運用効率を評価することが可能となる。この指標向上のためにはどの処理の並列効率がどの位向上する必要があるか等、運用効率の向上に対して具体的指針を出すことが可能となる。

According to the present invention, it is possible to create an index called system operation efficiency E _system based on the equation (16) and evaluate the operation efficiency of the system. It is possible to give specific guidelines for improving operational efficiency, such as how much the parallel efficiency of which processing needs to be improved in order to improve this index.

例えばＰ_system＝４、Ｔi＝１０、ｋ_max＝２であり、さらに以下の条件が満たされるとすると、Ｅ_system＝（５＋９）／（１０＋１０＋１０＋１０）＝０．３５となる。

なお、従来の稼働率ＮＷＲ_system＝（１０＋９）／４０＝０．４８３８となる。並列効率を考慮することにより、並列処理により各処理において無駄にしている時間を考慮したシステムの運用効率を評価することができる。 For example, if P _system = 4, Ti = 10, k _max = 2 and the following condition is satisfied, E _system = (5 + 9) / (10 + 10 + 10 + 10) = 0.35.

The conventional operation rate NWR _system = (10 + 9) /40=0.4838. By considering the parallel efficiency, it is possible to evaluate the operational efficiency of the system in consideration of the time wasted in each process by the parallel processing.

以上説明した稼働時間、並列処理である処理１におけるτ_i(p)の和、処理１におけるτ_i(p)と並列効率の積、非並列処理である処理２（プロセッサ４のみ）におけるτ_i(p)の和、及び処理２におけるτ_i(p)と並列効率の積を図１５に示す。図１５では、並列処理の場合には、稼働時間Ｔiより、τ_i(p)の方が短く、さらに並列効率を考慮すると無駄な処理時間が除かれるためさらに短くなる。一方、非並列処理の場合には並列効率が１となるため、処理２におけるτ_i(p)もτ_i(p)と並列効率の積についても同じ値になる。 Uptime described above, the sum of tau _i (p) in the process 1 is a parallel processing, the product of the parallel efficiency and tau _i (p) in the process 1, tau in non-parallel processing is processing 2 (processor 4 only) _i FIG. 15 shows the sum of (p) and the product of τ _i (p) and parallel efficiency in process 2. In FIG. 15, in the case of parallel processing, τ _i (p) is shorter than the operating time Ti, and further considering the parallel efficiency, it is further shortened because unnecessary processing time is eliminated. Meanwhile, since the parallel efficiency becomes 1 in the case of non-parallel processing, even to the same value for the product of the parallel efficiency in the processing 2 τ _i (p) is also τ _i (p).

並列計算機システムのプロセッサ増設の根拠データとして、従来からシステムの稼働率が用いられてきた。しかし有効に使われたシステムの資源を基にしているわけでないので、並列効率の低い処理のために資源の増設あるいは入れ替えを行う可能性がある。本発明によれば、並列計算機システムのプロセッサ増設に対して定量的指針を与えることが可能になる。システムの全プロセッサ数をＰ_System、Ｔiを各プロセッサの稼働時間、Ｐ_Addを増強後のプロセッサ個数、ｋ_maxを全処理数、αを予想並列効率とすると、例えば追加したプロセッサによる以下で表される稼働時間（数２８）だけ増加させたときの加速率Ａ_systemは、式（１８）に示すとおりになる。

Conventionally, system availability has been used as the basis data for adding processors in parallel computer systems. However, since it is not based on the resources of a system that has been used effectively, there is a possibility of adding or replacing resources for processing with low parallel efficiency. According to the present invention, it is possible to give a quantitative guideline for the addition of processors in a parallel computer system. If the total number of processors in the system is P _System , Ti is the operating time of each processor, P _Add is the number of processors after augmentation, k _max is the total number of processes, and α is the expected parallel efficiency, for example, The acceleration rate A _system when the operating time (Equation 28) is increased is as shown in Expression (18).

例えば以下に示すような条件で、α＝１であるとすると、加速率Ａ_systemは以下のように計算される。

Ａ_system＝（３９＋１×１０）／４０＝１．２３
このように約２３％のシステム増強となる。この値は従来の処理の並列効率を考慮しているという点で、増設に対して従来の稼働率より説得力がある値となる。システム増強に予測並列効率α（＜１）をかけて加速率を算出すれば、より現実的な値となる。また、増強したプロセッサのＣＰＵ能力を１０倍とするならば、α＝１０としてＡ_systemを求めることもできる。上記の例に当てはめると、Ａ_system＝（３９＋１０×１０）／４０＝３．４８となる。これにより異なるＣＰＵ性能のプロセッサの増設に対しても稼働率を基にした以上に根拠がある予測データを作成することが可能となる。 For example, assuming that α = 1 under the following conditions, the acceleration rate A _system is calculated as follows.

A _system = (39 + 1 × 10) /40=1.23
Thus, the system is increased by about 23%. This value is more persuasive than the conventional operation rate for expansion in that the parallel efficiency of the conventional processing is taken into consideration. If the acceleration rate is calculated by multiplying the system enhancement by the predicted parallel efficiency α (<1), it becomes a more realistic value. If the CPU capacity of the increased processor is 10 times, A _system can be obtained with α = 10. Applying the above example, A _system = (39 + 10 × 10) /40=3.48. As a result, it is possible to create prediction data that is based on the operation rate based on the addition of processors having different CPU performances.

また、本発明により、並列計算機システムの入れ替えに対して定量的指針を与えることも可能となる。各処理について計算された指標など（並列効率、ロードバランス寄与率、仮想並列化率、並列性能阻害要因寄与率、τ_i(p)、γ_i(p)、χ_i,j(p) 、各プロセッサの稼動時間Ｔi）により、各処理に対して次の例で示すような並列効率の推測が可能となり、システム入れ替え後のシステムの性能推定が可能となる。 Further, according to the present invention, it is possible to give a quantitative guideline for replacement of parallel computer systems. Indicators calculated for each process (parallel efficiency, load balance contribution rate, virtual parallelization rate, parallel performance impediment factor contribution rate, τ _i (p), γ _i (p), χ _{i, j} (p), The processor operating time Ti) makes it possible to estimate the parallel efficiency as shown in the following example for each process, and to estimate the performance of the system after system replacement.

例えば図１６に示したような経過時間が測定されるシステムのＣＰＵ性能を５倍にしたシステムを導入することを考えるとγ_i(p)、χ_i,RED(p)、χ_i,Other(p)は１／５になる。一方χ_i,C(p)は、ネットワーク性能に依存し今回は性能が同じとすると新システムの並列効率を次のように推定できる。 For example, when introducing a system in which the CPU performance of the system whose elapsed time is measured as shown in FIG. 16 is increased by five times, γ _i (p), χ _{i, RED} (p), χ _{i, Other} ( p) becomes 1/5. On the other hand, χ _{i, C} (p) depends on the network performance. If the performance is the same this time, the parallel efficiency of the new system can be estimated as follows.

なお、χ_i,C(1)＝０、χ_i,Others(1)＝０とすると、ＣＰＵ性能が5倍となった場合の性能評価指標は次のように計算できる。また、上で述べた式（１２−１）により、χ_1,RED(1)は以下のように表される。また、ロードバランス寄与率は式（５）により、仮想並列率は式（６−１）により、並列性能阻害要因寄与率（冗長処理、通信、その他）は式（７）に従って、並列効率は式（４−４）及び（９−１）に従って、以下のように計算される。

When χ _{i, C} (1) = 0 and χ _{i, Others} (1) = 0, the performance evaluation index when the CPU performance is five times can be calculated as follows. Further, χ _{1, RED} (1) is expressed as follows by the equation (12-1) described above. Also, the load balance contribution rate is calculated according to equation (5), the virtual parallel rate is calculated according to equation (6-1), the parallel performance impediment factor contribution rate (redundancy processing, communication, etc.) is calculated according to equation (7), and the parallel efficiency is calculated according to equation (7). According to (4-4) and (9-1), it is calculated as follows.

上に示した計算結果と実測に基づく性能指標をまとめると図１７に示すようになる。図１７の表に示すように、システムを入れ替えた時の並列性能を予測することによって、新しいシステムのシステム運用効率Ｅ_systemを推定することができる。そのために、今までのシステムのログデータを用い、今までの処理に対して図１７と同様にすべての性能指標を計算する。 FIG. 17 shows a summary of the calculation results shown above and performance indexes based on actual measurements. As shown in the table of FIG. 17, the system operation efficiency E _system of a new system can be estimated by predicting the parallel performance when the system is replaced. Therefore, using the log data of the system so far, all performance indices are calculated for the processing so far as in FIG.

ＣＰＵ性能を５倍にしたときのシステム運用効率Ｅ_systemを求める試算をすると、以下に示すようになる。推定値を基にして求めたこのＥ_systemと今までの実測値から求めたＥ_systemを比べることにより、システムの入れ替えに対し、稼働率に比べてより根拠があるデータを得ることができる。 A trial calculation for obtaining the system operation efficiency E _system when the CPU performance is increased by 5 times is as follows. By comparing this E _system obtained based on the estimated value with the E _system obtained from the actual measured values so far, it is possible to obtain data that is more grounded than the operating rate for system replacement.

図１８に示すように、ＣＰＵ性能が５倍になった場合の各処理のＥ_p(4)、τ_i(p)のｉについての和、及びプロセッサ数に従ってＥ_systemが計算できる。なお、前提として以下の条件を用いた。

As shown in FIG. 18, E _system can be calculated according to the sum of E _p (4) and τ _i (p) for each process and the number of processors when the CPU performance becomes five times. The following conditions were used as a premise.

［実施の形態の説明］
図１９に本発明の一実施の形態に係るシステム概要図を示す。並列性能分析装置１００は、並列計算機システム２００の並列性能を分析する単一プロセッサのコンピュータであり、印刷装置や表示装置といった出力装置１１０と接続されている。但し、並列性能分析装置１００は、並列計算機であってもよい。並列性能分析装置１００は、データ取得部１０と、ロードバランス寄与率計算部１１と、仮想並列化率計算部１２と、並列性能阻害要因寄与率計算部１３と、並列効率計算部１４と、補助指標計算部１５と、プロセッサ数最適化処理部２１と、プロセッサ増設見積処理部２２と、システムリプレイスデータ処理部２３と、運用効率データ処理部２４と、チューニング処理部２５と、アルゴリズム選定処理部２６と、並列性能評価処理部２７とを含む。並列性能分析装置１００は、ログデータ格納部３０に接続されている。並列計算機システム２００は、測定部２０１を含む。例えば並列性能分析装置１００は、並列計算機システム２００とネットワークにて接続されている。 [Description of Embodiment]
FIG. 19 shows a system outline diagram according to one embodiment of the present invention. The parallel performance analysis apparatus 100 is a single processor computer that analyzes the parallel performance of the parallel computer system 200, and is connected to an output device 110 such as a printing device or a display device. However, the parallel performance analysis apparatus 100 may be a parallel computer. The parallel performance analyzer 100 includes a data acquisition unit 10, a load balance contribution rate calculation unit 11, a virtual parallelization rate calculation unit 12, a parallel performance impediment factor contribution rate calculation unit 13, a parallel efficiency calculation unit 14, and an auxiliary Index calculation unit 15, processor number optimization processing unit 21, processor expansion estimation processing unit 22, system replacement data processing unit 23, operational efficiency data processing unit 24, tuning processing unit 25, and algorithm selection processing unit 26 And a parallel performance evaluation processing unit 27. The parallel performance analysis apparatus 100 is connected to the log data storage unit 30. The parallel computer system 200 includes a measurement unit 201. For example, the parallel performance analysis apparatus 100 is connected to the parallel computer system 200 via a network.

並列計算機システム２００の測定部２０１は、プログラムに従って並列処理を実行しながら、各処理時間γ_i(p)、χ_i,j(p)、τ_i(p)を測定する。例えば、各処理の開始から終了までをタイマで計測したり、各処理の開始時刻及び終了時刻を記録して処理終了後に処理時間を計算する。時間の計測は、オペレーティング・システム（ＯＳ：Operating System）を含むソフトウエアによる場合もあれば、ハードウエアによる場合もある。測定された処理時間のデータについては、一旦並列計算機システム２００のメモリ中に格納され、場合によっては他の記憶装置に格納される。 The measuring unit 201 of the parallel computer system 200 measures each processing time γ _i (p), χ _{i, j} (p), τ _i (p) while executing parallel processing according to a program. For example, the time from the start to the end of each process is measured by a timer, the start time and the end time of each process are recorded, and the process time is calculated after the process ends. The time measurement may be performed by software including an operating system (OS) or hardware. The measured processing time data is temporarily stored in the memory of the parallel computer system 200, and in some cases, stored in another storage device.

また、処理時間の測定ではなく、一定時間間隔毎に実行中のプログラムの事象を確認し、各事象についてカウントを行う場合もある。このような測定を、サンプリングによる測定と呼ぶ。このようなサンプリングによる測定は、式（４−４）、（９−１）、（９−２）やＲb(p)、Ｒp(p)、Ｒj(p)が時間比の形をしているため採用可能となる。測定精度による違いはあるが、時間測定による方法とサンプリングによる方法では結果は同じになる。 Further, instead of measuring the processing time, there may be a case where the events of the program being executed are confirmed at regular time intervals and the events are counted. Such measurement is called measurement by sampling. In such measurement by sampling, equations (4-4), (9-1), (9-2) and Rb (p), Rp (p), Rj (p) are in the form of time ratios. Therefore, it can be adopted. Although there are differences depending on the measurement accuracy, the results are the same between the time measurement method and the sampling method.

図２０にサンプリングによる測定の概念図を示す。図２０では左から右に時間が経過する様子を示している。図２０において下向き矢印はサンプリングのタイミングを示しており、下向き矢印の間隔で表されるようにサンプリングは一定時間間隔で行われる。図２０においては、最初に冗長処理がχ_i,RED(p)だけ実施された後、並列計算がγ_i(p)だけ行われる。なお、全体として処理はτ_i(p)だけ実施されている。サンプリング回数は、χ_i,RED(p)だけ続いた冗長処理の事象においては７回、γ_i(p)だけ続いた並列計算の事象においては９回である。全体の処理時間τ_i(p)の間では、サンプリング回数は２２回である。並列性能阻害要因のうち意図して測定したχ_i,RED(p)以外の事象をまとめてχ_i,others(p)で表し、意図して測定したτ_i(p)、χ_i,RED(p)及びγ_i(p)を用いて式（１３−６）から計算する。図２０の例では、χ_i,others(p)の間のサンプリング回数が６回（＝２２−９−７）であることが分かる。 FIG. 20 shows a conceptual diagram of measurement by sampling. FIG. 20 shows how time elapses from left to right. In FIG. 20, a downward arrow indicates a sampling timing, and sampling is performed at a constant time interval as indicated by an interval of the downward arrow. In FIG. 20, after redundant processing is first performed for χ _{i, RED} (p), parallel computation is performed for γ _i (p). As a whole, the processing is performed only for τ _i (p). The number of samplings is 7 in the event of redundant processing that lasts for χ _{i, RED} (p) and 9 in the event of parallel computation that lasts for γ _i (p). During the entire processing time τ _i (p), the number of samplings is 22 times. Of the parallel performance impediments, events other than the intentionally measured χ _{i, RED} (p) are collectively expressed as χ _{i, others} (p), and the intentionally measured τ _i (p), χ _{i, RED} ( Calculate from equation (13-6) using p) and γ _i (p). In the example of FIG. 20, it can be seen that the number of samplings during χ _{i, others} (p) is 6 (= 22-9-7).

実際にどのようにサンプリングによる測定を実施するかについては、その概要を以下に説明しておく。
（１）τ_i(p)の部分
（ａ）処理の始めにおいて事象τ_i(p)のためのフラグをonにし、処理の終了においてoffにする。実行時に事象τ_i(p)のためのフラグのon/offを一定時間間隔で識別し、onと識別された回数をカウントしてサンプリング回数を得るものとする。
以下の方法のいずれかの記述と処理を、必要に応じて組み合わせて測定する。
・プログラマが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・並列言語拡張やコンパイラ・ディレクティブ等が用いられている場合には、ツールが当該並列言語拡張やコンパイラ・ディレクティブ等を解釈して、上記フラグをon/offさせるための記述を行う。
・並列言語拡張やコンパイラ・ディレクティブ等が用いられている場合には、コンパイラが当該並列言語拡張やコンパイラ・ディレクティブ等を解釈して、上記フラグをon/offさせるための記述を行う。
・コンパイラが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・ＯＳが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・ランタイム・ライブラリが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・ハードウエアが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、コンパイラレベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ＯＳレベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ランタイムライブラリ・レベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ハードウェア・レベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ツールレベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、プログラム・レベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理の実施を、ハードウェア・レベルで行う。 The outline of how the measurement by sampling is actually performed will be described below.
(1) Part of τ _i (p) (a) The flag for the event τ _i (p) is turned on at the beginning of the process, and turned off at the end of the process. It is assumed that on / off of the flag for the event τ _i (p) is identified at a constant time interval at the time of execution, and the number of sampling is obtained by counting the number of times identified as on.
The description and processing of any of the following methods are combined and measured as necessary.
The programmer detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
When a parallel language extension or compiler directive is used, the tool interprets the parallel language extension or compiler directive, and makes a description for turning the flag on / off.
If a parallel language extension or compiler directive is used, the compiler interprets the parallel language extension or compiler directive, and makes a description for turning the flag on / off.
The compiler detects the beginning and end of the processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
The OS detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
The runtime library detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
The hardware detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
A description for processing for identifying that the flag is on and counting the number of times is given at the compiler level.
A description for processing for identifying that the flag is on and counting the number of times is given at the OS level.
A description for processing for identifying that the flag is on and counting the number of times is given at the runtime library level.
A description for processing for identifying that the flag is on and counting the number of times is given at the hardware level.
A description for the process of identifying that the flag is on and counting the number of times is given at the tool level.
A description for the process of identifying that the flag is on and counting the number of times is given at the program level.
A process of identifying that the flag is on and counting the number of times is performed at the hardware level.

（ｂ）プログラム名又はそれに代替する実行モジュール名等により事象を特定し、実行時にそのプログラム名又は実行モジュール名等を一定時間間隔で識別し、識別された名称の識別回数をカウントしてサンプリング回数を得るものとする。
以下の方法のいずれかの名前生成法と、識別処理及びカウント処理とを必要に応じて組み合わせて測定する。
・コンパイラが、上記プログラム名又は実行モジュール名等を生成する。
・ＯＳが、上記プログラム名又は実行モジュール名等を生成する。
・ランタイム・ライブラリが、上記プログラム名又は実行モジュール名等を生成する。
・ハードウエアが、上記プログラム名又は実行モジュール名等を生成する。
・並列言語拡張やコンパイラ・ディレクティブ等の記述により、上記プログラム名又は実行モジュール名等を生成する。
・プログラマの記述により、上記プログラム名又は実行モジュール名等を生成する。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理のための記述を、コンパイラ・レベルで行う。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理のための記述を、ＯＳレベルで行う。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理のための記述を、ランタイムライブラリ・レベルで行う。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理のための記述を、ハードウェア・レベルで行う。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理のための記述を、ツール・レベルで行う。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理のための記述を、プログラム・レベルで行う。
・生成されたプログラム名又は実行モジュール名等の識別処理及びカウント処理の実施を、ハードウェア・レベルで行う。 (B) An event is identified by a program name or an execution module name that substitutes for it, and the program name or execution module name is identified at a certain time interval at the time of execution, and the number of times of identification of the identified name is counted and the number of sampling times Shall be obtained.
The name generation method of any of the following methods, the identification process, and the count process are combined and measured as necessary.
-The compiler generates the program name or the execution module name.
The OS generates the program name or execution module name.
The runtime library generates the program name or execution module name.
The hardware generates the program name or execution module name.
-The above program name or execution module name is generated based on the description of parallel language extensions, compiler directives, etc.
-The above program name or execution module name is generated by the programmer's description.
A description for identification processing and count processing such as the generated program name or execution module name is performed at the compiler level.
A description for identification processing and count processing such as the generated program name or execution module name is performed at the OS level.
A description for identification processing and count processing such as the generated program name or execution module name is performed at the runtime library level.
A description for identification processing and count processing such as a generated program name or execution module name is performed at the hardware level.
A description for identification processing and count processing such as the generated program name or execution module name is performed at the tool level.
A description for identification processing and count processing such as the generated program name or execution module name is performed at the program level.
The identification process of the generated program name or execution module name and the count process are performed at the hardware level.

（２）χ_i,j(p)とγ_i(p)の部分
（ａ）事象χ_i,j(p)、γ_i(p)が出現する毎にその処理の初めにそのためのフラグをonにし、その処理の終わりにそのためのフラグをoffにセットする。
実行時に各事象のためのフラグのon/offを一定時間間隔で識別し、onと識別された回数をカウントしてサンプリング回数を得るものとする。１つの方法では検出できない場合があるため、以下の方法のいずれかの記述と処理を必要に応じて組み合わせて測定する。
・プログラマが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・並列言語拡張やコンパイラ・ディレクティブ等が用いられている場合には、ツールが当該並列言語拡張やコンパイラ・ディレクティブ等を解釈して、上記フラグをon/offさせるための記述を行う。
・並列言語拡張やコンパイラ・ディレクティブ等が用いられている場合には、コンパイラが当該並列言語拡張やコンパイラ・ディレクティブ等を解釈して、上記フラグをon/offさせるための記述を行う。
・コンパイラが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・ＯＳが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・ランタイム・ライブラリが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・ハードウエアが、プログラム中処理の始め及び終わり、すなわち上記フラグをon/offすべき位置を検出し、当該フラグをon/offさせるための記述を行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、コンパイラレベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ＯＳレベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ランタイムライブラリ・レベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ハードウェア・レベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、ツールレベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理のための記述を、アプリケーションプログラム・レベルで行う。
・上記フラグがonであることを識別してその回数をカウントする処理の実施を、ハードウェア・レベルで行う。 (2) Parts of χ _{i, j} (p) and γ _i (p) (a) Each time an event χ _{i, j} (p), γ _i (p) appears, the flag for that is turned on at the beginning of the processing And set the flag for that to off at the end of the process.
It is assumed that the flag on / off for each event is identified at a certain time interval at the time of execution, and the number of times of being identified as on is counted to obtain the number of times of sampling. Since there is a case where it cannot be detected by one method, measurement is performed by combining any of the following methods and processes as necessary.
The programmer detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
When a parallel language extension or compiler directive is used, the tool interprets the parallel language extension or compiler directive, and makes a description for turning the flag on / off.
If a parallel language extension or compiler directive is used, the compiler interprets the parallel language extension or compiler directive, and makes a description for turning the flag on / off.
The compiler detects the beginning and end of the processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
The OS detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
The runtime library detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
The hardware detects the beginning and end of processing in the program, that is, the position where the flag should be turned on / off, and makes a description for turning the flag on / off.
A description for processing for identifying that the flag is on and counting the number of times is given at the compiler level.
A description for processing for identifying that the flag is on and counting the number of times is given at the OS level.
A description for processing for identifying that the flag is on and counting the number of times is given at the runtime library level.
A description for processing for identifying that the flag is on and counting the number of times is given at the hardware level.
A description for the process of identifying that the flag is on and counting the number of times is given at the tool level.
A description for the process of identifying that the flag is on and counting the number of times is given at the application program level.
A process of identifying that the flag is on and counting the number of times is performed at the hardware level.

（ｂ）既知のモジュール名を並列処理部又は並列性能阻害要因に係る処理部に予め分類しておき、実行時にモジュール名を識別し、各モジュール名につきカウントしてサンプリング回数を得る。以下に示す分類方法と、識別処理及びカウント処理を必要に応じて組み合わせて測定する。
・モジュール名の分類を、コンパイラレベルで行う。
・モジュール名の分類を、ＯＳレベルで行う。
・モジュール名の分類を、ランタイムライブラリ・レベルで行う。
・モジュール名の分類を、ハードウェア・レベルで行う。
・モジュール名の分類を、並列言語拡張やコンパイラディレクティブ・レベルで行う。
・モジュール名の分類を、ユーザレベルで行う。
・上記モジュール名の識別処理及びカウント処理のための記述を、コンパイラ・レベルで行う。
・上記モジュール名の識別処理及びカウント処理のための記述を、ＯＳレベルで行う。
・上記モジュール名の識別処理及びカウント処理のための記述を、ランタイムライブラリ・レベルで行う。
・上記モジュール名の識別処理及びカウント処理のための記述を、ハードウェア・レベルで行う。
・上記モジュール名の識別処理及びカウント処理のための記述を、ツール・レベルで行う。
・上記モジュール名の識別処理及びカウント処理のための記述を、プログラム・レベルで行う。
・上記モジュール名の識別処理及びカウント処理の実施をハードウェア・レベルで行う。 (B) A known module name is classified in advance into a parallel processing unit or a processing unit related to a parallel performance impediment factor, the module name is identified at the time of execution, and the number of sampling is obtained by counting for each module name. The classification method shown below, the identification process, and the count process are combined and measured as necessary.
・ Classify module names at the compiler level.
-Module names are classified at the OS level.
・ Classify module names at runtime library level.
・ Classify module names at the hardware level.
-Classify module names at the parallel language extension or compiler directive level.
・ Classify module names at the user level.
-Description for the module name identification processing and count processing is performed at the compiler level.
A description for the module name identification processing and count processing is performed at the OS level.
-Description for the module name identification processing and count processing is performed at the runtime library level.
A description for the module name identification process and count process is performed at the hardware level.
-The description for the module name identification process and the count process is performed at the tool level.
-The description for the module name identification process and the count process is given at the program level.
-The module name identification process and the count process are performed at the hardware level.

例として、全プロセッサにあるF(Imax)の各要素を足し合わせる処理をFortranと並列ライブラリのＭＰＩ（Message Passing Interface）で記述したプログラムを用いて、χ_i,j(p)のサンプリング方法を以下の表１に示す。例えばonのフラグを、コンパイラに指示を行う「*sampon」で表わし、offのフラグを「*sampoff」で表す。また、REDを冗長処理とし、Ｃを通信とし、REDやＣの後ろの数字をその出現順番を表すものとする。総和処理の部分は各プロセッサで同じ計算を行う冗長処理であるので、その始点と終点に「*sampon (RED), 2」及び「*sampoff (RED), 2」を配置し、フラグをon/offする。nLOCALという変数の計算についても冗長処理であるから、その始点と終点に「*sampon (RED), 1」及び「*sampoff (RED), 1」を配置し、フラグをon/offする。さらに、MPI_ALLTOALLは通信ライブラリであり、ここでは「*sampon (C), 1」「*sampoff (C), 1」並びに「*sampon (C), 2」及び「*sampoff (C), 2」を配置し、フラグをon/offする。なお、MPI_ALLTOALLの場合には、ツールやコンパイラやＯＳが事象を識別してフラグを立てるようにすることも可能である。 As an example, the following is a sampling method for χ _{i, j} (p) using a program that describes the process of adding each element of F (Imax) in all processors in Fortran and a parallel library MPI (Message Passing Interface). Table 1 shows. For example, an on flag is represented by “* sampon” that instructs the compiler, and an off flag is represented by “* sampoff”. Further, RED is assumed to be redundant processing, C is assumed to be communication, and the numbers after RED and C represent the appearance order. Since the summation process is a redundant process in which each processor performs the same calculation, "* sampon (RED), 2" and "* sampoff (RED), 2" are placed at the start and end points, and the flag is set to on / Turn off. Since the calculation of the variable nLOCAL is also a redundant process, “* sampon (RED), 1” and “* sampoff (RED), 1” are arranged at the start point and end point, and the flag is turned on / off. Furthermore, MPI_ALLTOALL is a communication library. Here, "* sampon (C), 1", "* sampoff (C), 1", "* sampon (C), 2" and "* sampoff (C), 2" Place and turn flag on / off. In the case of MPI_ALLTOALL, the tool, compiler, or OS can identify the event and set a flag.

（表１）
subroutine GSUM(Imax,F,FW,NP)
real*8 F(Imax),FW(Imax)
include 'mpif.h'

sampon（RED）,1
nLOCAL=(Imax+NP-1)/NP
sampoff（RED）,1

sampon（C）,1
call MPI_ALLTOALL(F ,nLOCAL,MPI_DOUBLE_PRECISION,
& FW,nLOCAL,MPI_DOUBLE_PRECISION,
& MPI_COMM_WORLD,IERR)
sampoff（C）,1

sampon（RED）,2
do j=2,NP
k=(j-1)*nLOCAL
do i=1,nLOCAL
FW(i)=FW(i)+FW(i+k)
end do
end do

do j=2,NP
k=(j-1)*nLOCAL
do i=1,nLOCAL
FW(i+k)=FW(i)
end do
end do
sampoff（RED）,2

sampon（C）,2
call MPI_ALLTOALL(FW, nLOCAL,MPI_DOUBLE_PRECISION,
& F ,nLOCAL,MPI_DOUBLE_PRECISION,
& MPI_COMM_WORLD,IERR)
sampoff（C）,2
return
end (Table 1)
subroutine GSUM (Imax, F, FW, NP)
real * 8 F (Imax), FW (Imax)
include 'mpif.h'

sampon (RED), 1
nLOCAL = (Imax + NP-1) / NP
sampoff (RED), 1

sampon (C), 1
call MPI_ALLTOALL (F, nLOCAL, MPI_DOUBLE_PRECISION,
& FW, nLOCAL, MPI_DOUBLE_PRECISION,
& MPI_COMM_WORLD, IERR)
sampoff (C), 1

sampon (RED), 2
do j = 2, NP
k = (j-1) * nLOCAL
do i = 1, nLOCAL
FW (i) = FW (i) + FW (i + k)
end do
end do

do j = 2, NP
k = (j-1) * nLOCAL
do i = 1, nLOCAL
FW (i + k) = FW (i)
end do
end do
sampoff (RED), 2

sampon (C), 2
call MPI_ALLTOALL (FW, nLOCAL, MPI_DOUBLE_PRECISION,
& F, nLOCAL, MPI_DOUBLE_PRECISION,
& MPI_COMM_WORLD, IERR)
sampoff (C), 2
return
end

表１のプログラムを実行した際のサンプリングの一例を図２１に示す。図２１では、事象[(RED),1]、[(RED),2]、[(C),1]、[(C),2]の各々についてカウントされる。但し、並列効率等を計算する際には、冗長処理全体[(RED),1+(RED),2]、通信処理全体[(C),1+(C),2]として扱う。 An example of sampling when the program of Table 1 is executed is shown in FIG. In FIG. 21, the event [(RED), 1], [(RED), 2], [(C), 1], [(C), 2] is counted. However, when calculating parallel efficiency etc., it treats as the whole redundant process [(RED), 1+ (RED), 2] and the whole communication process [(C), 1 + (C), 2].

図１９の説明に戻って、並列性能分析装置１００のデータ取得部１０は、上で述べたように処理時間又はサンプリング数として測定部２０１により測定される各処理時間γ_i(p)、χ_i,j(p)、τ_i(p)を、並列計算機システム２００から取得し、並列性能分析装置１００に接続されたログデータ格納部３０に格納する。ログデータ格納部３０には、各処理時間のほかに計算された並列効率を含む並列性能評価指標等のデータも蓄積される。ロードバランス寄与率計算部１１は、上で説明したように式（５）に従ってロードバランス寄与率Ｒb(p)を計算し、記憶装置に格納する。なお、τ(p)については式（２）に従って計算する。仮想並列化率計算部１２は、式（６−１）に従って仮想並列化率Ｒp(p)を計算し、記憶装置に格納する。なお、式（６−１）の分母のτ(1)については、式（８−１）に示したような近似が行われる場合もある。また、式（１０）及び式（１１）から式（１５）が用いられる場合もある。なお、式（１５）中の第２項（χ_1,j(1)，ｊ＝１）については、式（１２−１）、（１２−２）、（１２−３）又は（１２−４）のいずれかで計算される場合もある。また、χ_1,j(1)，ｊ＞１については式（１３−３）乃至（１３−５）のいずれかで計算される場合もある。χ_1,j(p)のｊ＝１は冗長処理である。但し、他の並列性能阻害要因についての処理時間であってもよい。 Returning to the description of FIG. 19, the data acquisition unit 10 of the parallel performance analysis apparatus 100 performs processing times γ _i (p), χ _i measured by the measurement unit 201 as processing times or sampling numbers as described above. _{, j} (p), τ _i (p) are acquired from the parallel computer system 200 and stored in the log data storage unit 30 connected to the parallel performance analyzer 100. The log data storage unit 30 also stores data such as parallel performance evaluation indexes including parallel efficiency calculated in addition to each processing time. As described above, the load balance contribution ratio calculation unit 11 calculates the load balance contribution ratio Rb (p) according to the equation (5) and stores it in the storage device. Note that τ (p) is calculated according to equation (2). The virtual parallelization rate calculation unit 12 calculates the virtual parallelization rate Rp (p) according to Equation (6-1) and stores it in the storage device. Note that τ (1) in the denominator of equation (6-1) may be approximated as shown in equation (8-1). In addition, Expression (10) and Expression (11) to Expression (15) may be used. For the second term (χ _{1, j} (1), j = 1) in equation (15), equations (12-1), (12-2), (12-3) or (12-4) ) In some cases. Further, χ _{1, j} (1), j> 1 may be calculated by any one of formulas (13-3) to (13-5). j = 1 of χ _{1, j} (p) is a redundant process. However, the processing time for other parallel performance impediment factors may be used.

並列性能阻害要因寄与率計算部１３は、式（７）に従って、各並列性能阻害要因につき並列性能阻害要因寄与率Ｒj(p)を計算し、記憶装置に格納する。並列効率計算部１４は、式（４−４）、式（４−５）、式（８−１）の条件が満たされる場合には式（８−２）、式（９−１）又は式（９−２）のいずれかに従って、並列効率Ｅ_p(p)を計算し、記憶装置に格納する。式（９−２）を用いる場合に分子の第１項については式（１２−１）、（１２−２）、（１２−３）、（１２−４）、（１３−３）、（１３−４）又は（１３−５）のいずれかで計算される場合もある。式（１２−１）乃至（１２−４）は、χ_1,j(p)，ｊ＝１の冗長処理である。補助指標計算部１５は、例えば式（６−２）に従って加速率Ａ_pを、並列効率Ｅ_p(p)とプロセッサ数ｐとからＥ_p(p)・ｐを計算し、記憶装置に格納する。 The parallel performance impediment factor contribution rate calculation unit 13 calculates the parallel performance impediment factor contribution rate Rj (p) for each parallel performance impediment factor according to Equation (7) and stores it in the storage device. The parallel efficiency calculation unit 14 calculates the expression (8-2), the expression (9-1), or the expression when the conditions of the expressions (4-4), (4-5), and (8-1) are satisfied. The parallel efficiency E _p (p) is calculated according to any of (9-2) and stored in the storage device. When the formula (9-2) is used, the first term of the molecule is expressed by the formulas (12-1), (12-2), (12-3), (12-4), (13-3), (13 -4) or (13-5). Expressions (12-1) to (12-4) are redundant processes of χ _{1, j} (p), j = 1. The auxiliary index calculation unit 15 calculates the acceleration rate A _p according to, for example, the equation (6-2), and E _p (p) · p from the parallel efficiency E _p (p) and the number of processors p, and stores them in the storage device. .

プロセッサ数最適化処理部２１は、並列計算機システム２００のエンドユーザに、処理のために投入すべき最適なプロセッサ数を指示するための処理等を実施する。プロセッサ増設見積処理部２２は、並列計算機システム２００の運用管理者に、プロセッサの増設に際して指針となる数値を提示するための処理を実施する。システムリプレイスデータ処理部２３は、並列計算機システム２００の運用管理者に、システムリプレイスに際して指針となる数値を提示するための処理を実施する。運用効率データ処理部２４は、並列計算機システム２００の運用管理者に、システム運用効率に関するデータを提示するための処理を実施する。チューニング処理部２５は、並列処理を行うためのプログラムに、プログラマが適切な目標設定等により効率的なプログラム等のチューニングを実施できるようにするための処理を実施する。アルゴリズム選定処理部２６は、同一処理について異なるアルゴリズムが存在する場合に、並列処理を行うためのプログラムに、プログラマが並列効率をより向上させることのできるアルゴリズムを選択できるようにするための処理等を実施する。並列性能評価処理部２７は、並列計算機システムの開発者や研究者が並列性能の評価を容易に行うことができるようにするための処理を実施する。これらの処理部の詳細な処理内容については、以下で説明する。 The processor number optimization processing unit 21 performs processing for instructing an end user of the parallel computer system 200 about the optimum number of processors to be input for processing. The processor expansion estimation processing unit 22 performs processing for presenting a numerical value serving as a guideline when adding a processor to the operation manager of the parallel computer system 200. The system replacement data processing unit 23 performs processing for presenting numerical values that serve as guidelines for system replacement to the operation manager of the parallel computer system 200. The operational efficiency data processing unit 24 performs processing for presenting data related to system operational efficiency to the operation manager of the parallel computer system 200. The tuning processing unit 25 performs processing for enabling a programmer to perform efficient tuning of a program or the like by appropriate target setting or the like for a program for performing parallel processing. The algorithm selection processing unit 26 performs processing for allowing a programmer to select an algorithm that can further improve parallel efficiency as a program for performing parallel processing when different algorithms exist for the same processing. carry out. The parallel performance evaluation processing unit 27 performs processing for allowing a parallel computer system developer or researcher to easily evaluate parallel performance. Detailed processing contents of these processing units will be described below.

次に図１９に示したシステム等の処理フローを図２２を用いて説明する。最初に、処理時間の直接の計測のための記述、コンパイラ、ＯＳ、ツール、プログラマ、ランタイム・ライブラリ、ハードウエア等により各処理時間に対応するサンプリング数をカウントするためのフラグをon/offさせるための記述、コンパイラ、ＯＳ、ツール、プログラマ、ランタイム・ライブラリ、ハードウエア等により各処理時間に対応するサンプリング数をカウントするためのモジュール名等の分類などを含む前処理を実施する（ステップＳ１）。この処理については、並列計算機システム２００で行われる場合もあれば、他の計算機システムにおいて行われる場合もある。さらに、プログラマなどの人間により行われる場合もある。なお、ステップＳ１は、並列性能分析装置１００において実施される処理ではなく並列計算機システム２００により実施される処理でもない場合もあるので、点線ブロックで表されている。 Next, a processing flow of the system shown in FIG. 19 will be described with reference to FIG. First, in order to turn on / off the flag for counting the sampling number corresponding to each processing time by the description for direct measurement of the processing time, compiler, OS, tool, programmer, runtime library, hardware, etc. Preprocessing including classification of module names and the like for counting the number of samplings corresponding to each processing time is performed by a description of the above, compiler, OS, tool, programmer, runtime library, hardware, etc. (step S1). This processing may be performed by the parallel computer system 200 or may be performed by another computer system. Furthermore, it may be performed by a person such as a programmer. Note that step S1 is not a process executed by the parallel performance analysis apparatus 100 but may be a process executed by the parallel computer system 200, and is therefore represented by a dotted line block.

次に、並列計算機システム２００の測定部２０１は、前処理に基づいて、処理時間の計測を実施したり、サンプリング数をカウントしたりする測定処理を実施する（ステップＳ３）。測定結果である各処理時間γ_i(p)、χ_i,j(p)、τ_i(p)又は各処理時間に対応するサンプリングのカウント値については、並列計算機システム２００の記憶装置に格納され、並列性能分析装置１００のデータ取得部１０により読み出される。データ取得部１０は、各処理時間γ_i(p)、χ_i,j(p)、τ_i(p)又は各処理時間に対応するサンプリングのカウント値を取得すると、並列性能分析装置１００のログデータ格納部３０に格納する。 Next, the measurement unit 201 of the parallel computer system 200 performs a measurement process that measures the processing time and counts the number of samplings based on the preprocessing (step S3). Each processing time γ _i (p), χ _{i, j} (p), τ _i (p), which is a measurement result, or a sampling count value corresponding to each processing time is stored in the storage device of the parallel computer system 200. The data acquisition unit 10 of the parallel performance analyzer 100 reads the data. When the data acquisition unit 10 acquires each processing time γ _i (p), χ _{i, j} (p), τ _i (p) or the count value of sampling corresponding to each processing time, the data acquisition unit 10 logs Store in the data storage unit 30.

そして、ロードバランス寄与率計算部１１、仮想並列化率計算部１２、並列性能阻害要因寄与率計算部１３、並列効率計算部１４、補助指標計算部１５は、ログデータ格納部３０に格納された各処理時間γ_i(p)、χ_i,j(p)、τ_i(p)又は各処理時間に対応するサンプリングのカウント値等を用いて、ロードバランス寄与率Ｒb(p)、仮想並列化率Ｒp(p)、各並列性能阻害要因寄与率Ｒj(p)、並列効率Ｅ_p(p)、加速率Ａ_pその他の補助指標を計算し、ログデータ格納部３０に格納する（ステップＳ５）。 The load balance contribution rate calculation unit 11, virtual parallelization rate calculation unit 12, parallel performance impediment factor contribution rate calculation unit 13, parallel efficiency calculation unit 14, and auxiliary index calculation unit 15 are stored in the log data storage unit 30. Using each processing time γ _i (p), χ _{i, j} (p), τ _i (p) or the count value of sampling corresponding to each processing time, the load balance contribution ratio Rb (p), virtual parallelization The rate Rp (p), each parallel performance impediment factor contribution rate Rj (p), the parallel efficiency E _p (p), the acceleration rate A _{p and} other auxiliary indicators are calculated and stored in the log data storage unit 30 (step S5). .

並列効率Ｅ_p(p)については、上で述べたように式（４−４）、式（４−５）、式（８−１）の条件が満たされる場合には式（８−２）、式（９−１）又は式（９−２）のいずれかに従って計算される。従って、並列効率計算部１４は、ロードバランス寄与率計算部１１により計算されたロードバランス寄与率Ｒb(p)、仮想並列化率計算部１２により計算された仮想並列化率Ｒp(p)、並列性能阻害要因寄与率計算部１３により計算された各並列性能阻害要因寄与率Ｒj(p)、補助指標計算部１５により計算された加速度Ａ_p(p)を用いて、その他の必要なデータについてはログデータ格納部３０に格納された処理時間等を用いて、並列効率Ｅ_p(p)を計算する。 As for the parallel efficiency E _p (p), as described above, when the conditions of the equations (4-4), (4-5), and (8-1) are satisfied, the equation (8-2) , And calculated according to either equation (9-1) or equation (9-2). Therefore, the parallel efficiency calculation unit 14 includes the load balance contribution rate Rb (p) calculated by the load balance contribution rate calculation unit 11, the virtual parallelization rate Rp (p) calculated by the virtual parallelization rate calculation unit 12, About other necessary data, using each parallel performance impediment factor contribution rate Rj (p) calculated by the performance impediment factor contribution rate calculation unit 13 and the acceleration A _p (p) calculated by the auxiliary index calculation unit 15 The parallel efficiency E _p (p) is calculated using the processing time and the like stored in the log data storage unit 30.

例えば図２３のような処理時間の測定結果がデータ取得部１０によりログデータ格納部３０に格納された場合の計算例を以下に示す。より具体的には、τ₁(p)＝３４、τ₂(p)＝３５、τ₃(p)＝３３、τ₄(p)＝３７、γ₁(p)＝１５、γ₂(p)＝１４、γ₃(p)＝１３、γ₄(p)＝１６、χ_1,RED(p)＝８、χ_2,RED(p)＝９、χ_3,RED(p)＝７、χ_4,RED(p)＝７、χ_1,C(p)＝１０、χ_2,C(p)＝１１、χ_3,C(p)＝１２、χ_4,C(p)＝１３という測定結果が得られたものとする。従って、χ_1,others(p)＝１（＝３４−１５−８−１０）、χ_2,others(p)＝１（＝３５−１４−９−１１）、χ_3,others(p)＝１（＝３３−１３−７−１２）、χ_4,others(p)＝１（＝３７−１６−７−１３）となる。以下の計算で必要となるが未測定のχ_1,C(1)及びχ_1,others(1)については共に０とする。 For example, a calculation example when the measurement result of the processing time as shown in FIG. 23 is stored in the log data storage unit 30 by the data acquisition unit 10 is shown below. More specifically, τ ₁ (p) = 34, τ ₂ (p) = 35, τ ₃ (p) = 33, τ ₄ (p) = 37, γ ₁ (p) = 15, γ ₂ (p ) = 14, γ ₃ (p) = 13, γ ₄ (p) = 16, χ _{1, RED} (p) = 8, χ _{2, RED} (p) = 9, χ _{3, RED} (p) = 7, χ _{4, RED} (p) = 7, χ _{1, C} (p) = 10, χ _{2, C} (p) = 11, χ _{3, C} (p) = 12, χ _{4, C} (p) = 13 It is assumed that the measurement result is obtained. Therefore, χ _{1, others} (p) = 1 (= 34-15-8-10), χ _{2, others} (p) = 1 (= 35-14-9-11), χ _{3, others} (p) = 1 (= 33-13-7-12), χ _{4, others} (p) = 1 (= 37-16-7-13). Although it is necessary for the following calculation, it is assumed that both χ _{1, C} (1) and χ _{1, others} (1) which have not been measured are 0.

（１）ロードバランス寄与率（式（５））

（２）仮想並列化率（式（６−１），（１２−１），（１５））

（３）加速率（式（６−２））

（４）並列性能阻害要因寄与率（式（７））
（４−１）冗長処理の並列性能阻害要因寄与率

（４−２）通信処理の並列性能阻害要因寄与率

（４−３）その他の並列性能阻害要因寄与率

(1) Load balance contribution ratio (Formula (5))

(2) Virtual parallelization rate (formulas (6-1), (12-1), (15))

(3) Acceleration rate (Formula (6-2))

(4) Parallel performance impediment factor contribution ratio (Formula (7))
(4-1) Redundant processing parallel performance impediment factor contribution rate

(4-2) Parallel processing impediment contribution rate of communication processing

(4-3) Contribution rate of other parallel performance impediment factors

（５−１）並列効率（式（４−４））

（５−２）並列効率（式（９−１））

（５−３）並列効率（式（９−２））

(5-1) Parallel efficiency (Formula (4-4))

(5-2) Parallel efficiency (Formula (9-1))

(5-3) Parallel efficiency (Formula (9-2))

以上の結果をまとめると図１３のようになる。なお、補助指標であるＥ_p(p)・ｐも計算されている。Ｅ_p(4)・ｐ＝１．７７７で、４プロセッサ並列で１．７７７プロセッサの性能で処理していることが分かる。並列効率はロードバランスで約９４％（＝Ｒb(4)＝0.9392）に低下する。並列性能阻害要因の影響は、冗長処理が約２２％（＝Ｒ_RED(4)＝0.2230）、通信が約３３％（＝Ｒ_C(4)＝0.3309）、その他が３％（＝Ｒ_others(4)＝0.0288）である。従って主に通信と冗長処理実行で５５％程度並列効率を下げている。また図１３ではＲp(4)＝0.8821ということから、プロセッサを無限に投入した時の並列最大性能が１プロセッサの８．４８２（＝Ａ_p(4)＝１／（１−0.8821））倍であることが推定できる。従って、この処理は８プロセッサ以下で行われるべき処理であることが分かる。この処理をＥ_p(x)≧０．８の条件で行おうとすれば、Ｅ_p(x)・ｐ＝0.4443×４＝１．７７７＝Ｅ_p(x)・ｘ＝０．８＊ｘで、ｘ＝2.22となる。従ってｐ≧2.22＝２が与えられた条件に対する最適なプロセッサ数と予想できる。 The above results are summarized as shown in FIG. The auxiliary index E _p (p) · p is also calculated. E _p (4) · p = 1.777, it can be seen that processing is performed in parallel with four processors with the performance of 1.777 processors. The parallel efficiency is reduced to about 94% (= Rb (4) = 0.9392) in load balance. The effects of parallel performance impediments are about 22% for redundant processing (= R _RED (4) = 0.2230), about 33% for communication (= R _C (4) = 0.3309), and 3% for _others (= R _others ( 4) = 0.0288). Therefore, the parallel efficiency is lowered by about 55% mainly by communication and redundant processing execution. In FIG. 13, since Rp (4) = 0.8221, the maximum parallel performance when an infinite number of processors are inserted is 8.482 times (= A _p (4) = 1 / (1-0.8821)) times that of one processor. It can be estimated that there is. Therefore, it can be seen that this processing should be performed by eight processors or less. If this processing is performed under the condition of E _p (x) ≧ 0.8, E _p (x) · p = 0.4443 × 4 = 1.777 = E _p (x) · x = 0.8 * x , X = 2.22. Therefore, it can be expected that the optimum number of processors for a given condition of p ≧ 2.22 = 2.

また、例えば図２４のようなサンプリングによるカウント数がデータ取得部１０によりログデータ格納部３０に格納された場合の計算例を以下に示す。より具体的には、τ₁(p)＝３４８８、τ₂(p)＝３５６１、τ₃(p)＝３３７２、τ₄(p)＝３７５６、γ₁(p)＝１５２１、γ₂(p)＝１４１１、γ₃(p)＝１３２２、γ₄(p)＝１６０１、χ_1,RED(p)＝８２３、χ_2,RED(p)＝９４５、χ_3,RED(p)＝７１１、χ_4,RED(p)＝７０３、χ_1,C(p)＝１０５６、χ_2,C(p)＝１１１１、χ_3,C(p)＝１２３０、χ_4,C(p)＝１３４１という測定結果が得られたものとする。従って、χ_1,others(p)＝８８（＝３４８８−１５２１−８２３−１０５６）、χ_2,others(p)＝９４（＝３５６１−１４１１−９４５−１１１１）、χ_3,others(p)＝１０９（＝３３７２−１３２２−７１１−１２３０）、χ_4,others(p)＝１１１（＝３７５６−１６０１−７０３−１３４１）となる。以下の計算で必要となるが未測定のχ_1,C(1)及びχ_1,others(1)については共に０とする。 Further, for example, a calculation example in the case where the count number by sampling as shown in FIG. 24 is stored in the log data storage unit 30 by the data acquisition unit 10 is shown below. More specifically, τ ₁ (p) = 3488, τ ₂ (p) = 3561, τ ₃ (p) = 3372, τ ₄ (p) = 3756, γ ₁ (p) = 1521, γ ₂ (p ) = 1411, γ ₃ (p) = 1322, γ ₄ (p) = 1601, χ _{1, RED} (p) = 823, χ _{2, RED} (p) = 945, χ _{3, RED} (p) = 711, χ _{4, RED} (p) = 703, χ _{1, C} (p) = 1105, χ _{2, C} (p) = 1111, χ _{3, C} (p) = 1230, χ _{4, C} (p) = 1341 It is assumed that the measurement result is obtained. Therefore, χ _{1, others} (p) = 88 (= 3488-1521-823-1056), χ _{2, others} (p) = 94 (= 3561-1141-11545-1111), χ _{3, others} (p) = 109 (= 3372-1322-711-1230) and χ _{4, others} (p) = 111 (= 3756-1601-703-1341). Although it is necessary for the following calculation, it is assumed that both χ _{1, C} (1) and χ _{1, others} (1) which have not been measured are 0.

（１）ロードバランス寄与率（式（５））

（３）加速率（式（６−２））

（４−２）通信処理の並列性能阻害要因寄与率

（４−３）その他の並列性能阻害要因寄与率

(1) Load balance contribution ratio (Formula (5))

(2) Virtual parallelization rate (formulas (6-1), (12-1), (15))

(3) Acceleration rate (Formula (6-2))

(4-3) Contribution rate of other parallel performance impediment factors

（５−１）並列効率（式（４−４））

（５−２）並列効率（式（９−１））

（５−３）並列効率（式（９−２））

以上の結果をまとめると処理時間の計測の場合と同じ図１３のようになる。 (5-1) Parallel efficiency (Formula (4-4))

(5-2) Parallel efficiency (Formula (9-1))

(5-3) Parallel efficiency (Formula (9-2))

The above results are summarized as shown in FIG. 13 which is the same as the case of measuring the processing time.

図２２の説明に戻って、ログデータ格納部３０には、処理毎に処理時間等の測定結果と図１３に示すような並列性能評価指標及び補助指標とがセットで格納される。そして、並列性能分析装置１００は、ユーザによる要求に応じて又は自動的に、表示装置や印刷装置等の出力装置１１０に、図１３のような処理結果を出力する（ステップＳ７）。 Returning to the description of FIG. 22, the log data storage unit 30 stores a measurement result such as a processing time, a parallel performance evaluation index and an auxiliary index as shown in FIG. 13 as a set for each process. Then, the parallel performance analysis apparatus 100 outputs the processing result as shown in FIG. 13 to the output device 110 such as a display device or a printing device in response to a user request or automatically (step S7).

ユーザは、図１３のようなデータだけで自ら並列性能などについて分析、最適プロセッサ数の見積、プロセッサ増設やシステムリプレイスを行う場合の効果の見積、プログラム等のチューニング、アルゴリズムの選定などを行っても良い。しかし、プロセッサ数最適化処理部２１、プロセッサ増設見積処理部２２、システムリプレイスデータ処理部２３、運用効率データ処理部２４、チューニング処理部２５、アルゴリズム選定処理部２６、並列性能評価処理部２７により、以下で説明するような各種コンサルティング支援処理を、ユーザの指示に従って実施する（ステップＳ９）。これにより、より具体的なデータを得ることができるようになる。 The user can analyze the parallel performance by using only the data as shown in FIG. 13, estimate the optimum number of processors, estimate the effect when adding a processor or replacing a system, tune a program, select an algorithm, etc. good. However, the processor number optimization processing unit 21, the processor expansion estimation processing unit 22, the system replacement data processing unit 23, the operational efficiency data processing unit 24, the tuning processing unit 25, the algorithm selection processing unit 26, and the parallel performance evaluation processing unit 27 Various consulting support processes as described below are performed in accordance with user instructions (step S9). Thereby, more specific data can be obtained.

Ａ．プロセッサ数最適化処理
プロセッサ数最適化処理部２１による処理を図２５及び図２６を用いて説明する。プロセッサ数最適化処理部２１は、ユーザによる目標並列効率(Ｅ_p)_Tの値の設定入力を受け付ける（ステップＳ１１）。そして、最適プロセッサ数の計算を以下の式に従って行い、記憶装置に格納する（ステップＳ１３）。
(ｐ)_OPT＝Ｅ_p(p)／(Ｅ_p)_T・ｐ
そして、計算された最適プロセッサ数を出力装置１１０に出力する（ステップＳ１５）。これにより、ユーザは、次に同じ処理群に属する処理を実施する際に使用するプロセッサを必要最小限にすることができる。例えば、上でも説明しているが、図１３のような計算結果が得られており、目標並列効率(Ｅ_p)_T＝０．８とすると、ｐ＝２．２２になる。従って最適なプロセッサ数は２となる。 A. Processor Number Optimization Processing Processing performed by the processor number optimization processing unit 21 will be described with reference to FIGS. 25 and 26. The processor number optimization processing unit 21 receives a setting input of a target parallel efficiency (E _p ) _T value by the user (step S11). Then, the optimal number of processors is calculated according to the following formula and stored in the storage device (step S13).
(p) _OPT = E _p (p) / (E _p ) _T · p
Then, the calculated optimum number of processors is output to the output device 110 (step S15). As a result, the user can minimize the number of processors used when the next process belonging to the same process group is performed. For example, as described above, the calculation result as shown in FIG. 13 is obtained, and when the target parallel efficiency (E _p ) _T = 0.8, p = 2.22. Therefore, the optimum number of processors is 2.

また、連続して同じ処理群の処理を実施する場合には、最適プロセッサ数を調整しながらより効率的に処理を実施させることも可能になる。すなわち、図２６に示すような処理を実施する。最初に、プロセッサ数ｐの仮設定を行う（ステップＳ２１）。この仮設定されたプロセッサ数ｐは同じ処理群の最初の処理について用いられる。また、ユーザから目標並列効率の設定を受け付ける（ステップＳ２３）。そして、プロセッサ数ｐの設定に従って、並列計算機システム２００により並列処理を実施すると共に測定部２０１により処理時間等を測定し、記憶装置に測定結果を格納する（ステップＳ２５）。データ取得部１０は、測定部２０１により測定された処理時間等のデータをログデータ格納部３０に格納する。そして、並列効率計算部１４等により、並列効率を含む並列性能評価指標等を計算し、ログデータ格納部３０に格納する（ステップＳ２７）。 In addition, when processing of the same processing group is continuously performed, it is possible to perform processing more efficiently while adjusting the optimum number of processors. That is, processing as shown in FIG. 26 is performed. First, provisional setting of the number of processors p is performed (step S21). The temporarily set processor number p is used for the first process in the same process group. Further, the setting of the target parallel efficiency is received from the user (step S23). Then, according to the setting of the number of processors p, parallel processing is performed by the parallel computer system 200, the processing time is measured by the measuring unit 201, and the measurement result is stored in the storage device (step S25). The data acquisition unit 10 stores data such as processing time measured by the measurement unit 201 in the log data storage unit 30. Then, a parallel performance evaluation index including parallel efficiency is calculated by the parallel efficiency calculation unit 14 and the like, and stored in the log data storage unit 30 (step S27).

そしてプロセッサ数最適化処理部２１は、上で述べた式に従って最適プロセッサ数(ｐ)_OPTを計算し、記憶装置に格納する（ステップＳ２９）。この計算された最適プロセッサ数(ｐ)_OPTを、同じ処理群の次の処理にて使用するプロセッサ数としてｐに設定する（ステップＳ３１）。そして、同一処理群の全ての処理を実施したか判断する（ステップＳ３３）。もし、全ての処理を実施したわけではない場合には、同一処理群の次の処理を選択し（ステップＳ３５）、ステップＳ２５に戻り、ステップＳ３１において設定したプロセッサ数で並列処理を実施する。 Then, the processor number optimization processing unit 21 calculates the optimum processor number (p) _OPT in accordance with the above-described formula and stores it in the storage device (step S29). The calculated optimum processor number (p) _OPT is set to p as the number of processors to be used in the next process of the same process group (step S31). And it is judged whether all the processes of the same process group were implemented (step S33). If not all the processes have been performed, the next process in the same process group is selected (step S35), the process returns to step S25, and the parallel process is performed with the number of processors set in step S31.

このような処理を実施することにより、同一処理群に属する前の処理についての最適プロセッサ数を次の処理のプロセッサ数として設定することができるので、より効率的に処理群の処理を行うことができるようになる。 By performing such processing, the optimal number of processors for the previous processing belonging to the same processing group can be set as the number of processors for the next processing, so that processing of the processing group can be performed more efficiently. become able to.

Ｂ．プロセッサ増設見積処理
プロセッサ増設見積処理部２２は、並列計算機システム２００のプロセッサ増設に対して定量的な指針として、システム増強時の加速率Ａ_systemを与えるための処理を実施する。図２７に処理フローを示す。まず、プロセッサ増設見積処理部２２は、システム増設時の増加分の稼働時間のデータ及びその予想並列効率のデータの設定入力を受け付ける（ステップＳ４１）。そして、式（１８）に従ってシステム増設時の加速率Ａ_systemを計算し、記憶装置に格納する（ステップＳ４３）。なお、現在使用中の各プロセッサの稼働時間等のデータについては、ログデータ格納部３０に格納された過去の処理ログデータを用いて計算する。そして、計算されたシステム増設時の加速率Ａ_systemを表示装置などの出力装置１１０に出力する（ステップＳ４５）。 B. Processor Expansion Estimation Processing The processor expansion estimation processing unit 22 performs processing for giving an acceleration rate A _system at the time of system enhancement as a quantitative guideline for processor expansion of the parallel computer system 200. FIG. 27 shows a processing flow. First, the processor expansion estimation processing unit 22 accepts setting input of data on the operating time for the increase in system expansion and data on the predicted parallel efficiency (step S41). Then, the acceleration rate A _system at the time of system expansion is calculated according to equation (18) and stored in the storage device (step S43). Note that data such as the operating time of each processor currently in use is calculated using past processing log data stored in the log data storage unit 30. Then, the calculated acceleration rate A _system at the time of system expansion is output to the output device 110 such as a display device (step S45).

設定した増加分の稼働時間とその予測並列効率に対するシステム増設時の加速率Ａ_systemにより、どの程度意味のある処理を実施するための時間が増加するかといったことを判断することができるようになる。 Based on the set up operating time and the estimated parallel efficiency, the acceleration rate A _system at the time of system expansion can be used to determine how much time is required to perform meaningful processing. .

Ｃ．システムリプレイスデータ処理
並列計算機システムの入れ替えに際して新しい並列計算機システムの性能決定のための定量的指針を提示するための処理を実施する。図２８にそのための処理フローを示す。システムリプレイスデータ処理部２３は、目標とする並列効率(Ｅ_p)_T及び繰返回数ｉcmaxの設定入力を受け付ける（ステップＳ５１）。また、新しい並列計算機システムの性能として現行の並列計算機システムに対する性能倍率Ａの設定入力を受け付ける（ステップＳ５３）。性能倍率については、ＣＰＵ性能の倍率Ａ_CPU、通信性能の倍率Ａ_C、Ｉ／Ｏ性能の倍率Ａ_I/O等の設定入力を受け付ける。定量的指針はこの倍率値で得られる。ほとんどの計算機システム入れ替えでは、ＣＰＵ性能の改善によりシステムの性能向上を図るため、例えばまずＡ_CPUを設定し、他の性能倍率を１としてＥ_pを計算する。そして(Ｅ_p)_Tに近付くようにＡ_C，Ａ_I/O等の値を繰り返し計算により求め、新しい並列計算機システムの性能決定のための指針を得るような方針で処理を行ってもよい。 C. System replacement data processing Processing for presenting quantitative guidelines for determining the performance of new parallel computer systems when parallel computer systems are replaced. FIG. 28 shows a processing flow for this purpose. The system replace data processing unit 23 receives setting inputs of the target parallel efficiency (E _p ) _T and the number of repetitions icmax (step S51). Also, the setting input of the performance factor A for the current parallel computer system is accepted as the performance of the new parallel computer system (step S53). As for the performance magnification, setting inputs such as the CPU performance magnification A _CPU , the communication performance magnification A _C , and the I / O performance magnification A _{I / O} are accepted. Quantitative guidance is obtained with this magnification value. In most computer system replacements, in order to improve the system performance by improving the CPU performance, for example, A _CPU is first set, and E _p is calculated with another performance magnification of 1. Then, processing may be performed in such a manner that values such as A _C and A _{I / O} are repeatedly calculated so as to approach (E _p ) _T and a guideline for determining the performance of a new parallel computer system is obtained.

より具体的には、システムリプレイスデータ処理部２３は、設定された各性能倍率に従ってログデータ格納部３０に格納された各処理時間等を短くする計算を実施する（ステップＳ５５）。例えばＣＰＵ性能が５倍（Ａ_CPU＝５）と設定された場合には、並列処理の処理時間γ_i(p)などを５分の１にするといった計算を行う。そして、設定された各性能倍率に従って短縮された各処理時間に基づき、並列効率を含む並列性能評価指標の見積値（例えば並列効率の見積値は(Ｅ_p)_E）を計算し、記憶装置に格納する（ステップＳ５７）。 More specifically, the system replace data processing unit 23 performs a calculation for shortening each processing time and the like stored in the log data storage unit 30 in accordance with each set performance magnification (step S55). For example, when the CPU performance is set to 5 times (A _CPU = 5), the calculation is performed such that the processing time γ _i (p) of parallel processing is reduced to 1/5. Then, based on each processing time shortened according to each set performance magnification, an estimated value of a parallel performance evaluation index including parallel efficiency (for example, the estimated value of parallel efficiency is (E _p ) _E ) is stored in the storage device. Store (step S57).

システムリプレイスデータ処理部２３は、並列効率の見積値(Ｅ_p)_Eが目標の並列効率(Ｅ_p)_Tとが一致するか判断する（ステップＳ５９）。完全一致でなくてもよく、目標の並列効率(Ｅ_p)_Tの所定の範囲内に見積値(Ｅ_p)_Eが入っているか判断する。もし、並列効率の見積値(Ｅ_p)_Eが目標の並列効率(Ｅ_p)_Tとほぼ一致すると判断された場合には、目標並列効率達成を示すメッセージ及びステップＳ５７において計算された各並列性能評価指標の見積値等を、表示装置などの出力装置１１０に出力する（ステップＳ６１）。一方、並列効率の見積値(Ｅ_p)_Eが目標の並列効率(Ｅ_p)_Tとがほぼ一致しているとは言えない場合、カウンタｉcが繰返回数ｉcmax以上になったか判断する（ステップＳ６３）。もし、カウンタｉcが繰返回数ｉcmax以上になった場合には、目標並列効率を達成できなかった旨のメッセージ及びステップＳ５７において計算された各並列能力評価指標の見積値等を表示装置等の出力装置１１０に出力する（ステップＳ６５）。 The system replacement data processing unit 23 determines whether the estimated value (E _p ) _E of the parallel efficiency matches the target parallel efficiency (E _p ) _T (step S59). It does not have to be a perfect match, and it is determined whether the estimated value (E _p ) _E is within a predetermined range of the target parallel efficiency (E _p ) _T. If it is determined that the estimated parallel efficiency (E _p ) _E is substantially equal to the target parallel efficiency (E _p ) _T , a message indicating the achievement of the target parallel efficiency and each parallel performance calculated in step S57. The estimated value of the evaluation index is output to the output device 110 such as a display device (step S61). On the other hand, if it cannot be said that the estimated parallel efficiency (E _p ) _E is substantially equal to the target parallel efficiency (E _p ) _T, it is determined whether the counter ic is equal to or greater than the number of repetitions icmax (step) S63). If the counter ic exceeds the number of repetitions icmax, the message indicating that the target parallel efficiency has not been achieved and the estimated value of each parallel ability evaluation index calculated in step S57 are output from the display device or the like. It outputs to the apparatus 110 (step S65).

一方、ｉcが繰返回数ｉcmax未満である場合には、ＣＰＵ性能の倍率、通信性能の倍率、Ｉ／Ｏ性能の倍率等の性能倍率の変更を実施する（ステップＳ６７）。このステップについては、自動的に変更するようにしてもよいし、ユーザによる設定を受け付けるようにしてもよい。そして、カウンタｉcを１インクリメントし（ステップＳ６９）、ステップＳ５５に戻る。 On the other hand, if ic is less than the number of repetitions icmax, a change in performance magnification such as a CPU performance magnification, a communication performance magnification, and an I / O performance magnification is performed (step S67). About this step, you may make it change automatically and may accept the setting by a user. Then, the counter ic is incremented by 1 (step S69), and the process returns to step S55.

上の処理では目標の並列効率(Ｅ_p)_Tを達成するように、最大繰返回数ｉcmaxまで性能倍率を変更して並列効率の見積を実施する。なお、ログデータ格納部３０に処理時間等が格納されている特定の処理に対して(Ｅ_p)_Tを満足する性能を有する新たな並列計算機システムを選ぶことも、幾つかの種類の処理に対して(Ｅ_p)_Tを満足する性能を有する新たな並列計算機システムを選ぶことも可能となる。 In the above process, the parallel efficiency is estimated by changing the performance factor up to the maximum number of repetitions icmax so as to achieve the target parallel efficiency (E _p ) _T. It is also possible to select a new parallel computer system having a performance satisfying (E _p ) _T for a specific process whose processing time is stored in the log data storage unit 30 for some types of processes. On the other hand, it becomes possible to select a new parallel computer system having a performance satisfying (E _p ) _T.

図２８の処理フローの適用例を、具体的に図２３のような処理時間が測定されたケースで説明する。ここでは、目標並列効率(Ｅ_p)_T＝０．６とし、ＣＰＵ性能が５倍、すなわちＡ_CPU＝５である新たなシステムを導入することを想定すると、γ_i(p)、χ_i,RED(p)は、１／Ａ_CPUとなる。性質が不明なχ_i,others(p)も１／Ａ_CPUとなるものと仮定する。一方、χ_i,C(p)はネットワーク性能に依存する。ここでは、まず最初に、Ａ_C＝∞として実現可能性を検討する。なお、未測定のχ_1,C(1)及びχ_1,others(1)については共に０とする。 An application example of the processing flow of FIG. 28 will be described specifically in a case where the processing time is measured as shown in FIG. Here, assuming that the target parallel efficiency (E _p ) _T = 0.6 and introducing a new system with CPU performance five times, that is, A _CPU = 5, γ _i (p), χ _{i, RED} (p) is 1 / A _CPU . It is assumed that χ _{i, others} (p) whose properties are unknown is also 1 / A _CPU . On the other hand, χ _{i, C} (p) depends on network performance. Here, first, the feasibility is examined with A _C = ∞. Note that χ _{1, C} (1) and χ _{1, others} (1) that have not been measured are both set to 0.

式（１２−１）から以下のような計算がなされる。

［Ａ_CPU＝５，Ａ_C＝∞の場合の(Ｅ_p)_E］

The following calculation is performed from the equation (12-1).

[(E _p ) _E when A _CPU = 5 and A _C = ∞]

以上の計算結果などを図２９にまとめる。Ａ_C＝∞ではχ_i,C(p)の項は０となり、Ｅ_p＝0.6850となるので目標値０．６より大きい。従って目標並列効率がＡ_Cの性能向上次第で達成できる可能性があることが分かる。そこでステップＳ６７でＡ_Cを変更しながら繰り返し並列効率の計算を実施し（ステップＳ５７）、Ｅ_p(p)〜０．６となるＡ_Cを探す。途中の計算は省略するが、Ｅ_p(p)〜０．６の場合の計算結果を図２９の第２行に示す。この場合、Ａ_C＝19.2である。これより(Ｅ_p)_T＝０．６でＣＰＵ性能をＡ_CPU＝５としたい場合、Ａ_C＝19.2以上の性能を有する並列計算機システムを探して入れ替えればよいという指針が得られる。 The above calculation results are summarized in FIG. When A _C = ∞, the term of χ _{i, C} (p) becomes 0 and E _p = 0.6850, which is larger than the target value 0.6. Thus it can be seen that the target parallel efficiency may be achieved depending on the performance improvement of the A _C. Accordingly, the parallel efficiency is repeatedly calculated while changing A _C in step S67 (step S57), and A _C that satisfies E _p (p) ˜0.6 is searched. Although the calculation in the middle is omitted, the calculation result in the case of E _p (p) to 0.6 is shown in the second row of FIG. In this case, A _C = 19.2. From this, when (E _p ) _T = 0.6 and the CPU performance is set to A _CPU = 5, a guideline is obtained that a parallel computer system having a performance of A _C = 19.2 or more may be searched and replaced.

なお、Ａ_C＝19.2という数字が高すぎて、そのようなシステムが現存しない場合には、他の並列性能阻害要因の縮小を検討する。図２９の第二行の見積結果によれば、冗長処理Ｒ_RED(4)＝0.2953を改善すべきということが分かる。冗長処理を削減するためには、プログラムのチューニングが必要となる。チューニングしたプログラムを実行し、図２８の処理を再度実行して、Ａ_Cを再度見積もればよい。 If the number A _C = 19.2 is too high and no such system exists, consider reducing other parallel performance impediment factors. According to the estimation result in the second row in FIG. 29, it can be seen that the redundancy processing R _RED (4) = 0.2953 should be improved. To reduce redundant processing, program tuning is necessary. Run the tuned program performs the process of FIG. 28 again, it Estimate the A _C again.

また、図２９に示したようにシステムリプレイス時の並列性能を予測することで、新たに導入する並列計算機システムのシステム運用効率Ｅ_systemを推定することも可能である。例えば、ある処理で目標並列効率(Ｅ_p)_T＝０．６をクリアする、Ａ_CPU＝５、Ａ_C＝１９．２という新しいシステムで、見積りが図１８に示される処理１乃至４を行った場合、Ｅ_system＝0.6534となるということが分かる。この予測されたＥ_systemと、これまでの処理ログから得られるＥ_systemとを比較することにより、システムの入れ替えによる稼働率の向上をより根拠があるデータで定量的に示すことができるようになる。 Also, as shown in FIG. 29, it is possible to estimate the system operation efficiency E _system of a newly introduced parallel computer system by predicting the parallel performance at the time of system replacement. For example, in a new system that clears the target parallel efficiency (E _p ) _T = 0.6 in a certain process, A _CPU = 5, A _C = 19.2, the processes 1 to 4 shown in FIG. It can be seen that E _system = 0.6534. This and predicted E _system, by comparing the E _system obtained from the previous processing log, we are possible to indicate quantitatively with the data is more evidence to improve the operating rate due to replacement of the system .

Ｄ．システム運用効率向上処理
例えば式（１７）で示されたシステム運用効率Ｅ_systemという指標を基に、システムの運用効率を評価する。この指標向上のために、どの処理の並列効率をどの程度向上させる必要があるか等、運用効率の向上に関して具体的指針を出す。具体的には、図３０の処理を実施する。 D. System Operation Efficiency Improvement Process For example, the system operation efficiency is evaluated based on the index of the system operation efficiency E _system expressed by the equation (17). In order to improve this index, specific guidelines are given for improving operational efficiency, such as how much parallel efficiency of which processing needs to be improved. Specifically, the process of FIG. 30 is performed.

運用効率データ処理部２４は、運用管理者によるシステム運用効率の目標値(Ｅ_system)_T及び繰返回数ｉcmaxの設定入力を受け付ける（ステップＳ７１）。そして、ログデータ格納部３０に格納されている処理時間や並列効率等のデータを読み出し、式（１７）に従ってシステム運用効率Ｅ_systemを計算し、記憶装置に格納する（ステップＳ７３）。なお、並列効率を含む並列性能評価指標の計算がなされていない場合には、この段階にてロードバランス寄与率計算部１１、仮想並列化率計算部１２、並列性能阻害要因寄与率計算部１３、並列効率計算部１４などにより並列効率を含む並列性能評価指標などを計算する。そして、ステップＳ７３において計算されたシステム運用効率Ｅ_systemがシステム運用効率の目標値(Ｅ_system)_Tを超えたか判断する（ステップＳ７５）。もし、Ｅ_system＞(Ｅ_system)_Tであると判断された場合には、目標達成を表すメッセージ及びステップＳ７３において計算されたシステム運用効率Ｅ_systemを表示装置などの出力装置１１０に出力する（ステップＳ７７）。一方、Ｅ_system≦(Ｅ_system)_Tである場合には、カウンタ値ｉcが繰返回数ｉcmax以上になっているか判断する（ステップＳ７９）。もし、カウンタ値ｉcが繰返回数ｉcmax以上であれば、システム運用効率向上処理がうまく機能していないことを知らせるため、目標未達を示すメッセージ及び直前のステップＳ７３で計算されたシステム運用効率Ｅ_systemを、表示装置などの出力装置１１０に出力する（ステップＳ８１）。 The operational efficiency data processing unit 24 accepts the setting input of the system operational efficiency target value (E _system ) _T and the number of repetitions icmax by the operational manager (step S71). Then, data such as processing time and parallel efficiency stored in the log data storage unit 30 is read, and the system operation efficiency E _system is calculated according to the equation (17) and stored in the storage device (step S73). If the parallel performance evaluation index including the parallel efficiency is not calculated, the load balance contribution rate calculation unit 11, the virtual parallelization rate calculation unit 12, the parallel performance impediment factor contribution rate calculation unit 13, A parallel performance evaluation index including parallel efficiency is calculated by the parallel efficiency calculation unit 14 and the like. Then, it is determined whether the system operation efficiency E _system calculated in step S73 exceeds the target value (E _system ) _T of the system operation efficiency (step S75). If it is determined that E _system > (E _system ) _T , the message indicating the achievement of the target and the system operation efficiency E _system calculated in step S73 are output to the output device 110 such as a display device (step). S77). On the other hand, if E _system ≦ (E _system ) _T , it is determined whether the counter value ic is equal to or greater than the number of repetitions icmax (step S79). If the counter value ic is equal to or greater than the number of repetitions icmax, a message indicating that the target has not been achieved and the system operation efficiency E calculated in the immediately preceding step S73 are used to notify that the system operation efficiency improvement process is not functioning well. _{The system} is output to the output device 110 such as a display device (step S81).

一方、カウンタ値ｉcが繰返回数ｉcmax未満であれば、運用効率データ処理部２４は、エンドユーザにはエンドユーザ向けの改善処置を、システム管理者にはシステム管理者向けの改善処置を、プログラマにはプログラマ向けの改善処置を、並列計算機システム開発者又は研究者には並列計算機システム開発者又は研究者向けの改善処置を実施するように勧め、エンドユーザ等は可能なシステム運用効率改善処置を実施する（ステップＳ８３）。なお、実施する処置の例としては、プロセッサ数を最適化させたり、プロセッサの増設、システムのリプレイス、プログラム等のチューニング等である。システム運用効率改善処置の実施後に、再度並列計算機システム２００により並列処理を実施し、同時に測定部２０１による処理時間等の測定処理を実施する（ステップＳ８５）。そして、カウンタ値ｉcを１インクリメントし（ステップＳ８７）、ステップＳ７３に戻る。なお、ステップＳ８３についてはエンドユーザなどが行う処理である場合もあるので点線ブロックで、ステップＳ８５については並列性能分析装置１００の処理ではないので一点鎖線のブロックで示している。 On the other hand, if the counter value ic is less than the number of repetitions icmax, the operational efficiency data processing unit 24 performs an improvement procedure for the end user for the end user and an improvement procedure for the system administrator for the system administrator. Recommends that the parallel computer system developer or researcher implement the improvement measures for the parallel computer system developer or researcher. Implement (step S83). Examples of measures to be implemented include optimizing the number of processors, adding processors, replacing systems, tuning programs, and the like. After the system operation efficiency improvement measures are implemented, parallel processing is again performed by the parallel computer system 200, and simultaneously measurement processing such as processing time by the measurement unit 201 is performed (step S85). Then, the counter value ic is incremented by 1 (step S87), and the process returns to step S73. Since step S83 may be a process performed by an end user or the like, it is a dotted line block, and step S85 is not a process of the parallel performance analysis apparatus 100, and is therefore indicated by a chain line.

このような処理を実施することにより、従来の稼働率ＮＷＲ_systemでは考慮されていない並列効率を反映させた、すなわち実効的な処理時間を考慮に入れたシステム運用効率を向上させることができるようになる。 By carrying out such processing, it is possible to improve the system operation efficiency reflecting the parallel efficiency that is not taken into consideration in the conventional operation rate NWR _system , that is, taking the effective processing time into consideration. Become.

Ｅ．チューニング処理
従来、並列アプリケーション・プログラムのチューニングによる性能向上作業は、達成目標が不明確であったためその作業時間の見積が容易でなかった。目標とした処理時間がチューニングでは到達不可能な場合もあり、チューニング作業を延々と続けて多大な作業時間を費やす場合も多々存在していた。そこで、図３１に示すような処理を実施する。 E. Tuning process Conventionally, performance improvement work by tuning parallel application programs has been difficult to estimate because the goal has not been clear. In some cases, the target processing time is not reachable by tuning, and there are many cases where a great amount of work time is spent continuing the tuning work. Therefore, processing as shown in FIG. 31 is performed.

まず、チューニング処理部２５は、プログラマによる目標処理時間(τ)_T、繰返回数ｉcmax及び制限並列効率(Ｅ_p)_maxの設定入力を受け付ける（ステップＳ９１）。次に、ログデータ格納部３０に格納されている並列効率及び処理時間のデータ（例えばチューニングしようとしているプログラムの処理ログに含まれる並列効率及び処理時間）を用いて、目標処理時間(τ)_Tに対応する目標並列効率(Ｅ_p)_Tを計算し、記憶装置に格納する（ステップＳ９３）。目標並列効率(Ｅ_p)_Tは以下の式で計算される。この式は線形外挿を表している。
(Ｅ_p)_T＝max（τ_i）×Ｅ_p(p)／(τ)_T First, the tuning processing unit 25 receives setting input of the target processing time (τ) _T , the number of repetitions icmax and the limited parallel efficiency (E _p ) _max by the programmer (step S91). Next, using the parallel efficiency and processing time data stored in the log data storage unit 30 (for example, the parallel efficiency and processing time included in the processing log of the program to be tuned), the target processing time (τ) _T The target parallel efficiency (E _p ) _T corresponding to is calculated and stored in the storage device (step S93). The target parallel efficiency (E _p ) _T is calculated by the following formula. This equation represents linear extrapolation.
(E _p ) _T = max (τ _i ) × E _p (p) / (τ) _T

そして目標並列効率(Ｅ_p)_Tが制限並列効率(Ｅ_p)_max以下であるか判断する（ステップＳ９５）。目標処理時間(τ)_Tを何らの制限なく設定すると、実現不可能な目標並列効率(Ｅ_p)_Tが設定されることになりかねないため、本ステップにおいて目標処理時間(τ)_Tが妥当であるか判断するものである。もし、目標並列効率(Ｅ_p)_Tが制限並列効率(Ｅ_p)_maxを超える場合には、目標処理時間(τ)_T又は制限並列効率(Ｅ_p)_maxの設定し直しが必要となるので、ステップＳ９１に戻る。 Then, it is determined whether the target parallel efficiency (E _p ) _T is equal to or less than the limited parallel efficiency (E _p ) _max (step S95). If the target processing time (τ) _T is set without any limitation, an unrealizable target parallel efficiency (E _p ) _T may be set, so the target processing time (τ) _T is appropriate in this step. It is a judgment whether it is. If the target parallel efficiency (E _p) _T exceeds the limit parallel efficiency (E _p) _max is, the target processing time (tau) _T or limit parallel efficiency (E _p) resets the _max is needed Return to step S91.

一方、目標並列効率(Ｅ_p)_Tが制限並列効率(Ｅ_p)_max以下である場合には、今回測定の処理時間τ(p)が目標処理時間(τ)_T以下になっているか判断する（ステップＳ９７）。なお、ステップＳ９７の最初の処理は、必ずＮｏと判断される。もし、今回測定の処理時間が目標処理時間(τ)_T以下になっている場合には、目標を達成した旨のメッセージ、達成された並列効率、処理時間τ(p)等のデータを表示装置等の出力装置１１０に出力する（ステップＳ９９）。一方、今回測定の処理時間が目標処理時間(τ)_Tを超えている場合には、カウンタ値ｉcが繰返回数ｉcmax以上になっているか判断する（ステップＳ１０１）。もし、カウンタ値ｉcが繰返回数ｉcmax以上になった場合には、目標達成不可能を表すメッセージ、達成できた並列効率、処理時間τ(p)等のデータを、表示装置などの出力装置１１０に出力する（ステップＳ１０３）。 On the other hand, when the target parallel efficiency (E _p ) _T is equal to or less than the limited parallel efficiency (E _p ) _max , it is determined whether the processing time τ (p) of the current measurement is equal to or less than the target processing time (τ) _T. (Step S97). Note that the first process in step S97 is always determined as No. If the processing time of the current measurement is less than or equal to the target processing time (τ) _T , a message indicating that the target has been achieved, the parallel efficiency achieved, data such as the processing time τ (p), etc. are displayed on the display device. To the output device 110 (step S99). On the other hand, if the processing time of the current measurement exceeds the target processing time (τ) _T , it is determined whether the counter value ic is equal to or greater than the number of repetitions icmax (step S101). If the counter value ic is equal to or greater than the number of repetitions icmax, a message indicating that the target cannot be achieved, the parallel efficiency achieved, the data such as the processing time τ (p), and the like are output to the output device 110 such as a display device. (Step S103).

もし、カウンタ値ｉcが繰返回数ｉcmax未満である場合には、カウンタ値ｉcを１インクリメントする（ステップＳ１０５）。そして、冗長処理、ロードバランス、通信処理、Ｉ／Ｏなどの並列性能阻害要因についてチューニングを行う（ステップＳ１０７）。プログラムの書き換えではなく、ツールやコンパイラ、ランタイム・ライブラリなどによりチューニングを実施しても良い。プログラマが実施する作業である場合もあるのでここでは点線ブロックで示している。チューニングの後に、並列計算機システム２００にてプログラムを再度並列処理し、同時に測定部２０１により処理時間等の測定処理を実施し、記憶装置に格納する（ステップＳ１０９）。ステップＳ１０９も並列性能分析装置１００の処理ではないので一点鎖線によるブロックで表している。この後、データ取得部１０が、並列計算機システム２００から処理時間等のデータを取得し、ログデータ格納部３０に格納する。そして、ロードバランス寄与率計算部１１、仮想並列化率計算部１２、並列性能阻害要因寄与率計算部１３、並列効率計算部１４などにより、並列効率を含む並列性能評価指標を計算し、ログデータ格納部３０に格納する（ステップＳ１１１）。そして、ステップＳ９７に戻る。 If the counter value ic is less than the number of repetitions icmax, the counter value ic is incremented by 1 (step S105). Then, tuning is performed for parallel performance impediment factors such as redundancy processing, load balance, communication processing, and I / O (step S107). Tuning may be performed by tools, compilers, runtime libraries, etc. instead of rewriting programs. Since it may be an operation performed by a programmer, it is indicated by a dotted line block here. After tuning, the parallel computer system 200 processes the program again in parallel, and at the same time, the measurement unit 201 performs measurement processing such as processing time and stores it in the storage device (step S109). Since step S109 is not a process of the parallel performance analysis apparatus 100, it is represented by a block with a one-dot chain line. Thereafter, the data acquisition unit 10 acquires data such as processing time from the parallel computer system 200 and stores the data in the log data storage unit 30. Then, a parallel performance evaluation index including parallel efficiency is calculated by the load balance contribution rate calculation unit 11, virtual parallelization rate calculation unit 12, parallel performance impediment factor contribution rate calculation unit 13, parallel efficiency calculation unit 14, and the like, and log data Store in the storage unit 30 (step S111). Then, the process returns to step S97.

このように、所定のチューニング回数だけ目標処理時間(τ)_Tを達成すべくチューニング作業を実施することになるので、プログラマも効率的な作業を実施することができるようになる。 Thus, the tuning operation is performed so as to achieve the target processing time (τ) _T by a predetermined number of tunings, so that the programmer can also perform the efficient operation.

例えば図２３のような処理時間を基に具体例を示しておく。この際、τ(p)＝３７、Ｅ_p(4)=0.4443であるので、今仮に(Ｅ_p)_max＝０．６、(τ)_T＝２８とすると、(Ｅ_p)_T＝0.5871となる。従って、ステップＳ９５からステップＳ９７に移行する。最初の処理であるからステップＳ９７からステップＳ１０１及びＳ１０５を介してステップＳ１０７に移行する。そこで１回目のチューニングとして通信時間χ_Cを１／２に削減したものとする。その結果を用いてステップＳ１１１で並列効率を含む並列性能評価指標を計算する。そうすると図３２に示すような結果が得られる。なお、図３２は処理時間max(τ_i)を加えて比較したものである。 For example, a specific example is shown based on the processing time as shown in FIG. At this time, since τ (p) = 37 and E _p (4) = 0.4443, assuming that (E _p ) _max = 0.6 and (τ) _T = 28, (E _p ) _T = 0.5871 Become. Therefore, the process proceeds from step S95 to step S97. Since this is the first process, the process proceeds from step S97 to step S107 via steps S101 and S105. Therefore, it is assumed that the communication time χ _C is reduced to ½ as the first tuning. Using the result, a parallel performance evaluation index including parallel efficiency is calculated in step S111. Then, a result as shown in FIG. 32 is obtained. FIG. 32 shows a comparison with the processing time max (τ _i ) added.

チューニングとして通信時間χ_Cを１／２に削減した場合の計算方法は以下のとおりである。なお、χ_1,C(4)＝１０／２＝５、χ_2,C(4)＝１１／２＝５．５、χ_3,C(4)＝１２／２＝６、χ_4,C(4)＝１３／２＝６．５とする。また式（１２−１）から以下のように計算される。

（１）ロードバランス寄与率（式（５））

（２）仮想並列化率（式（６−１））

（３）並列性能阻害要因寄与率（式（７））

（４−１）並列効率（式（４−４））

（４−２）並列効率（式（９−１））

As a tuning, the calculation method when the communication time χ _C is reduced to ½ is as follows. Note that χ _{1, C} (4) = 10/2 = 5, χ _{2, C} (4) = 11/2 = 5.5, χ _{3, C} (4) = 12/2 = 6, χ _{4, C} (4) = 13/2 = 6.5. Moreover, it calculates as follows from Formula (12-1).

(1) Load balance contribution ratio (Formula (5))

(2) Virtual parallelization rate (Formula (6-1))

(3) Parallel performance impediment factor contribution ratio (Formula (7))

(4-1) Parallel efficiency (Formula (4-4))

(4-2) Parallel efficiency (Formula (9-1))

１回のチューニングでは処理時間max(τi)（＝τ(p)）は３０．５で目標処理時間(τ)_Tを達成できていないので、再度何らかのチューニングを実施する必要がある。 In one tuning, the processing time max (τi) (= τ (p)) is 30.5 and the target processing time (τ) _T has not been achieved. Therefore, some tuning needs to be performed again.

従来、並列性能の評価は、プロセッサ数を変えての時間変化、他のシステムとの処理時間比較、時間内に行われたオペレーション数の比較等、処理時間比較をベースにして行われた。これには２度以上の時間測定が必要で、プログラム開発時間を増加させる原因となっていた。またこの比較という相対的な並列性能評価では、処理データが変わった場合、比較基準を再度測定する必要が出てくる。このように並列性能評価に時間がかる結果、ある条件でしか並列性能が出ないアプリケーション・プログラムが開発されてしまう場合が生ずる。上で述べたような処理を実施することにより、並列効率による並列性能評価が１回の測定でできるようになり、並列アプリケーション・プログラムの開発時間のうちの性能評価時間を大幅に短縮することが可能となる。その結果、並列性能を十分考慮した並列アプリケーション・プログラムの開発が現実的に実施できるようになる。 Conventionally, evaluation of parallel performance has been performed on the basis of processing time comparison such as time change with the number of processors, processing time comparison with other systems, and comparison of the number of operations performed within the time. This required more than two time measurements, which increased the program development time. Further, in this relative parallel performance evaluation called comparison, when the processing data changes, it is necessary to measure the comparison criterion again. As a result of the time taken for parallel performance evaluation in this way, an application program that produces parallel performance only under certain conditions may be developed. By performing the processing described above, parallel performance evaluation based on parallel efficiency can be performed in a single measurement, and the performance evaluation time in the development time of parallel application programs can be significantly reduced. It becomes possible. As a result, it becomes possible to practically develop a parallel application program that fully considers parallel performance.

また従来では、アプリケーション・プログラムのチューニングによる性能向上作業は達成目標が不明確であったため作業時間見積が容易でなかった。またどのような場合に作業を終えるかが明確にならず、結果的に多大な作業時間を費やす場合も生じていた。さらにアプリケーション・プログラムのチューニングではなく再開発が必要になってしまう場合もあった。上で述べたような処理を実施することにより、アプリケーション・プログラムのチューニングによる並列効率向上の目標を明確に定め、チューニングの繰返回数等で作業時間の予測もできるようになる。 In the past, performance improvement work by tuning application programs was not easy to estimate, because the goal was unclear. Further, it is not clear when the work is finished, and as a result, a large amount of work time may be spent. In some cases, redevelopment is required instead of tuning application programs. By performing the processing as described above, it becomes possible to clearly set a target for improving the parallel efficiency by tuning the application program and to predict the working time based on the number of times of tuning.

さらに従来では、アプリケーション・プログラムのチューニングはアプリケーション・プログラムの中で処理時間が長い手続き（アプリケーション・プログラムの一部分）を時間測定等により探し出し、その手続き内で問題となっている並列性能阻害要因を処理時間の比較により探し出し、その処理時間を減らすという形で行われた。上で述べたような処理により、このようなアプリケーション・プログラムの一部分のチューニングに対し、ロードバランスに対する性能評価が初めて可能となった。 Furthermore, in the past, tuning of an application program is to find a procedure (part of the application program) with a long processing time in the application program by measuring time, etc., and process the parallel performance impediment factors that are problematic in the procedure. The search was made by comparing the time, and the processing time was reduced. With the processing described above, it has become possible for the first time to evaluate the performance of the load balance against the tuning of a part of the application program.

Ｆ．アルゴリズム選定処理
従来、並列アプリケーション・プログラムの一部に用いるアルゴリズムを変えたもの同士の性能比較には処理時間を用いたが、処理時間減少の原因が並列処理の効果によるものか、機能の違いによる効果か（例えば演算数の減少か）判別できなかった。その結果、処理時間は短いがスケーラビリティが悪いアルゴリズムにたくさんのプロセッサをつぎ込む資源の無駄使いを見逃すこととなった。本実施の形態では、より並列効率の良いアルゴリズムを選定して、システム全体の運用効率を向上させる。ここではまず、並列処理に向かないアルゴリズムと並列処理に向くアルゴリズムを比較した例を示しておく。 F. Algorithm selection processing Traditionally, processing time was used to compare the performance of algorithms that were used as part of a parallel application program. It was not possible to determine whether it was an effect (for example, a decrease in the number of operations). As a result, we missed the waste of resources that put a lot of processors into algorithms with short processing time but poor scalability. In this embodiment, an algorithm with better parallel efficiency is selected to improve the operational efficiency of the entire system. Here, an example in which an algorithm that is not suitable for parallel processing is compared with an algorithm that is suitable for parallel processing will be described.

［並列処理に向かないアルゴリズム］
例えば図３３に示すような処理時間の測定がなされた場合を例に説明する。なお、χ_1,C(1)＝０とする。また式（１２−１）から以下のように計算される。

（１）ロードバランス寄与率（式（５））

（２）仮想並列化率（式（６−１））

（３）加速率（式（６−２））

（４）並列性能阻害要因寄与率（式（７））

（４−１）並列効率（式（４−４））

（４−２）並列効率（式（９−１））

[Algorithms not suitable for parallel processing]
For example, the case where the processing time is measured as shown in FIG. 33 will be described as an example. Note that χ _{1, C} (1) = 0. Moreover, it calculates as follows from Formula (12-1).

(1) Load balance contribution ratio (Formula (5))

(2) Virtual parallelization rate (Formula (6-1))

(3) Acceleration rate (Formula (6-2))

(4) Parallel performance impediment factor contribution ratio (Formula (7))

(4-1) Parallel efficiency (Formula (4-4))

(4-2) Parallel efficiency (Formula (9-1))

［並列処理に向くアルゴリズム］
図３４に示すような処理時間が測定された場合の例を説明する。なお、χ_1,C(1)＝０とする。また式（１２−１）から以下のように計算される。

（１）ロードバランス寄与率（式（５））

（２）仮想並列化率（式（６−１））

（３）加速率（式（６−２））

（４）並列性能阻害要因寄与率（式（７））

（４−１）並列効率（式（４−４））

（４−２）並列効率（式（９−１））

[Algorithm for parallel processing]
An example when the processing time as shown in FIG. 34 is measured will be described. Note that χ _{1, C} (1) = 0. Moreover, it calculates as follows from Formula (12-1).

(1) Load balance contribution ratio (Formula (5))

(2) Virtual parallelization rate (Formula (6-1))

(3) Acceleration rate (Formula (6-2))

(4) Parallel performance impediment factor contribution ratio (Formula (7))

(4-1) Parallel efficiency (Formula (4-4))

(4-2) Parallel efficiency (Formula (9-1))

以上の処理結果をまとめると図３５のようになる。並列処理に向かないアルゴリズムの番号をｊ＝１、並列処理に向くアルゴリズムの番号をｊ＝２とすると、ｊ＝１では加速率Ａ_p＝５．０００で有限であり、プロセッサを増加させても５個分が効率的に限度であることが分かる。一方、ｊ＝２では加速率Ａ_p＝∞であり、プロセッサを投入すればするほど処理時間が短くなる可能性がある。なお、処理時間τはｊ＝１の方が１１０でｊ＝２の１２０より短いので、今まであれば並列処理に向かないアルゴリズムであるｊ＝１を選択してしまう場合があった。 The above processing results are summarized as shown in FIG. Assuming that the number of the algorithm not suitable for parallel processing is j = 1 and the number of the algorithm suitable for parallel processing is j = 2, the acceleration rate A _p = 5.000 is finite at j = 1, and even if the number of processors is increased. It can be seen that the limit of 5 is effectively the limit. On the other hand, when j = 2, the acceleration rate A _p = ∞, and the processing time may be shortened as the processor is inserted. Since the processing time τ is 110 when j = 1 and shorter than 120 when j = 2, j = 1 may be selected, which is an algorithm that is not suitable for parallel processing.

そこで本実施の形態では図３６に示す処理をアルゴリズム選定処理部２６にて実施するものとする。まず、プログラマにより目標処理時間(τ)_Tの設定入力を受け付ける（ステップＳ１２１）。そして、初期設定として、アルゴリズム番号ｊを１に、最適なアルゴリズム番号ｊ_Tを１に設定する（ステップＳ１２３）。また、ｊ＝１の場合に、目標処理時間(τ)_Tを達成するために必要なプロセッサ数(ｐ)₁を線形外挿で計算する（ステップＳ１２５）。すなわち、ログデータ格納部３０に格納されたアルゴリズム番号ｊ＝１の処理時間等を用いて、(ｐ)₁＝(τ)₁／(τ)_T／(Ｅ_p)₁＋(ｐ)₁を計算し、記憶装置に格納する（ステップＳ１２５）。また、最適なアルゴリズムについて必要なプロセッサ数(ｐ)_T＝ＩＮＴ（(ｐ)₁＋０．９９）と設定する。さらに、ｐ_min＝ｐ₁と設定する。 Therefore, in the present embodiment, it is assumed that the processing shown in FIG. First, the setting input of the target processing time (τ) _T is received by the programmer (step S121). Then, as an initial setting, the algorithm number j to 1, set to 1 the optimal algorithm number j _T (step S123). When j = 1, the number of processors (p) ₁ necessary to achieve the target processing time (τ) _T is calculated by linear extrapolation (step S125). That is, (p) ₁ = (τ) ₁ / (τ) _T / (E _p ) ₁ + (p) ₁ using the processing time of the algorithm number j = 1 stored in the log data storage unit 30 Calculate and store in the storage device (step S125). Further, the number of processors required for the optimum algorithm is set as (p) _T = INT ((p) ₁ +0.99). Further, p _min = p ₁ is set.

次にｊを１インクリメントする（ステップＳ１２７）。そして、ｊの場合のプロセッサ数(ｐ)_jを以下の式で計算し、記憶装置に格納する（ステップＳ１２９）。
(ｐ)_j＝(τ)_j／(τ)_T／(Ｅ_p)_j＋(ｐ)_j
そして、(ｐ)_j＜(ｐ)_minであり、且つ(Ａ_p)_j＞(ｐ)_jであるか確認する（ステップＳ１３１）。すなわち、(ｐ)_jが最小であり、当該アルゴリズムの加速率(Ａ_p)_jより最適プロセッサ数が小さくなっているか、すなわち実現可能かを確認する。ステップＳ１２５及びＳ１２９では線形外挿で単純に(ｐ)_jを計算しているので、実現可能か否かをここで担保するものである。もし、(ｐ)_j＜(ｐ)_minであり、且つ(Ａ_p)_j＞(ｐ)_jである場合には、アルゴリズム番号ｊをｊ_Tに設定する。すなわち、ｊ_T＝ｊ。また、(ｐ)_T＝ＩＮＴ（(ｐ)_j＋０．９９）と設定する（ステップＳ１３３）。 Next, j is incremented by 1 (step S127). Then, the number of processors (p) j in the case of _j is calculated by the following formula and stored in the storage device (step S129).
(p) _j = (τ) _j / (τ) _T / (E _p ) _j + (p) _j
Then, it is confirmed whether (p) _j <(p) _min and (A _p ) _j > (p) _j (step S131). That is, it is confirmed whether (p) _j is minimum and the optimal number of processors is smaller than the acceleration rate (A _p ) _{j of the} algorithm, that is, it is feasible. In steps S125 and S129, since (p) _j is simply calculated by linear extrapolation, whether or not it can be realized is guaranteed here. If (p) _j <(p) _min and (A _p ) _j > (p) _j , the algorithm number j is set to j _T. That is, j _T = j. Further, (p) _T = INT ((p) _j +0.99) is set (step S133).

ステップＳ１３１又はステップＳ１３３の後に、ｊがアルゴリズム数ｊ_max以上になっているか確認する（ステップＳ１３５）。すなわち全てのアルゴリズムについて処理したか判断する。もし、ｊ≧ｊ_maxであれば、最終的にｊ_Tで特定されたアルゴリズム番号及びその場合のプロセッサ数(ｐ)_T並びに他の処理結果（ｊ，(ｐ)_j，(Ａ_p)_j，(τ)_j等のセット）を、表示装置等の出力装置１１０に出力する（ステップＳ１３７）。一方、ｊ＜ｊ_maxであればステップＳ１２７に戻る。 After step S131 or step S133, it is confirmed whether j is equal to or greater than the algorithm number j _max (step S135). That is, it is determined whether all algorithms have been processed. If j ≧ j _max , the algorithm number finally specified by j _T and the number of processors (p) _{T in} that case and other processing results (j, (p) _j , (A _p ) _j , (τ) _{j and the} like) are output to the output device 110 such as a display device (step S137). On the other hand, if j < _jmax , the process returns to step S127.

このようにすれば実現性のある範囲内でプロセッサ数が少なく目標処理時間を達成することができるアルゴリズムを特定することができる。また、本処理フローで最適とされたアルゴリズムだけではなく、あまりプロセッサ数が異ならないアルゴリズムでチューニングなどがやりやすいアルゴリズムを選択することもできる。 In this way, it is possible to specify an algorithm that can achieve the target processing time with a small number of processors within a feasible range. It is also possible to select not only the algorithm optimized in this processing flow but also an algorithm that does not differ much in number of processors and that can be easily tuned.

図３５に示した２つのアルゴリズムの例で具体的に説明する。初めに目標処理時間(τ)_T＝５０を設定する。次にそれらのアルゴリズムの(Ｅ_p)_j，(τ)_jを用いて線形外挿により必要なプロセッサ数(ｐ)_jを計算する。図３６に示した処理フローでは(ｐ)_jを線形外挿で求めるため、冗長処理のみを考慮したＡ_p(4)をプロセッサ数の上限として導入して、Ａ_p(4)＞(ｐ)_jであれば(ｐ)_jを適用できるものとする。その結果、並列処理に向かないアルゴリズムの限界性能が5.000であるのに対して(ｐ)_jは7.872で、並列処理に向かないアルゴリズムについては(ｐ)_jを適用できないことが分かる。一方、並列処理に向くアルゴリズムの限界性能は∞であるので、6.618のプロセッサで(τ)_T＝５０を達成できる可能性がある。従って並列処理に向くアルゴリズムｊ_T＝１を選ぶことができる。その場合の初めの目安は、6.618を切り上げして得た(ｐ)_T＝７である。今まではτの１１０と１２０を比べて処理時間が短いが並列処理に向かないアルゴリズムを採用する場合が多かったが、図３６の処理フローによりｐ＝４では処理時間が長いが並列処理に向くアルゴリズムを選択できるようになる。 This will be specifically described with an example of two algorithms shown in FIG. First, the target processing time (τ) _T = 50 is set. Next, the required number of processors (p) _j is calculated by linear extrapolation using (E _p ) _j and (τ) _j of those algorithms. In the processing flow shown in FIG. 36, since (p) _{j is obtained} by linear extrapolation, A _p (4) considering only redundant processing is introduced as the upper limit of the number of processors, and A _p (4)> (p) _{If j} , (p) _j can be applied. As a result, the limit performance of an algorithm that is not suitable for parallel processing is 5.000, whereas (p) _j is 7.872, and (p) _j cannot be applied to an algorithm that is not suitable for parallel processing. On the other hand, since the limit performance of an algorithm suitable for parallel processing is ∞, there is a possibility that (τ) _T = 50 can be achieved with a 6.618 processor. Therefore, an algorithm j _T = 1 suitable for parallel processing can be selected. The initial guide in that case is (p) _T = 7 obtained by rounding up 6.618. Up to now, in many cases, an algorithm that has a shorter processing time than τ of 110 and 120 but is not suitable for parallel processing has been adopted. However, according to the processing flow of FIG. The algorithm can be selected.

Ｇ．並列性能評価処理
本実施の形態においては、実運用における全処理の並列性能評価指標のログデータを作成することが可能となる。このログデータにおいてある特定の処理をターゲットにすれば専用並列計算機システムに必要な仕様書（ＣＰＵ性能、通信性能、Ｉ／Ｏ性能、ランタイム・ライブラリの性能等）を求めることが可能となる。全アプリケーションによる処理をターゲットにすればそのログに対する汎用並列計算機システムに必要な仕様書を作成することも可能となる。 G. Parallel Performance Evaluation Processing In this embodiment, it is possible to create log data of parallel performance evaluation indexes for all processes in actual operation. If a specific process in the log data is targeted, it becomes possible to obtain specifications (CPU performance, communication performance, I / O performance, runtime library performance, etc.) necessary for the dedicated parallel computer system. If processing by all applications is targeted, it is possible to create specifications necessary for a general-purpose parallel computer system for the logs.

例えば図３７に示した処理番号１乃至４の処理性能を向上するためには、通信性能を上げること、又はＣＰＵの性能と通信の性能の比を保つ形で両者の性能向上を図れば良いことが分かる。並列性能評価処理部２７は、例えばログデータ格納部３０に格納されているデータから図３７のようなテーブルを構成し、表示装置等の出力装置１１０に出力する。また、並列性能阻害要因のうちどの処理においても相対的に高い値を示しているものを強調表示するような処理を行っても良い。また、処理５の性能を向上するためには、システムのリプレイス等による性能向上ではなく、アプリケーション・プログラムのチューニング等が必要なことが分かる。これは、処理５だけ冗長処理による並列性能阻害要因寄与率が大きな値を示しているからであり、並列性能評価処理部２７は特徴的な処理についても抽出して、例えば強調表示させるような場合もある。 For example, in order to improve the processing performance of processing numbers 1 to 4 shown in FIG. 37, it is only necessary to improve the communication performance or to improve the performance of both in a form that maintains the ratio between the CPU performance and the communication performance. I understand. For example, the parallel performance evaluation processing unit 27 forms a table as shown in FIG. 37 from the data stored in the log data storage unit 30 and outputs the table to the output device 110 such as a display device. Moreover, you may perform the process which highlights what shows a relatively high value in any process among parallel performance obstruction | occlusion factors. Further, it can be seen that in order to improve the performance of the process 5, it is necessary to tune the application program, etc., not to improve the performance by replacing the system. This is because the parallel performance impediment factor contribution rate due to the redundant processing is large only for the processing 5, and the parallel performance evaluation processing unit 27 also extracts characteristic processing, for example, highlighting it. There is also.

通信性能を決定する方法としては、例えばシステムリプレイスデータ処理で説明したような処理を実施すればよい。すなわち、通信性能の倍率以外は１に固定してしまい、目標のＥ_p(4)をクリアするまで実施する。 As a method for determining the communication performance, for example, the processing described in the system replacement data processing may be performed. That is, it is fixed to 1 except for the magnification of the communication performance, and is performed until the target E _p (4) is cleared.

なお、処理数の多い処理番号１乃至４のパターンの処理に着目して通信性能を向上すれば、このシステムは汎用並列計算機システムとなる。一方、処理数の少ない処理番号５のパターンの処理に着目して冗長処理を軽減する仕組みを計算機システムに導入すれば、専用並列計算機システムとなる。また、処理番号５のアプリケーション・プログラムのチューニングを行い、冗長処理を減らせば、通信性能を向上するだけで１乃至５の汎用並列計算機システムとなる。 If the communication performance is improved by paying attention to the processing of patterns with process numbers 1 to 4 having a large number of processes, this system becomes a general-purpose parallel computer system. On the other hand, if a mechanism for reducing the redundant processing is introduced into the computer system by paying attention to the processing of the pattern with the processing number 5 having a small number of processing, a dedicated parallel computer system is obtained. Further, if the application program of process number 5 is tuned to reduce redundant processing, it becomes a general-purpose parallel computer system of 1 to 5 only by improving communication performance.

従来では、アプリケーション・プログラムの並列処理の特徴により並列計算機システムの並列性能が大きく変わるため、並列計算機システムを開発することが容易でなかった。それを克服する方法として、アプリケーション・プログラムを特定し並列性能を分析して、それに合った並列計算機システムを開発する方法が多く用いられていた。しかしこの方法ではアプリケーション・プログラムが変わると、全く並列性能を発揮できないシステムを開発してしまう恐れがある。本実施の形態によれば、実運用における全処理の並列性能評価指標のログデータを作成することが可能となるため、このログデータを基にして、ある特定の処理をターゲットにすれば専用並列計算機システムに必要な仕様書（ＣＰＵ性能、通信性能、Ｉ／Ｏ性能、ランタイム・ライブラリの性能等）を作成することが可能となる。また、全処理をターゲットにすればそのログに対する汎用並列計算機システムに必要な仕様書を作成することも可能となる。 Conventionally, since the parallel performance of a parallel computer system varies greatly depending on the parallel processing characteristics of application programs, it has been difficult to develop a parallel computer system. As a method for overcoming this problem, a method of identifying an application program, analyzing parallel performance, and developing a parallel computer system suitable for it has been used. However, with this method, if the application program changes, there is a risk of developing a system that cannot exhibit parallel performance at all. According to the present embodiment, it is possible to create log data of parallel performance evaluation indexes for all processes in actual operation. Therefore, if a specific process is targeted based on this log data, dedicated parallel It is possible to create specifications (CPU performance, communication performance, I / O performance, runtime library performance, etc.) necessary for the computer system. In addition, if all processes are targeted, it is possible to create specifications necessary for a general-purpose parallel computer system for the log.

また従来では、並列計算機システムの並列性能阻害要因を定量的に把握する手段が組み込まれているか否かはシステムにより大きく差があり、並列性能阻害要因を定量的に把握する手段を全く持っていないシステムもある。本実施の形態では、式（７）で示すように、並列性能阻害要因が無い状態から任意に要因を追加できる機能を持つため、販売後、システムのアプグレード時に要因測定機能を追加して評価精度を高めることができる。 Conventionally, whether or not a means for quantitatively grasping the parallel performance impediment factor of the parallel computer system varies greatly depending on the system, and there is no means for quantitatively grasping the parallel performance impediment factor. There is also a system. In this embodiment, as shown in Equation (7), since there is a function that can arbitrarily add a factor from the state where there is no parallel performance impediment factor, evaluation is performed by adding a factor measurement function after system upgrade. Accuracy can be increased.

さらに、従来の性能評価指標である例えばflop/s，Mop/s，tpmC等は、アプリケーション・プログラムの種類によって適用できるものとできないものがある。本実施の形態では、指標を時間比で表わすため、全てのアプリケーション・プログラムに対して有効であり、性能評価を適切に実施できるようになる。さらに、従来の並列性能評価方法には特別な並列処理にしか適用できないものがあったが、本実施の形態によれば全ての並列処理に適用することができる。 Furthermore, conventional performance evaluation indexes such as flop / s, Mop / s, and tpmC may be applicable or not applicable depending on the type of application program. In this embodiment, since the index is represented by a time ratio, it is effective for all application programs, and performance evaluation can be appropriately performed. Furthermore, some conventional parallel performance evaluation methods can be applied only to special parallel processing, but according to the present embodiment, they can be applied to all parallel processing.

以上本発明の実施の形態を説明したが、これにより、並列処理の性能を表わす並列効率に対し、それを低下させる割合を並列性能評価指標、すなわちロードバランス寄与率、仮想並列化率及び並列性能阻害要因で示すことができるようになる。並列性能評価指標にロードバランス寄与率が加わり、全ての並列処理の並列性能評価が可能となる。 Although the embodiment of the present invention has been described above, the ratio of reducing the parallel efficiency representing the performance of parallel processing by the parallel performance evaluation index, that is, the load balance contribution ratio, the virtual parallelization ratio, and the parallel performance. It becomes possible to show by the obstruction factor. The load balance contribution ratio is added to the parallel performance evaluation index, and parallel performance evaluation of all parallel processing becomes possible.

また、式（８−２）を用いれば、Ｒ_p(p)がほぼ１である場合には、並列効率の計算に推測値τ(1)を必要としないため、τ(1)を測定できないグリッドやクラスタによる並列処理を含め、全ての並列処理の正確な（推測値τ(1)を含まないと言う意味で）並列性能評価が可能となる。 Further, if the equation (8-2) is used, when R _p (p) is approximately 1, the estimated value τ (1) is not required for the calculation of the parallel efficiency, and therefore τ (1) cannot be measured. Accurate parallel performance evaluation (in the sense of not including the estimated value τ (1)) of all parallel processing including parallel processing by grids and clusters becomes possible.

さらに、式（９−１）及び式（９−２）を用いれば、Ｒ_p(p)＜１の場合でも、並列効率の計算のためにτ(1)を見積ることにより、τ(1)を測定できないグリッドやクラスタによる並列処理を含め、全ての並列処理の並列性能評価が可能となる。 Furthermore, using Equations (9-1) and (9-2), even when R _p (p) <1, by estimating τ (1) for the calculation of parallel efficiency, τ (1) Parallel performance evaluation of all parallel processing is possible, including parallel processing by grids and clusters that cannot measure

式（７）の式の形により、対象とする並列計算機システムに特有の並列処理阻害要因を随時導入可能となり、詳細な性能評価を容易に実施できる。さらに、並列性能阻害要因の寄与率を並列効率に対する百分率で捉えられ、直感的な並列性能評価が可能となる。 The form of the expression (7) makes it possible to introduce a parallel processing impediment factor specific to the target parallel computer system as needed, and a detailed performance evaluation can be easily performed. Furthermore, the contribution rate of the parallel performance impediment factor can be grasped as a percentage of the parallel efficiency, and an intuitive parallel performance evaluation becomes possible.

また、ロードバランスの寄与が、並列効率に対する比率という数値で明確となったため、今まで評価できなかった並列性能に対するロードバランスの寄与が具体的に示せるようになる。 In addition, since the contribution of load balance is clarified by the numerical value of the ratio to the parallel efficiency, the contribution of the load balance to the parallel performance that could not be evaluated so far can be specifically shown.

また、並列性能指標を計算して提示するだけではなく、処理時間測定により決定された並列効率を用い、効率の良い処理を行うプロセッサ数の決定ができる。さらに、並列処理の効率を考慮した上で、プロセッサの増減を検討できる。 In addition to calculating and presenting a parallel performance index, the number of processors that perform efficient processing can be determined using the parallel efficiency determined by processing time measurement. Furthermore, the increase or decrease of the processor can be considered in consideration of the efficiency of parallel processing.

さらに、性能仕様が異なる新しい並列計算機システムの導入を机上で検討できる。また、並列性能評価指標を用いて，システム運用における利用効率管理ができる。 Furthermore, the introduction of a new parallel computer system with different performance specifications can be examined on the desk. In addition, the use efficiency management in system operation can be performed using the parallel performance evaluation index.

以上本発明の実施の形態を説明したが、本発明はこれに限定されない。例えば図１９の機能ブロック図は一例であり、必ずしもプログラムモジュールとは対応しない。また、プロセッサ数最適化処理部２１、プロセッサ増設見積処理部２２、システムリプレイスデータ処理部２３、運用効率データ処理部２４、チューニング処理部２５、アルゴリズム選定処理部２６、並列性能評価処理部２７は全て備えていなければならないものではなく、全て設けられる場合もあれば全く設けられない場合もある。さらに、任意の組み合わせにて設けられる場合もある。 Although the embodiment of the present invention has been described above, the present invention is not limited to this. For example, the functional block diagram of FIG. 19 is an example and does not necessarily correspond to a program module. Further, the processor number optimization processing unit 21, the processor expansion estimation processing unit 22, the system replacement data processing unit 23, the operation efficiency data processing unit 24, the tuning processing unit 25, the algorithm selection processing unit 26, and the parallel performance evaluation processing unit 27 are all included. It does not have to be provided, and all may be provided or may not be provided at all. Furthermore, it may be provided in any combination.

［実施例］
上で述べた実施の形態は全ての並列処理（メモリ、ネットワーク、ＣＰＵ性能が同じホモ構成又は異なるヘテロ構成のグリッド、クラスタ又は分散メモリ、若しくはＳＭＰ（対称型マルチプロセッシング：Symmetric MultiProcessing）、ＳＭＰ＋分散メモリ、ＮＵＭＡ（NonUniform Memory Access）等）に適用可能である。以下では、代表的な態様について計算例を示しておく。 [Example]
The above-described embodiments are all parallel processing (memory, network, grid with the same or different hetero configuration with the same CPU performance, cluster or distributed memory, or SMP (Symmetric MultiProcessing), SMP + distributed memory. NUMA (Non Uniform Memory Access) and the like. Below, the example of a calculation is shown about a typical mode.

（１）ホモ構成におけるグリッド等（χ_1,j(1)＝０）
グリッドやクラスタで処理を行う場合、各プロセッサへの処理の割り付けや処理結果の回収にネットワークを用いるため通信が発生するが、これは１つのプロセッサで処理する場合には生じない。このような処理はχ_1,j(1)＝０の処理である。ここでは並列性能阻害要因を通信のみとし、χ_1,C(1)＝０であるような処理の並列性能を評価する。例えば図３８のような経過時間の測定結果が得られた場合について説明する。 (1) Grid etc. in homo structure (χ _{1, j} (1) = 0)
When processing is performed in a grid or cluster, communication occurs because a network is used for allocation of processing to each processor and collection of processing results, but this does not occur when processing is performed by one processor. Such a process is a process of χ _{1, j} (1) = 0. Here, the parallel performance impediment factor is only communication, and the parallel performance of the processing in which χ _{1, C} (1) = 0 is evaluated. For example, a case where a measurement result of elapsed time as shown in FIG. 38 is obtained will be described.

式（３）から以下の計算がなされる。

式（５）、式（６−１）、式（６−２）及び式（７）からそれぞれ以下のような計算がなされる。

並列効率については、順番に式（４−４）、式（４−５）、式（８−２）、式（９−１）、式（９−２）からそれぞれ以下のような計算がなされる。

The following calculation is performed from Equation (3).

The following calculations are performed from the equations (5), (6-1), (6-2), and (7), respectively.

For the parallel efficiency, the following calculations are performed in order from the equations (4-4), (4-5), (8-2), (9-1), and (9-2), respectively. The

以上計算された並列性能評価指標をまとめると図３９のようになる。Ａ_p(p)＝∞ゆえｐ＝∞で並列処理した時の性能向上の可能性は無限であるが，プロセッサ数ｐ＝４を投入した現実の性能向上Ｅ_p(4)・ｐは1.928である。その理由は、並列効率Ｅ_p(4)が、ロードバランス寄与率で９３％（Ｒb(4)＝0.9286）となり、通信によりさらに４８％（Ｒc(4)＝0.4808）低下するためである。 The calculated parallel performance evaluation indexes are summarized as shown in FIG. Since A _p (p) = ∞, the possibility of performance improvement when parallel processing is performed with p = ∞ is infinite, but the actual performance improvement E _p (4) · p is 1.929 with the number of processors p = 4 is there. The reason is that the parallel efficiency E _p (4) is 93% (Rb (4) = 0.9286) as a load balance contribution ratio, and is further reduced by 48% (Rc (4) = 0.4808) due to communication.

（２）ホモ構成におけるグリッド等（χ_1,RED(1)≠０）
数値計算では、アプリケーション・プログラムを全てのプロセッサにコピーし、ループ処理のインデックス等を識別して各プロセッサで処理を分担する、いわゆるデータパラレルで並列処理する場合が多い。データパラレルでは、例えばループ間に並列処理できない処理が残る。この処理を全プロセッサが行うとき、内容が同じ処理であることからこれを冗長処理と呼ぶ。冗長処理の特徴は、並列処理でない場合も必要な処理のため必ずχ_1,RED(1)≠０となることである。ここでは並列性能阻害要因を冗長処理のみとし、χ_1,RED(1)≠０であるような処理の並列性能を評価する。例えば図４０のような経過時間の測定結果が得られた場合について説明する。 (2) Grids in homo structure (χ _{1, RED} (1) ≠ 0)
In numerical calculations, application programs are often copied to all processors, loop processing indexes and the like are identified, and processing is shared by each processor, so-called data parallel processing is often performed. In data parallel, for example, processing that cannot be performed in parallel remains between loops. When this processing is performed by all the processors, the content is the same processing, so this is called redundancy processing. The feature of the redundant processing is that χ _{1, RED} (1) ≠ 0 because it is necessary processing even when it is not parallel processing. Here, the parallel performance impediment factor is only redundant processing, and the parallel performance of processing in which χ _{1, RED} (1) ≠ 0 is evaluated. For example, a case where a measurement result of elapsed time as shown in FIG. 40 is obtained will be described.

式（３）から以下の計算がなされる。

式（５）、式（１２−１）、式（６−１）、式（６−２）及び式（７）からそれぞれ以下のような計算がなされる。

並列効率については、順番に式（４−４）、式（４−５）、式（９−１）、式（９−２）からそれぞれ以下のような計算がなされる。

The following calculation is performed from Equation (3).

The following calculations are performed from Expression (5), Expression (12-1), Expression (6-1), Expression (6-2), and Expression (7), respectively.

Regarding the parallel efficiency, the following calculations are performed in order from the equations (4-4), (4-5), (9-1), and (9-2).

以上計算された並列性能評価指標をまとめると図４１のようになる。ここではＡ_p(p)＝9.737ゆえｐ＞９の並列処理は無意味である。プロセッサ数ｐ＝４を投入した現実の性能向上Ｅ_p・ｐは2.874である。その理由は、並列効率Ｅ_pが、ロードバランス寄与率で９４％（Ｒb(4)＝0.9398）となり、冗長処理によりさらに３１％（Ｒ_RED(4)＝0.3141）低下するためである。 The calculated parallel performance evaluation indexes are summarized as shown in FIG. Here, since A _p (p) = 9.737, parallel processing of p> 9 is meaningless. The actual performance improvement E _p · p with the number of processors p = 4 is 2.874. The reason is that the parallel efficiency E _p is 94% (Rb (4) = 0.9398) as a load balance contribution ratio, and is further reduced by 31% (R _RED (4) = 0.3141) by the redundancy processing.

（３）ホモ構成におけるグリッド等（χ_1,j(1)≠０：但し冗長処理以外）
例えば通信ライブラリの処理時間はネットワーク通信と演算で構成される。この演算時間をχ_1,C(1)として扱う。ここでは並列性能阻害要因を通信のみとし、χ_1,C(1)≠０であるような処理の並列性能を評価する。例えば図４２のような経過時間の測定結果が得られた場合について説明する。 (3) Grid in homo structure (χ _{1, j} (1) ≠ 0: except for redundant processing)
For example, the processing time of the communication library is composed of network communication and calculation. This calculation time is treated as χ _{1, C} (1). Here, the parallel performance impediment factor is only communication, and the parallel performance of the processing in which χ _{1, C} (1) ≠ 0 is evaluated. For example, a case where a measurement result of elapsed time as shown in FIG. 42 is obtained will be described.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図４３のようになる。ここではＡ_p(p)＝22.57ゆえｐ＜２２で処理すべきである。プロセッサ数ｐ＝４を投入した現実の性能向上Ｅ_p・ｐは1.859である。その理由は、並列効率Ｅ_pが通信により５３％（Ｒ_C＝0.5263）低下するためである。ロードバランス寄与率は９４％（Ｒb(4)＝0.9375）で、ロードバランスはこの場合の並列性能を阻害する主要な要因ではない。実施例（１）と異なるところは、χ_1,C(1)≠０のためＡ_p(p)が有限値になるところである。 The calculated parallel performance evaluation indexes are summarized as shown in FIG. Here, since A _p (p) = 22.57, it should be processed with p <22. The actual performance improvement E _p · p with the number of processors p = 4 is 1.859. This is because the parallel efficiency E _p is reduced by 53% (R _C = 0.5263) due to communication. The load balance contribution ratio is 94% (Rb (4) = 0.9375), and the load balance is not the main factor that hinders the parallel performance in this case. The difference from the embodiment (1) is that A _p (p) becomes a finite value because χ _{1, C} (1) ≠ 0.

通信処理に含まれる演算はプロセッサの増加と共に変化する場合がある。これをΧ_i,C(p)とみなして並列性能評価に取り入れることで、プロセッサ数によって異なる演算数を評価に組み込むことが可能となる。 The calculations included in the communication process may change as the number of processors increases. By considering this as Χ _{i, C} (p) and incorporating it into the parallel performance evaluation, it becomes possible to incorporate different numbers of operations into the evaluation depending on the number of processors.

（４）ホモ構成におけるグリッド等（待ち（アイドリング：ウエイト（wait）とも呼ぶ）がある場合）
特定のプロセッサが処理し、結果を他のプロセッサが使う場合、その処理が終了するまで他のプロセッサは次の処理を開始できない。例えば特定のプロセッサのみがデータベース（ＤＢ）をアクセスできる場合がこれに当たる。図４４ではプロセッサ＃１でこの処理（γ'）を行う。他のプロセッサはＤＢ処理の間、待ち状態になる。このようにＣＰＵを待たせておくアイドリング処理が存在する場合の並列性能を評価することができる。図４４のような経過時間の測定結果が得られたものとする。 (4) Grid etc. in a homo structure (when there is a wait (also called idling: wait))
When a specific processor processes and the result is used by another processor, the other processor cannot start the next processing until the processing is completed. For example, this is the case when only a specific processor can access the database (DB). In FIG. 44, this processing (γ ′) is performed by the processor # 1. Other processors are in a waiting state during DB processing. In this way, the parallel performance can be evaluated when there is an idling process that keeps the CPU waiting. Assume that a measurement result of elapsed time as shown in FIG. 44 is obtained.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図４５のようになる。Ａ_p(4)＝∞ゆえｐ＝∞で並列処理した時の性能向上の可能性は無限であるが、ｐ＝４を投入した現実の性能向上Ｅ_p・ｐは2.226である。その理由は、並列効率Ｅ_pが通信により３２％（Ｒ_C(4)＝0.3158）、アイドリングにより１１％（Ｒ_W(4)＝0.1108）低下するためである。ロードバランス寄与率は９７％（Ｒb(4)＝0.9704）で、ロードバランスはこの場合並列性能を阻害する主要な要因ではない。 The above calculated parallel performance evaluation indices are summarized as shown in FIG. Since A _p (4) = ∞, the possibility of performance improvement when parallel processing is performed at p = ∞ is infinite, but the actual performance improvement E _p · p when p = 4 is input is 2.226. This is because the parallel efficiency E _p is reduced by 32% (R _C (4) = 0.3158) due to communication and 11% (R _W (4) = 0.1108) due to idling. The load balance contribution ratio is 97% (Rb (4) = 0.9704). In this case, the load balance is not a main factor that hinders parallel performance.

（５）ホモ構成におけるグリッド等（他の処理があるために待ちがある場合）
グリッドやクラスタで処理を行う場合、各プロセッサを自分の処理のみで使うことは希で、一般に複数の処理の中に共存することになる。その場合他の処理が割り込むことによる待ちが生じる。これを図４６に示す。このように他の処理があるために待ちがある場合の並列性能を評価する。図４６のような経過時間の測定結果が得られたものとする。 (5) Grid in homo structure (when there is a wait due to other processing)
When processing is performed in a grid or cluster, it is rare to use each processor only for its own processing, and in general, it coexists in a plurality of processing. In that case, a wait is caused by another process interrupting. This is shown in FIG. In this way, the parallel performance when there is waiting due to other processing is evaluated. Assume that a measurement result of elapsed time as shown in FIG. 46 is obtained.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図４７のようになる。Ａ_p(4)＝∞ゆえｐ＝∞で並列処理した時の性能向上の可能性は無限であるが、ｐ＝４を投入した現実の性能向上Ｅ_p・ｐは1.808である。その理由は、並列効率Ｅ_pが、ロードバランス寄与率で７９％（Ｒb(4)＝0.7875）となり、タイムシェアリングのための待ちにより２８％（Ｒ_W(4)＝0.2778）、通信によりさらに１４％（Ｒ_C＝0.1418）低下するためである。Ｒ_W(4)は他の処理により生じるので、Ｒb(4)はシステム全体を考慮したロードバランス寄与率となる。他の処理がある場合、Ｒb(4)とＲ_W(4)に注目する必要がある。たとえＲb(4)＝１であっても、Ｒ_W(4)が大きければそれは混んだシステムを利用していることになり、Ｅ_pは低い値となる。グリッド又はクラスタ処理を展開する際、特にＲb(4)が１に近づくように且つＲ_Wが０になるようにシステムを選択することで、並列処理を効率良く行うことが可能となる。このような事が分かるのは本実施の形態が初めてである。 The calculated parallel performance evaluation index is summarized as shown in FIG. Since A _p (4) = ∞, the possibility of performance improvement when parallel processing is performed at p = ∞ is infinite, but the actual performance improvement E _p · p when p = 4 is input is 1.808. The reason for this is that the parallel efficiency E _p is 79% (Rb (4) = 0.7875) in the load balance contribution ratio, 28% (R _W (4) = 0.2778) due to waiting for time sharing, and further due to communication This is because it decreases by 14% (R _C = 0.1418). Since R _W (4) is generated by other processing, Rb (4) is a load balance contribution ratio considering the entire system. When there are other processes, it is necessary to pay attention to Rb (4) and R _W (4). Even if Rb (4) = 1, if R _W (4) is large, it means that a crowded system is used, and E _p has a low value. When deploying grid or clustering, particularly Rb (4) is by selecting the system to be and R _W is 0 to approach 1, it is possible to perform parallel processing efficiently. This is the first time that this is understood.

尚、自処理（目的の処理）か他処理（目的外の処理）かを見分ける方法として、ＣＰＵ時間と経過時間を測定する方法がある。一般にＣＰＵ時間は自処理のみの時間、経過時間は他処理を含んだ時間となる。従ってタイムシェアリングのための待ち時間＝経過時間−ＣＰＵ時間とできる場合がある。 Note that there is a method of measuring CPU time and elapsed time as a method for distinguishing between own processing (target processing) and other processing (non-target processing). In general, the CPU time is the time of only the own process, and the elapsed time is the time including other processes. Therefore, there may be a case where waiting time for time sharing = elapsed time−CPU time.

（６）ホモ構成におけるグリッド等（データパラレル処理の場合）
データパラレル処理は、例えば１０００件のデータを４プロセッサで２５０件ずつ分割して処理するような、各プロセッサの手続きが同じでデータが異なる並列処理である。並列処理できない処理は、全プロセッサのデータを同じにする、すなわち冗長処理を行う場合と、あるプロセッサで処理して全プロセッサに放送する場合がある。ここでは両者の並列性能を評価する。 (6) Grid etc. in homo configuration (in case of data parallel processing)
The data parallel processing is parallel processing in which, for example, 1000 pieces of data are divided into 250 pieces by 4 processors for processing, and the procedures of each processor are the same and the data are different. Processing that cannot be performed in parallel may be the case where data of all processors is made the same, that is, when redundant processing is performed, or processing is performed by a certain processor and broadcasted to all processors. Here, the parallel performance of both is evaluated.

［冗長処理を用いたデータパラレル処理］
図４８のような経過時間の測定結果が得られたものとする。また、χ_1,C＝０とする。 [Data parallel processing using redundant processing]
Assume that a measurement result of elapsed time as shown in FIG. 48 is obtained. In addition, χ _{1, C} = 0.

式（３）、式（５）、式（１２−１）、式（６−１）、式（６−２）及び式（７）からそれぞれ以下のような計算がなされる。

The following calculations are performed from Expression (3), Expression (5), Expression (12-1), Expression (6-1), Expression (6-2), and Expression (7), respectively.

以上計算された並列性能評価指標をまとめると図４９のようになる。Ａ_p(4)＝２１．０１ゆえ、プロセッサ数はｐ≦２１で選択すべきである。プロセッサ数ｐ＝４を投入した時の現実の性能向上Ｅ_p・ｐは2.800である。その理由は、並列効率Ｅ_p(4)が通信で２０％（Ｒ_C＝０．２０００）、冗長処理で１３％（Ｒ_RED(4)＝0.1333）低下するためである。 The calculated parallel performance evaluation indexes are summarized as shown in FIG. Since A _p (4) = 21.01, the number of processors should be selected with p ≦ 21. The actual performance improvement E _p · p when the number of processors p = 4 is input is 2.800. The reason is that the parallel efficiency E _p (4) is reduced by 20% (R _C = 0.2000) in communication and 13% (R _RED (4) = 0.133) in redundant processing.

［並列処理できない部分を特定のプロセッサで処理するデータパラレル処理］
並列処理できない部分を冗長処理する代わりに、特定のプロセッサで処理する場合がある。図５０は図４８の冗長処理の代わりにプロセッサ＃１でのみ処理を行い（γ'の部分）、結果を各プロセッサに放送した場合である。当然その間、他のプロセッサはプロセッサ＃１の結果待ちとなる。またここではγ'を並列処理として取り扱ったが、逐次処理として並列処理阻害要因に加えれば、より詳細な並列性能評価ができる。しかしそのためにはγ'の処理が逐次処理か並列処理かの判別が必要となる。図５０のような経過時間の測定結果が得られたものとする。また、χ_1,C＝０とする。 [Data parallel processing where a specific processor processes parts that cannot be processed in parallel]
Instead of performing redundant processing on parts that cannot be processed in parallel, processing may be performed by a specific processor. FIG. 50 shows a case where only the processor # 1 performs processing (γ ′ portion) instead of the redundant processing of FIG. 48, and the result is broadcast to each processor. Of course, the other processors wait for the result of processor # 1. Here, γ ′ is handled as parallel processing, but if it is added to parallel processing impediment factors as sequential processing, more detailed parallel performance evaluation can be performed. However, for that purpose, it is necessary to determine whether the processing of γ ′ is sequential processing or parallel processing. Assume that a measurement result of elapsed time as shown in FIG. 50 is obtained. In addition, χ _{1, C} = 0.

以上計算された並列性能評価指標をまとめると図５１のようになる。Ａ_p(4)＝∞で並列処理した時の性能向上は無限であるが、ｐ＝４を投入した時の現実の性能向上Ｅ_p・ｐは2.800である。その理由は、並列効率Ｅ_p(4)が通信で２０％（Ｒ_C(4)＝0.2000）、待ちで１０％（Ｒ_W(4)＝0.1000）低下するためである。図４９と図５１ではＲ_p(4)、Ａ_p(4)、Ｒ_RED(4)、Ｒ_Wの値が異なる。一方Ｒb(4)及びＥ_p(4)は同じ値となる。図５１では並列処理できない部分γ'を並列処理として評価したため、Ｒ_p(4)＝１となった。またプロセッサ＃２，３，４の冗長処理が待ちに変わり、並列性能阻害要因Ｒ_W(4)に代替される。 The calculated parallel performance evaluation index is summarized as shown in FIG. The performance improvement when parallel processing is performed with A _p (4) = ∞ is infinite, but the actual performance improvement E _p · p when p = 4 is input is 2.800. The reason is that the parallel efficiency E _p (4) decreases by 20% (R _C (4) = 0.000) in communication and decreases by 10% (R _W (4) = 0.000) in waiting. 49 and 51 are different in values of R _p (4), A _p (4), R _RED (4), and R _W. On the other hand, Rb (4) and E _p (4) have the same value. In FIG. 51, since the portion γ ′ that cannot be processed in parallel is evaluated as parallel processing, R _p (4) = 1. In addition, the redundant processing of the processors # 2, 3, and 4 is changed to waiting, and is replaced with the parallel performance impediment factor R _W (4).

（７）ホモ構成におけるグリッド等（コントロールパラレル処理の場合）
コントロールパラレル処理は、通常各プロセッサの手続きが異なる。このため各プロセッサの手続き時間がばらばらな並列処理となる場合が多い。ここではコントロールパラレルの並列性能を評価する。図５２のような経過時間の測定結果が得られたものとする。また、χ_1,C＝０とする。 (7) Grid in homo configuration (in case of control parallel processing)
In the control parallel processing, the procedure of each processor is usually different. For this reason, there are many cases in which parallel processing with different procedure times of each processor is performed. Here, the parallel performance of the control parallel is evaluated. Assume that a measurement result of elapsed time as shown in FIG. 52 is obtained. In addition, χ _{1, C} = 0.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図５３のようになる。Ａ_p(4)＝∞で並列処理した時の性能向上は無限であるが、ｐ＝４を投入した時の現実の性能向上Ｅ_p・ｐは2.528である。その理由は、並列効率Ｅ_pがロードバランス寄与率で８２％（Ｒb(4)＝0.8231）、さらにタスク生成、通信、待ちをあわせて２３％（Ｒ_TC(4)＋Ｒ_C(4)＋Ｒ_W(4)＝0.0344＋0.1089＋0.0888）低下するためである。 The calculated parallel performance evaluation index is summarized as shown in FIG. The performance improvement when parallel processing is performed with A _p (4) = ∞ is infinite, but the actual performance improvement E _p · p when p = 4 is input is 2.528. The reason for this is that the parallel efficiency E _p is 82% (Rb (4) = 0.8231) in terms of load balance contribution, and 23% (R _TC (4) + R _C (4) + R _W ) including task generation, communication, and waiting. (4) = 0.0344 + 0.1089 + 0.0888).

並列性能の向上を図るには、並列性能指標を比べ、並列性能の低下に影響力の大きい順に改善の余地を検討する。図５３の場合、これはＲb(4)，Ｒ_C(4)，Ｒ_W(4)，Ｒ_TC(4)の順となる。Ｒb(4)＝１になればＥ_p(4)・ｐ＝3.071（＝2.528／0.8231）となる。そのため例えば、プロセッサ＃１の処理時間が他のプロセッサと同じになるように処理スケジュールを変更することを試みる。次の改善の余地はＲ_C(4)の削減である。削減方法としては例えば通信性能が２倍になるようなハードウェアに置きかえることが考えられる。その場合には、以下のような計算がなされる。 In order to improve parallel performance, we compare parallel performance indexes and examine room for improvement in descending order of influence on parallel performance degradation. In the case of FIG. 53, this is the order of Rb (4), R _C (4), R _W (4), and R _TC (4). When Rb (4) = 1, E _p (4) · p = 3.071 (= 2.528 / 0.8231). Therefore, for example, an attempt is made to change the processing schedule so that the processing time of the processor # 1 is the same as that of the other processors. The next room for improvement is the reduction of R _C (4). As a reduction method, for example, it is conceivable to replace with hardware that doubles the communication performance. In that case, the following calculation is performed.

まず、式（３）から以下の計算がなされる。

並列効率については、式（４−４）以下のような計算がなされる。

さらに、以下のような計算もなされる。

もし、通信の性能を上記のように向上させ、さらにＲb(4)＝１にロードバランスを変更できれば、以下のような計算がなされる。

First, the following calculation is performed from Equation (3).

For the parallel efficiency, the following equation (4-4) is calculated.

Further, the following calculation is performed.

If the communication performance is improved as described above and the load balance can be changed to Rb (4) = 1, the following calculation is performed.

Ｅ．チューニング処理において示したように、本実施の形態では、各並列性能阻害要因をチューニングして改善した時の並列性能を推測できる。従来のチューニングでは目標値を処理時間にしていたため、不可能な目標値が設定されることがあったが、本実施の形態ではＥ_pを用いてリーズナブルな目標設定が可能となる。さらに本実施の形態では、並列効率等を１回の測定結果で計算することができるため、チューニング時の性能評価時間を短縮することが可能である。さらに、従来のチューニングでは、入力データや処理機能を変更するとそれまで測定した各並列性能阻害要因に対する処理時間を性能評価に使うことができなくなる。従って入力データや処理機能毎に独立した並列性能評価を行ってきた。本実施の形態では性能評価指標がすべて比率の形になっており、異なった入力データや処理機能の並列性能を比較できる。 E. As shown in the tuning process, in this embodiment, the parallel performance when each parallel performance impediment factor is tuned and improved can be estimated. In the conventional tuning it had a target value in the processing time, but impossible target value had to be set, in this embodiment it is possible to reasonable goal set by using the E _p. Furthermore, in the present embodiment, since the parallel efficiency and the like can be calculated from a single measurement result, the performance evaluation time during tuning can be shortened. Furthermore, in the conventional tuning, when the input data or the processing function is changed, the processing time for each parallel performance impediment factor measured so far cannot be used for performance evaluation. Therefore, independent parallel performance evaluation has been performed for each input data and processing function. In this embodiment, all the performance evaluation indexes are in the form of ratios, and the parallel performance of different input data and processing functions can be compared.

（８）ホモ構成におけるグリッド等（コントロールパラレルでマスタ・スレイブ処理を行う場合）
コントロールパラレル処理は、通常各プロセッサの手続きが異なる。マスタ・スレイブ処理の場合、１つのプロセッサが他のプロセッサの管理をするマスタとなり、その指示に従って複数のプロセッサが処理を実施する。ここではプロセッサ＃１をマスタプロセッサとした場合の並列性能を評価する。図５４のような経過時間の測定結果が得られたものとする。また、χ_1,C＝０とする。 (8) Grid, etc. in a homo configuration (when master / slave processing is performed in control parallel)
In the control parallel processing, the procedure of each processor is usually different. In the case of the master / slave process, one processor serves as a master that manages other processors, and a plurality of processors execute processes according to the instruction. Here, the parallel performance when the processor # 1 is a master processor is evaluated. Assume that a measurement result of elapsed time as shown in FIG. 54 is obtained. In addition, χ _{1, C} = 0.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図５５のようになる。Ａ_p(4)＝∞ゆえｐ＝∞で並列処理した時の性能向上は無限であるが、ｐ＝４を投入した時の現実の性能向上Ｅ_p・ｐは2.055である。その理由は、並列効率Ｅ_pがロードバランス寄与率で８６％（Ｒb(4)＝0.8571）、さらに待ちで２３％（Ｒ_W(4)＝0.2340）、タスク生成、通信をあわせて１７％（Ｒ_TC(4)＋Ｒ_C(4)＝0.0385＋0.1282）低下するためである。マスタスレイブ処理を行う場合、マスタプロセッサの待ち時間が処理全体の性能に重要な影響を及ぼすことが知られているが、本実施の形態では待ちが性能に及ぼす影響を定量的に捉え、マスタスレイブ処理が有効に行われているかを判断することができる。 The calculated parallel performance evaluation index is summarized as shown in FIG. Since A _p (4) = ∞, the performance improvement when parallel processing is performed at p = ∞ is infinite, but the actual performance improvement E _p · p when p = 4 is input is 2.055. The reason is that the parallel efficiency E _p is 86% (Rb (4) = 0.8571) in the load balance contribution ratio, 23% in the waiting state (R _W (4) = 0.2340), 17% in total for task generation and communication ( This is because R _TC (4) + R _C (4) = 0.0385 + 0.1282). When performing master slave processing, it is known that the waiting time of the master processor has an important effect on the performance of the entire processing. It can be determined whether the process is being performed effectively.

（９）ホモ構成におけるグリッド等（データパラレルとコントロールパラレル混在の場合）
データパラレルとコントロールパラレルを混在させた処理は、ロードバランスを保たせることが難しいため通常の業務では使用されない。本実施の形態ではこのような場合の並列性能評価も可能となる。本実施の形態は、処理のコントロールのための性能評価指標を提供するため、このような処理に対する実用的な評価方法を提供するものである。ここではプロセッサ＃１乃至＃４はコントロールパラレルで、プロセッサ＃５乃至＃８はデータパラレルで、プロセッサ＃１をマスタプロセッサとした場合の並列性能を評価する。図５６のような経過時間の測定結果が得られたものとする。また、χ_1,C＝０とする。 (9) Grid, etc. in homo configuration (when data parallel and control parallel are mixed)
A process in which data parallel and control parallel are mixed is not used in normal business because it is difficult to maintain load balance. In this embodiment, parallel performance evaluation in such a case is also possible. Since this embodiment provides a performance evaluation index for control of processing, it provides a practical evaluation method for such processing. Here, the parallel performance when the processors # 1 to # 4 are control parallel, the processors # 5 to # 8 are data parallel, and the processor # 1 is a master processor is evaluated. Assume that a measurement result of elapsed time as shown in FIG. 56 is obtained. In addition, χ _{1, C} = 0.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図５７のようになる。Ａ_p(8)＝47.62ゆえ、ｐ＜４７で並列処理すべきである。プロセッサ数ｐ＝８を投入した時の現実の性能向上Ｅ_p・ｐは5.242である。その理由は、並列効率Ｅ_p がロードバランス寄与率で９３％（Ｒb(8)＝0.9286）、さらに待ちで１１％（Ｒ_W(8)＝0.1080）、通信で１１％（Ｒ_C(8)＝0.1124）、冗長処理，タスク生成をあわせて９％（Ｒ_RED(8)＋Ｒ_TC(8)＝0.0592＋0.0296）低下するためである。このように本実施の形態はデータパラレルとコントロールパラレル混在という並列処理方式に適用することができる。 The calculated parallel performance evaluation index is summarized as shown in FIG. Since A _p (8) = 47.62, parallel processing should be performed with p <47. The actual performance improvement E _p · p when the number of processors p = 8 is 5.42 is 5.242. The reason is that the parallel efficiency E _p is 93% (Rb (8) = 0.9286) in the load balance contribution ratio, 11% in the waiting state (R _W (8) = 0.080), and 11% in the communication (R _C (8)) This is because the redundancy processing and task generation are reduced by 9% (R _RED (8) + R _TC (8) = 0.0592 + 0.0296). As described above, the present embodiment can be applied to a parallel processing system in which data parallel and control parallel are mixed.

（１０）グリッド等のヘテロ構成で冗長処理がある場合（χ_1,RED≠０）
グリッドやクラスタでつながれたプロセッサは、ＣＰＵの能力が異なっている場合が多い。これをヘテロ構成と呼ぶ。本実施の形態では、ヘテロ構成の場合にも適用できる。ここでは実施例（２）においてプロセッサ＃１が１／２の性能である場合の並列性能を評価する。図５８のような経過時間の測定結果が得られたものとする。 (10) When there is redundant processing in a hetero configuration such as a grid (χ _{1, RED} ≠ 0)
Processors connected by a grid or cluster often have different CPU capabilities. This is called a hetero configuration. This embodiment can also be applied to a hetero configuration. Here, in the embodiment (2), the parallel performance when the processor # 1 has a half performance is evaluated. Assume that a measurement result of elapsed time as shown in FIG. 58 is obtained.

式（３）から以下の計算がなされる。

The following calculation is performed from Equation (3).

以上計算された並列性能評価指標をまとめると図５９のようになる。Ａ_p(4)＝9.881ゆえ、ｐ＞９の並列処理は無意味である。ｐ＝４のプロセッサを投入した現実の性能向上Ｅ_p・ｐは1.918である。その理由は、並列効率Ｅ_pが、ロードバランス寄与率で６３％（Ｒb(4)＝0.6250）となり、冗長処理によりさらに３１％（Ｒ_RED(4)＝0.3103）低下するためである。図４１と比較するとロードバランス寄与率Ｒb(4)が0.9398から0.6250に低下することが分かる。これは図４１と図５９に示されるようにプロセッサ＃１の違いが性能評価指標Ｒb(4)に反映された結果である。一般に等分割したタスクをＣＰＵ能力が異なったプロセッサで処理するとロードバランスが崩れる。本実施の形態ではこれをＲb(4)によって検知することができる。 The calculated parallel performance evaluation indexes are summarized as shown in FIG. Since A _p (4) = 9.881, parallel processing with p> 9 is meaningless. The actual performance improvement E _p · p with the processor of p = 4 is 1.918. The reason is that the parallel efficiency E _p is 63% (Rb (4) = 0.6250) as a load balance contribution ratio, and is further reduced by 31% (R _RED (4) = 0.3103) by the redundancy processing. Compared with FIG. 41, it can be seen that the load balance contribution ratio Rb (4) decreases from 0.9398 to 0.6250. This is a result of the difference in processor # 1 being reflected in the performance evaluation index Rb (4) as shown in FIG. 41 and FIG. In general, when an equally divided task is processed by a processor having a different CPU capability, the load balance is lost. In the present embodiment, this can be detected by Rb (4).

（付記１）
並列計算機システムの並列効率を計算する並列効率計算方法であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、
前記並列計算機システムにおいて実施した処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率を計算し、記憶装置に格納する仮想並列化率計算ステップと、
前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、
前記ロードバランス寄与率と前記仮想並列化率と前記並列性能阻害要因寄与率とを用いて並列効率を計算し、記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 1)
A parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating step of calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system, and storing the load balance contribution ratio in a storage device; and
A virtual parallelization rate calculating step of calculating a virtual parallelization rate representing a ratio with respect to time of a portion calculated in parallel by each processor in the processing performed in the parallel computer system, and storing the virtual parallelization rate in a storage device;
A parallel performance inhibition factor contribution ratio calculating step of calculating a parallel performance inhibition factor contribution ratio representing a ratio of processing time of each parallel performance inhibition factor portion to the total processing time of all processors included in the parallel computer system, and storing the calculation result in a storage device When,
Calculating a parallel efficiency using the load balance contribution rate, the virtual parallelization rate, and the parallel performance impediment factor contribution rate, and storing in a storage device;
Parallel efficiency calculation method including

（付記２）
並列計算機システムの並列効率を計算する並列効率計算方法であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、
前記並列計算機システムにおいて実施する処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率を計算し、記憶装置に格納する加速率計算ステップと、
前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、
前記ロードバランス寄与率と前記加速率と前記並列性能阻害要因寄与率とを用いて並列効率を計算し、記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 2)
A parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating step of calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system, and storing the load balance contribution ratio in a storage device; and
An acceleration rate calculating step for calculating an acceleration rate representing a limit of improvement in a reduction degree of processing time by parallelization of processing performed in the parallel computer system, and storing the acceleration rate in a storage device;
A parallel performance inhibition factor contribution ratio calculating step of calculating a parallel performance inhibition factor contribution ratio representing a ratio of processing time of each parallel performance inhibition factor portion to the total processing time of all processors included in the parallel computer system, and storing the calculation result in a storage device When,
Calculating parallel efficiency using the load balance contribution rate, the acceleration rate, and the parallel performance impediment factor contribution rate, and storing in a storage device;
Parallel efficiency calculation method including

（付記３）
並列計算機システムの並列効率を計算する並列効率計算方法であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、
前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率を計算し、記憶装置に格納する並列性能阻害要因寄与率計算ステップと、
前記ロードバランス寄与率と前記並列性能阻害要因寄与率とを用いて並列効率を計算し、記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 3)
A parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating step of calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system, and storing the load balance contribution ratio in a storage device; and
A parallel performance inhibition factor contribution ratio calculating step of calculating a parallel performance inhibition factor contribution ratio representing a ratio of processing time of each parallel performance inhibition factor portion to the total processing time of all processors included in the parallel computer system, and storing the calculation result in a storage device When,
Calculating parallel efficiency using the load balance contribution rate and the parallel performance impediment factor contribution rate, and storing the storage efficiency in a storage device;
Parallel efficiency calculation method including

（付記４）
並列計算機システムの並列効率を計算する並列効率計算方法であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算ステップと、
前記並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算される部分の、時間についての割合を表す仮想並列化率を計算し、記憶装置に格納する仮想並列化率計算ステップと、
前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和と、前記各プロセッサにおいて実施された処理の処理時間の和と、前記ロードバランス寄与率と、前記仮想並列化率とを用いて並列効率を計算し、記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 4)
A parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating step of calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system, and storing the load balance contribution ratio in a storage device; and
A virtual parallelization rate calculating step of calculating a virtual parallelization rate representing a ratio of time of a part to be calculated in parallel by each processor among processes executed in the parallel computer system, and storing the virtual parallelization rate in a storage device;
Of the processes executed in each processor included in the parallel computer system, the sum of the processing times of the parallel calculation part, the sum of the processing times of the processes executed in the processors, the load balance contribution ratio, and the virtual Calculating the parallel efficiency using the parallelization rate and storing it in a storage device;
Parallel efficiency calculation method including

（付記５）
並列計算機システムの並列効率を計算する並列効率計算方法であって、
１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間を計算し、記憶装置に格納するステップと、
前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和である第２の処理時間を計算し、記憶装置に格納するステップと、
前記並列計算機システムにおいて使用したプロセッサの数と、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理の処理時間のうち最長の処理時間と、前記第１の処理時間と、前記第２の処理時間とを用いて並列効率を計算し、記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 5)
A parallel efficiency calculation method for calculating parallel efficiency of a parallel computer system,
A step of calculating a first processing time corresponding to the total processing time of the parallel performance impeding portion of the processing in a case where the processing is performed by one processor, and storing the first processing time in a storage device;
Calculating a second processing time that is the sum of the processing times of the parallel computing portions of the processing performed in each processor included in the parallel computer system, and storing the second processing time in a storage device;
The number of processors used in the parallel computer system, the longest processing time of the processing times executed in each processor included in the parallel computer system, the first processing time, and the second processing Calculating parallel efficiency using time and storing in a storage device;
Parallel efficiency calculation method including

（付記６）
前記ロードバランス寄与率計算ステップにおいて、
前記ロードバランス寄与率を、
前記並列計算機システムに含まれる全プロセッサにおいて実施された処理の全処理時間を、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理の処理時間のうち最長の処理時間及び前記並列計算機システムにおいて使用したプロセッサ数により除することにより計算する
ことを特徴とする付記１乃至４のいずれか１つ記載の並列効率計算方法。 (Appendix 6)
In the load balance contribution ratio calculating step,
The load balance contribution ratio is
The total processing time of the processes executed in all the processors included in the parallel computer system is used in the longest processing time and the parallel computer system among the processing times of the processes executed in the processors included in the parallel computer system. The parallel efficiency calculation method according to any one of appendices 1 to 4, wherein the calculation is performed by dividing by the number of processors processed.

（付記７）
前記仮想並列化率計算ステップにおいて、
前記仮想並列化率を、
前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和を、１プロセッサにより同一処理を実施した場合の第３の処理時間に相当する処理時間により除することにより計算する
ことを特徴とする付記１又は４記載の並列効率計算方法。 (Appendix 7)
In the virtual parallelization rate calculation step,
The virtual parallelization rate is
Dividing the sum of the processing times of the parallel computing portion of the processing executed in each processor included in the parallel computer system by the processing time corresponding to the third processing time when the same processing is executed by one processor. The parallel efficiency calculation method according to appendix 1 or 4, wherein the parallel efficiency calculation method is performed by the following.

（付記８）
前記並列性能阻害要因寄与率計算ステップにおいて、
特定の並列性能阻害要因についての並列性能阻害要因寄与率を、
前記並列計算機システムに含まれる各プロセッサにおける前記特定の並列性能阻害要因部分の処理時間の和を、前記並列計算機算システムに含まれる各プロセッサの処理時間の和により除することにより計算する
ことを特徴とする付記１乃至３のいずれか１つ記載の並列効率計算方法。 (Appendix 8)
In the parallel performance impediment factor contribution rate calculation step,
The parallel performance impediment factor contribution ratio for a specific parallel performance impediment factor
Calculating the sum of the processing times of the specific parallel performance impediment factors in each processor included in the parallel computer system by dividing the sum of the processing times of the processors included in the parallel computer system. The parallel efficiency calculation method according to any one of supplementary notes 1 to 3.

（付記９）
前記加速率計算ステップにおいて、
前記加速率を、
前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和を１プロセッサにより同一処理を実施した場合の第３の処理時間に相当する処理時間により除することにより計算される仮想並列化率を１から差し引いた値の逆数として計算する
ことを特徴とする付記２記載の並列効率計算方法。 (Appendix 9)
In the acceleration rate calculating step,
The acceleration rate,
By dividing the sum of the processing times of the parallel computing portion of the processing executed in each processor included in the parallel computer system by the processing time corresponding to the third processing time when the same processing is executed by one processor. The parallel efficiency calculation method according to appendix 2, wherein the calculated virtual parallelization rate is calculated as an inverse of a value obtained by subtracting from 1.

（付記１０）
前記処理時間が、対応する事象の確認回数で表されることを特徴とする付記１乃至９のいずれか１つ記載の並列効率計算方法。 (Appendix 10)
The parallel efficiency calculation method according to any one of appendices 1 to 9, wherein the processing time is represented by the number of confirmations of the corresponding event.

（付記１１）
計算された前記並列効率に前記並列計算機システムにおいて使用したプロセッサ数を乗じて補助指標を計算し、記憶装置に格納するステップ、
をさらに含む付記１乃至１０のいずれか１つ記載の並列効率計算方法。 (Appendix 11)
Multiplying the calculated parallel efficiency by the number of processors used in the parallel computer system to calculate an auxiliary index and storing it in a storage device;
The parallel efficiency calculation method according to any one of appendices 1 to 10, further including:

（付記１２）
前記第３の処理時間を、
１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間と前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和である第２の処理時間との和により計算する
ことを特徴とする付記７又は９のいずれか１つ記載の並列効率計算方法。 (Appendix 12)
The third processing time is
In the case where the processing is performed by one processor, the first processing time corresponding to the total processing time of the parallel performance impeding portion of the processing and the parallel computing portion of the processing executed in each processor included in the parallel computer system. The parallel efficiency calculation method according to any one of appendices 7 and 9, wherein calculation is performed based on a sum of the second processing time which is a sum of processing times.

（付記１３）
前記第１の処理時間が、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち冗長処理の処理時間の和をプロセッサ数で除した値、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち冗長処理の処理時間の最大値、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち冗長処理の処理時間の最小値、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列処理の処理時間と並列性能阻害要因の処理時間の総和が最大となるプロセッサにおける冗長処理の処理時間の値のいずれかである
ことを特徴とする付記５又は１２記載の並列効率計算方法。 (Appendix 13)
The first processing time is a value obtained by dividing the sum of the processing times of redundant processing by the number of processors among the processing executed in each processor included in the parallel computer system, and is executed in each processor included in the parallel computer system. Among the processed processing, the maximum value of the processing time of the redundant processing, the minimum value of the processing time of the redundant processing among the processing performed in each processor included in the parallel computer system, and implemented in each processor included in the parallel computer system The parallel processing according to appendix 5 or 12, wherein the processing time of the parallel processing and the processing time of the parallel performance impediment factor among the processed processing is one of the values of the processing time of the redundant processing in the processor Efficiency calculation method.

（付記１４）
前記第１の処理時間が、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち冗長処理以外の並列性能阻害要因による処理時間から２以上のプロセッサ数で発生し且つプロセッサ数に依存する並列化阻害要因による処理時間を減じた第４の処理時間を全プロセッサについて加算した値をプロセッサ数で除した値、全プロセッサにおける前記第４の処理時間の最大値、全プロセッサにおける前記第４の処理時間の最小値のいずれかの値である
ことを特徴とする付記５又は１２記載の並列効率計算方法。 (Appendix 14)
The first processing time is generated in the number of processors of two or more from the processing time due to the parallel performance impediment factor other than the redundant processing among the processing executed in each processor included in the parallel computer system, and depends on the number of processors. A value obtained by dividing the value obtained by adding the fourth processing time obtained by reducing the processing time due to the parallelization inhibiting factor for all processors by the number of processors, the maximum value of the fourth processing time in all processors, and the fourth value in all processors. The parallel efficiency calculation method according to appendix 5 or 12, characterized in that it is one of the minimum values of the processing time.

（付記１５）
目標並列効率を設定するステップと、
計算された前記並列効率とプロセッサ数の積を前記目標並列効率で除することにより最適プロセッサ数を計算し、記憶装置に格納するステップと、
をさらに含む付記１乃至１４のいずれか１つ記載の並列効率計算方法。 (Appendix 15)
Setting a target parallel efficiency; and
Calculating the optimum number of processors by dividing the calculated product of the parallel efficiency and the number of processors by the target parallel efficiency, and storing it in a storage device;
The parallel efficiency calculation method according to any one of appendices 1 to 14, further including:

（付記１６）
システム増強時における増加分の稼働時間と予測並列効率とを設定するステップと、
前記並列計算機システムに現在含まれる各プロセッサにおいて実施された処理の処理時間の和と計算された前記並列効率との全処理についての積和と、前記増加分の稼働時間及び前記予測並列効率の積との和を、前記並列計算機システムに現在含まれる各プロセッサの稼働時間の和で除することにより、システム増強時の加速率を計算し、記憶装置に格納するステップと、
をさらに含む付記１乃至１４のいずれか１つ記載の並列効率計算方法。 (Appendix 16)
A step of setting an increased operation time and predicted parallel efficiency at the time of system enhancement;
The product sum of the total processing time of the processing time performed by each processor currently included in the parallel computer system and the calculated parallel efficiency, the product of the increased operating time and the predicted parallel efficiency And calculating the acceleration rate at the time of system augmentation by dividing by the sum of the operating time of each processor currently included in the parallel computer system, and storing in the storage device,
The parallel efficiency calculation method according to any one of appendices 1 to 14, further including:

（付記１７）
前記並列計算機システムに対する新たな並列計算機システムの性能倍率を設定するステップと、
前記新たな並列計算機システムの性能倍率を用いて見積並列効率を計算し、記憶装置に格納するステップと、
をさらに含む付記１乃至１４のいずれか１つ記載の並列効率計算方法。 (Appendix 17)
Setting a performance factor of a new parallel computer system with respect to the parallel computer system;
Calculating an estimated parallel efficiency using the performance factor of the new parallel computer system and storing it in a storage device;
The parallel efficiency calculation method according to any one of appendices 1 to 14, further including:

（付記１８）
前記並列計算機システムに現在含まれる各プロセッサにおいて実施された処理の処理時間の和と計算された前記並列効率との全処理についての積和を、前記並列計算機システムに現在含まれる各プロセッサの全稼働時間で除することにより、システム運用効率を計算し、記憶装置に格納するステップと、
をさらに含む付記１乃至１４のいずれか１つ記載の並列効率計算方法。 (Appendix 18)
The sum of the processing times of the processes performed in each processor currently included in the parallel computer system and the product sum of all the processes of the calculated parallel efficiency are calculated as the total operation of each processor currently included in the parallel computer system. Calculating the system operating efficiency by dividing by time and storing it in a storage device;
The parallel efficiency calculation method according to any one of appendices 1 to 14, further including:

（付記１９）
目標処理時間を設定するステップと、
前記目標処理時間を用いて目標並列効率を計算し、記憶装置に格納するステップと、
前記目標並列効率の妥当性を確認するステップと、
をさらに含む付記１乃至１４のいずれか１つ記載の並列効率計算方法。 (Appendix 19)
Setting a target processing time;
Calculating a target parallel efficiency using the target processing time and storing it in a storage device;
Checking the validity of the target parallel efficiency;
The parallel efficiency calculation method according to any one of appendices 1 to 14, further including:

（付記２０）
前記目標並列効率の妥当性が確認された場合には、チューニング実施後の並列効率を計算し、記憶装置に格納するステップと、
前記チューニング実施後の並列効率と前記目標並列効率とを比較するステップと、
をさらに含む付記１９記載の並列効率計算方法。 (Appendix 20)
When the validity of the target parallel efficiency is confirmed, the parallel efficiency after tuning is calculated and stored in a storage device;
Comparing the parallel efficiency after performing the tuning with the target parallel efficiency;
The parallel efficiency calculation method according to appendix 19, further including:

（付記２１）
目標処理時間を設定するステップと、
異なるアルゴリズム毎に当該アルゴリズムにおける並列効率を用いて必要となるプロセッサ数の見積値を計算し、記憶装置に格納するステップと、
前記プロセッサ数の見積値が前記並列計算機システムにおいて実施する当該アルゴリズムによる処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率より小さく且つ異なるアルゴリズムについて計算された前記プロセッサ数の見積値のうち最小の値となるアルゴリズムを抽出するステップと、
をさらに含む付記１乃至１４のいずれか１つ記載の並列効率計算方法。 (Appendix 21)
Setting a target processing time;
Calculating an estimate of the required number of processors using the parallel efficiency in the algorithm for each different algorithm and storing it in a storage device;
The estimated number of processors calculated for an algorithm whose estimated value for the number of processors is smaller than an acceleration rate that represents the limit of improvement in the degree of reduction in processing time due to parallelization of processing by the algorithm executed in the parallel computer system Extracting an algorithm having a minimum value among
The parallel efficiency calculation method according to any one of appendices 1 to 14, further including:

（付記２２）
付記１乃至２１のいずれか１つ記載の並列効率計算方法をコンピュータに実行させるためのプログラム。 (Appendix 22)
A program for causing a computer to execute the parallel efficiency calculation method according to any one of appendices 1 to 21.

（付記２３）
並列計算機システムの並列効率を計算する並列効率計算装置であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算手段と、
前記並列計算機システムにおいて実施した処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率を計算し、記憶装置に格納する仮想並列化率計算手段と、
前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率を計算し、記憶装置に格納する並列性能阻害要因寄与率計算手段と、
前記ロードバランス寄与率と前記仮想並列化率と前記並列性能阻害要因寄与率とを用いて並列効率を計算し、記憶装置に格納する手段と、
を有する並列効率計算装置。 (Appendix 23)
A parallel efficiency computing device for computing the parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating means for calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system and storing the load balance contribution ratio in a storage device;
A virtual parallelization rate calculating means for calculating a virtual parallelization rate representing a ratio of time of a part of the processing executed in the parallel computer system, which is calculated in parallel by each processor, and storing the virtual parallelization rate in a storage device;
Parallel performance inhibition factor contribution ratio calculating means for calculating a parallel performance inhibition factor contribution ratio representing a ratio of processing time of each parallel performance inhibition factor portion to the total processing time of all processors included in the parallel computer system, and storing it in a storage device When,
Means for calculating parallel efficiency using the load balance contribution rate, the virtual parallelization rate, and the parallel performance impediment factor contribution rate, and storing the storage efficiency in a storage device;
A parallel efficiency computing device having:

（付記２４）
並列計算機システムの並列効率を計算する並列効率計算装置であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算手段と、
前記並列計算機システムにおいて実施する処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率を計算し、記憶装置に格納する加速率計算手段と、
前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率を計算し、記憶装置に格納する並列性能阻害要因寄与率計算手段と、
前記ロードバランス寄与率と前記加速率と前記並列性能阻害要因寄与率とを用いて並列効率を計算し、記憶装置に格納する手段と、
を有する並列効率計算装置。 (Appendix 24)
A parallel efficiency computing device for computing the parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating means for calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system and storing the load balance contribution ratio in a storage device;
Acceleration rate calculation means for calculating an acceleration rate representing the limit of improvement in the degree of reduction in processing time by parallelization of processing performed in the parallel computer system, and storing the acceleration rate in a storage device;
Parallel performance inhibition factor contribution ratio calculating means for calculating a parallel performance inhibition factor contribution ratio representing a ratio of processing time of each parallel performance inhibition factor portion to the total processing time of all processors included in the parallel computer system, and storing it in a storage device When,
Means for calculating parallel efficiency using the load balance contribution rate, the acceleration rate, and the parallel performance impediment factor contribution rate, and storing the storage efficiency in a storage device;
A parallel efficiency computing device having:

（付記２５）
並列計算機システムの並列効率を計算する並列効率計算装置であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算手段と、
前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分の処理時間の割合を表す並列性能阻害要因寄与率を計算し、記憶装置に格納する並列性能阻害要因寄与率計算手段と、
前記ロードバランス寄与率と前記並列性能阻害要因寄与率とを用いて並列効率を計算し、記憶装置に格納する手段と、
を有する並列効率計算装置。 (Appendix 25)
A parallel efficiency computing device for computing the parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating means for calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system and storing the load balance contribution ratio in a storage device;
Parallel performance inhibition factor contribution ratio calculating means for calculating a parallel performance inhibition factor contribution ratio representing a ratio of processing time of each parallel performance inhibition factor portion to the total processing time of all processors included in the parallel computer system, and storing it in a storage device When,
Means for calculating parallel efficiency using the load balance contribution rate and the parallel performance impediment factor contribution rate, and storing the storage efficiency in a storage device;
A parallel efficiency computing device having:

（付記２６）
並列計算機システムの並列効率を計算する並列効率計算装置であって、
前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率を計算し、記憶装置に格納するロードバランス寄与率計算手段と、
前記並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算される部分の、時間についての割合を表す仮想並列化率を計算し、記憶装置に格納する仮想並列化率計算手段と、
前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和と、前記各プロセッサにおいて実施された処理の処理時間の和と、前記ロードバランス寄与率と、前記仮想並列化率とを用いて並列効率を計算し、記憶装置に格納する手段と、
を有する並列効率計算装置。 (Appendix 26)
A parallel efficiency computing device for computing the parallel efficiency of a parallel computer system,
A load balance contribution ratio calculating means for calculating a load balance contribution ratio representing a degree of load balance between processors included in the parallel computer system and storing the load balance contribution ratio in a storage device;
A virtual parallelization rate calculating means for calculating a virtual parallelization rate representing a ratio of time of a part of the processing executed in the parallel computer system, which is calculated in parallel by each processor, and storing the virtual parallelization rate in a storage device;
Of the processes executed in each processor included in the parallel computer system, the sum of the processing times of the parallel calculation part, the sum of the processing times of the processes executed in the processors, the load balance contribution ratio, and the virtual Means for calculating the parallel efficiency using the parallelization rate and storing it in a storage device;
A parallel efficiency computing device having:

（付記２７）
並列計算機システムの並列効率を計算する並列効率計算装置であって、
１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間を計算し、記憶装置に格納する手段と、
前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間の和である第２の処理時間を計算し、記憶装置に格納する手段と、
前記並列計算機システムにおいて使用したプロセッサの数と、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理の処理時間のうち最長の処理時間と、前記第１の処理時間と、前記第２の処理時間とを用いて並列効率を計算し、記憶装置に格納する手段と、
を有する並列効率計算装置。 (Appendix 27)
A parallel efficiency computing device for computing the parallel efficiency of a parallel computer system,
Means for calculating a first processing time corresponding to the total processing time of the parallel performance impeding portion of the processing in a case where the processing is performed by one processor and storing it in a storage device;
Means for calculating a second processing time that is the sum of the processing times of the parallel calculation portion of the processing performed in each processor included in the parallel computer system, and storing the second processing time in a storage device;
The number of processors used in the parallel computer system, the longest processing time of the processing times executed in each processor included in the parallel computer system, the first processing time, and the second processing Means for calculating parallel efficiency using time and storing in a storage device;
A parallel efficiency computing device having:

（付記２８）
並列計算機システムの並列効率Ｅ_p(p)を計算する並列効率計算方法であって、
前記並列効率Ｅ_p(p)は、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間が前記ｐ個のプロセッサの各処理時間τ_i(p)に等しいと仮定した場合の総処理時間に対する、並列処理を行わない場合における処理時間の割合であり、
前記並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、前記並列計算機システムの記憶部に格納するステップと、
データ取得部とロードバランス寄与率計算部と仮想並列化率計算部と並列性能阻害要因寄与率計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータの前記データ取得部により、前記並列計算機システムの前記記憶部から、前記並列計算部分の処理時間γ_i(p)及び前記各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、前記ログデータ格納部に格納するステップと、
前記ロードバランス寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、前記記憶装置に格納するロードバランス寄与率計算ステップと、
前記仮想並列化率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率Ｒp(p)を計算し、前記記憶装置に格納する仮想並列化率計算ステップと、
前記並列性能阻害要因寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分ｊの処理時間χ_i,j(p)の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、前記記憶装置に格納する並列性能阻害要因寄与率計算ステップと、
前記並列効率計算部により、前記記憶装置に格納された、前記ロードバランス寄与率Ｒb(p)と前記仮想並列化率Ｒp(p)と前記並列性能阻害要因寄与率Ｒj(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 28)
A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to the processing times τ _i (p) of the p processors. The ratio of processing time to the time when parallel processing is not performed,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
By the data acquisition unit of the computer having a data acquisition unit, a load balance contribution rate calculation unit, a virtual parallelization rate calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device, From the storage unit of the parallel computer system, the processing time γ _i (p) of the parallel calculation part and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are obtained and stored in the log data storage unit Storing, and
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
A virtual representing a ratio of time of a part calculated in parallel by each processor in processing executed in the parallel computer system by using the data stored in the log data storage unit by the virtual parallelization rate calculation unit A virtual parallelization rate calculating step of calculating a parallelization rate Rp (p) and storing it in the storage device;
Using the data stored in the log data storage unit by the parallel performance inhibition factor contribution rate calculation unit, the processing time χ of each parallel performance inhibition factor part j relative to the total processing time of all processors included in the parallel computer system calculating a parallel performance impediment factor contribution ratio Rj (p) representing a ratio of _{i, j} (p) and storing it in the storage device;
Using the load balance contribution rate Rb (p), the virtual parallelization rate Rp (p), and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device by the parallel efficiency calculation unit. , Parallel efficiency E _p (p)

And calculating and storing in the storage device;
Parallel efficiency calculation method including

（付記２９）
並列計算機システムの並列効率Ｅ_p(p)を計算する並列効率計算方法であって、
前記並列効率Ｅ_p(p)は、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間が前記ｐ個のプロセッサの各処理時間に等しいと仮定した場合の総処理時間に対する、並列処理を行わない場合における処理時間の割合であり、
前記並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、前記並列計算機システムの記憶部に格納するステップと、
データ取得部とロードバランス寄与率計算部と補助指標計算部と並列性能阻害要因寄与率計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータの前記データ取得部により、前記並列計算機システムの前記記憶部から、前記並列計算部分の処理時間γ_i(p)及び前記各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、前記ログデータ格納部に格納するステップと、
前記ロードバランス寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、前記記憶装置に格納するロードバランス寄与率計算ステップと、
前記補助指標計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムにおいて実施する処理の並列化による処理時間の短縮度合いの向上の限度を表す加速率Ａ_p(p)を計算し、前記記憶装置に格納する加速率計算ステップと、
前記並列性能阻害要因寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分ｊの処理時間χ_i,j(p)の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、前記記憶装置に格納する並列性能阻害要因寄与率計算ステップと、
前記並列効率計算部により、前記記憶装置に格納された、前記ロードバランス寄与率Ｒb(p)と前記加速率Ａ_p(p)と前記並列性能阻害要因寄与率Ｒj(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 29)
A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the parallel processing with respect to the total processing time when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to each processing time of the p processors. Is the ratio of the processing time when not performing
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
The data acquisition unit of the computer having a data acquisition unit, a load balance contribution rate calculation unit, an auxiliary index calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device allows the parallel The processing time γ _i (p) of the parallel computing portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are acquired from the storage unit of the computer system and stored in the log data storage unit Steps,
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
The auxiliary index calculation unit uses the data stored in the log data storage unit, and uses the data stored in the log data storage unit to express the acceleration rate A _p (p ) And calculating an acceleration rate to be stored in the storage device,
Using the data stored in the log data storage unit by the parallel performance inhibition factor contribution rate calculation unit, the processing time χ of each parallel performance inhibition factor part j relative to the total processing time of all processors included in the parallel computer system calculating a parallel performance impediment factor contribution ratio Rj (p) representing a ratio of _{i, j} (p) and storing it in the storage device;
Using the load balance contribution rate Rb (p), the acceleration rate A _p (p), and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device by the parallel efficiency calculation unit, Parallel efficiency E _p (p)

（付記３０）
並列計算機システムの並列効率Ｅ_p(p)を計算する並列効率計算方法であって、
前記並列効率Ｅ_p(p)は、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間が前記ｐ個のプロセッサの各処理時間τ_i(p)に等しいと仮定した場合の総処理時間に対する、並列処理を行わない場合における処理時間の割合であり、
前記並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、前記並列計算機システムの記憶部に格納するステップと、
データ取得部とロードバランス寄与率計算部と並列性能阻害要因寄与率計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータの前記データ取得部により、前記並列計算機システムの前記記憶部から、前記並列計算部分の処理時間γ_i(p)及び前記各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、前記ログデータ格納部に格納するステップと、
前記ロードバランス寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、前記記憶装置に格納するロードバランス寄与率計算ステップと、
前記並列性能阻害要因寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる全プロセッサの全処理時間に対する各並列性能阻害要因部分ｊの処理時間χ_i,j(p)の割合を表す並列性能阻害要因寄与率Ｒj(p)を計算し、前記記憶装置に格納する並列性能阻害要因寄与率計算ステップと、
前記並列効率計算部により、前記記憶装置に格納された、前記ロードバランス寄与率Ｒb(p)と前記並列性能阻害要因寄与率Ｒj(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 30)
A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to the processing times τ _i (p) of the p processors. The ratio of processing time to the time when parallel processing is not performed,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
The data acquisition unit of the computer having a data acquisition unit, a load balance contribution rate calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device allows the storage of the parallel computer system Obtaining the processing time γ _i (p) of the parallel computing portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j from the unit, and storing the log data in the log data storage unit;
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
Using the data stored in the log data storage unit by the parallel performance inhibition factor contribution rate calculation unit, the processing time χ of each parallel performance inhibition factor part j relative to the total processing time of all processors included in the parallel computer system calculating a parallel performance impediment factor contribution ratio Rj (p) representing a ratio of _{i, j} (p) and storing it in the storage device;
The parallel efficiency calculation unit calculates the parallel efficiency E _p (p) using the load balance contribution rate Rb (p) and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device.

（付記３１）
並列計算機システムの並列効率Ｅ_p(p)を計算する並列効率計算方法であって、
前記並列効率Ｅ_p(p)は、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間が前記ｐ個のプロセッサの各処理時間τ_i(p)に等しいと仮定した場合の総処理時間に対する、並列処理を行わない場合における処理時間の割合であり、
前記並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)とを測定し、前記並列計算機システムの記憶部に格納するステップと、
データ取得部とロードバランス寄与率計算部と仮想並列化率計算部と並列効率計算部と補助指標計算部とログデータ格納部と記憶装置とを有するコンピュータの前記データ取得部により、前記並列計算機システムの前記記憶部から、前記並列計算部分の処理時間γ_i(p)及び前記各並列性能阻害要因ｊの処理時間χ_i,j(p)を取得し、前記ログデータ格納部に格納するステップと、
前記ロードバランス寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる各プロセッサ間の負荷の均衡度合いを表すロードバランス寄与率Ｒb(p)を計算し、前記記憶装置に格納するロードバランス寄与率計算ステップと、
前記仮想並列化率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムにおいて実施する処理のうち各プロセッサにより並列計算された部分の、時間についての割合を表す仮想並列化率Ｒp(p)を計算し、前記記憶装置に格納する仮想並列化率計算ステップと、
前記補助指標計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間γ_i(p)の和αと、前記各プロセッサにおいて実施された処理の処理時間の和βとを計算し、前記記憶装置に格納する補助指標計算ステップと、
前記並列効率計算部により、前記記憶装置に格納された、前記αと前記βと前記ロードバランス寄与率Ｒb(p)と前記仮想並列化率Ｒp(p)とを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 31)
A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to the processing times τ _i (p) of the p processors. The ratio of processing time to the time when parallel processing is not performed,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
The parallel computer system includes the data acquisition unit, the load balance contribution rate calculation unit, the virtual parallelization rate calculation unit, the parallel efficiency calculation unit, the auxiliary index calculation unit, the log data storage unit, and the storage device. Obtaining the processing time γ _i (p) of the parallel computing portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j from the storage unit of the storage unit, and storing in the log data storage unit; ,
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
A virtual representing a ratio of time of a part calculated in parallel by each processor in processing executed in the parallel computer system by using the data stored in the log data storage unit by the virtual parallelization rate calculation unit A virtual parallelization rate calculating step of calculating a parallelization rate Rp (p) and storing it in the storage device;
Using the data stored in the log data storage unit by the auxiliary index calculation unit, the sum of the processing times γ _i (p) of the parallel calculation part of the processes executed in each processor included in the parallel computer system an auxiliary index calculating step of calculating α and a sum β of processing times of processing performed in each of the processors, and storing in the storage device;
Using the α, β, the load balance contribution rate Rb (p), and the virtual parallelization rate Rp (p) stored in the storage device by the parallel efficiency calculation unit, a parallel efficiency E _p ( p)

（付記３２）
並列計算機システムの並列効率Ｅ_p(p)を計算する並列効率計算方法であって、
前記並列効率Ｅ_p(p)は、ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間τ(p)が前記ｐ個のプロセッサの各処理時間τ_i(p)に等しいと仮定した場合の総処理時間に対する、並列処理を行わない場合における処理時間の割合であり、
前記並列計算機システムにおいて、処理における並列計算部分の処理時間γ_i(p)（ｉはプロセッサ番号を示す）と、各並列性能阻害要因ｊの処理時間χ_i,j(p)と、冗長処理以外に並列性能阻害要因が存在する場合にはｐ＞１で発生し且つｐに依存する並列性能阻害要因ｊによる処理時間Ｘ_i,j(p)とを測定し、前記並列計算機システムの記憶部に格納するステップと、
データ取得部と補助指標計算部と並列効率計算部とログデータ格納部と記憶装置とを有するコンピュータの前記データ取得部により、前記並列計算機システムの前記記憶部から、前記並列計算部分の処理時間γ_i(p)と前記各並列性能阻害要因ｊの処理時間χ_i,j(p)と前記冗長処理以外に並列性能阻害要因が存在する場合には前記処理時間Ｘ_i,j(p)とを取得し、前記ログデータ格納部に格納するステップと、
前記補助指標計算部により、前記ログデータ格納部に格納されたデータを用いて、１プロセッサにより処理を実施する場合において当該処理のうち並列性能阻害部分の全処理時間に相当する第１の処理時間ρを計算し、前記記憶装置に格納するステップと、
前記補助指標計算部により、前記ログデータ格納部に格納されているデータを用いて、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間γ_i(p)の和である第２の処理時間αを計算し、前記記憶装置に格納するステップと、
前記並列効率計算部により、前記並列計算機システムにおいて使用したプロセッサの数ｐと、前記ｐ個のプロセッサによって並列処理を行った場合における最長の処理時間τ(p)と、前記記憶装置に格納された、前記第１の処理時間ρ及び前記第２の処理時間αとを用いて、並列効率Ｅ_p(p)を

により計算し、前記記憶装置に格納するステップと、
を含む並列効率計算方法。 (Appendix 32)
A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is assumed that the longest processing time τ (p) when parallel processing is performed by p processors is equal to each processing time τ _i (p) of the p processors. Is the ratio of the processing time when parallel processing is not performed to the total processing time of the case,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing, the processing time χ _{i, j} (p) of each parallel performance impediment factor j, and other than redundant processing When a parallel performance impediment factor exists in p, the processing time X _{i, j} (p) generated by p> 1 and depending on the parallel performance impediment factor j depending on p is measured and stored in the storage unit of the parallel computer system Storing, and
The data acquisition unit of the computer having a data acquisition unit, an auxiliary index calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device causes the processing time γ of the parallel calculation part from the storage unit of the parallel computer system. _i (p), the processing time χ _{i, j} (p) of each parallel performance impediment factor j and the processing time X _{i, j} (p) when there are parallel performance impediment factors other than the redundant processing. Obtaining and storing in the log data storage unit;
A first processing time corresponding to the total processing time of the parallel performance impeding portion of the processing when the processing is performed by one processor using the data stored in the log data storage unit by the auxiliary index calculation unit calculating ρ and storing it in the storage device;
Using the data stored in the log data storage unit by the auxiliary index calculation unit, the processing time γ _i (p) of the parallel calculation part of the processing performed in each processor included in the parallel computer system Calculating a second processing time α which is a sum and storing it in the storage device;
The parallel efficiency calculation unit stores the number p of processors used in the parallel computer system, the longest processing time τ (p) when parallel processing is performed by the p processors, and the storage device. , Using the first processing time ρ and the second processing time α, the parallel efficiency E _p (p) is

（付記３３）
前記ロードバランス寄与率計算ステップが、
前記ロードバランス寄与率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記各プロセッサにおいて実施された処理の処理時間の和βを算出し、前記記憶装置に格納するステップと、
前記ロードバランス寄与率計算部により、前記記憶装置及び前記ログデータ格納部に格納されているデータを用いて、前記ロードバランス寄与率Ｒb(p)を、前記各プロセッサにおいて実施された処理の処理時間の和βを、前記ｐ個のプロセッサによって処理を行った場合における最長の処理時間τ(p)及び前記並列計算機システムにおいて使用したプロセッサ数ｐにより除することにより計算するステップと、
を含む付記２８乃至３１のいずれか１つ記載の並列効率計算方法。 (Appendix 33)
The load balance contribution ratio calculating step includes:
A step of calculating a sum β of processing times of processes executed in each processor by using the data stored in the log data storage unit by the load balance contribution rate calculation unit, and storing the sum β in the storage device;
Using the data stored in the storage device and the log data storage unit by the load balance contribution rate calculation unit, the load balance contribution rate Rb (p) is calculated as the processing time of processing performed in each processor. Dividing the sum β by the longest processing time τ (p) when processing is performed by the p processors and the number p of processors used in the parallel computer system;
32. The parallel efficiency calculation method according to any one of appendices 28 to 31, including:

（付記３４）
前記仮想並列化率計算ステップが、
前記仮想並列化率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算部分の処理時間γ_i(p)の和を計算し、前記記憶装置に格納するステップと、
前記仮想並列化率計算部により、前記仮想並列化率を、前記ログデータ格納部及び前記記憶装置に格納されたデータを用いて、前記並列計算部分の処理時間γ_i(p)の和を、１プロセッサにより同一処理を実施した場合の処理時間に相当する第３の処理時間τ(1)により除することにより計算するステップと、
含む付記２８又は３１記載の並列効率計算方法。 (Appendix 34)
The virtual parallelization rate calculation step includes:
The virtual parallelization rate calculation unit calculates the sum of the processing times γ _i (p) of the parallel calculation part using the data stored in the log data storage unit, and stores it in the storage device;
By the virtual parallelization rate calculation unit, the virtual parallelization rate is calculated using the data stored in the log data storage unit and the storage device, and the sum of the processing times γ _i (p) of the parallel calculation part is obtained, Calculating by dividing by a third processing time τ (1) corresponding to the processing time when the same processing is performed by one processor;
32. A parallel efficiency calculation method according to appendix 28 or 31.

（付記３５）
前記並列性能阻害要因寄与率計算ステップが、
前記並列性能阻害要因寄与率計算部により、特定の並列性能阻害要因部分ｊの処理時間χ_i,j(p)の和と、前記各プロセッサにおいて実施された処理の処理時間の和βを計算し、前記記憶装置に格納するステップと、
前記並列性能阻害要因寄与率計算部により、前記特定の並列性能阻害要因についての並列性能阻害要因寄与率Ｒj(p)を、前記ログデータ格納部及び前記記憶装置に格納されたデータを用いて、前記特定の並列性能阻害要因部分ｊの処理時間χ_i,j(p)の和を、前記各プロセッサにおいて実施された処理の処理時間の和βにより除することにより計算するステップと、
を含む付記２８乃至３０のいずれか１つ記載の並列効率計算方法。 (Appendix 35)
The parallel performance impediment factor contribution rate calculating step includes:
The parallel performance impediment factor contribution rate calculation unit calculates the sum of the processing times χ _{i, j} (p) of a specific parallel performance impediment factor part j and the processing time β of the processing performed in each processor. Storing in the storage device;
Using the data stored in the log data storage unit and the storage device, the parallel performance inhibition factor contribution rate calculation unit calculates the parallel performance inhibition factor contribution rate Rj (p) for the specific parallel performance inhibition factor, Calculating by dividing the sum of the processing times χ _{i, j} (p) of the specific parallel performance impediment factor part j by the sum of processing times β of the processing performed in each of the processors;
The parallel efficiency calculation method according to any one of appendices 28 to 30, including:

（付記３６）
前記加速率計算ステップにおいて、
前記加速率計算部により、前記ログデータ格納部に格納されたデータを用いて、前記並列計算部分の処理時間γ_i(p)の和を算出し、前記記憶装置に格納するステップと、
前記加速率計算部により、前記ログデータ格納部及び前記記憶装置に格納されているデータを用いて、前記加速率Ａ_pを、前記並列計算部分γ_i(p)の処理時間γ_i(p)の和を１プロセッサにより同一処理を実施した場合の処理時間に相当する第３の処理時間τ(1)により除することにより計算される仮想並列化率を１から差し引いた値の逆数として計算するステップと、
を含む付記２９記載の並列効率計算方法。 (Appendix 36)
In the acceleration rate calculating step,
Using the data stored in the log data storage unit by the acceleration rate calculation unit, calculating the sum of the processing times γ _i (p) of the parallel calculation part, and storing in the storage device;
By the acceleration rate calculation unit, by using the data stored in the log data storage unit and the storage device, the acceleration rate A _p, the parallel calculation portion gamma _i processing time (p) γ _i (p) Is calculated as a reciprocal of the value obtained by subtracting 1 from the virtual parallelization rate calculated by dividing the sum of the two by the third processing time τ (1) corresponding to the processing time when the same processing is performed by one processor. Steps,
The parallel efficiency calculation method according to appendix 29, including:

（付記３７）
前記並列性能阻害要因が冗長処理のみである場合には、前記ログデータ格納部に格納されているデータを用いて、前記冗長処理の処理時間Ｒ_iから特定される並列阻害要因処理時間Ｒと前記並列計算部分の処理時間γ_i(p)の総和との和により、前記第３の処理時間に相当する処理時間τ(1)を算出し、前記記憶装置に格納するステップ
をさらに含む付記３４又は３６記載の並列効率計算方法。 (Appendix 37)
Wherein when the parallel performance impediment factor is only redundant processing, using said data stored in the log data storage unit, the parallel disincentive processing time R specified from the processing time of the redundant processing R _i The supplementary note 34 or further comprising the step of calculating a processing time τ (1) corresponding to the third processing time based on the sum of the processing times γ _i (p) of the parallel computing portion and storing it in the storage device 36. The parallel efficiency calculation method according to 36.

（付記３８）
前記並列性能阻害要因が前記冗長処理以外にも存在する場合には、前記並列計算機システムにおいてｐ＞１で発生し且つｐに依存する並列性能阻害要因ｊによる処理時間Ｘ_i,j(p)を測定し、前記並列計算機システムの前記記憶部に格納するステップと、
前記データ取得部により、前記並列計算機システムの前記記憶部から、前記処理時間Ｘ_i,j(p)とを取得し、前記ログデータ格納部に格納するステップと、
冗長処理以外の前記並列性能阻害要因ｊの処理時間χ_i,j(p)から前記並列性能阻害要因ｊによる前記処理時間Ｘ_i,j(p)を差し引いた時間を基に特定される並列性能阻害要因ｊのｐ＝１時の処理時間χ_1,jの全ての並列性能阻害要因についての総和と前記前記並列計算部分の処理時間γ_i(p)の総和との和により、前記第３の処理時間τ(1)を算出し、前記記憶装置に格納するステップと、
をさらに含む付記３４又は３６記載の並列効率計算方法。 (Appendix 38)
If the parallel performance impediment factor exists in addition to the redundant processing, the processing time X _{i, j} (p) due to the parallel performance impediment factor j that occurs at p> 1 and depends on p is calculated in the parallel computer system. Measuring and storing in the storage unit of the parallel computer system;
Acquiring the processing time X _{i, j} (p) from the storage unit of the parallel computer system by the data acquisition unit and storing it in the log data storage unit;
Parallel performance specified based on a time obtained by subtracting the processing time X _{i, j} (p) due to the parallel performance impediment factor j from the processing time χ _{i, j} (p) of the parallel performance impediment factor j other than redundant processing The sum of all the parallel performance impediment factors of the processing time χ _{1, j} of the inhibition factor j at p = 1 and the sum of the processing times γ _i (p) of the parallel computation portion is Calculating a processing time τ (1) and storing it in the storage device;
37. The parallel efficiency calculation method according to appendix 34 or 36, further including:

（付記３９）
前記処理時間が、対応する事象の確認回数で表されることを特徴とする請求項２８乃至３８のいずれか１つ記載の並列効率計算方法。 (Appendix 39)
The parallel efficiency calculation method according to any one of claims 28 to 38, wherein the processing time is represented by a confirmation count of a corresponding event.

（付記４０）
前記第１の処理時間ρ又は前記並列阻害要因処理時間Ｒが、前記冗長処理の処理時間Ｒ_i(p)の和をプロセッサ数で除した値、前記冗長処理の処理時間Ｒ_i(p)の最大値、前記冗長処理の処理時間Ｒ_i(p)の最小値、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち並列計算部分の処理時間γ_i(p)と並列性能阻害要因の処理時間の総和が最大となるプロセッサにおける冗長処理の処理時間の値のいずれかである
ことを特徴とする付記３２又は３７記載の並列効率計算方法。 (Appendix 40)
The first process time ρ or the parallel disincentive processing time R is a sum value obtained by dividing the number of processors in the processing time of the redundant processing R _i (p), the processing time of the redundant processing R _i of (p) The maximum value, the minimum value of the processing time R _i (p) of the redundant processing, the processing time γ _i (p) of the parallel computing portion of the processing executed in each processor included in the parallel computer system, and the parallel performance impediment factor 38. The parallel efficiency calculation method according to appendix 32 or 37, wherein the processing time is a value of the processing time of the redundant processing in the processor having the maximum total processing time.

（付記４１）
前記第１の処理時間ρ又は前記並列性能阻害要因ｊのｐ＝１時の処理時間χ_1,jが、前記並列計算機システムに含まれる各プロセッサにおいて実施された処理のうち冗長処理以外の並列性能阻害要因ｊによる処理時間χ_i,j(p)からｐ＞１で発生し且つｐに依存する並列化阻害要因ｊによる処理時間Ｘ_i,j(p)を減じた第４の処理時間を全プロセッサについて加算した値をプロセッサ数で除した値、全プロセッサにおける前記第４の処理時間の最大値、全プロセッサにおける前記第４の処理時間の最小値のいずれかの値である
ことを特徴とする付記３２又は３８記載の並列効率計算方法。 (Appendix 41)
The processing time χ _{1, j} when the first processing time ρ or the parallel performance impediment factor j is p = 1 is parallel performance other than redundant processing among the processing executed in each processor included in the parallel computer system. The fourth processing time is obtained by subtracting the processing time X _{i, j} (p) due to the parallelization inhibition factor j that occurs at p> 1 and depends on p from the processing time χ _{i, j} (p) due to the inhibition factor j. A value obtained by dividing the value added for the processor by the number of processors, the maximum value of the fourth processing time in all processors, or the minimum value of the fourth processing time in all processors. 39. The parallel efficiency calculation method according to attachment 32 or 38.

プロセッサ間でロードバランスが保たれている状態を表す図である。It is a figure showing the state by which the load balance is maintained between processors. 各プロセッサにおける処理時間の分類例を示す図である。It is a figure which shows the example of classification | category of the processing time in each processor. プロセッサ間でロードバランスが保たれていない状態（４つのプロセッサを割り当てて、その中のプロセッサの１つで処理する場合）を表す図である。It is a figure showing the state (when assigning four processors and processing with one of the processors among them) where the load balance is not maintained between processors. γ₁とγ_i(p)の関係のモデル化の一例を示す図である。It is a figure which shows an example of modeling of the relationship between (gamma) ₁ and (gamma) _i (p). ＣＰＵ性能にばらつきがあり且つデータパラレル処理を行っている状態を表す図である。It is a figure showing the state which has dispersion | variation in CPU performance and is performing data parallel processing. （ａ）は１プロセッサで処理する場合の処理時間を表す図であり、（ｂ）は４プロセッサで処理する場合の処理時間を表す図である。(A) is a figure showing the processing time in the case of processing by one processor, (b) is a figure showing the processing time in the case of processing by four processors. （ａ）は並列処理部γと通信部χ_Cの時間を考慮した場合における処理時間を表す図であり、（ｂ）はさらに立ち上がり時間χ_TCを考慮に入れた場合における処理時間を表す図である。(A) is a diagram showing the processing time when the time of the parallel processing unit γ and the communication unit χ _C is considered, and (b) is a diagram showing the processing time when the rise time χ _TC is further taken into consideration. is there. 並列性能阻害要因を追加した場合における並列性能評価指標の変化を表すための図である。It is a figure for showing the change of the parallel performance evaluation index at the time of adding a parallel performance impediment factor. プロセッサ間でロードバランスは保たれているが、各処理時間のロードバランスまでは保たれていない場合の例を表す図である。It is a figure showing the example in case the load balance is maintained between processors, but the load balance of each processing time is not maintained. １つのプロセッサで処理する場合の処理時間を表す図である。It is a figure showing the processing time in the case of processing by one processor. プロセッサ間でロードバランスが保たれていない場合の例を示す図である。It is a figure which shows the example in case the load balance is not maintained between processors. 高並列化された場合に顕在化する並列性能阻害要因の存在を表すための図である。It is a figure for showing the existence of the parallel performance impediment factor which becomes apparent when highly parallelized. 並列性能評価指標の計算例を表す図である。It is a figure showing the example of calculation of a parallel performance evaluation index. 稼働時間と処理時間の総和の関係を表す図である。It is a figure showing the relationship between the sum total of working time and processing time. 稼働時間と処理時間と並列効率を考慮した処理時間との関係を表す図である。It is a figure showing the relationship between working time, processing time, and processing time which considered parallel efficiency. データ並列を分散メモリ並列計算機システムで実施した場合の処理時間の例を表す図である。It is a figure showing the example of the processing time at the time of implementing data parallel with a distributed memory parallel computer system. 原状のＣＰＵ性能に基づく並列性能評価指標とＣＰＵ性能が５倍のシステムに入れ替えた場合の推定並列性能評価指標とを比較するための図である。It is a figure for comparing the parallel performance evaluation index based on the original CPU performance and the estimated parallel performance evaluation index when the CPU performance is replaced with a system of 5 times. ＣＰＵ性能が５倍のシステムに入れ替えた場合における試算のためのデータを表す図である。It is a figure showing the data for trial calculation when it replaces | exchanges for the system of CPU performance 5 times. 本発明の一実施の形態に係る機能ブロック図である。It is a functional block diagram concerning one embodiment of the present invention. サンプリングによる事象発生の確認及びカウントを表す概念図である。It is a conceptual diagram showing confirmation and count of event occurrence by sampling. 表１のプログラム実行時のサンプリング結果例を表す図である。It is a figure showing the example of a sampling result at the time of the program execution of Table 1. 並列性能分析装置の処理フローの一例を表す図である。It is a figure showing an example of the processing flow of a parallel performance analyzer. 時間測定による処理時間の測定結果例を表す図である。It is a figure showing the example of a measurement result of processing time by time measurement. サンプリングによる処理時間の測定結果例を表す図である。It is a figure showing the example of a measurement result of processing time by sampling. プロセッサ数最適化処理の処理フローの第１の部分の一例を表す図である。It is a figure showing an example of the 1st part of the processing flow of a processor number optimization process. プロセッサ数最適化処理の処理フローの第２の部分の一例を表す図である。It is a figure showing an example of the 2nd part of the processing flow of a processor number optimization process. プロセッサ増設見積処理の処理フローの一例を表す図である。It is a figure showing an example of the processing flow of a processor expansion estimation process. システムリプレイスデータ処理の処理フローの一例を表す図である。It is a figure showing an example of the processing flow of a system replacement data process. ＣＰＵ性能が５倍で目標並列効率が０．６の場合の通信についての性能指針を表すための図である。It is a figure for expressing the performance guideline about communication when CPU performance is 5 times and target parallel efficiency is 0.6. システム運用効率向上処理のための処理フローの一例を示す図である。It is a figure which shows an example of the processing flow for a system operation efficiency improvement process. チューニング処理の処理フローの一例を示す図である。It is a figure which shows an example of the processing flow of a tuning process. チューニング前と１回目のチューニングを実施した後の並列性能評価指標の変化を表す図である。It is a figure showing the change of the parallel performance evaluation parameter | index before implementing tuning after the tuning before and 1st time. 並列処理に向かないアルゴリズムに基づく並列処理プログラムによる処理時間を表す図である。It is a figure showing the processing time by the parallel processing program based on the algorithm which is not suitable for parallel processing. 並列処理に向くアルゴリズムに基づく並列処理プログラムによる処理時間を表す図である。It is a figure showing the processing time by the parallel processing program based on the algorithm suitable for parallel processing. 並列処理に向かないアルゴリズムと並列処理に向くアルゴリズムの並列性能指標の比較等のための図である。It is a figure for the comparison etc. of the parallel performance parameter | index of the algorithm which is not suitable for parallel processing, and the algorithm which is suitable for parallel processing. アルゴリズム選定処理の処理フローを表す図である。It is a figure showing the processing flow of an algorithm selection process. ある並列処理システムのログデータの一例を示す図である。It is a figure which shows an example of the log data of a certain parallel processing system. 実施例１における処理時間の測定結果を表す図である。It is a figure showing the measurement result of processing time in Example 1. FIG. 実施例１における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 1. FIG. 実施例２における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 2. FIG. 実施例２における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 2. FIG. 実施例３における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 3. FIG. 実施例３における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 3. FIG. 実施例４における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 4. 実施例４における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 4. FIG. 実施例５における処理時間の測定結果を表す図である。It is a figure showing the measurement result of processing time in Example 5. FIG. 実施例５における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 5. FIG. 実施例６（冗長処理を用いたデータパラレル）における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 6 (data parallel using redundant processing). 実施例６（冗長処理を用いたデータパラレル）における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 6 (data parallel using a redundant process). 実施例６（並列処理できない部分を特定のプロセッサで処理するデータパラレル）における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 6 (data parallel which processes the part which cannot be processed in parallel with a specific processor). 実施例６（並列処理できない部分を特定のプロセッサで処理するデータパラレル）における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 6 (data parallel which processes the part which cannot be processed in parallel with a specific processor). 実施例７における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 7. 実施例７における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 7. 実施例８における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 8. 実施例８における並列性能評価指標の計算結果を表す図である。FIG. 20 is a diagram illustrating a calculation result of a parallel performance evaluation index in Example 8. 実施例９における処理時間の測定結果を表す図である。It is a figure showing the measurement result of the processing time in Example 9. FIG. 実施例９における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 9. 実施例１０における処理時間の測定結果を表す図である。It is a figure showing the measurement result of processing time in Example 10. FIG. 実施例１０における並列性能評価指標の計算結果を表す図である。It is a figure showing the calculation result of the parallel performance evaluation parameter | index in Example 10. FIG.

Explanation of symbols

１０データ取得部１１ロードバランス寄与率計算部
１２仮想並列化率計算部１３並列性能阻害要因寄与率計算部
１４並列効率計算部１５補助指標計算部
２１プロセッサ数最適化処理部２２プロセッサ増設見積処理部
２３システムリプレイスデータ処理部
２４運用効率データ処理部２５チューニング処理部
２６アルゴリズム選定処理部２７並列性能評価処理部
３０ログデータ格納部
１００並列性能分析装置１１０出力装置
２００並列計算機システム２０１測定部 DESCRIPTION OF SYMBOLS 10 Data acquisition part 11 Load balance contribution rate calculation part 12 Virtual parallelization rate calculation part 13 Parallel performance impediment factor contribution rate calculation part 14 Parallel efficiency calculation part 15 Auxiliary index calculation part 21 Processor number optimization process part 22 Processor expansion estimation process part 23 System Replace Data Processing Unit
24 Operation efficiency data processing unit 25 Tuning processing unit 26 Algorithm selection processing unit 27 Parallel performance evaluation processing unit 30 Log data storage unit
DESCRIPTION OF SYMBOLS 100 Parallel performance analyzer 110 Output device 200 Parallel computer system 201 Measuring part

Claims

A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to the processing times τ _i (p) of the p processors. The ratio of processing time to the time when parallel processing is not performed,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
By the data acquisition unit of the computer having a data acquisition unit, a load balance contribution rate calculation unit, a virtual parallelization rate calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device, From the storage unit of the parallel computer system, the processing time γ _i (p) of the parallel calculation part and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are obtained and stored in the log data storage unit Storing, and
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
A virtual representing a ratio of time of a part calculated in parallel by each processor in processing executed in the parallel computer system by using the data stored in the log data storage unit by the virtual parallelization rate calculation unit A virtual parallelization rate calculating step of calculating a parallelization rate Rp (p) and storing it in the storage device;
Using the data stored in the log data storage unit by the parallel performance inhibition factor contribution rate calculation unit, the processing time χ of each parallel performance inhibition factor part j relative to the total processing time of all processors included in the parallel computer system calculating a parallel performance impediment factor contribution ratio Rj (p) representing a ratio of _{i, j} (p) and storing it in the storage device;
Using the load balance contribution rate Rb (p), the virtual parallelization rate Rp (p), and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device by the parallel efficiency calculation unit. , Parallel efficiency E _p (p)

A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the parallel processing with respect to the total processing time when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to each processing time of the p processors. Is the ratio of the processing time when not performing
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
The data acquisition unit of the computer having a data acquisition unit, a load balance contribution rate calculation unit, an auxiliary index calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device allows the parallel The processing time γ _i (p) of the parallel computing portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are acquired from the storage unit of the computer system and stored in the log data storage unit Steps,
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
The auxiliary index calculation unit uses the data stored in the log data storage unit, and uses the data stored in the log data storage unit to express the acceleration rate A _p (p ) And calculating an acceleration rate to be stored in the storage device,
Using the data stored in the log data storage unit by the parallel performance inhibition factor contribution rate calculation unit, the processing time χ of each parallel performance inhibition factor part j relative to the total processing time of all processors included in the parallel computer system calculating a parallel performance impediment factor contribution ratio Rj (p) representing a ratio of _{i, j} (p) and storing it in the storage device;
Using the load balance contribution rate Rb (p), the acceleration rate A _p (p), and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device by the parallel efficiency calculation unit, Parallel efficiency E _p (p)

A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to the processing times τ _i (p) of the p processors. The ratio of processing time to the time when parallel processing is not performed,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
The data acquisition unit of the computer having a data acquisition unit, a load balance contribution rate calculation unit, a parallel performance impediment factor contribution rate calculation unit, a parallel efficiency calculation unit, a log data storage unit, and a storage device allows the storage of the parallel computer system Obtaining the processing time γ _i (p) of the parallel computing portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j from the unit, and storing the log data in the log data storage unit;
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
Using the data stored in the log data storage unit by the parallel performance inhibition factor contribution rate calculation unit, the processing time χ of each parallel performance inhibition factor part j relative to the total processing time of all processors included in the parallel computer system calculating a parallel performance impediment factor contribution ratio Rj (p) representing a ratio of _{i, j} (p) and storing it in the storage device;
The parallel efficiency calculation unit calculates the parallel efficiency E _p (p) using the load balance contribution rate Rb (p) and the parallel performance impediment factor contribution rate Rj (p) stored in the storage device.

A parallel efficiency calculation method for calculating a parallel efficiency E _p (p) of a parallel computer system,
The parallel efficiency E _p (p) is the total processing when it is assumed that the longest processing time when parallel processing is performed by p processors is equal to the processing times τ _i (p) of the p processors. The ratio of processing time to the time when parallel processing is not performed,
In the parallel computer system, the processing time γ _i (p) (i indicates the processor number) of the parallel calculation part in the processing and the processing time χ _{i, j} (p) of each parallel performance impediment factor j are measured, Storing in the storage unit of the parallel computer system;
The parallel computer system includes the data acquisition unit, the load balance contribution rate calculation unit, the virtual parallelization rate calculation unit, the parallel efficiency calculation unit, the auxiliary index calculation unit, the log data storage unit, and the storage device. Obtaining the processing time γ _i (p) of the parallel computing portion and the processing time χ _{i, j} (p) of each parallel performance impediment factor j from the storage unit of the storage unit, and storing in the log data storage unit; ,
Using the data stored in the log data storage unit, the load balance contribution rate calculation unit calculates a load balance contribution rate Rb (p) that represents the degree of load balance among the processors included in the parallel computer system. And a load balance contribution ratio calculating step for storing in the storage device,
A virtual representing a ratio of time of a part calculated in parallel by each processor in processing executed in the parallel computer system by using the data stored in the log data storage unit by the virtual parallelization rate calculation unit A virtual parallelization rate calculating step of calculating a parallelization rate Rp (p) and storing it in the storage device;
Using the data stored in the log data storage unit by the auxiliary index calculation unit, the sum of the processing times γ _i (p) of the parallel calculation part of the processes executed in each processor included in the parallel computer system an auxiliary index calculating step of calculating α and a sum β of processing times of processing performed in each of the processors, and storing in the storage device;
Using the α, β, the load balance contribution rate Rb (p), and the virtual parallelization rate Rp (p) stored in the storage device by the parallel efficiency calculation unit, a parallel efficiency E _p ( p)

The load balance contribution ratio calculating step includes:
A step of calculating a sum β of processing times of processes executed in each processor by using the data stored in the log data storage unit by the load balance contribution rate calculation unit, and storing the sum β in the storage device;
Using the data stored in the storage device and the log data storage unit by the load balance contribution rate calculation unit, the load balance contribution rate Rb (p) is calculated as the processing time of processing performed in each processor. Dividing the sum β by the longest processing time τ (p) when processing is performed by the p processors and the number p of processors used in the parallel computer system;
The parallel efficiency calculation method according to claim 1, comprising:

The parallel performance impediment factor contribution rate calculating step includes:
The parallel performance impediment factor contribution rate calculation unit calculates the sum of the processing times χ _{i, j} (p) of a specific parallel performance impediment factor part j and the processing time β of the processing performed in each processor. Storing in the storage device;
Using the data stored in the log data storage unit and the storage device, the parallel performance inhibition factor contribution rate calculation unit calculates the parallel performance inhibition factor contribution rate Rj (p) for the specific parallel performance inhibition factor, Calculating by dividing the sum of the processing times χ _{i, j} (p) of the specific parallel performance impediment factor part j by the sum of processing times β of the processing performed in each of the processors;
The parallel efficiency calculation method according to claim 1, comprising:

The virtual parallelization rate calculation step includes:
The virtual parallelization rate calculation unit calculates the sum of the processing times γ _i (p) of the parallel calculation part using the data stored in the log data storage unit, and stores it in the storage device;
By the virtual parallelization rate calculation unit, the virtual parallelization rate is calculated using the data stored in the log data storage unit and the storage device, and the sum of the processing times γ _i (p) of the parallel calculation part is obtained, Calculating by dividing by a third processing time τ (1) corresponding to the processing time when the same processing is performed by one processor;
Including
Wherein when the parallel performance impediment factor is only redundant processing, using said data stored in the log data storage unit, the parallel disincentive processing time R specified from the processing time of the redundant processing R _i The processing time τ (1) corresponding to the third processing time is calculated based on the sum of the processing time γ _i (p) of the parallel calculation portion and further stored in the storage device. 4. The parallel efficiency calculation method according to 4.

The virtual parallelization rate calculation step includes:
The virtual parallelization rate calculation unit calculates the sum of the processing times γ _i (p) of the parallel calculation part using the data stored in the log data storage unit, and stores it in the storage device;
By the virtual parallelization rate calculation unit, the virtual parallelization rate is calculated using the data stored in the log data storage unit and the storage device, and the sum of the processing times γ _i (p) of the parallel calculation part is obtained, Calculating by dividing by a third processing time τ (1) corresponding to the processing time when the same processing is performed by one processor;
Including
If the parallel performance impediment factor exists in addition to the redundant processing, the processing time X _{i, j} (p) due to the parallel performance impediment factor j that occurs at p> 1 and depends on p is calculated in the parallel computer system. Measuring and storing in the storage unit of the parallel computer system;
Acquiring the processing time X _{i, j} (p) from the storage unit of the parallel computer system by the data acquisition unit and storing it in the log data storage unit;
Parallel performance specified based on a time obtained by subtracting the processing time X _{i, j} (p) due to the parallel performance impediment factor j from the processing time χ _{i, j} (p) of the parallel performance impediment factor j other than redundant processing The sum of all the parallel performance impediment factors of the processing time χ _{1, j} of the inhibition factor j at p = 1 and the sum of the processing times γ _i (p) of the parallel computation portion is Calculating a processing time τ (1) and storing it in the storage device;
The parallel efficiency calculation method according to claim 1 or 4, further comprising:

The parallel impediments processing time R is the sum divided by the number of processors of the redundant processing of the processing time R _i (p), the maximum value of the redundant processing of the processing time R _i (p), the processing of the redundant processing The minimum value of the time R _i (p), the sum of the processing time γ _i (p) of the parallel computing portion and the processing time of the parallel performance impeding factor among the processing executed in each processor included in the parallel computer system is maximum The parallel efficiency calculation method according to claim 7, wherein the processing time is a value of a processing time of redundant processing in the processor.

The processing time χ _{1, j} at the time p = 1 of the parallel performance impediment factor j is the processing time χ _i due to the parallel performance impediment factor j other than the redundant processing among the processing executed in each processor included in the parallel computer system. _{, j} (p), the number of processors is obtained by adding the fourth processing time generated by p> 1 and subtracting the processing time X _{i, j} (p) due to the parallelization inhibiting factor j depending on p for all processors. 9. The parallel efficiency calculation according to claim 8, wherein the value is any one of a value divided by a value, a maximum value of the fourth processing time in all processors, and a minimum value of the fourth processing time in all processors. Method.