JP2019040317A

JP2019040317A - Information processing apparatus, compilation method and compilation program

Info

Publication number: JP2019040317A
Application number: JP2017160611A
Authority: JP
Inventors: 由典杉崎; Yoshinori Sugizaki
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-08-23
Filing date: 2017-08-23
Publication date: 2019-03-14
Anticipated expiration: 2037-08-23
Also published as: JP6933052B2

Abstract

To reduce power consumption of an information processing apparatus while preventing a decrease in performance.SOLUTION: A compilation unit 21 inserts a code for outputting power profiler information 6 into a non-branched loop of a program code 4 and compiles it, and a combining unit 22 combines a profiler library with a compilation result to create an executable code 5. A parallel computer 3 then executes the execution code 5 and outputs power profiler information 6. Then, an analysis unit 23 analyzes the power profiler information 6 to create frequency control information 7, and the compilation unit 21 inserts a frequency control code into the program code 4 on the basis of the frequency control information 7, and compiles it to create the execution code 5. The parallel computer 3 then executes the execution code 5 including the frequency control code.SELECTED DRAWING: Figure 5

Description

本発明は、情報処理装置、コンパイル方法及びコンパイルプログラムに関する。 The present invention relates to an information processing apparatus, a compiling method, and a compiling program.

ＨＰＣ（High Performance Computing）等の分野では、処理性能を維持しながら消費電力を削減することが重要となっている。処理性能を維持しながら消費電力を削減する技術としてＤＦＳ（Dynamic Frequency Scaling）制御がある。ＤＦＳ制御は、動作周波数を制御することで電力消費を制御する。 In fields such as HPC (High Performance Computing), it is important to reduce power consumption while maintaining processing performance. As a technique for reducing power consumption while maintaining processing performance, there is DFS (Dynamic Frequency Scaling) control. DFS control controls power consumption by controlling the operating frequency.

図２４は、ＤＦＳ制御による省電力化を説明するための図である。図２４において、コア＃０及びコア＃１は、ＣＰＵ（Central Processing Unit）コアであり、ｐ＃１〜ｐ＃３はループ単位の処理単位である。縦軸は、上から下に向けて時間の経過を示す。図２４（ａ）は、コアに対してのＤＦＳ制御による省電力化例を示し、図２４（ｂ）は、コアに対してのＤＦＳ制御が有効にならないケースを示す。 FIG. 24 is a diagram for explaining power saving by DFS control. In FIG. 24, core # 0 and core # 1 are CPU (Central Processing Unit) cores, and p # 1 to p # 3 are processing units in units of loops. The vertical axis shows the passage of time from top to bottom. FIG. 24A shows an example of power saving by DFS control for the core, and FIG. 24B shows a case where DFS control for the core is not effective.

図２４（ａ）では、コア＃０でｐ＃１の処理が行われ、コア＃１でｐ＃２の処理が行われ、ｐ＃１とｐ＃２の処理が終了するとコア＃１でｐ＃３の処理が行われる。ｐ＃２の処理はｐ＃１より早く終了するため、コア＃１の動作に空きがある。したがって、例えば、コア＃１の動作周波数を１／２にしてもｐ＃２の完了時刻がｐ＃１の完了時刻よりも遅くならない場合には、コア＃１の消費電力を１／２とすることができる。このように、複数のコアが並列して動作する場合、いずれかのコアの動作に空きがあると、空きのあるコアの動作周波数を下げることで省電力化が可能となる。 In FIG. 24A, the processing of p # 1 is performed by the core # 0, the processing of p # 2 is performed by the core # 1, and the processing of p # 1 and p # 2 is completed. Processing of # 3 is performed. Since the process of p # 2 is completed earlier than p # 1, there is a vacancy in the operation of core # 1. Therefore, for example, if the completion time of p # 2 is not later than the completion time of p # 1 even if the operating frequency of core # 1 is halved, the power consumption of core # 1 is halved. be able to. In this way, when a plurality of cores operate in parallel, if there is a vacant operation in any of the cores, it is possible to save power by lowering the operating frequency of the vacant core.

なお、電力が供給されて所定の動作を行う複数のコンポーネントの少なくとも１つの動作状況に基づいて、あらかじめ設定された条件を満たすように複数のコンポーネントに配分する電力の割合を制御する技術がある。 There is a technique for controlling a ratio of power distributed to a plurality of components so as to satisfy a preset condition based on at least one operation state of the plurality of components that are supplied with power and perform a predetermined operation.

また、コンピュータのメモリ性能に関する情報を閾値と比較することで、処理装置がメモリ性能に依存する制約状態であるかを判定し、処理装置がメモリ性能に依存する制約状態であるとき、処理装置の演算能力を下げる制御を行う技術がある。この技術によれば、不要な演算能力を削減することができ、性能低下を抑えながら処理装置の消費電力を削減することができる。 In addition, by comparing information regarding the memory performance of the computer with a threshold value, it is determined whether the processing device is in a constraint state that depends on the memory performance, and when the processing device is in a constraint state that depends on the memory performance, There is a technology that performs control to lower the computing capacity. According to this technique, unnecessary computing power can be reduced, and the power consumption of the processing apparatus can be reduced while suppressing a decrease in performance.

特開２０１６−１８９１０９号公報JP 2016-189109 A 国際公開第２００８／１２０２７４号International Publication No. 2008/120274

図２４（ｂ）に示すように、コアの動作に空きがない場合には、ＤＦＳ制御によりコアの消費電力を削減することができないという問題がある。いずれかのコアの動作周波数を下げると、処理に遅れが発生する。 As shown in FIG. 24 (b), when there is no vacant core operation, there is a problem that the power consumption of the core cannot be reduced by DFS control. If the operating frequency of any of the cores is lowered, processing will be delayed.

本発明は、１つの側面では、性能低下を防ぎながら情報処理装置の消費電力を削減することを目的とする。 An object of one aspect of the present invention is to reduce power consumption of an information processing apparatus while preventing performance degradation.

１つの態様では、情報処理装置は、コンパイル部と実行部と解析部とを有する。コンパイル部は、プロファイリングデータを採取するコードをソースプログラムの分岐のないループに対して挿入して第１の実行コードを生成する。実行部は、コンパイル部により生成された第１の実行コードを実行してプロファイリングデータに基づく電力プロファイラ情報を出力する。解析部は、実行部により出力された電力プロファイラ情報を解析して実行部に含まれる複数のハードウェアモジュールの周波数制御に用いられる周波数制御情報を作成する。そして、コンパイル部は、解析部により作成された周波数制御情報に基づいてソースプログラムのループに対して複数のハードウェアモジュールの周波数を制御する周波数制御コードを挿入して第２の実行コードを生成する。そして、実行部は、コンパイル部により生成された第２の実行コードを実行する。 In one aspect, the information processing apparatus includes a compilation unit, an execution unit, and an analysis unit. The compiling unit generates a first executable code by inserting a code for collecting profiling data into a loop without a branch of the source program. The execution unit executes the first execution code generated by the compiling unit and outputs power profiler information based on the profiling data. The analysis unit analyzes the power profiler information output by the execution unit, and creates frequency control information used for frequency control of a plurality of hardware modules included in the execution unit. Then, the compiling unit inserts frequency control code for controlling the frequency of the plurality of hardware modules into the loop of the source program based on the frequency control information created by the analyzing unit, and generates the second execution code. . Then, the execution unit executes the second execution code generated by the compilation unit.

１つの側面では、本発明は、性能低下を防ぎながら情報処理装置の消費電力を削減することができる。 In one aspect, the present invention can reduce power consumption of an information processing device while preventing performance degradation.

図１は、ＤＦＳ制御の対象とするループの条件を説明するための図である。FIG. 1 is a diagram for explaining loop conditions to be subjected to DFS control. 図２は、実施例に係る情報処理システムによる省電力化を説明するための図である。FIG. 2 is a diagram for explaining power saving by the information processing system according to the embodiment. 図３は、性能低下せずに消費電力を低減できるケースを説明するための図である。FIG. 3 is a diagram for explaining a case where power consumption can be reduced without degrading performance. 図４Ａは、ループにおいて消費電力を低減できる第１のケースを説明するための図である。FIG. 4A is a diagram for describing a first case in which power consumption can be reduced in a loop. 図４Ｂは、ループにおいて消費電力を低減できる第２のケースを説明するための図である。FIG. 4B is a diagram for describing a second case in which power consumption can be reduced in a loop. 図４Ｃは、ループにおいて消費電力を低減できる第３のケースを説明するための図である。FIG. 4C is a diagram for describing a third case in which power consumption can be reduced in a loop. 図５は、実施例に係る情報処理システムの構成を示す図である。FIG. 5 is a diagram illustrating the configuration of the information processing system according to the embodiment. 図６Ａは、情報処理システムの動作（データ採取）を示す図である。FIG. 6A is a diagram illustrating an operation (data collection) of the information processing system. 図６Ｂは、情報処理システムの動作（データ解析）を示す図である。FIG. 6B is a diagram illustrating an operation (data analysis) of the information processing system. 図７は、コマンド例を示す図である。FIG. 7 is a diagram illustrating an example of a command. 図８は、プロファイリング用翻訳の処理フローを示すフローチャートである。FIG. 8 is a flowchart showing a processing flow of profiling translation. 図９は、プロファイリング用翻訳の例を示す図である。FIG. 9 is a diagram illustrating an example of translation for profiling. 図１０Ａは、プロファイリングデータを採取する処理のフロー及び電力プロファイラ情報の例を示す図である。FIG. 10A is a diagram illustrating an example of a flow of processing for collecting profiling data and power profiler information. 図１０Ｂは、プロファイリングデータの採取例を示す図である。FIG. 10B is a diagram illustrating an example of collecting profiling data. 図１１は、電力プロファイラ情報を解析する処理のフローを示すフローチャートである。FIG. 11 is a flowchart showing a flow of processing for analyzing the power profiler information. 図１２は、ビジー率を算出する処理のフローを示すフローチャートである。FIG. 12 is a flowchart illustrating a process flow for calculating the busy rate. 図１３は、ビジー率の算出式の例を示す図である。FIG. 13 is a diagram illustrating an example of a busy rate calculation formula. 図１４は、ビジー率情報の例を示す図である。FIG. 14 is a diagram illustrating an example of busy rate information. 図１５は、周波数制御方法を決定する処理のフローを示すフローチャートである。FIG. 15 is a flowchart illustrating a process flow for determining a frequency control method. 図１６Ａは、周波数制御情報の例を示す図である。FIG. 16A is a diagram illustrating an example of frequency control information. 図１６Ｂは、周波数制御情報の他の例を示す図である。FIG. 16B is a diagram illustrating another example of the frequency control information. 図１７は、プロファイリングデータの解析例を示す図である。FIG. 17 is a diagram illustrating an analysis example of profiling data. 図１８は、周波数制御コードを埋め込む処理のフローを示すフローチャートである。FIG. 18 is a flowchart showing a flow of processing for embedding the frequency control code. 図１９は、周波数制御コードとして挿入される周波数制御関数の例を示す図である。FIG. 19 is a diagram illustrating an example of a frequency control function inserted as a frequency control code. 図２０は、プログラムコードへの周波数制御コードの埋め込み例を示す図である。FIG. 20 is a diagram illustrating an example of embedding the frequency control code in the program code. 図２１は、再翻訳コードによる省電力実行のフローを示すフローチャートである。FIG. 21 is a flowchart showing a flow of power saving execution by the retranslation code. 図２２は、利用者によるカストマイズ例を示す図である。FIG. 22 is a diagram illustrating an example of customization by the user. 図２３は、実施例に係る管理プログラムを実行するコンピュータのハードウェア構成を示す図である。FIG. 23 is a diagram illustrating a hardware configuration of a computer that executes the management program according to the embodiment. 図２４は、ＤＦＳ制御による省電力化を説明するための図である。FIG. 24 is a diagram for explaining power saving by DFS control.

以下に、本願の開示する情報処理装置、コンパイル方法及びコンパイルプログラムの実施例を図面に基づいて詳細に説明する。なお、この実施例は開示の技術を限定するものではない。 Embodiments of an information processing apparatus, a compiling method, and a compiling program disclosed in the present application will be described below in detail with reference to the drawings. Note that this embodiment does not limit the disclosed technology.

実施例に係るＤＦＳ制御では、ループを処理単位としてＤＦＳ制御を行う。そこで、まず、ＤＦＳ制御の対象とするループの条件について説明する。図１は、ＤＦＳ制御の対象とするループの条件を説明するための図である。図１に示すように、ループにはＣＰＵ資源のビジー（busy）状態が変化するループと変化しないループがある。ここで、ＣＰＵ資源とは、ＣＰＵの処理に関係するハードウェアモジュールであり、例えば、メモリ、Ｌ２キャッシュ、Ｌ１キャッシュ、演算器がある。 In the DFS control according to the embodiment, the DFS control is performed using a loop as a processing unit. Therefore, first, the conditions of the loop to be subjected to DFS control will be described. FIG. 1 is a diagram for explaining loop conditions to be subjected to DFS control. As shown in FIG. 1, there are loops in which the busy state of the CPU resource changes and loops in which the CPU resource does not change. Here, the CPU resource is a hardware module related to CPU processing, and includes, for example, a memory, an L2 cache, an L1 cache, and an arithmetic unit.

ビジー状態が変化するループは、図１（ａ）に示すように、制御構造が複雑なループであり、ｉｆ分等の分岐があるループである。分岐があると、命令実行による処理の流れが途切れるため、ＣＰＵ資源の利用が均一でなくなる。すなわち、分岐が発生すると、命令シーケンスが途切れてしまい演算のビジー状態が変化する。また、データアクセスも連続性が途切れるため、メモリ及びキャッシュ系のビジー状態も変化する。 As shown in FIG. 1A, the loop in which the busy state changes is a loop having a complicated control structure and a branch having a branch such as if. If there is a branch, the flow of processing due to instruction execution is interrupted, and the use of CPU resources is not uniform. That is, when a branch occurs, the instruction sequence is interrupted and the busy state of the operation changes. In addition, since continuity of data access is interrupted, the busy state of the memory and the cache system also changes.

一方、ビジー状態が変化しないループは、図１（ｂ）に示すように、制御構造が単純なループであり、ｉｆ分等の分岐がないループである。分岐がないと、処理の流れが一律となるため、ＣＰＵ資源の利用が均一となり、ビジー状態が変化しない。図１（ｂ）では、複数行の演算処理となるが、基本的には、データをローディングして演算して結果をセーブする処理であるため、ＣＰＵ資源の利用状況は均一となる。 On the other hand, as shown in FIG. 1B, the loop in which the busy state does not change is a loop having a simple control structure and is a loop having no branch such as if. If there is no branching, the flow of processing is uniform, so the use of CPU resources is uniform and the busy state does not change. In FIG. 1B, the calculation process is performed for a plurality of lines. Basically, the process is a process of loading data, calculating the result, and saving the result. Therefore, the usage state of the CPU resources is uniform.

ビジー状態が変化するループについては、一律なＤＦＳ制御は行えない。したがって、実施例に係る情報処理システムは、ビジー状態が変化しないループを対象としてＤＦＳ制御を行う。 For a loop in which the busy state changes, uniform DFS control cannot be performed. Therefore, the information processing system according to the embodiment performs DFS control for a loop in which the busy state does not change.

次に、実施例に係る情報処理システムによる省電力化について説明する。図２は、実施例に係る情報処理システムによる省電力化を説明するための図である。図２において、Ｍはメモリを表し、Ｌ＃２はＬ２キャッシュを表し、Ｌ＃１はＬ１キャッシュを表し、Ｃは演算器を表す。ｐ＃１、ｐ＃２はループ単位の処理単位である。縦軸は、上から下に向けて時間の経過を示す。また、コア＃０の処理に関係するハードウェアモジュールとしてメモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算器がある。ハードウェアモジュールの四角は、ハードウェアモジュールがビジーである時間を表す。 Next, power saving by the information processing system according to the embodiment will be described. FIG. 2 is a diagram for explaining power saving by the information processing system according to the embodiment. In FIG. 2, M represents a memory, L # 2 represents an L2 cache, L # 1 represents an L1 cache, and C represents an arithmetic unit. p # 1 and p # 2 are processing units in units of loops. The vertical axis shows the passage of time from top to bottom. Further, there are a memory, an L2 cache, an L1 cache, and an arithmetic unit as hardware modules related to the processing of the core # 0. The square of the hardware module represents the time when the hardware module is busy.

図２に示すように、通常状態において、コア＃０の全体動作としては空きはないが、ハードウェアモジュール単位では空きがある。そこで、実施例に係る情報処理システムは、ハードウェアモジュール単位でＤＦＳ制御を行う。 As shown in FIG. 2, in the normal state, the entire operation of the core # 0 has no vacancy, but there is a vacancy in units of hardware modules. Therefore, the information processing system according to the embodiment performs DFS control in units of hardware modules.

例えば、ｐ＃１の処理では、Ｌ２キャッシュメモリの動作に空きがあるので、実施例に係る情報処理システムは、Ｌ２キャッシュメモリの電力を１／２とする。また、演算器の動作にも空きがあるので、実施例に係る情報処理システムは、演算器の電力を１／４とする。 For example, in the process of p # 1, since the operation of the L2 cache memory is free, the information processing system according to the embodiment sets the power of the L2 cache memory to ½. In addition, since there is a vacancy in the operation of the arithmetic unit, the information processing system according to the embodiment sets the power of the arithmetic unit to ¼.

演算器の電力を１／４とすることで、演算器の動作時間が４倍になるが、ｐ＃１の時間はメモリの時間によって決まるため、性能への影響はない。Ｌ２キャッシュについても同様で電力を１／２にしても性能への影響はない。 By setting the power of the computing unit to ¼, the operation time of the computing unit is quadrupled, but the time of p # 1 is determined by the memory time, so there is no influence on the performance. The same applies to the L2 cache, and even if the power is halved, there is no influence on the performance.

また、ｐ＃２の処理では、メモリの動作に空きがあるので、実施例に係る情報処理システムは、メモリの電力を１／２とする。なお、動作周波数を１／２とすることで、ハードウェアモジュールの電力は１／２となり、動作周波数を１／４とすることで、ハードウェアモジュールの電力は１／４となる。 Also, in the process of p # 2, since there is a vacant memory operation, the information processing system according to the embodiment sets the memory power to ½. Note that by setting the operating frequency to 1/2, the power of the hardware module becomes 1/2, and by setting the operating frequency to 1/4, the power of the hardware module becomes 1/4.

図３は、性能低下せずに消費電力を低減できるケースを説明するための図である。図３において、値は時間比を表し、値が大きいほど処理は遅い。図３に示すように、ケース（１）では、メモリアクセスがネックであり、例えば演算器の動作周波数を１／４としても性能は低下しない。また、ケース（２）でも、メモリアクセスがネックであり、例えばＬ２キャッシュの動作周波数を１／２としても性能は低下しない。 FIG. 3 is a diagram for explaining a case where power consumption can be reduced without degrading performance. In FIG. 3, the value represents a time ratio, and the larger the value, the slower the processing. As shown in FIG. 3, in case (1), memory access is a bottleneck. For example, even if the operating frequency of the arithmetic unit is set to 1/4, the performance does not deteriorate. Also in case (2), memory access is a bottleneck. For example, even if the operating frequency of the L2 cache is halved, the performance does not deteriorate.

ケース（３）では、Ｌ２キャッシュアクセスがネックであり、例えばメモリの動作周波数を１／２としても性能は低下しない。ケース（３）は、ケース（１）及び（２）とは動作周波数を低減できるメカニズムが異なる。ケース（１）及び（２）では、ハードウェアモジュール間の動作速度の違いにより、動作速度の速いハードウェアモジュールの動作周波数を下げる。一方、ケース（３）は、Ｌ２オンキャッシュでＬ２キャッシュネックとなった場合であり、オンキャッシュの状態が一定時間継続すれば、メモリから見るとＬ２キャッシュへのデータ供給が遅延しても問題とならないケースである。 In the case (3), the L2 cache access is a bottleneck. For example, even if the memory operating frequency is halved, the performance does not deteriorate. Case (3) differs from cases (1) and (2) in the mechanism by which the operating frequency can be reduced. In cases (1) and (2), the operating frequency of a hardware module with a high operating speed is lowered due to the difference in operating speed between hardware modules. On the other hand, Case (3) is a case where the L2 on-cache causes an L2 cache bottleneck, and if the on-cache state continues for a certain period of time, there is a problem even if the data supply to the L2 cache is delayed when viewed from the memory. This is not the case.

図４Ａ〜図４Ｃは、ループにおいて消費電力を低減できるケースを説明するための図である。図４Ａはメモリアクセスネック時の演算処理の周波数低減のケースであり、図４Ｂはメモリアクセスネック時のキャッシュ処理の周波数低減のケースであり、図４Ｃはキャッシュアクセスネック時のメモリアクセスの周波数低減のケースである。 4A to 4C are diagrams for describing a case in which power consumption can be reduced in a loop. 4A shows a case of reducing the frequency of arithmetic processing at the time of memory access neck, FIG. 4B shows a case of reducing the frequency of cache processing at the time of memory access neck, and FIG. 4C shows the frequency reduction of the memory access at the time of cache access neck. It is a case.

図４Ａ〜図４Ｃにおいて、「−」は１回転目でハードウェアモジュールがビジー状態であることを表し、「＝」は２回転目でハードウェアモジュールがビジー状態であることを表し、「＋」は３回転目でハードウェアモジュールがビジー状態であることを表す。「−」、「＝」、「＋」の１文字が１単位時間を表す。また、メモリアクセス又はキャッシュアクセスが完了したところで演算処理が行われるため、演算処理はメモリアクセス又はキャッシュアクセスの完了を待つ。 4A to 4C, “−” represents that the hardware module is busy at the first rotation, “=” represents that the hardware module is busy at the second rotation, and “+”. Indicates that the hardware module is busy at the third rotation. One character of “−”, “=”, and “+” represents one unit time. In addition, since arithmetic processing is performed when memory access or cache access is completed, the arithmetic processing waits for completion of memory access or cache access.

図４Ａ（ａ）は、図４Ａに示すケースのプログラム例である。図４Ａ（ａ）に示すように、図４Ａのケースでは３回の繰り返し処理が行われる。図４Ａ（ｂ）は、性能低下しないケースを示す。図４Ａ（ｂ）に示すように、メモリアクセスに４単位時間かかり、演算処理に２単位時間かかる場合には、演算処理の周波数を１／２に低減させても最後の「＋＋」のみコスト増加となるだけで、性能低下はほとんどない。 FIG. 4A (a) is a program example of the case shown in FIG. 4A. As shown in FIG. 4A (a), in the case of FIG. 4A, three iterations are performed. FIG. 4A (b) shows a case where the performance does not deteriorate. As shown in FIG. 4A (b), when memory access takes 4 unit hours and computation processing takes 2 unit hours, the cost of only the last “++” increases even if the computation processing frequency is reduced to ½. There is almost no performance degradation.

一方、図４Ａ（ｃ）に示すように、メモリアクセスに２単位時間かかり、演算処理に２単位時間かかる場合には、演算処理の周波数を１／２に低減させると、３回の繰り返し処理で８単位時間が１４単位時間になり性能低下が発生する。 On the other hand, as shown in FIG. 4A (c), when the memory access takes 2 unit hours and the calculation process takes 2 unit hours, if the frequency of the calculation process is reduced to 1/2, the process can be repeated three times. 8 unit time becomes 14 unit time, and performance degradation occurs.

また、図４Ａ（ｄ）に示すように、繰り返し回数が１００００００と大きい場合には、図４Ａ（ｂ）に示した最後の「＋＋」は、全体の処理時間の中での比率が小さく性能低下は発生しない。 Further, as shown in FIG. 4A (d), when the number of repetitions is as large as 1000000, the last “++” shown in FIG. 4A (b) has a small ratio in the entire processing time and a performance degradation. Does not occur.

また、図４Ｂに示すように、キャッシュアクセスの周波数を１／２に低減させてもコスト増加はなく、性能低下はない。さらに、演算処理の周波数を１／２に低下させても最後の「＋」のみコスト増加となるだけで、性能低下はほとんどない。 Further, as shown in FIG. 4B, even if the cache access frequency is reduced to ½, there is no cost increase and no performance degradation. Furthermore, even if the frequency of the arithmetic processing is reduced to ½, only the last “+” increases the cost, and there is almost no performance decrease.

また、図４Ｃ（ａ）に示すように、メモリアクセスの周波数を１／２に低下させても、最初のメモリアクセスの遅延の影響で最後の「＋」のみコスト増加となるだけで、性能低下はほとんどない。図４Ｃ（ａ）では、メモリアクセスとキャッシュアクセスは並行して動作するため、メモリアクセスがオンキャッシュのキャッシュアクセスに隠れて性能への影響はない。また、図４Ｃ（ｂ）に示すように、回転数を１００００００に増やすと、最後の「＋」の影響は１／（１＋１００００００×５＋１）であり、性能低下はほとんどない。 Further, as shown in FIG. 4C (a), even if the memory access frequency is reduced to ½, only the last “+” increases in cost due to the influence of the delay of the first memory access, and the performance decreases. There is almost no. In FIG. 4C (a), since the memory access and the cache access operate in parallel, the memory access is hidden by the on-cache cache access and does not affect the performance. Further, as shown in FIG. 4C (b), when the number of rotations is increased to 1000000, the influence of the last “+” is 1 / (1 + 1000000 × 5 + 1), and there is almost no performance degradation.

次に、実施例に係る情報処理システムの構成について説明する。図５は、実施例に係る情報処理システムの構成を示す図である。図５に示すように、実施例に係る情報処理システム１は、管理装置２と並列計算機３とを有する。管理装置２は、並列計算機３の実行を管理する。並列計算機３は複数の計算機を有し、各計算機あるいは各計算機が有する複数のコアが並列に処理を行う。複数の計算機は、２次元、３次元、６次元等にメッシュ接続あるいはトーラス接続される。 Next, the configuration of the information processing system according to the embodiment will be described. FIG. 5 is a diagram illustrating the configuration of the information processing system according to the embodiment. As illustrated in FIG. 5, the information processing system 1 according to the embodiment includes a management device 2 and a parallel computer 3. The management device 2 manages the execution of the parallel computer 3. The parallel computer 3 includes a plurality of computers, and each computer or a plurality of cores included in each computer performs processing in parallel. A plurality of computers are mesh-connected or torus-connected in two dimensions, three dimensions, six dimensions, and the like.

管理装置２は、コンパイル部２１と結合部２２と解析部２３とを有する。コンパイル部２１は、プログラムコード（ソースプログラム）４を翻訳する。コンパイル部２１は、プロファイル採取が指定されるとプロファイリング用翻訳を行う。結合部２２は、コンパイル部２１によりプロファイリング用翻訳が行われた場合に、翻訳結果にプロファイラライブラリを結合し、実行コード５を作成する。 The management device 2 includes a compiling unit 21, a combining unit 22, and an analyzing unit 23. The compiling unit 21 translates the program code (source program) 4. The compile unit 21 performs profiling translation when profile collection is designated. When the compile unit 21 performs profiling translation, the combining unit 22 combines the profiler library with the translation result and creates the execution code 5.

実行コード５は並列計算機３により実行される。コンパイル部２１がプロファイリング用翻訳を行うと、並列計算機３は実行コード５を実行して電力プロファイラ情報６を出力する。電力プロファイラ情報６は、並列計算機３の電力に関連するハードウェアカウンタの情報である。 The execution code 5 is executed by the parallel computer 3. When the compiling unit 21 performs the profiling translation, the parallel computer 3 executes the execution code 5 and outputs the power profiler information 6. The power profiler information 6 is hardware counter information related to the power of the parallel computer 3.

解析部２３は、電力プロファイラ情報６を解析してループ毎に各ハードウェアモジュールの周波数制御方法を決定し、周波数制御情報７を作成する。周波数制御情報７は、ループ毎にハードウェアモジュール間の周波数比を定義する情報である。 The analysis unit 23 analyzes the power profiler information 6, determines the frequency control method of each hardware module for each loop, and creates the frequency control information 7. The frequency control information 7 is information that defines a frequency ratio between hardware modules for each loop.

コンパイル部２１は、周波数制御情報７に基づいて周波数制御コードをプログラムコード４に挿入し、実行コード５を作成する。実行コード５は並列計算機３により実行される。実行コード５に周波数制御コードが含まれると、並列計算機３は、ハードウェアモジュールの周波数を制御しながら実行コード５を実行する。 The compiling unit 21 inserts a frequency control code into the program code 4 based on the frequency control information 7 and creates an execution code 5. The execution code 5 is executed by the parallel computer 3. When the frequency control code is included in the execution code 5, the parallel computer 3 executes the execution code 5 while controlling the frequency of the hardware module.

コンパイル部２１は、第１挿入部２１ａと第２挿入部２１ｂを有する。第１挿入部２１ａは、コンパイル部２１がプロファイリング用翻訳を行う場合に、ループ毎に分岐が含まれるか否かを判定し、分岐が含まれないループに対してプロファイラライブラリを呼び出すコードを挿入する。 The compiling unit 21 includes a first insertion unit 21a and a second insertion unit 21b. When the compiling unit 21 performs translation for profiling, the first insertion unit 21a determines whether or not a branch is included for each loop, and inserts a code that calls the profiler library for the loop that does not include a branch. .

第２挿入部２１ｂは、周波数制御情報７に基づいて周波数制御コードをプログラムコード４に挿入する。第２挿入部２１ｂは、周波数制御コードを挿入する際に最適化を行って不要な周波数制御コードを削除する。 The second insertion unit 21 b inserts a frequency control code into the program code 4 based on the frequency control information 7. The second insertion unit 21b performs optimization when inserting the frequency control code and deletes unnecessary frequency control codes.

解析部２３は、算出部２３ａと作成部２３ｂを有する。算出部２３ａは、電力プロファイラ情報６に基づいてループ毎に各ハードウェアモジュールのビジー率を算出し、ビジー率情報を作成する。作成部２３ｂは、ビジー率情報に基づいてループ毎に各ハードウェアモジュールの周波数制御方法を決定し、周波数制御情報７を作成する。 The analysis unit 23 includes a calculation unit 23a and a creation unit 23b. The calculation unit 23a calculates the busy rate of each hardware module for each loop based on the power profiler information 6, and creates busy rate information. The creation unit 23b determines the frequency control method of each hardware module for each loop based on the busy rate information, and creates the frequency control information 7.

なお、ここでは管理装置２がコンパイル部２１、結合部２２及び解析部２３を有するが、管理装置２とは別の装置がコンパイル部２１、結合部２２及び解析部２３を有してもよい。また、管理装置２とは別の装置がコンパイル部２１及び結合部２２と解析部２３のいずれかを有してもよい。また、管理装置２とは別の第１装置がコンパイル部２１及び結合部２２を有し、管理装置２及び第１装置とは別の第２装置が解析部２３を有してもよい。また、コンパイル部２１が結合部２２及び解析部２３の機能を有してもよい。 Here, the management device 2 includes the compile unit 21, the coupling unit 22, and the analysis unit 23, but a device different from the management device 2 may include the compile unit 21, the coupling unit 22, and the analysis unit 23. Further, a device different from the management device 2 may include any of the compiling unit 21, the combining unit 22, and the analyzing unit 23. Further, a first device different from the management device 2 may include the compiling unit 21 and the combining unit 22, and a second device different from the management device 2 and the first device may include the analysis unit 23. The compiling unit 21 may have the functions of the combining unit 22 and the analyzing unit 23.

次に、情報処理システム１の動作について説明する。図６Ａは、情報処理システム１の動作（データ採取）を示す図であり、図６Ｂは、情報処理システム１の動作（データ解析）を示す図である。図６Ａ及び図６Ｂにおいて、ＣＭＰはコンパイル処理を示し、ＥＸＥは実行コード５の実行を示し、ＡＮＬは電力制御解析を示す。 Next, the operation of the information processing system 1 will be described. 6A is a diagram illustrating an operation (data collection) of the information processing system 1, and FIG. 6B is a diagram illustrating an operation (data analysis) of the information processing system 1. 6A and 6B, CMP indicates a compilation process, EXE indicates execution of the execution code 5, and ANL indicates power control analysis.

図６Ａに示すように、情報処理システム１は、プログラムコード４を入力し、プロファイリング用の翻訳を行う（ステップＳ１）。なお、情報処理システム１は、プロファイリング用の翻訳が指定されている場合に、プロファイリング用の翻訳を行う。そして、情報処理システム１は、翻訳結果にプロファイラライブラリを結合し、実行コード５を生成する（ステップＳ２）。 As shown in FIG. 6A, the information processing system 1 inputs the program code 4 and performs profiling translation (step S1). The information processing system 1 performs the profiling translation when the profiling translation is designated. Then, the information processing system 1 combines the profiler library with the translation result and generates an execution code 5 (step S2).

プログラムコード４の修正は不要であり、情報処理システム１がプロファイリング用の翻訳を行ってプロファイラライブラリを結合するだけである。情報処理システム１は、プロファイリング用翻訳により、ループ毎に電力に関連するハードウェアカウンタ情報の採取コードを挿入する。 Modification of the program code 4 is not necessary, and the information processing system 1 only performs profiling translation and combines the profiler library. The information processing system 1 inserts a collection code of hardware counter information related to power for each loop by profiling translation.

そして、情報処理システム１は、ｆａｐｐを実行する（ステップＳ３）。情報処理システム１は、ｆａｐｐコマンドを実行することで、プロファイラがループ毎に電力に関連するハードウェアカウンタ情報を採取し（ステップＳ４）、電力プロファイラ情報６を作成する。 Then, the information processing system 1 executes fapp (step S3). The information processing system 1 executes the fapp command, so that the profiler collects hardware counter information related to power for each loop (step S4), and creates power profiler information 6.

そして、図６Ｂに示すように、情報処理システム１は、ｆａｐｐｐｘを実行する（ステップＳ５）。情報処理システム１は、ｆａｐｐｐｘコマンドにて、電力制御解析を実行する。電力制御解析において、情報処理システム１は、並列計算機３のハードウェアモジュールのビジー率を算出し、ビジー率から最適な周波数制御方法を決定して周波数制御情報７を出力する。 Then, as illustrated in FIG. 6B, the information processing system 1 executes fapppx (step S5). The information processing system 1 executes power control analysis using the fapppx command. In the power control analysis, the information processing system 1 calculates the busy rate of the hardware module of the parallel computer 3, determines an optimal frequency control method from the busy rate, and outputs the frequency control information 7.

そして、情報処理システム１は、周波数制御情報７に従って周波数制御コードを埋め込み（ステップＳ６）、翻訳を行って実行コード５を作成する。実行コード５は周波数制御コードを含んでいるため、情報処理システム１は、実行コード５を実行することで、消費電力を低減できる。 Then, the information processing system 1 embeds the frequency control code according to the frequency control information 7 (step S6), translates it, and creates the execution code 5. Since the execution code 5 includes the frequency control code, the information processing system 1 can reduce power consumption by executing the execution code 5.

図７は、コマンド例を示す図である。最初の翻訳（１）では、ｆｃｃ −ｅｃｏ−ｂｕｓｙａ．ｃが用いられる。ｆｃｃはＣプログラムのコンパイルを指示するコマンドである。−ｅｃｏ−ｂｕｓｙは、省電力モードでのビジー状態採取コードを挿入する翻訳を指定する。ここで、ビジー状態採取コードとは、ハードウェアモジュールのビジー状態を特定するための情報を採取するコードである。ａ．ｃはプログラム名である。 FIG. 7 is a diagram illustrating an example of a command. In the first translation (1), fcc-eco-busy a. c is used. fcc is a command for instructing compilation of the C program. -Eco-busy specifies a translation for inserting a busy state collection code in the power saving mode. Here, the busy state collection code is a code for collecting information for specifying the busy state of the hardware module. a. c is the program name.

データ採取（２）では、ｆａｐｐ −ｅｃｏ −ｗＶｐｒｏｆ．ｄａｔａ．ｏｕｔが用いられる。ｆａｐｐは電力プロファイラ情報６の収集を指示するコマンドである。−ｅｃｏは省電力モードでのプロファイリング実行を指定する。−ｗＦｉｌｅは電力プロファイラ情報６の出力先のファイル名を指定する。ａ．ｏｕｔは実行コード名である。 In data collection (2), fapp-eco-w Vprof. dat a. out is used. “fapp” is a command for instructing collection of the power profiler information 6. -Eco specifies execution of profiling in the power saving mode. -W File specifies the file name of the output destination of the power profiler information 6. a. out is an execution code name.

データ解析（３）では、ｆａｐｐｐｘ −ｅｃｏ −ｗＶｐｒｏｆ．ｄａｔ −ｆｆｒｅｑ．ｄａｔが用いられる。ｆａｐｐｐｘは電力プロファイラ情報６の解析を指示するコマンドである。−ｅｃｏは省電力モードでのプロファイリングデータ解析を指定する。−ｗＤｉｒは電力プロファイラ情報６の入力ファイル名を指定する。−ｆｆｒｅｑ．ｄａｔは周波数制御情報７の出力先ファイル名を指定する。 In data analysis (3), fappxx-eco-w Vprof. dat -f freq. dat is used. fapppx is a command for instructing analysis of the power profiler information 6. -Eco specifies profiling data analysis in power saving mode. -W Dir specifies the input file name of the power profiler information 6. -F freq. dat designates the output destination file name of the frequency control information 7.

再翻訳（４）では、ｆｃｃ −ｅｃｏ −ｆｆｒｅｑ．ｔｘｔａ．ｃが用いられる。−ｅｃｏは省電力モードでの翻訳を指定する。−ｆＦｉｌｅは周波数制御情報７のファイル名を指定する。 In retranslation (4), fcc-eco-ffreq. txt a. c is used. -Eco specifies translation in the power saving mode. -F File specifies the file name of the frequency control information 7.

省電力実行（５）では、ａ．ｏｕｔ［−ｅｃｏ｜−ｎｏｅｃｏ］が用いられる。−ｅｃｏは省電力モードでの実行を指定し、−ｎｏｅｃｏは省電力モードでない実行を指定する。省電力モードでない実行では周波数制御コードは無視される。−ｅｃｏがデフォルトである。ａ．ｏｕｔは実行コード名である。 In the power saving execution (5), a. out [-eco | -noeco] is used. -Eco specifies execution in the power saving mode, and -noeco specifies execution not in the power saving mode. The frequency control code is ignored for executions that are not in power saving mode. -Eco is the default. a. out is an execution code name.

次に、情報処理システム１の処理のフロー及び処理例について説明する。図８は、プロファイリング用翻訳の処理フローを示すフローチャートである。図８に示すように、コンパイル部２１は、プログラムコード４の読み込みを行い（ステップＳ１１）、ステップＳ１２〜ステップＳ１３の処理をループ（Ｉ）毎に実施する。 Next, a processing flow and a processing example of the information processing system 1 will be described. FIG. 8 is a flowchart showing a processing flow of profiling translation. As shown in FIG. 8, the compiling unit 21 reads the program code 4 (step S11), and performs the processing of steps S12 to S13 for each loop (I).

すなわち、コンパイル部２１は、ループ内に分岐処理があるか否かを判定し（ステップＳ１２）、ループ内に分岐処理がある場合には、次のループを処理する。一方、ループ内に分岐処理がない場合には、コンパイル部２１は、ループの前後にプロファイリング用コードを挿入し（ステップＳ１３）、次のループを処理する。そして、全てのループを処理すると、コンパイル部２１は、プロファイリング用翻訳の処理を終了する。 That is, the compiling unit 21 determines whether or not there is a branch process in the loop (step S12), and when there is a branch process in the loop, the next loop is processed. On the other hand, when there is no branch processing in the loop, the compiling unit 21 inserts profiling code before and after the loop (step S13), and processes the next loop. When all the loops are processed, the compiling unit 21 ends the profiling translation process.

このように、コンパイル部２１が分岐処理がないループを対象としてプロファイリング用コードを挿入することで、ＣＰＵ資源の利用が均一なループを対象とした周波数制御が可能となる。 As described above, the compile unit 21 inserts the profiling code for a loop without branch processing, thereby enabling frequency control for a loop in which the use of CPU resources is uniform.

図９は、プロファイリング用翻訳の例を示す図である。図９（ａ）はプログラムコード４を示し、図９（ｂ）はビジー状態採取コードが挿入されたコードを示す。ａ．ｆは、プログラムコード４がＦＯＲＴＲＡＮで記述されていることを示す。図９（ａ）に示すように、プログラムコード４には３つのｄｏループが含まれる。１つ目と３つ目のループには分岐処理がないので、図９（ｂ）に示すように、ビジー状態採取コードとしてｃａｌｌｂｕｓｙ＿ｓｔａｒｔ（）及びｃａｌｌｂｕｓｙ＿ｓｔｏｐ（）が挿入される。一方、２つ目のループにはｉｆ文による分岐処理があるので、ビジー状態採取コードは挿入されない。 FIG. 9 is a diagram illustrating an example of translation for profiling. FIG. 9A shows the program code 4, and FIG. 9B shows the code in which the busy state collection code is inserted. a. f indicates that the program code 4 is described in FORTRAN. As shown in FIG. 9A, the program code 4 includes three do loops. Since there is no branch processing in the first and third loops, as shown in FIG. 9B, call busy_start () and call busy_stop () are inserted as busy state collection codes. On the other hand, since there is a branch process by the if statement in the second loop, the busy state collection code is not inserted.

図１０Ａは、プロファイリングデータを採取する処理のフロー及び電力プロファイラ情報６の例を示す図であり、図１０Ｂは、プロファイリングデータの採取例を示す図である。 FIG. 10A is a diagram illustrating an example of the flow of processing for collecting profiling data and the power profiler information 6. FIG. 10B is a diagram illustrating an example of collecting profiling data.

図１０Ａに示すように、並列計算機３は、ループの先頭でハードウェアカウンタ情報の採取を開始する（ステップＳ２１）。そして、並列計算機３は、以下のビジー率を求めるために必要なハードウェアカウンタ情報を採取する（ステップＳ２２）。ビジー率には、メモリアクセス利用率、Ｌ２キャッシュアクセス利用率、Ｌ１キャッシュアクセス利用率及び演算処理利用率がある。 As shown in FIG. 10A, the parallel computer 3 starts collecting hardware counter information at the head of the loop (step S21). Then, the parallel computer 3 collects hardware counter information necessary for obtaining the following busy rate (step S22). The busy rate includes a memory access usage rate, an L2 cache access usage rate, an L1 cache access usage rate, and an arithmetic processing usage rate.

これらのビジー率を求めるために必要なハードウェアカウンタ情報には、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、命令コミット数、Ｌ１パイプバリッド及びＬ２ライトバックがある。電力プロファイラ情報６は、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、命令コミット数、Ｌ１パイプバリッド及びＬ２ライトバックの情報を呼出し回数とともにループに対応付ける情報である。なお、上記に上げたハードウェアカウンタ情報はＣＰＵアーキテクチャに依存するため、ＣＰＵアーキテクチャに準じて必要なハードウェアカウンタ情報も変わる。 The hardware counter information necessary for obtaining these busy rates includes the cycle count, the number of L1 cache misses, the number of L2 cache misses, the number of instruction commits, the L1 pipe valid, and the L2 write back. The power profiler information 6 is information for associating the cycle count, the number of L1 cache misses, the number of L2 cache misses, the number of instruction commits, the information of L1 pipe valid and L2 write back together with the number of calls to the loop. Since the hardware counter information raised above depends on the CPU architecture, the necessary hardware counter information also changes according to the CPU architecture.

例えば、図１０Ａでは、「ｌｏｏｐ＃１」に、サイクルカウント「１４５０３２」、Ｌ１キャッシュミス数「３９８５」、Ｌ２キャッシュミス数「７８９０」、命令コミット数「２２９８」、Ｌ１パイプバリッド「４９８２」、Ｌ２ライトバック「１１２０８」、呼び出し回数「１００」が対応付けられる。また、図１０Ａでは、電力プロファイラ情報６には、複数のレコードが含まれ、１レコードに１つのループに対応付けられる情報が含まれる。なお、１つのループには複数のレコードが対応付けられてもよい。 For example, in FIG. 10A, the cycle count “145032”, L1 cache miss number “3985”, L2 cache miss number “7890”, instruction commit number “2298”, L1 pipe valid “4982”, L2 Write back “11208” and the number of calls “100” are associated with each other. In FIG. 10A, the power profiler information 6 includes a plurality of records, and one record includes information associated with one loop. A plurality of records may be associated with one loop.

そして、並列計算機３は、ループの最後でハードウェアカウンタ情報の採取を終了する（ステップＳ２３）。そして、並列計算機３は、採取したハードウェアカウンタ情報及び呼出し回数情報をループ毎に電力プロファイラ情報６として出力する（ステップＳ２４）。 Then, the parallel computer 3 finishes collecting the hardware counter information at the end of the loop (step S23). Then, the parallel computer 3 outputs the collected hardware counter information and call count information as power profiler information 6 for each loop (step S24).

また、図１０Ｂに示すように、１つ目と３つ目の分岐処理のないループについて、プロファイリングデータが採取される。図１０Ｂにおいて、「ｆｕｎｃ１−１」、「ｆｕｎｃ１−２」は区間の名前である。区間は分岐処理のないループに対応付けられる。 Also, as shown in FIG. 10B, profiling data is collected for the first and third loops without branch processing. In FIG. 10B, “func1-1” and “func1-2” are section names. The section is associated with a loop without branch processing.

図１１は、電力プロファイラ情報６を解析する処理のフローを示すフローチャートである。図１１に示すように、解析部２３は、電力プロファイラ情報６を読み込み（ステップＳ３１）、各ループのビジー率の算出を行う（ステップＳ３２）。そして、解析部２３は、ループ毎に各ハードウェアモジュールの周波数制御方法を決定し（ステップＳ３３）、周波数制御情報７を作成する。 FIG. 11 is a flowchart showing a flow of processing for analyzing the power profiler information 6. As shown in FIG. 11, the analysis unit 23 reads the power profiler information 6 (step S31), and calculates the busy rate of each loop (step S32). And the analysis part 23 determines the frequency control method of each hardware module for every loop (step S33), and produces the frequency control information 7. FIG.

図１２は、ビジー率を算出する処理のフローを示すフローチャートである。なお、図１２では、１つの区間に複数の採取タイミングがあり、１レコードに１回の採取データが含まれる。図１２に示すように、解析部２３は、ビジー率の０初期化を行う（ステップＳ４１）。 FIG. 12 is a flowchart illustrating a process flow for calculating the busy rate. In FIG. 12, one section has a plurality of collection timings, and one record includes one collection data. As shown in FIG. 12, the analysis unit 23 initializes the busy rate to 0 (step S41).

そして、解析部２３は、電力プロファイラ情報６から情報を１レコード（Ｉ）毎に取り出し、ステップＳ４２の処理を全レコードについて行う。ステップＳ４２では、解析部２３は、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、命令コミット数、Ｌ１パイプバリッド及びＬ２ライトバックを区間毎に加算する。 And the analysis part 23 takes out information for every record (I) from the electric power profiler information 6, and performs the process of step S42 about all the records. In step S42, the analysis unit 23 adds the cycle count, the number of L1 cache misses, the number of L2 cache misses, the number of instruction commits, the L1 pipe valid, and the L2 write back for each section.

そして、解析部２３は、区間毎のビジー率を算出し（ステップＳ４３）、ビジー率情報を作成する。図１３は、ビジー率の算出式の例を示す図である。図１３に示すように、
メモリビジー率＝メモリスループット／メモリスループット上限
Ｌ２ビジー率＝（Ｌ２スループット＋メモリスループット×係数）／Ｌ２スループット上限
Ｌ１ビジー率＝Ｌ１パイプバリッド／サイクルカウント
演算ビジー率＝浮動小数点演算パイプライン／サイクルカウント
である。 And the analysis part 23 calculates the busy rate for every area (step S43), and produces busy rate information. FIG. 13 is a diagram illustrating an example of a busy rate calculation formula. As shown in FIG.
Memory busy rate = memory throughput / memory throughput upper limit L2 busy rate = (L2 throughput + memory throughput × coefficient) / L2 throughput upper limit L1 busy rate = L1 pipe valid / cycle count Operation busy rate = floating point operation pipeline / cycle count is there.

ここで、
メモリスループット＝（Ｌ２キャッシュミス数＋Ｌ２ライトバック）×キャッシュラインサイズ／サイクルカウント／クロック周波数
Ｌ２スループット＝Ｌ１キャッシュミス数×キャッシュラインサイズ／サイクルカウント／クロック周波数
である。また、Ｌ２ビジー率の算出に用いる係数は０．３、０．６等である。 here,
Memory throughput = (L2 cache miss number + L2 write back) × cache line size / cycle count / clock frequency L2 throughput = L1 cache miss number × cache line size / cycle count / clock frequency. The coefficients used for calculating the L2 busy rate are 0.3, 0.6, and the like.

図１４は、ビジー率情報の例を示す図である。図１４に示すように、ビジー率情報は、各区間のメモリビジー率、Ｌ２ビジー率、Ｌ１ビジー率及び演算ビジー率を示す情報である。また、ビジー率情報は、各区間の合計時間及び呼出し回数も示す。例えば、「ｌｏｏｐ＃１」のメモリビジー率は「８０％」であり、Ｌ２ビジー率は「２０％」であり、Ｌ１ビジー率は「３０％」であり、演算ビジー率は「２０％」であり、合計時間は「１３秒」であり、呼出し回数は「１００」である。 FIG. 14 is a diagram illustrating an example of busy rate information. As shown in FIG. 14, the busy rate information is information indicating a memory busy rate, an L2 busy rate, an L1 busy rate, and an operation busy rate in each section. The busy rate information also indicates the total time and number of calls for each section. For example, the memory busy rate of “loop # 1” is “80%”, the L2 busy rate is “20%”, the L1 busy rate is “30%”, and the operation busy rate is “20%”. Yes, the total time is “13 seconds”, and the number of calls is “100”.

図１５は、周波数制御方法を決定する処理のフローを示すフローチャートである。図１５に示すように、解析部２３は、ビジー率情報に含まれる各区間（Ｉ）について、ステップＳ５１〜ステップＳ５７の処理を行う。 FIG. 15 is a flowchart illustrating a process flow for determining a frequency control method. As illustrated in FIG. 15, the analysis unit 23 performs the processes of steps S <b> 51 to S <b> 57 for each section (I) included in the busy rate information.

すなわち、解析部２３は、ビジー率が最大となるハードウェアモジュールＭと値Ｒを記憶し、Ｍの電力比を１．００とする（ステップＳ５１）。そして、解析部２３は、残りの３つの各ハードウェアモジュール（Ｊ）について、ステップＳ５２〜ステップＳ５７の処理を行う。 That is, the analysis unit 23 stores the hardware module M and the value R that maximize the busy rate, and sets the power ratio of M to 1.00 (step S51). And the analysis part 23 performs the process of step S52-step S57 about each of the remaining three hardware modules (J).

すなわち、解析部２３は、Ｊの電力比＝Ｊのビジー率／Ｍのビジー率を計算する（ステップＳ５２）。そして、解析部２３は、２つの閾値０．４と０．２を用いて、電力比を１、０．５０又は０．２５に調整する。具体的には、解析部２３は、電力比が０．４以上であるか否かを判定し（ステップＳ５３）、０．４以上である場合には電力比を１．００とする（ステップＳ５７）。一方、電力比が０．４以上でない場合には、解析部２３は、電力比が０．２以上であるか否かを判定し（ステップＳ５４）、０．２以上である場合は電力比を０．５０とし（ステップＳ５５）、０．２以上でない場合は電力比を０．２５とする（ステップＳ５６）。そして、解析部２３は、周波数制御情報７を作成する。 That is, the analysis unit 23 calculates the power ratio of J = the busy rate of J / the busy rate of M (step S52). Then, the analysis unit 23 adjusts the power ratio to 1, 0.50, or 0.25 using the two threshold values 0.4 and 0.2. Specifically, the analysis unit 23 determines whether or not the power ratio is 0.4 or more (step S53), and if it is 0.4 or more, sets the power ratio to 1.00 (step S57). ). On the other hand, when the power ratio is not 0.4 or more, the analysis unit 23 determines whether or not the power ratio is 0.2 or more (step S54). The power ratio is set to 0.25 if not 0.5 or more (step S56). Then, the analysis unit 23 creates the frequency control information 7.

このように、解析部２３が周波数制御方法としてハードウェアモジュールの電力比を決定して周波数制御情報７を作成することで、コンパイル部２１は周波数制御情報７に基づいて周波数制御コードを挿入することができる。 As described above, the analysis unit 23 determines the power ratio of the hardware module as the frequency control method and creates the frequency control information 7, so that the compilation unit 21 inserts the frequency control code based on the frequency control information 7. Can do.

図１６Ａは、周波数制御情報７の例を示す図である。図１６Ａに示すように、周波数制御情報７は、各区間にメモリ周波数比、Ｌ２周波数比、Ｌ１周波数比及び演算周波数比を対応付ける。ここで、各ハードウェアモジュールの周波数比は、図１５に示した処理で決定された電力比である。また、周波数制御情報７は、各区間に合計時間及び呼出し回数を対応付ける。例えば、区間「ｌｏｏｐ＃１」のメモリ周波数比は「１．００」であり、Ｌ２周波数比は「０．２５」であり、Ｌ１周波数比は「０．５０」であり、演算周波数比は「０．２５」であり、合計時間は「１３秒」であり、呼出し回数は「１００」である。 FIG. 16A is a diagram illustrating an example of the frequency control information 7. As shown in FIG. 16A, the frequency control information 7 associates a memory frequency ratio, an L2 frequency ratio, an L1 frequency ratio, and an operation frequency ratio with each section. Here, the frequency ratio of each hardware module is the power ratio determined by the processing shown in FIG. The frequency control information 7 associates the total time and the number of calls with each section. For example, the memory frequency ratio of the section “loop # 1” is “1.00”, the L2 frequency ratio is “0.25”, the L1 frequency ratio is “0.50”, and the calculation frequency ratio is “ 0.25 ”, the total time is“ 13 seconds ”, and the number of calls is“ 100 ”.

図１６Ｂは、周波数制御情報７の他の例を示す図である。図１６Ｂは、閾値を用いた電力比の調整を行わなかった場合を示す。例えば、区間「ｌｏｏｐ＃１」のメモリ周波数比は「１．００」であり、Ｌ２周波数比は「０．２０」であり、Ｌ１周波数比は「０．３０」であり、演算周波数比は「０．２０」であり、合計時間は「１３秒」であり、呼出し回数は「１００」である。 FIG. 16B is a diagram showing another example of the frequency control information 7. FIG. 16B shows a case where the adjustment of the power ratio using the threshold is not performed. For example, the memory frequency ratio of the section “loop # 1” is “1.00”, the L2 frequency ratio is “0.20”, the L1 frequency ratio is “0.30”, and the calculation frequency ratio is “ 0.20 ”, the total time is“ 13 seconds ”, and the number of calls is“ 100 ”.

図１７は、プロファイリングデータの解析例を示す図である。図１７に示すように、電力プロファイラ情報６からビジー率情報が作成され、ビジー率情報から周波数制御情報７が作成される。 FIG. 17 is a diagram illustrating an analysis example of profiling data. As shown in FIG. 17, busy rate information is created from the power profiler information 6, and frequency control information 7 is created from the busy rate information.

図１８は、周波数制御コードを埋め込む処理のフローを示すフローチャートである。図１８に示すように、コンパイル部２１は、周波数制御情報７の読み込みを行う（ステップＳ６１）。そして、コンパイル部２１は、以下のステップＳ６２〜ステップＳ６３の処理を周波数制御情報７の区間（Ｉ）毎に実施する。 FIG. 18 is a flowchart showing a flow of processing for embedding the frequency control code. As shown in FIG. 18, the compiling unit 21 reads the frequency control information 7 (step S61). Then, the compiling unit 21 performs the following processing in steps S62 to S63 for each section (I) of the frequency control information 7.

すなわち、コンパイル部２１は、１回の区間平均時間が閾値未満であるか否かを判定する（ステップＳ６２）。ここで、区間平均時間は合計時間／呼出し回数である。そして、コンパイル部２１は、１回の区間平均時間が閾値未満である場合には次の区間を処理し、閾値以上である場合には、区間に対応するループの前後に周波数制御コードを挿入する（ステップＳ６３）。そして、全ての区間についてステップＳ６２〜ステップＳ６３の処理を終了すると、コンパイル部２１は、周波数制御命令を埋め込む処理を終了する。 That is, the compiling unit 21 determines whether or not the one-section average time is less than the threshold value (step S62). Here, the section average time is the total time / number of calls. Then, the compiling unit 21 processes the next section when the one-section average time is less than the threshold value, and inserts the frequency control code before and after the loop corresponding to the section when the average time is less than the threshold value. (Step S63). Then, when the processes in steps S62 to S63 are finished for all the sections, the compiling unit 21 finishes the process of embedding the frequency control instruction.

このように、コンパイル部２１は、１回の区間平均時間が閾値未満である場合には周波数制御コードを埋め込まないことで、省電力効果の少ないループについて周波数制御によるオーバヘッドの発生を防ぐことができる。 In this way, the compiling unit 21 can prevent overhead due to frequency control for a loop with a small power saving effect by not embedding the frequency control code when the average time of one section is less than the threshold. .

図１９は、周波数制御コードとして挿入される周波数制御関数の例を示す図である。図１９（ａ）は関数仕様を示し、図１９（ｂ）はコンパイル出力例を示す。図１９（ａ）に示すように、周波数制御関数ｆｒｅｑｃｎｔｌは４つの引き数を有する。ａｒｇ＃１はメモリの周波数制御値であり、ａｒｇ＃２はＬ２キャッシュの周波数制御値であり、ａｒｇ＃３はＬ１キャッシュの周波数制御値であり、ａｒｇ＃４は演算器の周波数制御値である。 FIG. 19 is a diagram illustrating an example of a frequency control function inserted as a frequency control code. FIG. 19A shows function specifications, and FIG. 19B shows an example of compilation output. As shown in FIG. 19A, the frequency control function freqcntl has four arguments. arg # 1 is a memory frequency control value, arg # 2 is an L2 cache frequency control value, arg # 3 is an L1 cache frequency control value, and arg # 4 is an arithmetic unit frequency control value. .

また、図１９（ｂ）に示すように、ｌｏｏｐ＃１〜ｌｏｏｐ＃４で表される４つのループの前後に周波数制御関数が挿入される。図１９（ｂ）において、Ｌ２周波数はＬ２キャッシュの周波数であり、Ｌ１周波数はＬ１キャッシュの周波数であり、演算周波数は演算器の周波数である。 Further, as shown in FIG. 19B, frequency control functions are inserted before and after the four loops represented by loop # 1 to loop # 4. In FIG. 19B, the L2 frequency is the frequency of the L2 cache, the L1 frequency is the frequency of the L1 cache, and the calculation frequency is the frequency of the calculator.

また、取り消し線で削除された周波数制御関数は、コンパイル部２１による最適化により削除された周波数制御関数である。周波数制御関数が連続した場合、前にある周波数制御関数は削除される。また、全引数の値が前と同じであって重複する周波数制御関数も削除される。 The frequency control function deleted by the strikethrough is a frequency control function deleted by optimization by the compiling unit 21. When the frequency control function is continuous, the previous frequency control function is deleted. Also, duplicate frequency control functions with the same values for all arguments as before are deleted.

このように、コンパイル部２１は、最適化を行うことによって、無駄な周波数制御のオーバーヘッドをなくすことができる。 In this way, the compiling unit 21 can eliminate unnecessary frequency control overhead by performing optimization.

図２０は、プログラムコード４への周波数制御コードの埋め込み例を示す図である。図２０に示すように、１つ目のループに対しては演算器の効率を０．５倍とする周波数制御コードが埋め込まれている。２つ目のループについては周波数制御コードはない。３つ目のループについては効率が１．０倍のままであるため、周波数制御コードは埋め込まれない。 FIG. 20 is a diagram illustrating an example of embedding the frequency control code in the program code 4. As shown in FIG. 20, a frequency control code for increasing the efficiency of the arithmetic unit by 0.5 is embedded in the first loop. There is no frequency control code for the second loop. Since the efficiency remains 1.0 times for the third loop, the frequency control code is not embedded.

図２１は、再翻訳コードによる省電力実行のフローを示すフローチャートである。図２１に示すように、並列計算機３は、−ｎｏｅｃｏオプションか否かを判定し（ステップＳ７１）、−ｎｏｅｃｏオプションでない場合には、ｆｒｅｑｃｎｔｌ関数にて、引数に従い、ハードウェアモジュール毎に周波数を設定変更する（ステップＳ７２）。一方、−ｎｏｅｃｏオプションである場合には、並列計算機３は、ステップＳ７２の処理は行わない。 FIG. 21 is a flowchart showing a flow of power saving execution by the retranslation code. As shown in FIG. 21, the parallel computer 3 determines whether or not it is the -noeco option (step S71). If it is not the -noeco option, the frequency is set for each hardware module according to the argument in the freqcntl function. Change (step S72). On the other hand, if it is the -noeco option, the parallel computer 3 does not perform the process of step S72.

このように、並列計算機３が−ｎｏｅｃｏオプションで実行コード５を実行することで、利用者は省電力実行との比較を行うことができる。 As described above, the parallel computer 3 executes the execution code 5 with the -noco option, so that the user can compare with the power saving execution.

図２２は、利用者によるカストマイズ例を示す図である。図２２では、利用者は周波数制御情報７をカストマイズしている。図２２に示すように、演算周波数比を０．５０にしてプログラムを実行させても性能に影響がない場合、利用者は演算周波数比を０．２５にしてプログラムを実行させる。このようなカストマイズにより、利用者は、性能低下を抑えながら省電力を図ることができる。 FIG. 22 is a diagram illustrating an example of customization by the user. In FIG. 22, the user has customized the frequency control information 7. As shown in FIG. 22, if the performance is not affected even if the program is executed with the calculation frequency ratio set to 0.50, the user executes the program with the calculation frequency ratio set to 0.25. With such customization, the user can save power while suppressing a decrease in performance.

上述してきたように、実施例では、コンパイル部２１がプログラムコード４の分岐のないループに対して電力プロファイラ情報６を出力するコードを挿入してコンパイルし、結合部２２がコンパイル結果にプロファイラライブラリを結合して実行コード５を生成する。そして、並列計算機３が実行コード５を実行して電力プロファイラ情報６を出力する。そして、解析部２３が電力プロファイラ情報６を解析して周波数制御情報７を作成し、コンパイル部２１が周波数制御情報７に基づいてプログラムコード４に周波数制御コードを挿入してコンパイルし、実行コード５を生成する。そして、並列計算機３が周波数制御コードを含む実行コード５を実行する。 As described above, in the embodiment, the compiling unit 21 inserts and compiles the code that outputs the power profiler information 6 into the loop without the branch of the program code 4, and the combining unit 22 adds the profiler library to the compilation result. The execution code 5 is generated by combining them. Then, the parallel computer 3 executes the execution code 5 and outputs the power profiler information 6. Then, the analysis unit 23 analyzes the power profiler information 6 to create frequency control information 7, and the compiling unit 21 inserts and compiles the frequency control code into the program code 4 based on the frequency control information 7, and executes the execution code 5. Is generated. Then, the parallel computer 3 executes the execution code 5 including the frequency control code.

したがって、情報処理システム１は、性能低下を防ぎながら並列計算機３の消費電力を削減することができる。また、ループに対して１回だけ周波数制御を行うことで、段階的に周波数制御を行う場合と比較して周波数制御のオーバーヘッドを少なくすることができる。 Therefore, the information processing system 1 can reduce the power consumption of the parallel computer 3 while preventing performance degradation. Further, by performing the frequency control only once for the loop, it is possible to reduce the overhead of the frequency control as compared with the case where the frequency control is performed in stages.

また、実施例では、解析部２３は、電力プロファイラ情報６に基づいてビジー率情報を作成し、ビジー率情報に基づいて周波数制御方法を決定して周波数制御情報７を作成する。したがって、解析部２３は、正確な周波数制御情報７を作成することができる。 In the embodiment, the analysis unit 23 creates busy rate information based on the power profiler information 6, determines a frequency control method based on the busy rate information, and creates the frequency control information 7. Therefore, the analysis unit 23 can create accurate frequency control information 7.

また、実施例では、電力プロファイラ情報６には、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、浮動小数点演算パイプライン、Ｌ１パイプバリッド及びＬ２ライトバックが含まれる。また、ハードウェアモジュールには、メモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算器が含まれる。そして、解析部２３は、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、浮動小数点演算パイプライン、Ｌ１パイプバリッド及びＬ２ライトバックに基づいてメモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算器のビジー率を算出する。したがって、解析部２３は、メモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算器のビジー率を正確に算出することができる。また、解析部２３は、１回の採取でビジー率の算出に必要なハードウェアカウンタ情報を取得することができる。 In the embodiment, the power profiler information 6 includes a cycle count, the number of L1 cache misses, the number of L2 cache misses, a floating point arithmetic pipeline, an L1 pipe valid, and an L2 write back. The hardware module includes a memory, an L2 cache, an L1 cache, and an arithmetic unit. Then, the analysis unit 23 determines the busy rate of the memory, the L2 cache, the L1 cache, and the arithmetic unit based on the cycle count, the number of L1 cache misses, the number of L2 cache misses, the floating point arithmetic pipeline, the L1 pipe valid, and the L2 write back. calculate. Therefore, the analysis unit 23 can accurately calculate the busy rate of the memory, the L2 cache, the L1 cache, and the arithmetic unit. Further, the analysis unit 23 can acquire hardware counter information necessary for calculating the busy rate with one collection.

また、実施例では、コンパイル部２１は、周波数制御情報７に基づいてプログラムコード４に周波数制御コードを挿入する際に、１回の区間平均時間が閾値未満であるか否かを判定し、閾値未満である場合には、周波数制御コードを挿入しない。したがって、コンパイル部２１は、省電力効果の少ないループについて周波数制御によるオーバヘッドの発生を防ぐことができる。 In the embodiment, the compiling unit 21 determines whether or not the average time of one section is less than the threshold when inserting the frequency control code into the program code 4 based on the frequency control information 7. If it is less, the frequency control code is not inserted. Therefore, the compiling unit 21 can prevent the occurrence of overhead due to frequency control for a loop with a small power saving effect.

また、実施例では、コンパイル部２１は、各ループの前に周波数を変更する周波数制御関数を挿入し、各ループの後に周波数を元に戻す周波数制御関数を挿入する。そして、コンパイル部２１は、周波数を元に戻す周波数制御関数の直後に周波数を変更する周波数制御関数がある場合には、周波数を元に戻す周波数制御関数を削除する。したがって、コンパイル部２１は、無駄な周波数制御のオーバーヘッドをなくすことができる。 In the embodiment, the compiling unit 21 inserts a frequency control function for changing the frequency before each loop, and inserts a frequency control function for returning the frequency after each loop. When there is a frequency control function that changes the frequency immediately after the frequency control function that restores the frequency, the compiling unit 21 deletes the frequency control function that restores the frequency. Therefore, the compiling unit 21 can eliminate unnecessary frequency control overhead.

なお、実施例では、管理装置２について説明したが、管理装置２が有する構成をソフトウェアによって実現することで、同様の機能を有する管理プログラムを得ることができる。そこで、管理プログラムを実行するコンピュータについて説明する。 In addition, although the management apparatus 2 was demonstrated in the Example, the management program which has the same function can be obtained by implement | achieving the structure which the management apparatus 2 has with software. Therefore, a computer that executes the management program will be described.

図２３は、実施例に係る管理プログラムを実行するコンピュータのハードウェア構成を示す図である。図２３に示すように、コンピュータ５０は、メインメモリ５１と、ＣＰＵ５２と、ＬＡＮ（Local Area Network）インタフェース５３と、ＨＤＤ（Hard Disk Drive）５４とを有する。また、コンピュータ５０は、スーパーＩＯ（Input Output）５５と、ＤＶＩ（Digital Visual Interface）５６と、ＯＤＤ（Optical Disk Drive）５７とを有する。 FIG. 23 is a diagram illustrating a hardware configuration of a computer that executes the management program according to the embodiment. As shown in FIG. 23, the computer 50 includes a main memory 51, a CPU 52, a LAN (Local Area Network) interface 53, and an HDD (Hard Disk Drive) 54. The computer 50 includes a super IO (Input Output) 55, a DVI (Digital Visual Interface) 56, and an ODD (Optical Disk Drive) 57.

メインメモリ５１は、プログラムやプログラムの実行途中結果などを記憶するメモリである。ＣＰＵ５２は、メインメモリ５１からプログラムを読み出して実行する中央処理装置である。ＣＰＵ５２は、メモリコントローラを有するチップセットを含む。 The main memory 51 is a memory for storing a program and a program execution result. The CPU 52 is a central processing unit that reads a program from the main memory 51 and executes it. The CPU 52 includes a chip set having a memory controller.

ＬＡＮインタフェース５３は、コンピュータ５０をＬＡＮ経由で他のコンピュータに接続するためのインタフェースである。ＨＤＤ５４は、プログラムやデータを格納するディスク装置であり、スーパーＩＯ５５は、マウスやキーボードなどの入力装置を接続するためのインタフェースである。ＤＶＩ５６は、液晶表示装置を接続するインタフェースであり、ＯＤＤ５７は、ＤＶＤの読み書きを行う装置である。 The LAN interface 53 is an interface for connecting the computer 50 to another computer via a LAN. The HDD 54 is a disk device that stores programs and data, and the super IO 55 is an interface for connecting an input device such as a mouse or a keyboard. The DVI 56 is an interface for connecting a liquid crystal display device, and the ODD 57 is a device for reading / writing a DVD.

ＬＡＮインタフェース５３は、ＰＣＩエクスプレス（ＰＣＩｅ）によりＣＰＵ５２に接続され、ＨＤＤ５４及びＯＤＤ５７は、ＳＡＴＡ（Serial Advanced Technology Attachment）によりＣＰＵ５２に接続される。スーパーＩＯ５５は、ＬＰＣ（Low Pin Count）によりＣＰＵ５２に接続される。 The LAN interface 53 is connected to the CPU 52 by PCI Express (PCIe), and the HDD 54 and ODD 57 are connected to the CPU 52 by SATA (Serial Advanced Technology Attachment). The super IO 55 is connected to the CPU 52 by LPC (Low Pin Count).

そして、コンピュータ５０において実行される管理プログラムは、コンピュータ５０により読み出し可能な記録媒体の一例であるＤＶＤに記憶され、ＯＤＤ５７によってＤＶＤから読み出されてコンピュータ５０にインストールされる。あるいは、管理プログラムは、ＬＡＮインタフェース５３を介して接続された他のコンピュータシステムのデータベースなどに記憶され、これらのデータベースから読み出されてコンピュータ５０にインストールされる。そして、インストールされた管理プログラムは、ＨＤＤ５４に記憶され、メインメモリ５１に読み出されてＣＰＵ５２によって実行される。 A management program executed in the computer 50 is stored in a DVD that is an example of a recording medium readable by the computer 50, read from the DVD by the ODD 57, and installed in the computer 50. Alternatively, the management program is stored in a database or the like of another computer system connected via the LAN interface 53, read from these databases, and installed in the computer 50. The installed management program is stored in the HDD 54, read into the main memory 51, and executed by the CPU 52.

また、実施例では、並列計算機３が実行コード５を実行する場合について説明したが、ハードウェアモジュールの周波数制御が可能な情報処理装置が実行コード５を実行してもよい。 In the embodiment, the case where the parallel computer 3 executes the execution code 5 has been described. However, an information processing apparatus capable of controlling the frequency of the hardware module may execute the execution code 5.

また、実施例では、メモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算器の動作周波数を制御する場合について説明したが、他のハードウェアモジュールの動作周波数を制御してもよい。また、ハードウェアカウンタ情報として、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、浮動小数点演算パイプライン、Ｌ１パイプバリッド及びＬ２ライトバック以外の情報を用いてもよい。 In the embodiment, the case where the operating frequencies of the memory, the L2 cache, the L1 cache, and the arithmetic unit are controlled has been described. However, the operating frequencies of other hardware modules may be controlled. Further, as hardware counter information, information other than the cycle count, the number of L1 cache misses, the number of L2 cache misses, the floating point arithmetic pipeline, the L1 pipe valid, and the L2 write back may be used.

また、実施例では、解析部２３が周波数制御情報７を作成する場合について説明したが、解析部２３は解析結果を電力プロファイラ情報６に加えて表示装置に表示してもよい。 In the embodiment, the case where the analysis unit 23 creates the frequency control information 7 has been described. However, the analysis unit 23 may display the analysis result on the display device in addition to the power profiler information 6.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）プロファイリングデータを採取するコードをソースプログラムの分岐のないループに対して挿入して第１の実行コードを生成するコンパイル部と、
前記コンパイル部により生成された第１の実行コードを実行して前記プロファイリングデータに基づく電力プロファイラ情報を出力する実行部と、
前記実行部により出力された電力プロファイラ情報を解析して前記実行部に含まれる複数のハードウェアモジュールの周波数制御に用いられる周波数制御情報を作成する解析部とを有し、
前記コンパイル部は、前記解析部により作成された周波数制御情報に基づいて前記ソースプログラムの前記ループに対して前記複数のハードウェアモジュールの周波数を制御する周波数制御コードを挿入して第２の実行コードを生成し、
前記実行部は、前記コンパイル部により生成された第２の実行コードを実行することを特徴とする情報処理装置。 (Supplementary Note 1) A compiling unit that inserts a code for collecting profiling data into a loop without a branch of a source program to generate a first execution code;
An execution unit that executes the first execution code generated by the compiling unit and outputs power profiler information based on the profiling data;
An analysis unit that analyzes the power profiler information output by the execution unit and creates frequency control information used for frequency control of a plurality of hardware modules included in the execution unit;
The compiling unit inserts a frequency control code for controlling the frequencies of the plurality of hardware modules into the loop of the source program based on the frequency control information created by the analyzing unit, and executes a second execution code. Produces
The information processing apparatus, wherein the execution unit executes the second execution code generated by the compiling unit.

（付記２）前記解析部は、
前記電力プロファイラ情報に基づいて各ループにおける前記複数のハードウェアモジュールのビジー率を算出する算出部と、
前記算出部により算出された複数のビジー率に基づいて各ループの周波数制御方法を決定し前記周波数制御情報を作成する作成部と
を有することを特徴とする付記１に記載の情報処理装置。 (Appendix 2) The analysis unit
A calculation unit that calculates a busy rate of the plurality of hardware modules in each loop based on the power profiler information;
The information processing apparatus according to claim 1, further comprising: a generation unit that determines a frequency control method for each loop based on a plurality of busy rates calculated by the calculation unit and generates the frequency control information.

（付記３）前記電力プロファイラ情報には、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、浮動小数点演算パイプライン、Ｌ１パイプバリッド及びＬ２ライトバックが含まれ、
前記実行部の複数のハードウェアモジュールにはメモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算装置が含まれ、
前記算出部は、サイクルカウント、Ｌ１キャッシュミス数、Ｌ２キャッシュミス数、浮動小数点演算パイプライン、Ｌ１パイプバリッド及びＬ２ライトバックに基づいてメモリ、Ｌ２キャッシュ、Ｌ１キャッシュ及び演算装置のビジー率を算出することを特徴とする付記２に記載の情報処理装置。 (Supplementary Note 3) The power profiler information includes a cycle count, the number of L1 cache misses, the number of L2 cache misses, a floating point arithmetic pipeline, an L1 pipe valid, and an L2 writeback.
The plurality of hardware modules of the execution unit include a memory, an L2 cache, an L1 cache, and an arithmetic device.
The calculation unit calculates the busy rate of the memory, the L2 cache, the L1 cache, and the arithmetic unit based on the cycle count, the number of L1 cache misses, the number of L2 cache misses, the floating point arithmetic pipeline, the L1 pipe valid, and the L2 write back. The information processing apparatus according to Supplementary Note 2, wherein

（付記４）前記周波数制御情報には、前記ループに対応する区間の実行時間が含まれ、
前記コンパイル部は、前記区間の平均実行時間が所定の閾値以上である場合に該ループに対して前記複数のハードウェアモジュールの周波数を制御する周波数制御コードを挿入して第２の実行コードを生成することを特徴とする付記１、２又は３に記載の情報処理装置。 (Supplementary Note 4) The frequency control information includes an execution time of a section corresponding to the loop,
The compiling unit generates a second execution code by inserting a frequency control code for controlling the frequency of the plurality of hardware modules into the loop when the average execution time of the section is equal to or greater than a predetermined threshold. The information processing apparatus according to appendix 1, 2, or 3, wherein:

（付記５）前記コンパイル部は、複数の前記ループについて各ループの前に周波数を変更する周波数制御コードを挿入し、各ループの後に周波数を元に戻す周波数制御コードを挿入し、周波数を元に戻す周波数制御コードの直後に周波数を変更する周波数制御コードがある場合には、周波数を元に戻す周波数制御コードを削除することを特徴とする付記１〜４のいずれか１つに記載の情報処理装置。 (Supplementary Note 5) The compiling unit inserts a frequency control code for changing the frequency before each loop for the plurality of loops, inserts a frequency control code for returning the frequency after each loop, and based on the frequency The information processing according to any one of appendices 1 to 4, wherein when there is a frequency control code for changing the frequency immediately after the frequency control code to be returned, the frequency control code for returning the frequency is deleted. apparatus.

（付記６）コンピュータが、
プロファイリングデータを採取するコードをソースプログラムの分岐のないループに対して挿入して第１の実行コードを生成し、
前記第１の実行コードが処理装置により実行されて採取されたプロファイリングデータを用いて作成された電力プロファイラ情報に基づき前記処理装置に含まれる複数のハードウェアモジュールの周波数制御用に作成された周波数制御情報に基づいて、前記ソースプログラムの前記ループに対して前記複数のハードウェアモジュールの周波数を制御する周波数制御コードを挿入して第２の実行コードを生成する
処理を実行することを特徴とするコンパイル方法。 (Appendix 6)
Insert the code that collects the profiling data into the loop without the branch of the source program to generate the first execution code,
Frequency control created for frequency control of a plurality of hardware modules included in the processing device based on power profiler information created using profiling data collected by executing the first execution code by the processing device Compiled by executing a process of generating a second executable code by inserting a frequency control code for controlling a frequency of the plurality of hardware modules into the loop of the source program based on information Method.

（付記７）コンピュータに、
プロファイリングデータを採取するコードをソースプログラムの分岐のないループに対して挿入して第１の実行コードを生成し、
前記第１の実行コードが処理装置により実行されて採取されたプロファイリングデータを用いて作成された電力プロファイラ情報に基づき前記処理装置に含まれる複数のハードウェアモジュールの周波数制御用に作成された周波数制御情報に基づいて、前記ソースプログラムの前記ループに対して前記複数のハードウェアモジュールの周波数を制御する周波数制御コードを挿入して第２の実行コードを生成する
処理を実行させることを特徴とするコンパイルプログラム。 (Appendix 7)
Insert the code that collects the profiling data into the loop without the branch of the source program to generate the first execution code,
Frequency control created for frequency control of a plurality of hardware modules included in the processing device based on power profiler information created using profiling data collected by executing the first execution code by the processing device Compile characterized in that, based on the information, a process for generating a second executable code by inserting a frequency control code for controlling the frequency of the plurality of hardware modules into the loop of the source program is executed. program.

（付記８）コンピュータに、
プロファイリングデータを採取するコードがソースプログラムの分岐のないループに対して挿入された実行コードが処理装置により実行されて採取されたプロファイリングデータを用いて作成された電力プロファイラ情報を読み込み、
読み込んだ電力プロファイラ情報を解析して前記処理装置に含まれる複数のハードウェアモジュールの周波数制御に用いられる周波数制御情報を作成する
処理を実行させることを特徴とする解析プログラム。 (Appendix 8)
Read the power profiler information created by using the profiling data collected when the execution code inserted into the loop without the branch of the source program is the code that collects the profiling data,
An analysis program for executing a process of analyzing read power profiler information and generating frequency control information used for frequency control of a plurality of hardware modules included in the processing device.

（付記９）前記作成する処理は、
前記電力プロファイラ情報に基づいて各ループにおける前記複数のハードウェアモジュールのビジー率を算出し、
算出した複数のビジー率に基づいて各ループの周波数制御方法を決定し前記周波数制御情報を作成する
ことを特徴とする付記８に記載の解析プログラム。 (Appendix 9) The process to create is
Calculating a busy rate of the plurality of hardware modules in each loop based on the power profiler information;
The analysis program according to appendix 8, wherein a frequency control method for each loop is determined based on the calculated plurality of busy rates, and the frequency control information is created.

１情報処理システム
２管理装置
３並列計算機
４プログラムコード
５実行コード
６電力プロファイラ情報
７周波数制御情報
２１コンパイル部
２１ａ第１挿入部
２１ｂ第２挿入部
２２結合部
２３解析部
２３ａ算出部
２３ｂ作成部
５０コンピュータ
５１メインメモリ
５２ＣＰＵ
５３ＬＡＮインタフェース
５４ＨＤＤ
５５スーパーＩＯ
５６ＤＶＩ
５７ＯＤＤ DESCRIPTION OF SYMBOLS 1 Information processing system 2 Management apparatus 3 Parallel computer 4 Program code 5 Execution code 6 Power profiler information 7 Frequency control information 21 Compile part 21a 1st insertion part 21b 2nd insertion part 22 Connection part 23 Analysis part 23a Calculation part 23b Creation part 50 Computer 51 Main memory 52 CPU
53 LAN interface 54 HDD
55 Super IO
56 DVI
57 ODD

Claims

A compiling unit that inserts a code for collecting profiling data into a loop without a branch of the source program to generate a first executable code;
An execution unit that executes the first execution code generated by the compiling unit and outputs power profiler information based on the profiling data;
An analysis unit that analyzes the power profiler information output by the execution unit and creates frequency control information used for frequency control of a plurality of hardware modules included in the execution unit;
The compiling unit inserts a frequency control code for controlling the frequencies of the plurality of hardware modules into the loop of the source program based on the frequency control information created by the analyzing unit, and executes a second execution code. Produces
The information processing apparatus, wherein the execution unit executes the second execution code generated by the compiling unit.

The analysis unit
A calculation unit that calculates a busy rate of the plurality of hardware modules in each loop based on the power profiler information;
The information processing apparatus according to claim 1, further comprising: a generation unit that determines a frequency control method for each loop based on a plurality of busy rates calculated by the calculation unit and generates the frequency control information.

The power profiler information includes a cycle count, L1 cache miss count, L2 cache miss count, floating point arithmetic pipeline, L1 pipe valid and L2 writeback,
The plurality of hardware modules of the execution unit include a memory, an L2 cache, an L1 cache, and an arithmetic device.
The calculation unit calculates the busy rate of the memory, the L2 cache, the L1 cache, and the arithmetic unit based on the cycle count, the number of L1 cache misses, the number of L2 cache misses, the floating point arithmetic pipeline, the L1 pipe valid, and the L2 write back. The information processing apparatus according to claim 2.

The frequency control information includes an execution time of a section corresponding to the loop,
The compiling unit generates a second execution code by inserting a frequency control code for controlling the frequency of the plurality of hardware modules into the loop when the average execution time of the section is equal to or greater than a predetermined threshold. The information processing apparatus according to claim 1, 2, or 3.

The compiling unit inserts a frequency control code for changing the frequency before each loop for the plurality of loops, inserts a frequency control code for returning the frequency after each loop, and a frequency control code for returning the frequency to the original. 5. The information processing apparatus according to claim 1, wherein when there is a frequency control code for changing the frequency immediately after the frequency control code, the frequency control code for restoring the frequency is deleted.

Computer
Insert the code that collects the profiling data into the loop without the branch of the source program to generate the first execution code,
Frequency control created for frequency control of a plurality of hardware modules included in the processing device based on power profiler information created using profiling data collected by executing the first execution code by the processing device Compiled by executing a process of generating a second executable code by inserting a frequency control code for controlling a frequency of the plurality of hardware modules into the loop of the source program based on information Method.

On the computer,
Insert the code that collects the profiling data into the loop without the branch of the source program to generate the first execution code,
Frequency control created for frequency control of a plurality of hardware modules included in the processing device based on power profiler information created using profiling data collected by executing the first execution code by the processing device Compile characterized in that, based on the information, a process for generating a second executable code by inserting a frequency control code for controlling the frequency of the plurality of hardware modules into the loop of the source program is executed. program.