JP4696282B2

JP4696282B2 - Computer apparatus, program, and recording medium

Info

Publication number: JP4696282B2
Application number: JP2003022792A
Authority: JP
Inventors: 孝洋片桐
Original assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Priority date: 2003-01-30
Filing date: 2003-01-30
Publication date: 2011-06-08
Anticipated expiration: 2023-01-30
Also published as: JP2004234393A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えばコンピュータに備えられたライブラリのパラメータの最適化を、コンピュータに実行させるためのプログラム、記録媒体およびコンピュータに関するものである。
【０００２】
【従来の技術】
コンピュータのような計算装置において数値計算ライブラリなどのソフトウェアを用いる際には、ライブラリ中の所望のサブルーチンに対して、ユーザが所望の問題に応じてパラメータを指定する。その後、ユーザの指定したパラメータを用いてサブルーチンが実行され、結果が出力される。
【０００３】
例えば、数値計算ライブラリのサブルーチンとして、行列の固有値を計算する固有値計算サブルーチンを考える。このとき、サブルーチンに対してユーザが指定するパラメータのうちには、所望の行列の実体や、その行列のサイズなどがある。これらのパラメータは、その問題を実際に解く際に必要とされるパラメータである。
【０００４】
一方、パラメータのうちには、どのように設定しても問題の答えとしては同じものが得られるが、適切に設定することによって例えば数値計算に要する時間を短縮できたりするような、いわゆる最適化のためのパラメータがある。
【０００５】
例えば、計算装置が複数のプロセッサを備えた並列計算装置であるとき、行列計算におけるループアンローリング段数（アンローリング段数）は、最適化のためのパラメータである。
【０００６】
ここで、アンローリング段数とは、ループの計算において通常１に設定している、ループごとの増分を意味する。例えば、ベクトルＡ(i)とベクトルＢ(i)の和Ｃ(i)を計算する際に、アンローリング段数を２に設定した場合には、i成分の和Ｃ(i)＝Ａ(i)＋Ｂ(i)とi+1成分の和Ｃ(i+1)＝Ａ(i+1)＋Ｂ(i+1)とがループ内においてそれぞれ計算され、その後iを２だけ増加させる。
【０００７】
問題の性質や計算装置の性能（並列プロセッサの数など）に応じてアンローリング段数を調整することによって、計算装置における計算時間を最も短く（最適化）できる。
【０００８】
このような最適化のためのパラメータを調節して計算時間（性能）などのコストを最適なものとする調節機能を備えた計算装置が知られている。このような調整機能は通常ソフトウェア（自動チューニングソフトウェア）によって実現される。
【０００９】
従来の自動チューニングソフトウェアの構成方式として、ソフトウェアインストール時にパラメータ最適化を行うものがある。例えば、ソフトウェアインストール時にアンローリング段数を最適化する場合には、以下のようにする。
【００１０】
この場合には、解くべき問題の問題サイズなどが決まっていないため、適当にサンプリングした問題サイズごとに最適なアンローリング段数を求める。その後に、サンプリングした問題サイズごとの最適なアンローリング段数を、例えば問題サイズについて適当な補間関数によって補間する。なお、ある問題サイズにおいて最適なアンローリング段数を求める際に、例えばアンローリング段数についてもサンプリングを行い、補間関数を用いて最もコストの小さいアンローリング段数を選択してもよい。
【００１１】
このように、例えば適当な補間関数を用いたモデル化によって、後に選択を行う際に、問題サイズに応じた最適なアンローリング段数の推定値を得ることができる。
【００１２】
または、従来の自動チューニングソフトウェアの構成方式の他の一例として、ライブラリ実行時にパラメータ最適化を行うものがある。例えば、コストを変化させる大きな要因として、行列の実体のような実行時でないと確定しない要素が含まれる問題について、このようなパラメータ最適化を行う。
【００１３】
この場合には、ライブラリコールが行われた時点で、所望の問題サイズ、行列の実体などに対して所望のパラメータを幾つか試行して、最適なものを選択する。
【００１４】
なお、このようなライブラリ実行時に最適化を行う場合には、このチューニングを行う時間についてもコストとしての計算時間に含まれることに注意が必要である。すなわち、チューニングを行う時間とその後の計算時間とが、チューニングせずにパラメータを何らかの値に固定しておく場合の計算時間よりも少なくなる必要がある。
【００１５】
これらの従来の自動チューニングソフトウェアについては、以下の非特許文献１、非特許文献２を参照されたい。
【００１６】
なお、例えば日本国の公開特許公報「特開２０００−２７６４５４号公報（公開日：２０００年１０月６日）」には、並列計算機におけるソフトウェアの実行性能を大きく左右し、かつ、ユーザインタフェースには現れないパラメータを調節してインストールを行う機能を有するソフトウェアの構成方法が記載されている。この場合には、ソフトウェアインストール時にパラメータ最適化が行われる。
【００１７】
【特許文献１】
特開２０００−２７６４５４号公報
【００１８】
【非特許文献１】
片桐孝洋，他４名、「自動チューニング機構が並列数値計算ライブラリに及ぼす効果」、情報処理学会論文誌：ハイパフォーマンスコンピューティング、社団法人情報処理学会、２００１年１１月、第４２巻、第１２号（ＨＰＳ４）、ｐ．６０−７６
【００１９】
【非特許文献２】
直野健、山本有作、「単一メモリ型インタフェースを有する自動チューニング並列ライブラリの構成方法」、情報処理学会研究報告、社団法人情報処理学会、２００１年７月２５日、第７７巻、ｐ．２５−３０
【００２０】
【発明が解決しようとする課題】
しかしながら、従来の自動チューニングソフトウェアの構成方式では、ソフトウェアインストール時にパラメータ最適化を行うもの、またはライブラリ実行時にパラメータ最適化を行うもの、のみが存在していたため、パラメータ調整が不十分となる場合があるという問題を生ずる。
【００２１】
すなわち、従来の、ソフトウェアインストール時にパラメータ最適化を行う構成においては、例えば補間関数を用いたモデルに基づき、最適化パラメータを推定によって決定する。このため、十分な精度が得られない虞れがある。
【００２２】
また、従来の、ライブラリ実行時にパラメータ最適化を行う構成においては、ライブラリ実行時のパラメータチューニングに要する時間もコストとなるため、チューニングに十分な時間を費やすことができずに、精度が不十分なものとなる虞れがある。
【００２３】
また、従来は、汎用的な処理に適用できる自動チューニングソフトウェアがなかったという問題がある。例えば、非特許文献１に記載のＡＴＬＡＳは、数値計算ライブラリの中でも、ＢＬＡＳ(Basic Linear Algebra Subprograms)と呼ばれるライブラリのみ最適化できる。これは、汎用的な処理に適用できるものではない。
【００２４】
すなわち、問題によっては、インストール時にしか最適化できない、または実行時にしか最適化できないものがある。このため、従来のように、ソフトウェアインストール時にパラメータ最適化を行うもの、またはライブラリ実行時にパラメータ最適化を行うもの、のいずれか一方しかない場合には、全ての問題について、それぞれ最適化することはできない。すなわち、汎用的な処理に適用できないという問題がある。
【００２５】
本発明は、上記の問題点に鑑みてなされたものであり、その目的は、精密なパラメータ調整を行うことのできるプログラム、記録媒体およびコンピュータを提供することにある。また、本発明は、汎用的な処理に適用できるプログラム、記録媒体およびコンピュータを提供することも目的とする。
【００２６】
【課題を解決するための手段】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う手順とを含んでいることを特徴としている。
【００２７】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【００２８】
上記プログラムが実行されたコンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった時点を検出する。
【００２９】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【００３０】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【００３１】
すなわち、ライブラリの内容を数式として表したときに、数式中の変数として表現されるパラメータが、基本情報パラメータに相当する。また、数式中に現れず、または数式において単なる媒介変数として現れるパラメータが、性能情報パラメータに相当する。このため、例えば性能情報パラメータを変化させたとしても、数式によって得られる結果（ライブラリの出力）は変わらない。
【００３２】
その後、コンピュータは、ライブラリの実際の実行の前に、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。これによって、確実に最適な性能情報パラメータを得ることができる。
【００３３】
ここで、従来の最適化のためのプログラムの一例は、例えばライブラリのインストール時に性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【００３４】
また、従来の最適化のためのプログラムの他の一例は、例えばライブラリの実行時に性能情報パラメータの最適化を行う。この場合には、性能情報パラメータを最適化するための計算時間が、ライブラリの実行コストに計上されてしまう。このため、最適化のために十分な時間を取れずに、最適なパラメータが得られない虞れがある。
【００３５】
そこで、本発明に係る上述のプログラムのように、実際の計算の前に、実行コストを予め実測して、最適な性能情報パラメータを得るようにする。これによって、より精密かつ確実なパラメータ調整が可能となる。また、プログラムの実行前において、計算所要時間を予測できる。
【００３６】
なお、本発明に係るプログラムを、ユーザが知りうる情報が定まった時点でのパラメタ最適化機能を有するソフトウェアである、と表現することもできる。
【００３７】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリのインストール時に上記性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記初期設定手順において設定された上記性能情報パラメータを参照して、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順とを含んでいることを特徴としている。
【００３８】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【００３９】
上記プログラムが実行されたコンピュータは、ライブラリのインストール時に、性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【００４０】
また、コンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった時点を検出する。
【００４１】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【００４２】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【００４３】
その後、コンピュータは、ライブラリの実際の実行の前に、インストール時に設定された性能情報パラメータを参照して、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。特に、インストール時に設定された性能情報パラメータの最適値周辺の値のみについて、試行計算を行うようにしてもよい。これによって、試行計算の回数を削減して、最適な性能情報パラメータを得ることができる。このように、より精密かつ確実なパラメータ調整が可能となる。
【００４４】
なお、本発明に係るプログラムを、ソフトウェアのインストール時、およびユーザが知りうる情報が定まった時点でのソフトウェアの実行前、のパラメタ最適化機能を有するソフトウェアである、と表現することもできる。
【００４５】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいることを特徴としている。
【００４６】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【００４７】
上記プログラムが実行されたコンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった時点を検出する。
【００４８】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【００４９】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【００５０】
その後、コンピュータは、ライブラリの実際の実行の前に、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。これによって、確実に最適な性能情報パラメータを得ることができる。
【００５１】
また、コンピュータは、ライブラリの実際の実行の際に、既に設定された性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしているか否かを試行により判別する。そして、所望の精度を満たしていないときには、基本情報パラメータを用いて性能情報パラメータの最適化を再度実行する。そして、所望の精度を得ることのできる性能情報パラメータを用いて、ライブラリを実行する。
【００５２】
このように、実際の計算の前に、実行コストを予め実測して、最適な性能情報パラメータを得るようにする。基本情報パラメータの変更がないときには、予め設定した性能情報パラメータを用いてライブラリを実行できる。また、基本情報パラメータの変更があるときでも、所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できる。したがって、実行時におけるパラメータの最適化に要する時間を不要として、ライブラリの実行コスト（計算時間）を増大させない。また、ライブラリの実行の前に精度を確認するので、より精密かつ確実なパラメータ調整が可能となる。
【００５３】
なお、本発明に係るプログラムを、ユーザが知りうる情報が定まった時点でのソフトウェアの実行前、およびソフトウェア実行時、のパラメタ最適化機能を有するソフトウェアである、と表現することもできる。
【００５４】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリのインストール時に上記性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいることを特徴としている。
【００５５】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【００５６】
上記プログラムが実行されたコンピュータは、ライブラリのインストール時に、性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【００５７】
また、コンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった時点を検出する。
【００５８】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【００５９】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【００６０】
また、コンピュータは、ライブラリの実際の実行の際に、既に設定された性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしているか否かを試行により判別する。そして、所望の精度を満たしていないときには、基本情報パラメータを用いて性能情報パラメータの最適化を再度実行する。そして、所望の精度が得られる性能情報パラメータを用いて、ライブラリを実行する。
【００６１】
このように、実際の計算の前に、性能情報パラメータを設定しておく。実際の計算の際に、その性能情報パラメータによって所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できる。したがって、実行時におけるパラメータの最適化に要する時間を不要として、ライブラリの実行コスト（計算時間）を増大させない。また、ライブラリの実行の前に精度を確認するので、より精密かつ確実なパラメータ調整が可能となる。
【００６２】
なお、本発明に係るプログラムを、ソフトウェアのインストール時、およびソフトウェア実行時、のパラメタ最適化機能を有するソフトウェアである、と表現することもできる。
【００６３】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリのインストール時に上記性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいることを特徴としている。
【００６４】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【００６５】
上記プログラムが実行されたコンピュータは、ライブラリのインストール時に、性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【００６６】
また、コンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった時点を検出する。
【００６７】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【００６８】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【００６９】
その後、コンピュータは、ライブラリの実際の実行の前に、インストール時に設定された性能情報パラメータを参照して、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。特に、インストール時に設定された性能情報パラメータの最適値周辺の値のみについて、試行計算を行うようにしてもよい。これによって、試行計算の回数を削減して、最適な性能情報パラメータを得ることができる。このように、より精密かつ確実なパラメータ調整が可能となる。
【００７０】
また、コンピュータは、ライブラリの実際の実行の際に、既に設定された性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしているか否かを試行により判別する。そして、所望の精度を満たしていないときには、基本情報パラメータを用いて性能情報パラメータの最適化を再度実行する。そして、所望の精度が得られる性能情報パラメータを用いて、ライブラリを実行する。
【００７１】
このように、実際の計算の前に、実行コストを予め実測して、最適な性能情報パラメータを得るようにする。基本情報パラメータの変更がないときには、予め設定した性能情報パラメータを用いてライブラリを実行できる。また、基本情報パラメータの変更があるときでも、所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できる。したがって、実行時におけるパラメータの最適化に要する時間を不要として、ライブラリの実行コスト（計算時間）を増大させない。また、ライブラリの実行の前に精度を確認するので、より精密かつ確実なパラメータ調整が可能となる。
【００７２】
なお、本発明に係るプログラムを、ソフトウェアのインストール時、ユーザが知りうる情報が定まった時点でのソフトウェアの実行前、およびソフトウェア実行時、の３階層のパラメタ最適化機能を有するソフトウェアである、と表現することもできる。
【００７３】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータについて最適化する機能を上記コンピュータに実現させるためのプログラムにおいて、上記性能情報パラメータの各要素を、上記ライブラリのインストール時に最適化を行うパラメータの第１の集合、上記ライブラリの実行の前に最適化を行うパラメータの第２の集合、または上記ライブラリの実行の際に最適化を行うパラメータの第３の集合のうちの少なくとも一つに含まれるように設定して、第１の集合の要素を最適化する機能と、第２の集合の要素を最適化する機能と、第３の集合の要素を最適化する機能とを上記コンピュータに実現させることを特徴としている。
【００７４】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、並列計算機を用いる場合のループアンローリング段数が、性能情報パラメータに相当する。
【００７５】
上記プログラムが実行されたコンピュータにおいては、性能情報パラメータが、ライブラリのインストール時に最適化を行うパラメータの第１の集合、ライブラリの実行の前に最適化を行うパラメータの第２の集合、またはライブラリの実行の際に最適化を行うパラメータの第３の集合のうちの少なくとも一つに含まれるように設定される。
【００７６】
ここで、性能情報パラメータが何らかの意味で最適化可能であるならば、インストール時、ライブラリ実行前、ライブラリ実行の際のいずれかにおいて最適化することは、常に可能である。また、性能情報パラメータを、上述の第１〜第３のうちから選択された少なくとも一つ以上の集合に含まれるように設定する具体的な構成には、ある程度任意性があるが、その構成はどのように選択してもよい。
【００７７】
そして、コンピュータは、第１〜第３の集合について、それぞれ最適化を行う。したがって、性能情報パラメータの全てが最適化可能となり、汎用な処理に適用できる。すなわち、複数のルーチンを含んだライブラリ全体に対する最適化が可能となる。
【００７８】
一方、従来の最適化法は、ソフトウェアインストール時にパラメータ最適化を行うもの、またはライブラリ実行時にパラメータ最適化を行うもの、のいずれか一方しかなかった。このため、問題によっては、インストール時にしか最適化できない、または実行時にしか最適化できないものがあるので、全ての問題に対して汎用することができなかった。
【００７９】
なお、本発明に係るプログラムを、最適化すべきパラメタに関して、インストール時、実行前、実行時の３種のパラメタに分離し、それぞれのパラメタ最適化を行うソフトウェアである、と表現することもできる。
【００８０】
本発明に係る記録媒体は、上記課題を解決するための、上述のいずれかのプログラムを記録したコンピュータ読み取り可能な記録媒体である。
【００８１】
この記録媒体がコンピュータにて読取られると、上述のいずれかのプログラムがコンピュータにて実行される。したがって、上述のプログラムと同様の効果を得ることができる。
【００８２】
なお、記録媒体の構成としては、ハードディスク、CD ROM(Read Only Memory)などに限るものではなく、どのような記録媒体であってもよい。
【００８３】
また、本発明に係るコンピュータは、上記課題を解決するために、上述の記録媒体を備えている構成である。
【００８４】
このコンピュータにて上述の記録媒体を読み取りすると、上述のいずれかのプログラムがコンピュータにて実行される。したがって、上述のプログラムと同様の効果を得ることができる。
【００８５】
なお、このコンピュータは、コンピュータ内に複数のプロセッサを有する並列計算装置であってもよいし、または、複数のコンピュータがネットワークに接続されて複数のプロセッサを有する計算装置として機能する分散計算装置であってもよい。
【００８６】
また、上述のコンピュータは、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの調整を行う調整方法において、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う手順とを含んでいる調整方法を実行するものである、と表現することもできる。
【００８７】
また、上述のコンピュータは、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの調整を行う調整方法において、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順を含んでいる調整方法を実行するものである、と表現することもできる。
【００８８】
また、上述のコンピュータは、上記調整方法を実行することによって、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を行う調整装置として機能する。また、上述のコンピュータは、上述のプログラムとライブラリとを備えた計算装置として機能する。
【００８９】
なお、上述の構成において、性能情報パラメータの最適化とは、性能情報パラメータの全てを最適化するものではなく、最適化が可能なもののうち、適当なものについて最適化を行うことを意味する。
【００９０】
【発明の実施の形態】
本発明の一実施の形態について図１ないし図３に基づいて説明すると以下の通りである。
【００９１】
計算装置（コンピュータ）１は、図２に示すように、プロセッサ２、ユーザライブラリ（ライブラリ）３、パラメータ調整層４、およびパラメータ情報ファイル５を備えている。また、計算装置１は、図示しない記録媒体を備えている。計算装置１は、外部から入力されるパラメータを用いてライブラリ３中のサブルーチンを呼び出して、計算を行う。計算結果は図示しない表示装置に出力される。
【００９２】
プロセッサ２は、計算を行うため計算処理部である。プロセッサ２は、図示しないnprocs個のプロセッサを内部に備えている。計算装置１は、プロセッサ２の複数のプロセッサを用いて、並列計算装置として機能する。
【００９３】
ライブラリ３は、数値計算ライブラリである。ライブラリ３は少なくとも一つ以上のサブルーチンを含んでいる。本実施形態のライブラリ３は、内部に複数のサブルーチン３ａ〜３ｋを備えている。
【００９４】
このライブラリ３やサブルーチン３ａ〜３ｋには、なんらかの方法（専用記述言語など）を用いて、パラメータを記述してアクセスする。このパラメータのうちの一部は、外部からユーザによってライブラリ３へ直接入力される。また、パラメータの他の一部は、ライブラリ３内で使われる。また、パラメータのさらに他の一部は、パラメータ調整層４を介してライブラリ３に入力される。
【００９５】
このライブラリ３は、ユーザによって開発された数値計算ライブラリであるが、これに限るものではなく、例えばライブラリ開発者によって開発されたシステムライブラリであってもよい。このような、ＭＰＩ(Message Passing Interface)などの計算機環境やＯＳ（Operating System）などであらかじめ用意されているライブラリ等についても、ソフトウェアインタフェースさえ周知であれば、ユーザやライブラリ開発者がパラメータ記述を行うことによって、パラメータ調整層４にパラメータ情報を引き渡すことができる。
【００９６】
なお、ライブラリの備えるサブルーチンの内容、個数などについては、特に限定されない。また、計算装置１には、ライブラリ以外のプログラムが備えられていてもよく、そのプログラムによって他の機能が実現されてもよい。
【００９７】
パラメータ調整層４は、ライブラリ３の用いるパラメータを調整する調整装置として機能する。パラメータ調整層４は、ライブラリ３に入力するパラメータの一部を調整した上で、ライブラリ３に入力する。パラメータ調整層４は、インストール時最適化層（Installation Optimization Layer:ＩＯＬ）４ａ、実行前最適化層（Before Execution-invocation Optimization Layer:ＢＥＯＬ）４ｂおよび実行時最適化層（Run-time Optimization Layer:ＲＯＬ）４ｃを含んでいる。これらの各層の機能については後述する。
【００９８】
パラメータ情報ファイル５は、パラメータ調整層４において調整されたパラメータを保存するためのファイルである。
【００９９】
本実施形態の計算装置１において、ライブラリ３は、図示しない記録媒体に記録されたプログラムが読み取られ、実行されることによって実現される機能である。また、パラメータ調整層４も、図示しない記録媒体に記録されたプログラムが読み取られ、実行されることによって実現される機能である。
【０１００】
上記構成の計算装置１について、以下でより詳細に説明する。
【０１０１】
計算装置１を用いてユーザがライブラリ３を実行する際には、所望のサブルーチン３ａに対して適当なパラメータを設定した上で実行指示をする。
【０１０２】
ここで、サブルーチン３ａに対して設定されるパラメータには、計算装置１の実行性能のみを変化させて、ライブラリ３のサブルーチン３ａの出力を変化させないパラメータが含まれる。以下では、このようなパラメータを、性能情報パラメータ(Performance Parameters :ＰＰ)と呼ぶ。
【０１０３】
また、サブルーチン３ａに対して設定されるパラメータのうち、計算装置１の実行性能とライブラリ３のサブルーチン３ａの出力とを共に変化させるようなパラメータを、以下では基本情報(Basic Parameters :ＢＰ)パラメータと呼ぶ。
【０１０４】
例えば、数値計算ライブラリに含まれるサブルーチン３ａが、行列の固有値を計算する固有値計算サブルーチンであるとする。このとき、所望の行列の実体や、その行列のサイズなどは、基本情報パラメータＢＰに相当する。また、計算装置１の行列計算におけるループアンローリング段数は、性能情報パラメータＰＰに相当する。
【０１０５】
計算装置１においては、与えられた基本情報パラメータＢＰを用いて、性能パラメータＰＰを最適化することによって、所望の結果を最小の時間で得ることができる。性能情報パラメータＰＰ、基本情報パラメータＢＰは、パラメータ調整層４を介してライブラリ３に入力される。性能情報パラメータＰＰおよび基本情報パラメータＢＰ以外のパラメータは、計算装置１の外部からライブラリ３に直接入力されるか、またはライブラリ３の内部で用いられる。
【０１０６】
本実施形態のパラメータ調整層４は、図１に示すように、調整可能なパラメータである性能情報パラメータＰＰを最適化するために、インストール時最適化層４ａ、実行前最適化層４ｂ、実行時最適化層４ｃの各層を備えている。各層４ａ〜４ｃはパラメータを自身で保持することはなく、パラメータ情報ファイル５に保存する。
【０１０７】
インストール時最適化層（ＩＯＬ）４ａは、ライブラリ３のインストール時に最適化を行う。
【０１０８】
インストール時最適化層４ａは、例えば図３（ａ）に示すように、ライブラリ３のインストール時に（Ｓ１）、性能情報パラメータＰＰのうちの一部であるインストール時最適化パラメータ（ＩＯＰ）を最適化し（Ｓ２）、得られたパラメータ（ＩＯＰ）をパラメータ情報ファイル５に出力する。
【０１０９】
なお、ライブラリ３のインストール時には、通常は、基本情報パラメータＢＰが定まっていることはない。このため、インストール時最適化層４ａは、例えば基本情報パラメータＢＰの値を適当にサンプリングして、そのサンプリングした抽出点ごとに、適当に定義したコスト定義関数を最小化するパラメータを決定する。そして、適当なモデル式によって、サンプリングした抽出点と抽出点との間のデータについて補間する。
【０１１０】
実行前最適化層（ＢＥＯＬ）４ｂは、ユーザが指定する特定パラメータ（例えば問題サイズなど）の指定後に最適化を行う。
【０１１１】
実行前最適化層４ｂは、基本情報パラメータＢＰの入力に応じて、これを用いて、性能情報パラメータＰＰのうちの一部である実行前最適化パラメータＢＥＯＰを最適化する。例えば図３（ｂ）に示すように、ユーザ指定パラメータとしての基本情報パラメータＢＰの定義（入力）に応じて（Ｓ４）、パラメータ情報ファイル５のパラメータ（ＩＯＰ）を参照して（Ｓ５）、最適化を行い（Ｓ６）、得られた最適化パラメータ（ＢＥＯＰ）をパラメータ情報ファイル５に出力する。
【０１１２】
なお、実行前最適化層４ｂは、ユーザによって指定された基本情報パラメータＢＰを用いて、最適なパラメータを得るために、実測にて試行をする。
【０１１３】
実行時最適化層（ＲＯＬ）４ｃは、インストール時最適化層４ａまたは実行前最適化層４ｂの少なくとも一方によるパラメータ最適化が終了した後で、かつ対象のライブラリ（やルーチン）の実行時に、最適化を行う。
【０１１４】
実行時最適化層４ｃは、例えば図３（ｃ）に示すように、ライブラリ３（ライブラリ３のサブルーチン３ａ）の実行指示を検出すると（Ｓ８）、既に設定された性能情報パラメータＰＰを参照して（Ｓ９）、この性能情報パラメータＰＰによる計算が所望の精度を満たしていないときには、最適化を再度行う（Ｓ１０）。Ｓ１０においては、計算が所望の精度を満たすような、最適なパラメータＰＰが得られるまで計算を繰り返す。
【０１１５】
このように、実行時最適化層４ｃは、既に設定された性能情報パラメータＰＰを参照して、例えば十分な精度が得られるような所定の場合には、最適化のための計算を行わない。
【０１１６】
以上のように、本実施形態のパラメータ調整層４においては、インストール時最適化層４ａにて最適化したパラメータ情報ＩＯＰは、パラメータ情報ファイル５に保存され、実行前最適化層４ｂと実行時最適化層４ｃとで参照可能となっている。また、実行前最適化層４ｂにて最適化したパラメータ情報ＢＥＯＰは、パラメータ情報ファイル５に保存され、実行時最適化層４ｃで参照可能となっている。
【０１１７】
ここで、性能情報パラメータＰＰの各要素は、パラメータ（ＩＯＰ）、パラメータ（ＢＥＯＰ）、パラメータ（ＲＯＰ）の各集合のうちの少なくとも一つに含まれている。すなわち、性能情報パラメータＰＰの各要素は、パラメータ調整層４の各層４ａ〜４ｃのために、重複を許して、３つの部分集合（ＩＯＰ、ＢＥＯＰ、ＲＯＰ）に分解される。これを式で表現すると、以下のようになる。
ＰＰパラメータ＝ＩＯＰ ∪ ＢＥＯＰ ∪ ＲＯＰ …（式１）
したがって、本実施形態の計算装置１は、パラメータ調整層４を用いて、性能情報パラメータＰＰに含まれる全ての要素を、上述したタイミングのいずれかにて最適化できる。
【０１１８】
特に、本実施形態の計算装置１は、問題に応じた例えば行列サイズ（ｎ）のような基本情報パラメータＢＰが定まると、実際の計算の実行前の時点で最適化を行う実行前最適化層４ｂを備えている。これによって、従来の計算装置よりも正確な最適化が可能となる。
【０１１９】
〔実施例１〕
以下では、本実施形態の計算装置１について、数値計算ライブラリの具体的な一例を用いて説明する。また、この実施例１では、サブルーチン３ａのインストール時最適化層４ａによる最適化について説明する。
【０１２０】
ここでは一例として、ユーザライブラリ３が固有値計算ライブラリであり、サブルーチン３ａがサブルーチンPEigVecCalである場合について説明する。
【０１２１】
固有値計算ライブラリのサブルーチンPEigVecCalは、実数対称行列において固有値、固有ベクトルを求める際にしばしば用いられる、Householder二分・逆反復法によるサブルーチンを意味する。なお、この実施例では実行時間に関し最適化するので、最適化すべきコスト定義関数は実行時間である。
【０１２２】
サブルーチンPEigVecCalは、以下のような構成である。
call PEigVecCal(A, x, lambda, n, nproc, myid, iDistInd, imv, iud, ihit, icomm, kbi, kort, MAXITER, deps, …)
サブルーチンPEigVecCalにおける各引数は、パラメータ提示によるソフトウェア構成方式により提示された、ソフトウェアパラメータである。
【０１２３】
各パラメータについて説明すると、以下のようになる。
（ｉ）A, x, lambda, nの各パラメータは、行列情報などを表す基本情報パラメータＢＰに相当する。より詳細には、Aは固有値を求める対象となる行列の実体に相当する。xは、固有ベクトルに相当する。lambdaは固有値に相当する。nは行列のサイズに相当する。計算を行う際には、Aとnとを指定してサブルーチンPEigVecCalを呼び出すと、必要に応じてx,lambdaについての結果が得られるようになっている。
（ii）nproc, myid, iDistIndの各パラメータは、並列制御のためのパラメータである。例えばnprocは、プロセッサ２に含まれる、図示しない独立のプロセッサの数を表す。
（iii）imv, iud, ihit, icomm, kbi, kortは、アンローリング段数などの、処理手順に関連して性能に影響するパラメータである。
（iv）MAXITER, deps, …はアルゴリズムに影響するパラメータであり、以下では解法情報パラメータとよぶ。
このうち、(ii)(iii)のうちの一部が、性能情報パラメータ（ＰＰ）に相当する。
【０１２４】
より詳細には、Householder二分・逆反復法では、主要なソフトウェアパラメータは、基本情報パラメータＢＰ＝｛n｝、性能情報パラメータＰＰ＝｛imv, iud, ihit, kbi, kort, nproc]のようになる。
【０１２５】
なお、サブルーチンPEigVecCalは、より詳細には、以下の４種のサブルーチンと性能情報パラメータＰＰで構成されているとする。
・HosehoIder三重対角化ルーチン：ＰＰ＝[imv, iud],
・二分法ルーチン：ＰＰ＝｛kbi],
・逆反復法ルーチン：ＰＰ＝｛kort],
・Householder逆変換ルーチン：ＰＰ＝｛ihit]。
【０１２６】
ここで、上述した性能情報パラメータＰＰの詳細について説明する。各性能情報パラメータＰＰの定義域についても示す。
・imv＝[1,2,…,16]
imvは、Householder三重対角化で必要な行列・ベクトル積（２重ループ）のうち、最外ループアンローリング段数を指定するものである。
・iud＝[1,2,…,16]
iudは、HousehoIder三重対角化で必要な行列更新処理（２重ループ）のうち、最外ループアンローリング段数を指定するものである。
・kbi＝[vec, non-vec]
kbiは、二分法で必要な処理にっいて、ベクトル向きかそうでないか、の実装方式を指定するものである。
・kort＝[CG-S, MG-S, IRCG-S, NoOrt]
kortは、逆反復法中で密集固有値に対する固有ベクトル計算で必要な、再直交化処理の実装方式を指定するものである。より詳細には、CG-Sは、古典Gram-Schmidt法で固有ベクトルを再直交化することを意味する。また、MG-Sは、修正Gram-Schmidt法で固有ベクトルを再直交化することを意味する。また、IRCG-Sは、反復改良古典Gram-Schmidt法で固有ベクトルを再直交化することを意味する。また、NoOrtは、全く再直交化をしないことを意味する。
・ihit＝[1,2,…,16]
ihitは、Householder逆変換で必要な処理(2重ループ)のうち、最外ループアンローリング段数を指定するものである。
【０１２７】
以下では、固有値計算ライブラリPEigVecCalに対する、インストール時最適化層４ａによる最適化について説明する。
【０１２８】
本実施例では、インストール時最適化層４ａにおいて最適化するインストール時最適化パラメータＩＯＰを、ＩＯＰ＝｛ｉｍｖ，ｉｕｄ，ｋｂｉ，ｉｈｉｔ｝とする。これらのパラメータ｛ｉｍｖ，ｉｕｄ，ｋｂｉ，ｉｈｉｔ｝は、インストール先の計算機アーキテクチャやコンパイラなどの計算機環境が決まった時点で、その情報（レジスタ数、キャッシュサイズ、ベクトル機構など）から決まるパラメータである。このため、インストール時に最適化することが好ましい。
【０１２９】
パラメータＩＯＰの最適化について、パラメータiudを例にして説明する。他のパラメータについての最適化も同様であり、説明は省略する。
【０１３０】
まず、このインストール時においては、基本パラメータであるサイズnは定まっていない。そこで、問題サイズnに関して、適当なサンプリング点｛200, 400, 800, 2000, 4000, 8000｝を定める。なお、プロセッサ台数nprocは８台と仮定する。
【０１３１】
また、パラメータiudについても、以下のサンプリング点｛1, 2, 3, 4, 8, 16｝を定める。これは、定義域全体での計算は避けて、計算量を削減するためである。十分な計算時間を取ることができる場合には、定義域全体にわたって計算してもよい。
【０１３２】
そして、各サンプリング点において、適当に試行を行って、実行コストである計算時間を測定する。その後、各サンプリング点における値を補間するような、適当なコスト定義関数を決定する。これによって、定義域全域にわたるコストが推定できる。
【０１３３】
なお、この実施例においては、パラメータiudの最適化実行時間（＝コスト定義関数）を、パラメータiudについて多項式近似する。なお、補間に用いる近似関数は、多項式近似に限るものではなく、他の関数を用いてもよい。
【０１３４】
並列計算機の一例を用いた、各サンプリング点における実行時間の測定結果（単位：秒）を以下の表１に示す。
【０１３５】
【表１】

【０１３６】
この測定結果に対して、基本情報パラメータであるｎを固定して、パラメータiudに関する関数ｆ_n（iud）を推定する。
【０１３７】
ここで、f_n(iud)として、５次の多項式f_n（iud）＝a1×iud⁵＋a2×iud⁴＋a3×iud³＋a4×iud²＋a5×iud＋a6 を仮定する。これに対して、適当な手法を用いて係数a1,…,a6を決定できる。ここでは、最小二乗法を用いて各係数を決定した。なお、各係数を決定する手法はこれに限るものではない。
【０１３８】
以下の表２に、表１のデータをサンプル点として最小二乗法を用いて係数を決定した結果を示す。
【０１３９】
【表２】

【０１４０】
この表２から、iudの定義域｛1, 2, …, 16｝で最小となるiudの値を、サンプリングした各問題サイズにおいて決定できる。
【０１４１】
また、問題サイズｎについては、以下のように補間を行う。
【０１４２】
まず、表２によりiud全ての領域について評価値を得ることができる。したがって定義域｛1, 2, …, 16｝全てにおいて、iudを固定し問題サイズｎをサンプル点[200, 400, 800, 2000, 4000, 8000]だけ変化させた評価値を計算できる。
【０１４３】
そこで、これらの評価値を新たなサンプル点とみなして、関数f_iud（n）を最小二乗法により推定する。f_iud(n)について、５次多項式f_iud(n)＝a'1×n⁵＋a'2×n⁴＋a'3×n³＋a'4×n²＋a'5×n＋a'6を仮定する。ｎに関するサンプル点[200, 400, 800, 2000, 4000, 8000]だけ変化させた評価値による結果を求める。その結果から、ｎの関数f_iud(n)がiudに関する定義域｛1, 2, …, 16｝で定まるので、実行時に指定されたｎを代入することで、最小となるiudが決定できる。
【０１４４】
このように、サンプルされた問題サイズｎに関して、実行時に全く同じ値が指定される保証はないので、上述のように最適なパラメータを推定する。推定したパラメータをパラメータ情報ファイル５に保存しておく。また、推定するパラメータのための情報、ここでは例えば各係数なども、パラメータ情報ファイル５に保存しておく。これによって、後の最適化において、パラメータ情報ファイル５を参照して、情報を得ることができる。
【０１４５】
〔実施例２〕
次に、実施例１にて説明したサブルーチンPEigVecCalに対する、実行前最適化層４ｂによる最適化について説明する。
【０１４６】
本実施例においては、基本情報パラメータである問題サイズｎが、ｎ=8192と定まったとする。なお、プロセッサ台数nprocsは4であるとする。
【０１４７】
ここで、実行前最適化層４ｂによって最適化する、実行前最適化パラメータ（ＢＥＯＰ）として、ＢＥＯＰ＝｛imv, iud, ihit, kbi｝とする。
【０１４８】
このように、本実施例においては、ＢＥＯＰは上述の実施例１におけるＩＯＰと同じとなる。しかしながら、本実施例においては、基本情報パラメータｎが定まった後に最適化を行うので、上述のような補間を行う必要がなく、ＢＥＯＰについて実測した確実な最適値を得ることができるという違いがある。
【０１４９】
すなわち、インストール時における最適化においては、サイズｎに関するサンプル標本点以外は、補間などによる推定でコスト定義関数のパラメータ決定をしていた。また、プロセッサ数nprocの値は仮定した値を用いていた。
【０１５０】
このように、例えばインストール時のみに最適化を行う従来の構成では、実行前最適化層がないため、推定値からパラメータを決定するしかない。
【０１５１】
一方、本発明による、実行前における最適化では、所望のサイズｎについて、実測でパラメータ決定をする。このため、実行前最適化によって、インストール時の最適化よりもパラメータの精度を高めることができる。したがって、実行前最適化層４ｂによる最適化は、パラメータ推定に誤差があると致命的になるような場合であっても、用いることができる。また、例えばパラメータ情報ファイル５を参照して、インストール時最適化の結果を利用して、計算時間を削減することもできる。
【０１５２】
なお、この実行前における最適化は、実際に用いる所望のサイズｎについて、例えば上述した実施例１の表１と同様に実測し、表２のように係数を得て、最適なiudを求めることによって行われる。手順の詳細については実施例１と同様であるので、省略する。
【０１５３】
以上の実施例から、本発明を実施することで従来よりも高度なパラメータ調整機構が提供される。
【０１５４】
なお、従来のインストール時最適化と本発明における実行前最適化層４ｂの機能の違いは、以下のようなものである。
【０１５５】
【表３】

【０１５６】
〔実施例３〕
次に、実施例１、２にて説明したサブルーチンPEigVecCalに対する、実行前最適化層４ｂによる最適化の他の例について説明する。
【０１５７】
ここでは、ユーザが係数行列について、ライブラリコールの時点で変化しない、という情報を知っているとする。すなわち、問題サイズｎについては確定しているものとする。
【０１５８】
このとき、実行前最適化層４ｂによって最適化する実行前最適化パラメータＢＥＯＰとして、ＢＥＯＰ＝[imv, iud, ihit, kbi, kort]とする。すなわちこの問題の場合、固有値問題において、逆反復法での直交化処理（パラメータkort）まで最適化できる。また、この実施例においては、プロセッサ数nprocsについても最適化する。
【０１５９】
なお従来法では、実行前最適化層がないため、本実施例ではパラメータ最適化が適用できない。
【０１６０】
ここでは、行列（Frank行列）のサイズがｎ=10,000と与えられたとする。並列計算機の一例を用いて、パラメータkort, プロセッサ数nprocsについて実測を行った実行時間（単位：秒）の結果を、以下の以下の表４に示す。なお、記号>は、所定の制約時間中に収束せず、実行が終了しなかったことを示す。
【０１６１】
【表４】

【０１６２】
また、以下の表５には、逆反復法での各直交化方式による、固有ベクトルの直交精度を示す。単位は、Frobeniusノルムであり、８ＰＥのＭＧ−Ｓにおける最大残差ベクトルmax_i( |(A x)_i - lambda_i x_i|² )=1.61E-7とする。
【０１６３】
【表５】

【０１６４】
表４と表５とから、本実施例では、直交化方式の違いにより実行時問が異なるが、直交精度が1.5E-12以下であるなら、CG-S法が速度と精度の観点からよいことが分かる。したがって、例えばユーザが直交精度の上界をシステムに引き渡せば、ＢＥＯＬによってパラメータkortをCG-Sに固定できる。また、最適なプロセッサ数nprocsについても確定できる。
【０１６５】
以上の実施例から、本発明を適用することで従来よりも高度なパラメータ調整機構が提供される。
【０１６６】
また、本実施例における計算では、パラメータ情報ファイル５に保存された情報を参照して、計算量を削減してもよい。
【０１６７】
〔実施例４〕
次に、実施例１〜３にて説明したサブルーチンPEigVecCalに対する、実行時最適化層４ｃによる最適化の例について説明する。
【０１６８】
ここで、この実施例４においては、行列の実体Aや、行列サイズｎが変化しうる状態であるとする。すなわち、行列サイズや行列データは実行時に固定されないとする。このような場合には、最適な直交化方式は、ユーザの与えた条件と実行時にならないと固定されない行列の特性とに、実際には依存する。
【０１６９】
ここで、実行時最適化層４ｃによって最適化する実行時最適化パラメータＲＯＰを、ＲＯＰ＝｛kort｝とする。
【０１７０】
また、実施例３にて説明した実行前最適化層４ｂによる最適化によって、過去の直交化適用例としてユーザの精度要求と合致するパラメータkortが、最適パラメータとしてパラメータ情報ファイル５に保存されているとする。
【０１７１】
そこで、実行時最適化層４ｃにおいては、まずパラメータ情報ファイル５に保存されているパラメータkortを参照して、最もよさそうな直交化方式を選ぶ。
【０１７２】
次に、実行時最適化層４ｃは、計算の精度がユーザ指定の基準を満たすか否かを判別する。そして、指定された精度を満たしていないときには、実施例３にて説明したように、パラメータkortの各値について実測を行って、パラメータの再調整を行う。そして、指定された精度が得られる、最適なパラメータkortが選択できるまで、判別と計算とを繰り返す。これによって、ユーザ指定の精度をシステム側で保証することができる。なお、計算の詳細については上述の実施例３と同様であるのでここでは省略する。
【０１７３】
なお、従来法においては、行列サイズや行列データが実行時に固定されないならば、アルゴリズム上の理由から最適パラメータは決定できない。一般的に従来法では、精度に関して保証するため、コストにかかわらずMG-Sを強制選択する場合が多い。この場合には、上述の表４にて示したように、コストが非常に不利になる。
【０１７４】
一方、本発明では、システムにユーザから与えたれた情報（直交精度など）を引き渡すことで、上述の場合においてもパラメータ調整が適用可能となる。また、コストについても最適化が可能となる
以上の実施例では、本発明特有の実行前最適化層４ｂと実行時最適化層４ｃとが、ソフトウェア構成方式として存在しないとできない。したがって本発明は、従来法に対してパラメータ調整の適用範囲が広い。
【０１７５】
なお、ここでは、このパラメータ情報ファイル５に保存された、実行前に最適化された情報を参照する構成について説明したが、可能であれば、パラメータ情報ファイル５に保存された、インストール時に最適化された情報を参照する構成であってもよい。このように、インストール時最適化層４ａと実行時最適化層４ｃとの組合せによって実現することも可能である。
【０１７６】
以上の各実施例にて説明したように、本実施形態に係るプログラムは、基本情報パラメータＢＰが定まると、それに応じて性能方法パラメータの最適化を、実際のライブラリの実行前に行う構成である。
【０１７７】
したがって、インストール時に最適化したパラメータをより精密に再調整することができ、または実行時に最適化する際の計算時間を削減して十分な最適化時間を確保することができる。これによって、より精密かつ確実なパラメータ調整が可能となる。
【０１７８】
また、性能情報パラメータＰＰの各要素は、インストール時、ライブラリ実行前、ライブラリ実行時のいずれかにおいて最適化されるようになっている。すなわち、また、インストール時、ライブラリ実行時に加えて、ライブラリ実行前においても最適化を行うので、あらゆる問題が最適化できる、汎用性が高いパラメータ調整機能を提供できる。
【０１７９】
なお、上述の実施の形態においては、並列計算装置としての計算装置１について説明をしたが、本発明はこれに限るものではなく、プロセッサ２がネットワークにて接続された複数の計算装置に備えられたものである分散計算装置であってもよい。
【０１８０】
また、上述の実施の形態においては、ライブラリが固有値計算ライブラリであり、サブルーチンがサブルーチンPEigVecCalである場合についてのみ説明を行ったが、これに限るものではなく、他のライブラリ、サブルーチンについても適用できるのはもちろんである。
【０１８１】
また、以上のように、この発明は、性能やコスト等に関するソフトウェア上のパラメータを自動調整するソフトウェア（自動チューニングソフトウェア）において、性能等の諸コストを考慮しパラメータ調整を行う機構があり、かつその調整機構の適用に関し範囲が広いソフトウェア構成方式に関するものである。また、本発明は、インストール時、実行前、実行時の最適化層を有するソフトウェア構成方式に関するものである。また、本発明では上述の式１のように、パラメータを３種類に分離して用いる。
【０１８２】
ここで、従来の自動チューニングソフトウェアの構成方式では、例えば図４（ａ）に示すようにソフトウェアインストール時にパラメータ最適化を行うもの、または例えば図４（ｂ）に示すようにライブラリ実行時にパラメータ最適化を行うもの、のみ存在していた。これらのソフトウェア構成方式では、汎用的な処理に適用できない、パラメータ調整が不十分となる場合がある、という問題がある。また図４（ａ）（ｂ）から分かるように、従来の自動チューニングではパラメータは１種類であった。
【０１８３】
そこで本発明においては、より汎用的な処理においてパラメータ調整が適用でき、かつ従来よりも高度なパラメータ調整機構を有するソフトウェア構成方式によって課題の解決をねらうものである。
【０１８４】
特に、本実施形態の計算装置１は、問題に応じた例えば行列サイズ（ｎ）のような基本情報パラメータＢＰが定まると、実際の計算の実行前の時点で最適化を行う実行前最適化層４ｂを備えている。これによって、従来の計算装置のような、ＩＯＬ、またはＲＯＬ単独の場合よりもより正確な最適化が可能となる。
【０１８５】
なお、従来の技術における非特許文献１（Ｐ６２）には、自動チューニングとして『(i)実行時自動チューニング(ii)実行前自動チューニング』の二つがある点が記載されている。しかしながら、この非特許文献１における『実行前自動チューニング』は、本発明における上述の『実行前自動チューニング』とは異なるものであり、本発明においては『インストール時最適化』に相当するものである。
【０１８６】
本発明は上述した実施形態、実施例に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施例にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても、本発明の技術的範囲に含まれる。
【０１８７】
上述の具体的な実施形態または実施例は、あくまでも、本発明の技術内容を明らかにするものであって、本発明はそのような具体例にのみ限定して狭義に解釈されるべきものではなく、特許請求の範囲に示した範囲で種々の変更が可能であり、変更した形態も本発明の技術的範囲に含まれる。
【０１８８】
【発明の効果】
本発明に係るプログラムは、以上のように、ライブラリのパラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する手順と、上記基本情報パラメータを用いて性能情報パラメータの最適化を行う手順とを含んでいる構成である。
【０１８９】
それゆえ、実際の計算の前に実行コストを予め実測して最適な性能情報パラメータを得るようにして、より精密かつ確実なパラメータ調整が可能となるという効果を奏する。
【０１９０】
本発明に係るプログラムは、以上のように、ライブラリのインストール時に性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリのパラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記初期設定手順において設定された上記性能情報パラメータを参照して、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順とを含んでいる構成である。
【０１９１】
それゆえ、ライブラリの実際の実行の前に、インストール時に設定された性能情報パラメータを参照して、基本情報パラメータを用いて性能情報パラメータの最適化を行うので、試行計算の回数を削減して最適な性能情報パラメータを得ることができるという効果を奏する。
【０１９２】
本発明に係るプログラムは、以上のように、ライブラリのパラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記基本情報パラメータを用いて性能情報パラメータの最適化を行う前調整手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいる構成である。
【０１９３】
それゆえ、再調整手順にて精度を確認するので、所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できるという効果を奏する。
【０１９４】
本発明に係るプログラムは、以上のように、ライブラリのインストール時に性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリのパラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいる構成である。
【０１９５】
それゆえ、ライブラリの実際の実行の際に、既に設定された性能情報パラメータによって所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できるという効果を奏する。
【０１９６】
本発明に係るプログラムは、以上のように、ライブラリのインストール時に性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリのパラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった時点を検出する検出手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいる構成である。
【０１９７】
それゆえ、ライブラリの実際の実行の前に、最適な性能情報パラメータを得ることができるという効果を奏する。また、ライブラリの実際の実行の際に、既に設定された性能情報パラメータによって所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できるという効果を奏する。
【０１９８】
本発明に係るプログラムは、以上のように、性能情報パラメータの各要素を、ライブラリのインストール時に最適化を行うパラメータの第１の集合、上記ライブラリの実行の前に最適化を行うパラメータの第２の集合、または上記ライブラリの実行の際に最適化を行うパラメータの第３の集合のうちの少なくとも一つに含まれるように設定して、第１の集合の要素を最適化する機能と、第２の集合の要素を最適化する機能と、第３の集合の要素を最適化する機能とを上記コンピュータに実現させる構成である。
【０１９９】
それゆえ、性能情報パラメータを、インストール時、ライブラリ実行前、ライブラリ実行の際のいずれかにおいて最適化するので、性能情報パラメータの全てが最適化可能となり、汎用な処理に適用できるという効果を奏する。
【０２００】
本発明に係る記録媒体は、以上のように、上述のいずれかのプログラムを記録したコンピュータ読み取り可能な記録媒体である。
【０２０１】
それゆえ、上述のプログラムと同様の効果を奏する。
【０２０２】
また、本発明に係るコンピュータは、以上のように、上述の記録媒体を備えている構成である。
【０２０３】
それゆえ、上述のプログラムと同様の効果を奏する。
【図面の簡単な説明】
【図１】本発明に係るコンピュータの一実施形態の一部を示すブロック図である。
【図２】上記コンピュータを示すブロック図である。
【図３】（ａ）はインストール時最適化の手順を示すフローチャートであり、（ｂ）はライブラリ実行前最適化の手順を示すフローチャートであり、（ｃ）はライブラリ実行時最適化の手順を示すフローチャートである。
【図４】（ａ）は従来のコンピュータの一例の一部を示すブロック図であり、（ｂ）は従来のコンピュータの他の一例の一部を示すブロック図である。
【符号の説明】
１計算装置（コンピュータ）
２プロセッサ
３ユーザライブラリ（ライブラリ）
４パラメータ調整層
４ａインストール時最適化層
４ｂ実行前最適化層
４ｃ実行時最適化層
５パラメータ情報ファイル
ＩＯＰインストール時最適化パラメータ（性能情報パラメータ）
ＢＥＯＰ実行前最適化パラメータ（性能情報パラメータ）
ＲＯＰ実行時最適化パラメータ（性能情報パラメータ）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a program, a recording medium, and a computer for causing a computer to optimize a parameter of a library provided in the computer, for example.
[0002]
[Prior art]
When software such as a numerical calculation library is used in a computing device such as a computer, a user designates parameters for a desired subroutine in the library according to a desired problem. Thereafter, the subroutine is executed using the parameters designated by the user, and the result is output.
[0003]
For example, an eigenvalue calculation subroutine for calculating eigenvalues of a matrix is considered as a subroutine of a numerical calculation library. At this time, the parameters specified by the user for the subroutine include the substance of the desired matrix and the size of the matrix. These parameters are required for actually solving the problem.
[0004]
On the other hand, the same answer can be obtained regardless of how the parameters are set, but optimization can be performed so that the time required for numerical calculation can be shortened by appropriate setting. There are parameters for
[0005]
For example, when the computing device is a parallel computing device including a plurality of processors, the number of loop unrolling stages (number of unrolling stages) in matrix calculation is a parameter for optimization.
[0006]
Here, the number of unrolling stages means an increment for each loop, which is normally set to 1 in the calculation of the loop. For example, when calculating the sum C (i) of the vector A (i) and the vector B (i), if the number of unrolling stages is set to 2, the sum of i components C (i) = A (i) + B (i) and the sum of i + 1 components C (i + 1) = A (i + 1) + B (i + 1) are respectively calculated in the loop, and then i is increased by 2.
[0007]
By adjusting the number of unrolling stages according to the nature of the problem and the performance of the computing device (such as the number of parallel processors), the computation time in the computing device can be shortened (optimized).
[0008]
There is known a calculation apparatus having an adjustment function that adjusts parameters for optimization to optimize the cost such as calculation time (performance). Such an adjustment function is usually realized by software (automatic tuning software).
[0009]
As a conventional configuration method of automatic tuning software, there is one that performs parameter optimization at the time of software installation. For example, when optimizing the number of unrolling stages at the time of software installation, the following is performed.
[0010]
In this case, since the problem size of the problem to be solved has not been determined, the optimum number of unrolling stages is obtained for each appropriately sampled problem size. Thereafter, the optimum number of unrolling stages for each sampled problem size is interpolated by an appropriate interpolation function for the problem size, for example. Note that when determining the optimum number of unrolling stages for a certain problem size, for example, sampling may be performed on the number of unrolling stages, and the number of unrolling stages having the lowest cost may be selected using an interpolation function.
[0011]
In this manner, for example, by performing modeling using an appropriate interpolation function, an optimal estimated value of the number of unrolling stages according to the problem size can be obtained when selection is performed later.
[0012]
Another example of a conventional configuration method of automatic tuning software is to perform parameter optimization during library execution. For example, such parameter optimization is performed for a problem that includes an element that is not determined only at the time of execution, such as a matrix entity, as a major factor for changing the cost.
[0013]
In this case, when a library call is made, a desired parameter is tried for a desired problem size, matrix entity, etc., and an optimum one is selected.
[0014]
When optimization is performed at the time of executing such a library, it should be noted that the time for performing the tuning is included in the calculation time as a cost. That is, the tuning time and the subsequent calculation time need to be shorter than the calculation time when the parameter is fixed to some value without tuning.
[0015]
For these conventional automatic tuning software, see Non-Patent Document 1 and Non-Patent Document 2 below.
[0016]
For example, Japanese published patent publication “Japanese Patent Laid-Open Publication No. 2000-276454 (published date: October 6, 2000)” has a great influence on the execution performance of software in a parallel computer. A method for configuring software having a function of performing installation by adjusting parameters that do not appear is described. In this case, parameter optimization is performed at the time of software installation.
[0017]
[Patent Document 1]
JP 2000-276454 A
[0018]
[Non-Patent Document 1]
Takahiro Katagiri, 4 others, “Effect of automatic tuning mechanism on parallel numerical computation library”, Journal of Information Processing Society of Japan: High Performance Computing, Information Processing Society of Japan, November 2001, Vol. 42, No. 12 ( HPS4), p. 60-76
[0019]
[Non-Patent Document 2]
Takeshi Naono, Yusaku Yamamoto, “Method of configuring an auto-tuning parallel library having a single memory interface”, Information Processing Society of Japan Research Report, Information Processing Society of Japan, July 25, 2001, Vol. 77, p. 25-30
[0020]
[Problems to be solved by the invention]
However, in the conventional configuration method of automatic tuning software, there is only one that performs parameter optimization at the time of software installation or one that performs parameter optimization at the time of library execution, so parameter adjustment may be insufficient. This causes the problem.
[0021]
That is, in the conventional configuration in which parameter optimization is performed at the time of software installation, an optimization parameter is determined by estimation based on, for example, a model using an interpolation function. For this reason, there is a possibility that sufficient accuracy may not be obtained.
[0022]
Moreover, in the conventional configuration in which parameter optimization is performed at the time of library execution, the time required for parameter tuning at the time of library execution is also a cost, so that sufficient time cannot be spent for tuning and the accuracy is insufficient. There is a risk of becoming something.
[0023]
Further, there has been a problem that there is no automatic tuning software that can be applied to general-purpose processing. For example, ATLAS described in Non-Patent Document 1 can optimize only a library called BRAS (Basic Linear Algebra Subprograms) among numerical calculation libraries. This is not applicable to general-purpose processing.
[0024]
That is, some problems can be optimized only during installation, or can be optimized only during execution. For this reason, if there is only one of the one that performs parameter optimization at the time of software installation or the one that performs parameter optimization at the time of library execution as in the past, it is not possible to optimize all problems individually. Can not. That is, there is a problem that it cannot be applied to general-purpose processing.
[0025]
The present invention has been made in view of the above problems, and an object of the present invention is to provide a program, a recording medium, and a computer that can perform precise parameter adjustment. Another object of the present invention is to provide a program, a recording medium, and a computer that can be applied to general-purpose processing.
[0026]
[Means for Solving the Problems]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, basic information parameters included in the parameters of the library that change both the execution performance and the output of the library are determined. Time And a procedure for optimizing the performance information parameter using the basic information parameter.
[0027]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0028]
The computer on which the above program is executed determines the basic information parameters by, for example, detecting the input of the basic information parameters from the user before the actual execution of the library. Time Is detected.
[0029]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0030]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0031]
That is, when the contents of the library are expressed as mathematical expressions, parameters expressed as variables in the mathematical expressions correspond to basic information parameters. A parameter that does not appear in the mathematical expression or appears as a simple parameter in the mathematical expression corresponds to the performance information parameter. For this reason, for example, even if the performance information parameter is changed, the result obtained by the mathematical expression (output of the library) does not change.
[0032]
Thereafter, the computer optimizes the performance information parameter using the basic information parameter before the actual execution of the library. More specifically, for example, a basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. As a result, an optimal performance information parameter can be obtained reliably.
[0033]
Here, as an example of a conventional optimization program, for example, performance information parameters are optimized when a library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0034]
Another example of a conventional optimization program is to optimize performance information parameters when a library is executed, for example. In this case, the calculation time for optimizing the performance information parameter is included in the execution cost of the library. For this reason, there is a possibility that an optimal parameter cannot be obtained without taking sufficient time for optimization.
[0035]
Therefore, as in the above-described program according to the present invention, the execution cost is measured in advance before the actual calculation to obtain the optimum performance information parameter. As a result, more accurate and reliable parameter adjustment is possible. In addition, the calculation time can be predicted before the program is executed.
[0036]
Information that the user can know about the program according to the present invention has been determined. Time It can also be expressed as software having a parameter optimization function.
[0037]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, an initial setting procedure for optimizing the performance information parameter at the time of installing the library, and a basic for changing both the execution performance and the output of the library included in the parameter of the library Information parameters are determined Time And a pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter with reference to the performance information parameter set in the initial setting procedure. It is a feature.
[0038]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0039]
The computer on which the program is executed optimizes the performance information parameter when the library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0040]
In addition, before the actual execution of the library, the computer determines the basic information parameters by detecting the input of the basic information parameters from the user, for example. Time Is detected.
[0041]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0042]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0043]
Thereafter, before the actual execution of the library, the computer refers to the performance information parameter set at the time of installation and optimizes the performance information parameter using the basic information parameter. More specifically, for example, a basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. In particular, trial calculation may be performed only for values around the optimum value of the performance information parameter set at the time of installation. As a result, the number of trial calculations can be reduced and an optimum performance information parameter can be obtained. In this way, more accurate and reliable parameter adjustment is possible.
[0044]
In addition, information that the user can know about the program according to the present invention was determined at the time of software installation. Time It can also be expressed as software having a parameter optimization function prior to execution of software at.
[0045]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, basic information parameters included in the parameters of the library that change both the execution performance and the output of the library are determined. Time , A pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter, and the performance information parameter already set when the library is executed. When the calculation based on the performance information parameter does not satisfy the desired accuracy, it includes a readjustment procedure for re-optimizing the performance information parameter using the basic information parameter.
[0046]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0047]
The computer on which the above program is executed determines the basic information parameters by, for example, detecting the input of the basic information parameters from the user before the actual execution of the library. Time Is detected.
[0048]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0049]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0050]
Thereafter, the computer optimizes the performance information parameter using the basic information parameter before the actual execution of the library. More specifically, for example, a basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. As a result, an optimal performance information parameter can be obtained reliably.
[0051]
Further, when the library is actually executed, the computer refers to the performance information parameter that has already been set, and determines whether or not the calculation based on the performance information parameter satisfies the desired accuracy. When the desired accuracy is not satisfied, the performance information parameter is optimized again using the basic information parameter. Then, the library is executed using the performance information parameter that can obtain the desired accuracy.
[0052]
As described above, before the actual calculation, the execution cost is measured in advance to obtain the optimum performance information parameter. When the basic information parameter is not changed, the library can be executed using the preset performance information parameter. In addition, even when there is a change in the basic information parameter, if a desired accuracy can be obtained, the library can be executed without performing calculation for parameter optimization. Therefore, the time required for parameter optimization at the time of execution is unnecessary, and the execution cost (calculation time) of the library is not increased. In addition, since the accuracy is checked before the library is executed, more precise and reliable parameter adjustment is possible.
[0053]
Information that the user can know about the program according to the present invention has been determined. Time It can also be expressed as software having a parameter optimization function before and during software execution.
[0054]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, an initial setting procedure for optimizing the performance information parameter at the time of installing the library, and a basic for changing both the execution performance and the output of the library included in the parameter of the library Information parameters are determined Time The basic information parameter is used when the calculation based on the performance information parameter does not satisfy the desired accuracy by referring to the performance information parameter already set during the execution of the library. And a readjustment procedure for re-optimizing the performance information parameter.
[0055]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0056]
The computer on which the program is executed optimizes the performance information parameter when the library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0057]
In addition, before the actual execution of the library, the computer determines the basic information parameters by detecting the input of the basic information parameters from the user, for example. Time Is detected.
[0058]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0059]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0060]
Further, when the library is actually executed, the computer refers to the performance information parameter that has already been set, and determines whether or not the calculation based on the performance information parameter satisfies the desired accuracy. When the desired accuracy is not satisfied, the performance information parameter is optimized again using the basic information parameter. Then, the library is executed using the performance information parameter that provides the desired accuracy.
[0061]
Thus, the performance information parameter is set before the actual calculation. In the actual calculation, if a desired accuracy can be obtained by the performance information parameter, the library can be executed without performing the calculation for optimizing the parameter. Therefore, the time required for parameter optimization at the time of execution is unnecessary, and the execution cost (calculation time) of the library is not increased. In addition, since the accuracy is checked before the library is executed, more precise and reliable parameter adjustment is possible.
[0062]
The program according to the present invention can also be expressed as software having a parameter optimization function at the time of software installation and software execution.
[0063]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, an initial setting procedure for optimizing the performance information parameter at the time of installing the library, and a basic for changing both the execution performance and the output of the library included in the parameter of the library Information parameters are determined Time , A pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter, and the performance information parameter already set when the library is executed. When the calculation based on the performance information parameter does not satisfy the desired accuracy, it includes a readjustment procedure for re-optimizing the performance information parameter using the basic information parameter.
[0064]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0065]
The computer on which the program is executed optimizes the performance information parameter when the library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0066]
In addition, before the actual execution of the library, the computer determines the basic information parameters by detecting the input of the basic information parameters from the user, for example. Time Is detected.
[0067]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0068]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0069]
Thereafter, before the actual execution of the library, the computer refers to the performance information parameter set at the time of installation and optimizes the performance information parameter using the basic information parameter. More specifically, for example, a basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. In particular, trial calculation may be performed only for values around the optimum value of the performance information parameter set at the time of installation. As a result, the number of trial calculations can be reduced and an optimum performance information parameter can be obtained. In this way, more accurate and reliable parameter adjustment is possible.
[0070]
Further, when the library is actually executed, the computer refers to the performance information parameter that has already been set, and determines whether or not the calculation based on the performance information parameter satisfies the desired accuracy. When the desired accuracy is not satisfied, the performance information parameter is optimized again using the basic information parameter. Then, the library is executed using the performance information parameter that provides the desired accuracy.
[0071]
As described above, before the actual calculation, the execution cost is measured in advance to obtain the optimum performance information parameter. When the basic information parameter is not changed, the library can be executed using the preset performance information parameter. In addition, even when there is a change in the basic information parameter, if a desired accuracy can be obtained, the library can be executed without performing calculation for parameter optimization. Therefore, the time required for parameter optimization at the time of execution is unnecessary, and the execution cost (calculation time) of the library is not increased. In addition, since the accuracy is checked before the library is executed, more precise and reliable parameter adjustment is possible.
[0072]
In addition, when installing the program according to the present invention, information that the user can know is determined. Time It can also be expressed as software having a three-level parameter optimization function before and during software execution.
[0073]
In order to solve the above problems, the program according to the present invention has a function of optimizing performance information parameters included in the parameters of the library provided in the computer so that only the execution performance is changed and the output of the library is not changed. In the program to be realized by the computer, each element of the performance information parameter is a first set of parameters to be optimized when the library is installed, and a second parameter to be optimized before the library is executed. Or a function for optimizing the elements of the first set by setting to be included in at least one of the third set of parameters to be optimized when executing the library, A function for optimizing elements of the second set and a function for optimizing elements of the third set; It is characterized in that to realize.
[0074]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized. For example, in the matrix eigenvalue calculation library of the numerical calculation library, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0075]
In the computer on which the program is executed, the performance information parameter includes a first set of parameters to be optimized when the library is installed, a second set of parameters to be optimized before the library is executed, It is set to be included in at least one of the third set of parameters to be optimized during execution.
[0076]
Here, if the performance information parameter can be optimized in any sense, it is always possible to optimize at the time of installation, before library execution, or during library execution. In addition, the specific configuration for setting the performance information parameter so as to be included in at least one set selected from the above first to third is somewhat arbitrary, but the configuration is Any method may be selected.
[0077]
Then, the computer optimizes each of the first to third sets. Therefore, all the performance information parameters can be optimized, and can be applied to general-purpose processing. That is, the entire library including a plurality of routines can be optimized.
[0078]
On the other hand, the conventional optimization method has only one of the parameter optimization at the time of software installation and the parameter optimization at the time of library execution. For this reason, some problems can be optimized only at the time of installation, or can be optimized only at the time of execution, and therefore cannot be used for all problems.
[0079]
Note that the program according to the present invention can be expressed as software that optimizes each parameter by separating the parameter to be optimized into three types of parameters at the time of installation, before execution, and at the time of execution.
[0080]
A recording medium according to the present invention is a computer-readable recording medium on which any of the above-described programs is recorded in order to solve the above problems.
[0081]
When this recording medium is read by a computer, one of the above-described programs is executed by the computer. Therefore, the same effect as the above-described program can be obtained.
[0082]
The configuration of the recording medium is not limited to a hard disk, a CD ROM (Read Only Memory), etc., and any recording medium may be used.
[0083]
In addition, a computer according to the present invention is configured to include the above-described recording medium in order to solve the above problems.
[0084]
When this computer reads the above-described recording medium, any of the above-described programs is executed by the computer. Therefore, the same effect as the above-described program can be obtained.
[0085]
The computer may be a parallel computing device having a plurality of processors in the computer, or may be a distributed computing device that functions as a computing device having a plurality of processors connected to a network. May be.
[0086]
In the adjustment method for adjusting the performance information parameter included in the parameters of the library provided in the computer and changing only the execution performance and not changing the output of the library, the computer may include the parameter of the library. The basic information parameters that change both the execution performance and the output of the above library have been determined. Time It is also possible to express that an adjustment method including a procedure for detecting the performance information and a procedure for optimizing the performance information parameter using the basic information parameter is executed.
[0087]
In the adjustment method for adjusting the performance information parameter included in the library parameters included in the computer and changing only the execution performance and not changing the output of the library, the computer executes the library. In addition, referring to the performance information parameter that has already been set, if the calculation based on the performance information parameter does not satisfy the desired accuracy, readjustment is performed to optimize the performance information parameter again using the basic information parameter. It can also be expressed that the adjustment method including the procedure is executed.
[0088]
In addition, by executing the adjustment method, the computer described above optimizes performance information parameters included in the parameters of the library provided in the computer so that only the execution performance is changed and the output of the library is not changed. Functions as an adjustment device. Further, the above-described computer functions as a computing device including the above-described program and library.
[0089]
In the configuration described above, the optimization of the performance information parameter does not optimize all of the performance information parameters, but means that optimization is performed on an appropriate one among those that can be optimized.
[0090]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described below with reference to FIGS.
[0091]
As shown in FIG. 2, the computer (computer) 1 includes a processor 2, a user library (library) 3, a parameter adjustment layer 4, and a parameter information file 5. The computing device 1 includes a recording medium (not shown). The calculation device 1 performs a calculation by calling a subroutine in the library 3 using parameters input from the outside. The calculation result is output to a display device (not shown).
[0092]
The processor 2 is a calculation processing unit for performing calculations. The processor 2 includes nprocs processors (not shown). The computing device 1 functions as a parallel computing device using a plurality of processors 2.
[0093]
The library 3 is a numerical calculation library. The library 3 includes at least one subroutine. The library 3 of this embodiment includes a plurality of subroutines 3a to 3k.
[0094]
The library 3 and the subroutines 3a to 3k are accessed by describing parameters using some method (dedicated description language or the like). Some of these parameters are directly input to the library 3 by the user from the outside. The other part of the parameters is used in the library 3. Still another part of the parameters is input to the library 3 through the parameter adjustment layer 4.
[0095]
The library 3 is a numerical calculation library developed by a user, but is not limited thereto, and may be a system library developed by a library developer, for example. If a software interface is well known for a computer environment such as MPI (Message Passing Interface) or a library prepared in advance in an OS (Operating System), the user or library developer will describe parameters. Thus, the parameter information can be delivered to the parameter adjustment layer 4.
[0096]
Note that the contents and number of subroutines provided in the library are not particularly limited. The computing device 1 may be provided with a program other than the library, and other functions may be realized by the program.
[0097]
The parameter adjustment layer 4 functions as an adjustment device that adjusts parameters used by the library 3. The parameter adjustment layer 4 adjusts a part of parameters input to the library 3 and then inputs the parameters to the library 3. The parameter adjustment layer 4 includes an installation optimization layer (IOL) 4a, a pre-execution-invocation optimization layer (BEOL) 4b, and a run-time optimization layer (ROL). ) 4c is included. The functions of these layers will be described later.
[0098]
The parameter information file 5 is a file for storing parameters adjusted in the parameter adjustment layer 4.
[0099]
In the computing device 1 of the present embodiment, the library 3 is a function realized by reading and executing a program recorded on a recording medium (not shown). The parameter adjustment layer 4 is also a function realized by reading and executing a program recorded on a recording medium (not shown).
[0100]
The computing device 1 having the above configuration will be described in more detail below.
[0101]
When the user executes the library 3 using the computing device 1, an execution instruction is given after setting appropriate parameters for the desired subroutine 3a.
[0102]
Here, the parameters set for the subroutine 3 a include parameters that change only the execution performance of the computing device 1 and do not change the output of the subroutine 3 a of the library 3. Hereinafter, such a parameter is referred to as a performance information parameter (Performance Parameters: PP).
[0103]
Of the parameters set for the subroutine 3a, parameters that change both the execution performance of the computing device 1 and the output of the subroutine 3a of the library 3 are referred to as basic information (BP) parameters below. Call.
[0104]
For example, it is assumed that the subroutine 3a included in the numerical calculation library is an eigenvalue calculation subroutine for calculating eigenvalues of a matrix. At this time, the substance of the desired matrix, the size of the matrix, and the like correspond to the basic information parameter BP. Further, the number of loop unrolling stages in the matrix calculation of the calculation apparatus 1 corresponds to the performance information parameter PP.
[0105]
In the computing device 1, a desired result can be obtained in a minimum time by optimizing the performance parameter PP using the given basic information parameter BP. The performance information parameter PP and the basic information parameter BP are input to the library 3 via the parameter adjustment layer 4. Parameters other than the performance information parameter PP and the basic information parameter BP are directly input to the library 3 from the outside of the computing device 1 or used inside the library 3.
[0106]
As shown in FIG. 1, the parameter adjustment layer 4 of the present embodiment has an installation-time optimization layer 4a, a pre-execution optimization layer 4b, and an execution time to optimize the performance information parameter PP, which is an adjustable parameter. Each layer of the optimization layer 4c is provided. Each layer 4 a to 4 c does not hold the parameter itself, but stores it in the parameter information file 5.
[0107]
The installation optimization layer (IOL) 4 a performs optimization when the library 3 is installed.
[0108]
For example, as shown in FIG. 3A, the installation optimization layer 4a optimizes an installation optimization parameter (IOP) that is a part of the performance information parameter PP when the library 3 is installed (S1). (S2) The obtained parameter (IOP) is output to the parameter information file 5.
[0109]
Note that, at the time of installing the library 3, the basic information parameter BP is not normally determined. For this reason, the installation optimization layer 4a appropriately samples, for example, the value of the basic information parameter BP, and determines a parameter that minimizes an appropriately defined cost definition function for each sampled extraction point. Then, the data between the sampled extraction points is interpolated by an appropriate model formula.
[0110]
The pre-execution optimization layer (BEOL) 4b performs optimization after designating a specific parameter (for example, problem size) designated by the user.
[0111]
In response to the input of the basic information parameter BP, the pre-execution optimization layer 4b optimizes the pre-execution optimization parameter BEOP which is a part of the performance information parameter PP. For example, as shown in FIG. 3B, according to the definition (input) of the basic information parameter BP as the user-specified parameter (S4), the parameter (IOP) of the parameter information file 5 is referred to (S5), and the optimum (S6), and the obtained optimization parameter (BEOP) is output to the parameter information file 5.
[0112]
It should be noted that the pre-execution optimization layer 4b performs a trial by actual measurement in order to obtain an optimum parameter using the basic information parameter BP designated by the user.
[0113]
The runtime optimization layer (ROL) 4c is optimized after the parameter optimization by at least one of the installation optimization layer 4a and the pre-execution optimization layer 4b is completed, and when the target library (or routine) is executed. To do.
[0114]
When the execution optimization layer 4c detects an execution instruction of the library 3 (subroutine 3a of the library 3) as shown in FIG. 3C, for example (S8), the execution optimization layer 4c refers to the performance information parameter PP that has already been set. (S9) When the calculation based on the performance information parameter PP does not satisfy the desired accuracy, optimization is performed again (S10). In S10, the calculation is repeated until an optimum parameter PP is obtained so that the calculation satisfies a desired accuracy.
[0115]
In this way, the runtime optimization layer 4c refers to the performance information parameter PP that has already been set, and does not perform optimization calculations in a predetermined case where, for example, sufficient accuracy is obtained.
[0116]
As described above, in the parameter adjustment layer 4 of the present embodiment, the parameter information IOP optimized by the installation optimization layer 4a is stored in the parameter information file 5, and the pre-execution optimization layer 4b and the execution optimization It is possible to refer to the conversion layer 4c. The parameter information BEOP optimized in the pre-execution optimization layer 4b is stored in the parameter information file 5 and can be referred to in the runtime optimization layer 4c.
[0117]
Here, each element of the performance information parameter PP is included in at least one of a set of parameters (IOP), parameters (BEOP), and parameters (ROP). That is, each element of the performance information parameter PP is decomposed into three subsets (IOP, BEOP, ROP) for each of the layers 4a to 4c of the parameter adjustment layer 4 to allow duplication. This can be expressed as follows:
PP parameter = IOP ∪ BEOP ∪ ROP (Equation 1)
Therefore, the computing device 1 according to the present embodiment can optimize all the elements included in the performance information parameter PP at any one of the timings described above using the parameter adjustment layer 4.
[0118]
In particular, when the basic information parameter BP such as the matrix size (n) corresponding to the problem is determined, the computing device 1 according to the present embodiment performs optimization before execution of the actual calculation. 4b. This allows for more accurate optimization than conventional computing devices.
[0119]
[Example 1]
Below, the calculation apparatus 1 of this embodiment is demonstrated using a specific example of a numerical calculation library. In the first embodiment, optimization by the optimization layer 4a at the time of installation of the subroutine 3a will be described.
[0120]
Here, as an example, a case where the user library 3 is an eigenvalue calculation library and the subroutine 3a is a subroutine PEigVecCal will be described.
[0121]
The subroutine PEigVecCal of the eigenvalue calculation library means a subroutine based on the Householder bisection / inverse iteration method, which is often used when obtaining eigenvalues and eigenvectors in a real symmetric matrix. In this embodiment, since the execution time is optimized, the cost definition function to be optimized is the execution time.
[0122]
The subroutine PEigVecCal has the following configuration.
call PEigVecCal (A, x, lambda, n, nproc, myid, iDistInd, imv, iud, ihit, icomm, kbi, kort, MAXITER, deps,…)
Each argument in the subroutine PEigVecCal is a software parameter presented by a software configuration method by parameter presentation.
[0123]
Each parameter will be described as follows.
(I) Each parameter of A, x, lambda, n corresponds to a basic information parameter BP representing matrix information and the like. More specifically, A corresponds to the entity of the matrix for which eigenvalues are to be obtained. x corresponds to an eigenvector. lambda corresponds to the eigenvalue. n corresponds to the size of the matrix. When performing calculations, if you specify the A and n and call the subroutine PEigVecCal, you can get results for x and lambda as needed.
(Ii) Each parameter of nproc, myid, iDistInd is a parameter for parallel control. For example, nproc represents the number of independent processors (not shown) included in the processor 2.
(Iii) imv, iud, ihit, icomm, kbi, kort are parameters that affect the performance in relation to the processing procedure, such as the number of unrolling stages.
(Iv) MAXITER, deps, ... are parameters that affect the algorithm, and are hereinafter referred to as solution information parameters.
Among these, a part of (ii) and (iii) corresponds to the performance information parameter (PP).
[0124]
More specifically, in the Householder bisection / inverse iteration method, the main software parameters are as follows: basic information parameter BP = {n}, performance information parameter PP = {imv, iud, ihit, kbi, kort, nproc] .
[0125]
More specifically, the subroutine PEigVecCal is assumed to be composed of the following four types of subroutines and performance information parameters PP.
HosehoIder tridiagonalization routine: PP = [imv, iud],
・ Dichotomy routine: PP = {kbi],
Inverse iteration routine: PP = {kort],
Householder reverse conversion routine: PP = {ihit].
[0126]
Here, the details of the performance information parameter PP described above will be described. The definition area of each performance information parameter PP is also shown.
・ Imv = [1,2,…, 16]
imv designates the number of outermost loop unrolling stages among matrix / vector products (double loop) necessary for Householder tridiagonalization.
・ Iud = [1,2,…, 16]
iud specifies the outermost loop unrolling stage number in the matrix update process (double loop) necessary for HousehoIder tridiagonalization.
・ Kbi ＝ [vec, non-vec]
kbi specifies the implementation method of whether or not the processing necessary for the bisection method is vector oriented.
・ Kort = [CG-S, MG-S, IRCG-S, NoOrt]
kort specifies the implementation method of re-orthogonalization processing necessary for eigenvector calculation for dense eigenvalues in the inverse iteration method. More specifically, CG-S means re-orthogonalizing eigenvectors with the classical Gram-Schmidt method. MG-S means re-orthogonalization of eigenvectors by the modified Gram-Schmidt method. IRCG-S means re-orthogonalization of eigenvectors by the iterative improved classical Gram-Schmidt method. NoOrt means that no re-orthogonalization is performed.
・ Ihit = [1,2,…, 16]
ihit designates the outermost loop unrolling stage number among the processes (double loop) necessary for Householder inverse transformation.
[0127]
Below, the optimization by the optimization layer 4a at the time of installation with respect to eigenvalue calculation library PEigVecCal is demonstrated.
[0128]
In this embodiment, the installation optimization parameter IOP optimized in the installation optimization layer 4a is IOP = {imv, iud, kbi, ihit}. These parameters {imv, iud, kbi, ihit} determine the computer environment such as the installation computer architecture and compiler. Time The parameters are determined from the information (register number, cache size, vector mechanism, etc.). For this reason, it is preferable to optimize at the time of installation.
[0129]
The optimization of the parameter IOP will be described using the parameter iud as an example. The optimization for the other parameters is the same, and a description thereof will be omitted.
[0130]
First, at the time of this installation, the basic parameter size n is not fixed. Therefore, an appropriate sampling point {200, 400, 800, 2000, 4000, 8000} is determined for the problem size n. It is assumed that the number of processors nproc is 8.
[0131]
For the parameter iud, the following sampling points {1, 2, 3, 4, 8, 16} are determined. This is to reduce the amount of calculation by avoiding calculations in the entire domain. If sufficient calculation time can be taken, the calculation may be performed over the entire domain.
[0132]
At each sampling point, an appropriate trial is performed to measure the calculation time, which is the execution cost. Thereafter, an appropriate cost definition function is determined that interpolates the values at each sampling point. As a result, the cost over the entire domain can be estimated.
[0133]
In this embodiment, the optimization execution time (= cost definition function) of the parameter iud is approximated by a polynomial with respect to the parameter iud. The approximate function used for interpolation is not limited to polynomial approximation, and other functions may be used.
[0134]
Table 1 below shows the measurement results (unit: seconds) of the execution time at each sampling point using an example of a parallel computer.
[0135]
[Table 1]

[0136]
For this measurement result, the basic information parameter n is fixed, and the function f related to the parameter iud _n Estimate (iud).
[0137]
Where f _n (iud) as a fifth order polynomial f _n (Iud) = a1 x iud ^Five + A2 × iud ^Four + A3 × iud ^Three + A4 × iud ² + A5 × iud + a6 is assumed. On the other hand, the coefficients a1,..., A6 can be determined using an appropriate method. Here, each coefficient was determined using the least square method. The method for determining each coefficient is not limited to this.
[0138]
Table 2 below shows the results of determining the coefficients using the least square method using the data in Table 1 as sample points.
[0139]
[Table 2]

[0140]
From Table 2, the value of iud that is minimum in the domain of iud {1, 2, ..., 16} can be determined for each sampled problem size.
[0141]
For the problem size n, interpolation is performed as follows.
[0142]
First, from Table 2, evaluation values can be obtained for all iud regions. Therefore, it is possible to calculate an evaluation value in which iud is fixed and the problem size n is changed by the sample points [200, 400, 800, 2000, 4000, 8000] in all the domain of definition {1, 2,.
[0143]
Therefore, considering these evaluation values as new sample points, the function f _iud (N) is estimated by the method of least squares. f _iud For (n), the fifth order polynomial f _iud (n) ＝ a'1 × n ^Five + A'2 × n ^Four + A'3 × n ^Three + A'4 × n ² Assume + a′5 × n + a′6. The result by the evaluation value changed only by the sample points [200, 400, 800, 2000, 4000, 8000] regarding n is obtained. From the result, the function f of n _iud Since (n) is determined by the domain {1, 2,..., 16} relating to iud, the minimum iud can be determined by substituting n specified at the time of execution.
[0144]
As described above, since there is no guarantee that the same value is specified at the time of execution with respect to the sampled problem size n, the optimum parameter is estimated as described above. The estimated parameters are stored in the parameter information file 5. In addition, information for the parameter to be estimated, for example, each coefficient is stored in the parameter information file 5 here. Thereby, information can be obtained by referring to the parameter information file 5 in later optimization.
[0145]
[Example 2]
Next, optimization by the pre-execution optimization layer 4b for the subroutine PEigVecCal described in the first embodiment will be described.
[0146]
In this embodiment, it is assumed that the problem size n, which is a basic information parameter, is determined as n = 8192. It is assumed that the number of processors nprocs is 4.
[0147]
Here, BEOP = {imv, iud, ihit, kbi} is set as a pre-execution optimization parameter (BEOP) to be optimized by the pre-execution optimization layer 4b.
[0148]
Thus, in this embodiment, BEOP is the same as the IOP in the first embodiment. However, in this embodiment, since the optimization is performed after the basic information parameter n is determined, it is not necessary to perform the interpolation as described above, and there is a difference that a reliable optimum value measured for BEOP can be obtained. .
[0149]
That is, in the optimization at the time of installation, the parameters of the cost definition function are determined by estimation by interpolation or the like except for the sample sampling points related to the size n. The assumed number of processors nproc was used.
[0150]
As described above, for example, in the conventional configuration in which optimization is performed only at the time of installation, there is no pre-execution optimization layer, and therefore, the parameter can only be determined from the estimated value.
[0151]
On the other hand, in the optimization before execution according to the present invention, parameters are determined by actual measurement for a desired size n. For this reason, the pre-execution optimization can improve the accuracy of the parameters more than the optimization at the time of installation. Therefore, the optimization by the pre-execution optimization layer 4b can be used even when the parameter estimation has a fatal error. Further, for example, by referring to the parameter information file 5, the calculation time can be reduced by using the result of the optimization at the time of installation.
[0152]
In this optimization before execution, the desired size n to be actually used is measured, for example, in the same manner as in Table 1 of Example 1 described above, and the coefficients are obtained as shown in Table 2 to obtain the optimum iud. Is done by. Since the details of the procedure are the same as those in the first embodiment, a description thereof will be omitted.
[0153]
From the above embodiments, by implementing the present invention, a more advanced parameter adjustment mechanism than the conventional one is provided.
[0154]
The difference in function between the conventional installation optimization and the pre-execution optimization layer 4b in the present invention is as follows.
[0155]
[Table 3]

[0156]
Example 3
Next, another example of optimization by the pre-execution optimization layer 4b for the subroutine PEigVecCal described in the first and second embodiments will be described.
[0157]
Here, the user calls the library call for the coefficient matrix Time Suppose you know the information that does not change. That is, it is assumed that the problem size n is fixed.
[0158]
At this time, BEOP = [imv, iud, ihit, kbi, kort] is set as the pre-execution optimization parameter BEOP optimized by the pre-execution optimization layer 4b. In other words, in the case of this problem, the eigenvalue problem can be optimized up to the orthogonalization process (parameter kort) by the inverse iteration method. In this embodiment, the number of processors nprocs is also optimized.
[0159]
In the conventional method, since there is no pre-execution optimization layer, parameter optimization cannot be applied in this embodiment.
[0160]
Here, it is assumed that the size of the matrix (Frank matrix) is given as n = 10,000. Table 4 below shows the results of execution time (unit: seconds) in which measurement was performed for the parameter kort and the number of processors nprocs using an example of a parallel computer. Note that the symbol> indicates that the convergence did not occur during the predetermined constraint time and the execution did not end.
[0161]
[Table 4]

[0162]
Table 5 below shows the orthogonal accuracy of eigenvectors by each orthogonalization method in the inverse iteration method. The unit is Frobenius norm and the maximum residual vector max in MG-S of 8PE _i (| (A x) _i -lambda _i x _i | ² ) = 1.61E-7.
[0163]
[Table 5]

[0164]
From Table 4 and Table 5, in this embodiment, the execution time differs depending on the orthogonalization method, but if the orthogonal accuracy is 1.5E-12 or less, the CG-S method is good from the viewpoint of speed and accuracy. I understand that. Therefore, for example, if the user hands over the upper bound of the orthogonal accuracy to the system, the parameter kort can be fixed to CG-S by BEOL. In addition, the optimal number of processors nprocs can be determined.
[0165]
From the above embodiments, by applying the present invention, a more advanced parameter adjustment mechanism than the conventional one is provided.
[0166]
In the calculation in this embodiment, the calculation amount may be reduced by referring to the information stored in the parameter information file 5.
[0167]
Example 4
Next, an example of optimization by the runtime optimization layer 4c for the subroutine PEigVecCal described in the first to third embodiments will be described.
[0168]
Here, in the fourth embodiment, it is assumed that the matrix entity A and the matrix size n can be changed. That is, it is assumed that the matrix size and matrix data are not fixed at the time of execution. In such a case, the optimal orthogonalization method actually depends on the conditions given by the user and the characteristics of the matrix that are not fixed until run time.
[0169]
Here, the runtime optimization parameter ROP optimized by the runtime optimization layer 4c is ROP = {kort}.
[0170]
In addition, as a result of optimization by the pre-execution optimization layer 4b described in the third embodiment, a parameter kort that matches the accuracy requirement of the user as a past orthogonal application example is stored in the parameter information file 5 as an optimum parameter. And
[0171]
In view of this, in the runtime optimization layer 4c, first, the most appropriate orthogonalization method is selected with reference to the parameter kort stored in the parameter information file 5.
[0172]
Next, the runtime optimization layer 4c determines whether or not the calculation accuracy satisfies a user-specified criterion. When the specified accuracy is not satisfied, as described in the third embodiment, each value of the parameter kort is actually measured, and the parameter is readjusted. Then, the determination and calculation are repeated until the optimum parameter kort that provides the specified accuracy can be selected. As a result, the accuracy specified by the user can be guaranteed on the system side. Note that the details of the calculation are the same as in the third embodiment described above, and are omitted here.
[0173]
In the conventional method, if the matrix size and matrix data are not fixed at the time of execution, the optimum parameter cannot be determined for algorithmic reasons. In general, in the conventional method, in order to guarantee accuracy, MG-S is often forcibly selected regardless of cost. In this case, as shown in Table 4 above, the cost is very disadvantageous.
[0174]
On the other hand, in the present invention, parameter adjustment can be applied even in the above-described case by passing information (such as orthogonal accuracy) given from the user to the system. In addition, the cost can be optimized.
In the above embodiment, the pre-execution optimization layer 4b and the runtime optimization layer 4c, which are unique to the present invention, can only be realized as software configuration methods. Therefore, the present invention has a wide range of application of parameter adjustment to the conventional method.
[0175]
Here, the configuration for referring to the information optimized before execution stored in the parameter information file 5 has been described. If possible, the configuration stored in the parameter information file 5 and optimized at the time of installation is used. It may be configured to refer to the information. In this way, it can be realized by a combination of the installation time optimization layer 4a and the runtime optimization layer 4c.
[0176]
As described in the above embodiments, the program according to the present embodiment is configured such that when the basic information parameter BP is determined, the performance method parameter is optimized accordingly before the actual library is executed. .
[0177]
Therefore, the parameters optimized at the time of installation can be readjusted more precisely, or the calculation time for optimization at the time of execution can be reduced to ensure a sufficient optimization time. As a result, more accurate and reliable parameter adjustment is possible.
[0178]
Each element of the performance information parameter PP is optimized either at the time of installation, before library execution, or at the time of library execution. That is, since optimization is performed before execution of the library in addition to the time of installation and library execution, it is possible to provide a highly versatile parameter adjustment function that can optimize all problems.
[0179]
In the above-described embodiment, the computing device 1 as a parallel computing device has been described. However, the present invention is not limited to this, and the processor 2 is provided in a plurality of computing devices connected via a network. It may be a distributed computing device.
[0180]
In the above-described embodiment, the description has been given only for the case where the library is the eigenvalue calculation library and the subroutine is the subroutine PEigVecCal. Of course.
[0181]
As described above, the present invention has a mechanism for adjusting parameters in consideration of various costs such as performance in software (automatic tuning software) that automatically adjusts parameters on software related to performance, cost, and the like. The present invention relates to a software configuration method having a wide range regarding application of the adjustment mechanism. The present invention also relates to a software configuration method having an optimization layer at the time of installation, before execution, and at execution. In the present invention, the parameters are separated into three types and used as in the above-described equation 1.
[0182]
Here, in the conventional automatic tuning software configuration method, for example, parameter optimization is performed at the time of software installation as shown in FIG. 4A, or parameter optimization is performed at the time of library execution as shown in FIG. 4B, for example. There was only something to do. These software configuration methods have a problem that they cannot be applied to general-purpose processing and parameter adjustment may be insufficient. As can be seen from FIGS. 4A and 4B, the conventional automatic tuning has one parameter.
[0183]
Therefore, in the present invention, the problem can be solved by a software configuration method that can apply parameter adjustment in more general-purpose processing and has a more advanced parameter adjustment mechanism than the conventional one.
[0184]
In particular, when the basic information parameter BP such as the matrix size (n) corresponding to the problem is determined, the computing device 1 according to the present embodiment performs optimization before execution of the actual calculation. 4b. This enables more accurate optimization than the case of the IOL or ROL alone as in a conventional computing device.
[0185]
Note that Non-Patent Document 1 (P62) in the prior art describes that there are two types of automatic tuning: “(i) automatic tuning at execution (ii) automatic tuning before execution”. However, “automatic pre-execution tuning” in Non-Patent Document 1 is different from the above-mentioned “automatic tuning before execution” in the present invention, and corresponds to “optimization at installation” in the present invention. .
[0186]
The present invention is not limited to the above-described embodiments and examples, and various modifications are possible within the scope shown in the claims, and can be obtained by appropriately combining technical means disclosed in different examples. Embodiments are also included in the technical scope of the present invention.
[0187]
The specific embodiments or examples described above are merely to clarify the technical contents of the present invention, and the present invention is not limited to such specific examples and should not be interpreted in a narrow sense. Various modifications can be made within the scope of the claims, and the modified embodiments are also included in the technical scope of the present invention.
[0188]
【The invention's effect】
In the program according to the present invention, as described above, the basic information parameters included in the library parameters that change both the execution performance and the output of the library are determined. Time And a procedure for optimizing the performance information parameter using the basic information parameter.
[0189]
Therefore, there is an effect that more accurate and reliable parameter adjustment can be achieved by measuring the execution cost in advance before actual calculation to obtain the optimum performance information parameter.
[0190]
As described above, the program according to the present invention is a basic procedure for changing both the initial setting procedure for optimizing the performance information parameter at the time of installing the library and the execution performance and the output of the library included in the library parameter. Information parameters are determined Time And a pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter with reference to the performance information parameter set in the initial setting procedure. is there.
[0191]
Therefore, since the performance information parameters are optimized using the basic information parameters by referring to the performance information parameters set at the time of installation before the actual execution of the library, the number of trial calculations is reduced and optimized. The performance information parameter can be obtained.
[0192]
In the program according to the present invention, as described above, the basic information parameters included in the library parameters that change both the execution performance and the output of the library are determined. Time , A pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter, and the performance information parameter already set when the library is executed. When the calculation based on the information parameter does not satisfy the desired accuracy, the read information includes a readjustment procedure for re-optimizing the performance information parameter using the basic information parameter.
[0193]
Therefore, since the accuracy is confirmed by the readjustment procedure, when the desired accuracy is obtained, there is an effect that the library can be executed without performing calculation for parameter optimization.
[0194]
As described above, the program according to the present invention is a basic procedure for changing both the initial setting procedure for optimizing the performance information parameter at the time of installing the library and the execution performance and the output of the library included in the library parameter. Information parameters are determined Time The basic information parameter is used when the calculation based on the performance information parameter does not satisfy the desired accuracy by referring to the performance information parameter already set during the execution of the library. And a readjustment procedure for re-optimizing the performance information parameter.
[0195]
Therefore, in the actual execution of the library, if the desired accuracy can be obtained by the performance information parameter that has already been set, the library can be executed without performing calculation for parameter optimization. Play.
[0196]
As described above, the program according to the present invention is a basic procedure for changing both the initial setting procedure for optimizing the performance information parameter at the time of installing the library and the execution performance and the output of the library included in the library parameter. Information parameters are determined Time , A pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter, and the performance information parameter already set when the library is executed. When the calculation based on the performance information parameter does not satisfy the desired accuracy, a readjustment procedure for re-optimizing the performance information parameter using the basic information parameter is included.
[0197]
Therefore, there is an effect that an optimum performance information parameter can be obtained before the actual execution of the library. Also, when the library is actually executed, if the desired accuracy can be obtained by the performance information parameter that has already been set, the library can be executed without performing calculation for parameter optimization. .
[0198]
As described above, the program according to the present invention is configured such that each element of the performance information parameter is a first set of parameters that are optimized when the library is installed, and a second parameter that is optimized before the library is executed. Or a function for optimizing the elements of the first set by setting to be included in at least one of the third set of parameters to be optimized when executing the library, In this configuration, the computer realizes the function of optimizing the elements of the second set and the function of optimizing the elements of the third set.
[0199]
Therefore, since the performance information parameters are optimized at the time of installation, before library execution, or at the time of library execution, all of the performance information parameters can be optimized, and the present invention can be applied to general-purpose processing.
[0200]
As described above, the recording medium according to the present invention is a computer-readable recording medium in which any one of the above programs is recorded.
[0201]
Therefore, the same effect as the above-described program is achieved.
[0202]
The computer according to the present invention has the above-described recording medium as described above.
[0203]
Therefore, the same effect as the above-described program is achieved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a part of an embodiment of a computer according to the present invention.
FIG. 2 is a block diagram showing the computer.
FIG. 3A is a flowchart showing a procedure for optimization at the time of installation, FIG. 3B is a flowchart showing a procedure for optimization before library execution, and FIG. 3C shows a procedure for optimization at library execution time. It is a flowchart.
FIG. 4A is a block diagram showing a part of an example of a conventional computer, and FIG. 4B is a block diagram showing a part of another example of a conventional computer.
[Explanation of symbols]
1. Computing device (computer)
2 processor
3 User library (library)
4 Parameter adjustment layer
4a Installation optimization layer
4b Optimization layer before execution
4c Runtime optimization layer
5 Parameter information file
IOP installation optimization parameters (performance information parameters)
BEOP pre-execution optimization parameters (performance information parameters)
ROP runtime optimization parameters (performance information parameters)

Claims

In a computing device having a unique value calculating means for performing a reference to the eigenvalue calculating the basic information parameter indicating the operational performance varied performance information parameters which do not alter the operation result and the size of the operation target matrix, the eigenvalue calculation means the Comprising an optimization means for optimizing the value of the performance information parameter before executing the eigenvalue calculation ;
When the value of the basic information parameter is designated by the user, the optimization means uses each of a plurality of pre-sampled sampling values as the performance information parameter, and designates the designated value designated by the user as the basic information parameter. as to the eigenvalue calculation means to attempt the eigenvalue calculation, the eigenvalue calculation means for each of the plurality of sampling values from the calculation time required to attempt the eigenvalue calculation as the performance information parameter is the eigenvalue calculation means Estimating the performance information parameter value that minimizes the computation time required to execute the eigenvalue calculation using the specified value as the basic information parameter, and storing the estimated performance information parameter value in a parameter information file ,
The eigenvalue calculation means is a value of the performance information parameter stored in the parameter information file at the time when execution of the eigenvalue calculation is instructed by a user, and the performance information parameter estimated by the optimization means Execute the above eigenvalue calculation with reference to the value of
A computing device characterized by that.

A program for causing a computer to operate as the calculation device according to claim 1, wherein the computer functions as the eigenvalue calculation means and the optimization means.

A computer-readable recording medium on which the program according to claim 2 is recorded.