JP4565201B2

JP4565201B2 - Calculation device, calculation method, program, and recording medium

Info

Publication number: JP4565201B2
Application number: JP2003149701A
Authority: JP
Inventors: 孝洋片桐
Original assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Priority date: 2003-03-28
Filing date: 2003-05-27
Publication date: 2010-10-20
Anticipated expiration: 2023-05-27
Also published as: JP2004355144A

Description

【０００１】
【発明の属する技術分野】
本発明は、プログラムに含まれるパラメータの最適化を行うための計算装置、計算方法、プログラムおよび記録媒体に関するものである。
【０００２】
【従来の技術】
従来、コンピュータのような計算装置において、実行するソフトウェアプログラムを最適化する際には、最適化するためのパラメータをユーザが指定して、そのパラメータについての最適化処理をユーザが順次手作業で指示するようになっていた。
【０００３】
例えば、ユーザは、プログラムについてチューニング（最適化）すべきパラメータを、手作業で登録する。さらには、実際に最適化を行うために、例えば、チューニングを行うための前処理、実際のチューニング方法、およびチューニングしたパラメータの利用のための処理などについて、コンピュータに対してそれぞれ指示する必要がある。
【０００４】
なお、このようなチューニングを行うための構成の一例として、日本国の公開特許公報「特開２０００−２７６４５４号公報（公開日：２０００年１０月６日）」には、パラメータを調節してインストールを行う機能を有するソフトウェアの構成方法が記載されている。
【０００５】
【特許文献１】
特開２０００−２７６４５４号公報
【０００６】
【発明が解決しようとする課題】
しかしながら、上述の従来の構成によれば、パラメータを最適化する際に、開発時間と開発費用の増大、機能拡張性の低さ、およびバグ混入の可能性の高さなどの問題を生ずる。
【０００７】
すなわち、従来の構成によれば、パラメータの登録の後にも、実際にパラメータの最適化を達成するために、種々の設定が必要となる。したがって、パラメータを最適化する際には、開発時間と開発費用の増大、機能拡張性の低さ、およびバグ混入の可能性の高さなどの問題を生ずることになる。
【０００８】
また、例えば、最適なパラメータ推定のための最適化問題求解処理において、設定を手作業で行うため、通常は単一のコスト定義関数による推定機能しか実現されない。このため、パラメータ推定機能が低いという問題も生ずる。
【０００９】
本発明は、上記の問題点に鑑みてなされたものであり、その目的は、パラメータの最適化が容易な計算装置、計算方法、プログラムおよび記録媒体を提供することにある。
【００１０】
【課題を解決するための手段】
本発明に係る計算装置は、上記課題を解決するために、入力されるプログラムに含まれるパラメータの最適化を行うための計算装置において、最適化を行う上記プログラム中の領域と最適化を行うパラメータとを指定する指定子が含まれている上記プログラムが入力されると、上記指定子によって指定される上記領域と上記パラメータとについての、実測による最適化を実行するためのプログラムを生成するプログラム生成部を備えていることを特徴としている。
【００１１】
この計算装置は、入力されるプログラムについての最適化を行うものである。
より詳細には、この計算装置は、入力されるプログラムに所定の指定子が含まれてことを検出すると、それに応じて、このプログラムの最適化を実際に実行するためのプログラムを生成するプログラム生成部を備えている。プログラム生成部は、この指定子によって指定される領域についての最適化を、指定子によって指定されるパラメータについて行うための新たなプログラムを生成する。なお、パラメータの指定には、どの変数を指定するかというパラメータの種類だけでなく、パラメータの範囲の指定が含まれていてもよい。また、プログラムの生成には、プログラムの書き換えをも含むものとする。
【００１２】
例えば、プログラム生成部は、指定子によって指定される領域を含むようなサブプログラムを作成する。例えば、指定子によって、パラメータとしてのループアンローリング段数の最適化が指定され、指定される領域がループ処理である場合には、引数として指定されるループアンローリング段数に応じたループ処理を実行するサブプログラムを作成する。すなわち、サブプログラムとは、例えば調節するためのパラメータを引数として有しており、指定子によって指定された領域についての処理をこのパラメータに応じて実行するプログラムである。また、このサブプログラムをパラメータごとに呼び出して実際の所要時間を計測するための実測ルーチンを作成する。また、実測ルーチンにて計測した所要時間から最適なパラメータを推定するための推定ルーチンを作成する。
【００１３】
このため、この計算装置に対して、所定の指定子を記載したプログラムを入力すれば、このプログラムの指定した領域を指定したパラメータについて最適化するためのプログラムを得ることができる。
【００１４】
計算装置は、プログラム生成部の生成したプログラムを実行形式に翻訳するコンパイラを備えていてもよい。また、計算装置は、実行形式を実行するプロセッサを備えていてもよく、コンパイラの生成した実行形式をプロセッサにて実行して実際に最適化を行ってもよい。または、計算装置は、プログラム生成部の生成したプログラムを外部の他の計算装置に送信して、実行形式への翻訳および実際の最適化を行うようにしてもよい。いずれにせよ、本発明に係る計算装置は、所定の形式のプログラムを入力すると、このプログラムから新たなプログラムを生成するプログラム生成部を備えていればよい。
【００１５】
例えば、プログラム生成部の生成したサブプログラム、実測ルーチン、推定ルーチンが、計算装置において実行可能形式に翻訳され、実測ルーチン、推定ルーチンが実行されれば、最適なパラメータを得ることができる。
【００１６】
ここで、従来の計算装置は、入力されるプログラムの最適化を行う場合には、処理の各段階において、ユーザによる所定の指示が必要となっていた。このため、設定に時間を要し、開発時間と開発費用とを増大させていた。また、ユーザによる設定には種々の制約があり、便利なものとはいえないため、機能拡張性も低くなっていた。また、ユーザによる設定にミスが含まれて、バグが混入する可能性があった。
【００１７】
なお、上記の計算装置を、自動チューニング機能を付加したプログラムを生成する生成手段を備えたプログラミング言語処理装置である、と表現することもできる。
【００１８】
本発明に係る計算装置は、上記課題を解決するために、上記構成において、上記プログラム生成部は、入力される上記プログラムから上記指定子によって指定される上記領域と上記パラメータとを抽出する指定子解析手段と、上記指定子解析手段にて抽出された上記領域を含むサブプログラムを生成し、上記サブプログラムを呼び出して、上記パラメータについての実測による最適化を実行するためのメインプログラムを生成する、プログラム作成手段とを含んでいることを特徴としている。
【００１９】
この構成によって、上述の本発明に係る計算装置を実現できる。例えば、メインプログラムには、サブプログラムをパラメータごとに呼び出して実際の所要時間を計測するための実測ルーチンと、実測ルーチンにて計測した所要時間から最適なパラメータを推定するための推定ルーチンとを含ませればよい。
【００２０】
なお、ここでいうメインプログラムは、いわゆるメインルーチンに限るものではなく、メインルーチンから呼び出されるサブルーチンをも含むものであってもよい。すなわち、上述のように、サブプログラムとは、最適化を行うための領域を抽出して生成したものであり、例えば最適化するためのパラメータを引数としてその領域についての処理を実行するものであるので、メインプログラムはそのサブプログラムを呼び出すものであればよい。したがって、メインルーチンに限るものではなく、メインルーチンから呼び出されるサブルーチンが、メインプログラムとして、上述のサブプログラムを呼び出す構成であってもよい。
【００２１】
また、プログラム作成手段がメインプログラムとサブプログラムとを作成する順序は、どのようなものであってもよく、例えばサブプログラムを作成した後にメインプログラムを作成してもよいし、または例えばメインプログラムを作成した後にサブプログラムを作成してもよい。
【００２２】
本発明に係る計算装置は、上記課題を解決するために、上記構成において、上記プログラム作成手段は、上記メインプログラムから呼び出す、または上記メインプログラムに含ませるための、上記パラメータごとに上記サブプログラムを呼び出して所要時間を計測する実測ルーチンと、上記実測ルーチンにて計測した上記所要時間を用いて最適なパラメータを推定する推定ルーチンとを作成することを特徴としている。
【００２３】
この構成であれば、実測ルーチンと推定ルーチンとが、メインプログラムから呼び出され、またはメインプログラムに含まれているので、メインプログラムを実行可能形式に翻訳して実行するだけで、最適なパラメータを得ることができる。なお、指定子には、推定ルーチンにて用いる、最適なパラメータを近似によって推定するための近似関数の指定が含まれていてもよい。
【００２４】
本発明に係る計算装置は、上記課題を解決するために、上記構成において、上記プログラム作成手段は、上記メインプログラムの実行の際に最適化を行うために、上記サブプログラムが上記メインプログラムのループ内において呼び出されている場合には、上記ループの外側で上記ループよりも前において、上記実測ルーチンおよび上記推定ルーチンを呼び出す上記メインプログラムを生成することを特徴としている。
【００２５】
ここで、実測ルーチンにおいては、上述のように、パラメータごとにサブプログラムを呼び出して所要時間を計測するため、この所要時間の計測に時間が必要となる。
【００２６】
そこで、上記構成のように、メインプログラムとして、ループの外側でループよりも前において、実測ルーチンを呼び出すものを作成する。したがって、最適化の際の実測ルーチンの呼び出し回数を減らすことができるので、実測に要する時間を削減できる。すなわち、ループの内部にて毎回実測ルーチンが実行されることがないので、その分だけ最適化に要する時間を短縮できる。
【００２７】
また、推定ルーチンについても、実測ルーチンをループの外にて呼び出すのであれば、ループの内部にて呼び出す必要がないので、同様にループの外にて呼び出すようにすればよい。
【００２８】
このために、プログラム作成手段は、例えばメインプログラムの先頭において実測ルーチンおよび推定ルーチンを呼び出すようなメインプログラムを生成する構成であってもよい。この構成であれば、上記の計算装置を確実に実現できる。
【００２９】
なお、上記計算装置を、実行時のパラメータチューニングにおいて、該当する領域を含むサブプログラムを呼び出す前に行うチューニング方式を有する構成のソフトウェア構成方式を実行する計算装置である、と表現することもできる。
【００３０】
本発明に係る計算装置は、上記課題を解決するために、上記構成において、上記プログラム作成手段は、上記メインプログラムの実行の際に最適化を行うために、上記サブプログラムが上記メインプログラムのループ内において呼び出されている場合には、上記ループの外側で上記ループよりも前において、上記実測ルーチンおよび上記推定ルーチンを呼び出す上記メインプログラムか、または、上記ループ内において上記実測ルーチンおよび上記推定ルーチンを呼び出す上記メインプログラムかのいずれかを、上記指定子に応じて選択して生成することを特徴としている。
【００３１】
この構成であれば、最適化の際の実測ルーチンおよび推定ルーチンの呼び出し回数を減らすか、または通常の最適化を行うかを、指定子に応じて切り替えることができる。
【００３２】
すなわち、例えば最適化の対象となるサブプログラムの領域における変数が、上述のサブプログラムを呼び出すループ内において確定する場合には、指定子を適切なものに設定して、ループ内において実測・推定ルーチンを呼び出す通常の最適化を行うようにすればよい。
【００３３】
また、例えばサブプログラムの領域における変数が、上述のサブプログラムを呼び出すループ前において確定している場合には、このループ前に実測・推定ルーチンを呼び出すようにして最適化を行えば、最適化に要する時間を削減できる。
【００３４】
なお、上記計算装置を、実行時のパラメータチューニングにおいて、該当する領域を含むサブプログラムを呼び出す前に行うチューニング方式と、該当する領域が実行される時に行うチューニング方式の２方式に分離する構成のソフトウェア構成方式を実行する計算装置である、と表現することもできる。
【００３５】
本発明に係る計算装置は、上記課題を解決するために、上記構成において、上記パラメータごとに計測した上記所要時間を近似するためのコスト定義関数を含むコスト定義関数ライブラリを備えていることを特徴としている。
【００３６】
この構成であれば、例えば、このコスト定義関数ライブラリに含まれるコスト定義関数を用いて、所望の近似を行うことができる。
【００３７】
また、例えば、指定子に、推定ルーチンにて用いる近似関数の指定が含まれている場合には、この指定された近似関数をコスト定義関数ライブラリ中から探して用いるようにしてもよい。
【００３８】
本発明に係る計算装置は、上記課題を解決するために、上記構成において、計測した上記所要時間を、上記コスト定義関数ライブラリ中に含まれるコスト定義関数の全てを順次用いて近似して、そのうちから最も近似精度のよいコスト定義関数を選択するコスト定義関数決定部を備えていることを特徴としている。
【００３９】
この構成であれば、例えば指定子に推定ルーチンにて用いる近似関数の指定を含めない場合であっても、最適な近似関数を得ることができる。
【００４０】
また、指定子にて指定した近似関数がコスト定義関数ライブラリ中に含まれていない場合であっても、上記構成のように精度のよい近似関数を選択できる。
【００４１】
本発明に係る計算装置は、上記指定子解析手段にて抽出した上記領域と上記パラメータとを記憶するチューニング情報データベースを有しており、上記プログラム作成手段と上記コスト定義関数決定部とが、上記チューニング情報データベースを参照して上記領域または上記パラメータを取得することを特徴としている。
【００４２】
この構成であれば、指定子解析手段にて抽出した領域とパラメータとをチューニング情報データベースに記憶しておくので、プログラム作成手段とコスト定義関数決定部とが領域またはパラメータを用いる際に、チューニング情報データベースを参照すればよく、その度に領域またはパラメータを抽出する必要がない。
【００４３】
本発明に係る計算方法は、上記課題を解決するために、計算装置に入力されるプログラムに含まれるパラメータの最適化を行うための計算方法において、最適化を行う上記プログラム中の領域と最適化を行うパラメータとを指定する指定子が含まれている上記プログラムが入力されると、上記指定子によって指定される上記領域と上記パラメータとについての、実測による最適化を実行するためのプログラムを生成する工程と、上記生成する工程にて得た上記プログラムを実行して最適化を行う工程とを含んでいることを特徴としている。
【００４４】
この計算方法を例えばコンピュータのような計算装置にて実行すれば、上述の計算装置を実現できる。なお、上述の計算方法を、自動チューニング機能を付加したソフトウェアを生成するステップを備えた、プログラミング言語処理方法である、と表現することもできる。
【００４５】
本発明に係る計算方法は、上記課題を解決するために、上記構成において、上記プログラムの実行の際に最適化を行うために、上記領域が上記プログラムのループ内において呼び出されている場合には、上記プログラムを生成する工程において、上記ループの外側で上記ループよりも前に、上記領域についての上記パラメータごとの所要時間の実測と実測した上記所要時間から最適なパラメータの推測とを行うような上記プログラムを生成することを特徴としている。
【００４６】
このようにすれば、ループの外側でループよりも前に実測・推測を実行し、ループの内部において毎回実測・推測を実行することがないので、その分だけ最適化に要する時間を短縮できる。
【００４７】
本発明に係る計算方法は、上記課題を解決するために、上記構成において、上記プログラムの実行の際に最適化を行うために、上記領域が上記プログラムのループ内において呼び出されている場合には、上記プログラムを生成する工程において、上記ループの外側で上記ループよりも前に、上記領域についての上記パラメータごとの所要時間の実測と実測した上記所要時間から最適なパラメータの推測とを行うような上記プログラムか、または上記ループ内にて上記領域についての上記パラメータごとの所要時間の実測と実測した上記所要時間から最適なパラメータの推測とを行うような上記プログラムかのいずれかを、上記指定子に応じて選択して生成することを特徴としている。
【００４８】
この構成であれば、ループの外側でループよりも前に実測・推測を実行するか、またはループの内部において毎回実測・推測を実行するかを、指定子の設定によって簡単に切り替えることができる。指定子は、例えばサブプログラムによって実現される、対象となる問題の性質に応じて選択すればよい。
【００４９】
また、上述の計算方法を、上述の計算装置の有する自動チューニング機能を付加したプログラムを生成するステップを備えた、プログラミング言語処理方法である、と表現することもできる。
【００５０】
また、上述の計算方法を用いて、自動チューニング機能を有するプログラムを生成する生成手段を備えたプログラミング言語処理装置を実現してもよい。
【００５１】
本発明に係るプログラムは、上記課題を解決するために、上記構成において、コンピュータを、上述のいずれかに記載の計算装置の各手段として動作させることを特徴としている。
【００５２】
このプログラムを用いれば、上述の計算装置を実現できる。なお、このプログラムを使用する方法を、上述の言語処理装置を利用するために行う、プログラムの利用形態である、と表現することもできる。
【００５３】
本発明に係る記録媒体は、上記課題を解決するために、上記構成において、上述のプログラムをコンピュータ読み取り可能に記録したことを特徴としている。
【００５４】
この記録媒体のプログラムをコンピュータにて読み取って実行すれば、上述の計算装置を実現できる。なお、この記録媒体を、自動チューニング機能を付加したソフトウェアを生成する生成手段として機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体である、と表現することもできる。
【００５５】
【発明の実施の形態】
〔実施の形態１〕
本発明の一実施の形態について図１ないし図１７に基づいて説明すると以下の通りである。
【００５６】
本実施形態の計算装置は、所定の形式の計算機言語（プログラム）から、パラメータの最適化（チューニング）が容易となるような他のプログラムを生成する、プログラム生成部を備えた構成である。また、プログラムを実行形式に翻訳するコンパイラを備えている。
【００５７】
計算装置１は、図１に示すように、プロセッサ２、ユーザライブラリ３、パラメータ調整層４、パラメータ情報ファイル５、プログラム生成部６およびコンパイラ７を含んでいる。
【００５８】
また、計算装置１は、図示しない記録媒体を備えている。計算装置１は、例えば、外部から入力される図示しないパラメータを用いてライブラリ３中のサブルーチンを呼び出して、計算を行う。計算結果は図示しない表示装置に出力される。
【００５９】
プロセッサ２は、計算を行うため計算処理部である。プロセッサ２は、図示しないnprocs個のプロセッサを内部に備えている。計算装置１は、プロセッサ２の複数のプロセッサを用いて、並列計算装置として機能する。
【００６０】
ライブラリ３は、数値計算ライブラリである。ライブラリ３は少なくとも一つ以上のサブルーチンを含んでいる。本実施形態のライブラリ３は、図１６に示すように、内部に複数のサブルーチン３ａ〜３ｋを備えている。この図１６は、図１に示す計算装置１の一部を示すものである。
【００６１】
このライブラリ３やサブルーチン３ａ〜３ｋには、なんらかの方法（専用記述言語など）を用いて、パラメータを記述してアクセスする。このパラメータのうちの一部は、例えば外部からユーザによってライブラリ３へ直接入力される。また、パラメータの他の一部は、ライブラリ３内で使われる。また、パラメータのさらに他の一部は、パラメータ調整層４を介してライブラリ３に入力される。
【００６２】
このライブラリ３は、ユーザによって開発された数値計算ライブラリであるが、これに限るものではなく、例えばライブラリ開発者によって開発されたシステムライブラリであってもよい。このような、ＭＰＩ(Message Passing Interface)などの計算機環境やＯＳ（Operating System）などであらかじめ用意されているライブラリ等についても、ソフトウェアインタフェースさえ周知であれば、ユーザやライブラリ開発者がパラメータ記述を行うことによって、パラメータ調整層４にパラメータ情報を引き渡すことができる。
【００６３】
なお、ライブラリの備えるサブルーチンの内容、個数などについては、特に限定されない。また、計算装置１には、ライブラリ以外のプログラムが備えられていてもよく、そのプログラムによって他の機能が実現されてもよい。
【００６４】
パラメータ調整層４は、ライブラリ３の用いるパラメータを調整する調整装置として機能する。パラメータ調整層４は、ライブラリ３に入力するパラメータの一部を調整した上で、ライブラリ３に入力する。パラメータ調整層４は、インストール時最適化層（Installation Optimization Layer:ＩＯＬ）４ａ、実行前最適化層（Before Execution-invocation Optimization Layer:ＢＥＯＬ）４ｂおよび実行時最適化層（Run-time Optimization Layer:ＲＯＬ）４ｃを含んでいる。これらの各層の機能については後述する。
【００６５】
パラメータ情報ファイル５は、パラメータ調整層４において調整されたパラメータを保存するためのファイルである。
【００６６】
なお、本実施形態の計算装置１において、ライブラリ３は、図示しない記録媒体に記録されたプログラムが読み取られ、実行されることによって実現される機能である。また、パラメータ調整層４も、図示しない記録媒体に記録されたプログラムが読み取られ、実行されることによって実現される機能である。
【００６７】
プログラム生成部６は、所定の形式のプログラムから、パラメータの最適化を容易に実行できるような他のプログラムを生成するものである。プログラム生成部６の詳細については後述する。
【００６８】
コンパイラ７は、プログラムを実行形式に翻訳するものである。本実施形態のコンパイラ７は、プログラム生成部６にて生成されたプログラムを実行形式に翻訳する。コンパイラ７は、翻訳した実行形式をプロセッサ２へと出力する。プロセッサ２にて、実行形式を実行すると、後述するように、実際にパラメータの最適化を行うことができる。
【００６９】
なお、このプログラム生成部６・コンパイラ７は、計算装置１において、図示しない記録媒体に記録されたプログラムが読み取られ、実行されることによって実現される機能である。
【００７０】
ここで、プログラム生成部６の詳細について説明する。プログラム生成部６は、最適化を行うプログラム中の領域と最適化を行うパラメータとを指定する指定子が含まれているプログラムが入力されると、指定子によって指定される領域とパラメータとについての、実測による最適化を実行するためのプログラムを生成する。プログラム生成部６は、図２に示すように、指定子解析手段８、プログラム作成手段９、およびコスト定義関数決定手段１０を含んでいる。
【００７１】
指定子解析手段８は、指定子が含まれるプログラムを解析し、指定子で指定されるパラメータと、指定子で指定されるプログラムの一部分（以下、チューニング領域（領域）と呼ぶ。）とを抽出するためのものである。
【００７２】
指定子解析手段８は指定子解析部８ａを含んでおり、プログラム生成部６に入力されるプログラムは、まず指定子解析部８ａへと入力される。指定子解析部８ａが、プログラムからパラメータとチューニング領域とを抽出して、コスト定義関数決定手段１０のチューニング情報データベース１０ａに含まれる、パラメータ１０ｄ・チューニング領域集合１０ｅへとそれぞれ出力する。このように、指定子解析部８ａは指定子からパラメータを抽出する。また、指定子から、最適化を行う際の処理の内容を抽出する。また、チューニング領域を抽出して、必要に応じて所定の処理を行う。
【００７３】
ここで、図３において、抽象的なレベルで記述しているプログラム(Subroutine xxx())は、最適化するべきチューニング領域を含んでいるプログラムの一例である。ここで、図中で「チューニング領域」として示すのが、チューニング領域の一例である。また、「指定子の始め」、「指定子の終わり」として示すのが、指定子の一例である。本実施形態のチューニング領域は、指定子の始め、指定子の終わりによって囲まれた領域である。しかしながら、これに限るものではなく、例えば指定子の始めとその始めの位置からの行数によって指定することもできる。
【００７４】
なお、この例では、Fortran言語を用いてプログラムの一例を記述しているが、本発明はこれに限るものではなく、他の任意の計算機言語を用いるものであってもよい。この場合であっても、処理は同様となる。例えば、関数型計算機言語（Ｃ言語、Ｃ＋＋言語など）を用いて記載されたプログラムにおいても、本発明による処理の本質は同じとなる。また、プログラム中の日本語による記載は、特に断らない場合は、具体的なプログラムの一例ではなく、プログラムによって実現するべき制御動作を抽象的に日本語で表現したものである。また、プログラム中の日本語が、プログラム中のコメントを表す場合もある。
【００７５】
チューニング領域の一例を、図４（ａ）に示す。この図４（ａ）に示すプログラムが、指定子によって囲まれており、指定子によって最適化の方法としてアンローリングが指定されている場合を考える。この場合には、指定子解析部８ａは、図４（ｂ）に示すようなプログラムを生成し、その情報をプログラム作成手段９に引き渡す。すなわち、アンローリング指定子のように、チューニング領域に記載されたプログラムに一定の処理を施し、新たなチューニング領域とする処理が指定される場合には、指定子解析部８ａにてその処理を行った後、その情報をプログラム作成手段９に引き渡す。また、指定子によって指定された処理が、プログラムに対する変更を特に必要としない場合には、指定子解析部８ａは、抽出したチューニング領域をそのままプログラム作成手段９に引き渡す。
【００７６】
また、指定子解析部８ａは、例えば指定子から抽出したパラメータを、図１に示すパラメータ調整層４などに入力することもできる。パラメータ調整層４の有するインストール時最適化層４ａ、実行前最適化層４ｂ、および実行時最適化層４ｃは、指定されたチューニングのタイミング（インストール時、実行前、実行時）に応じて、それぞれ指定子で指定されたパラメータとチューニング領域を分けて、別々に以降の機構に引き渡し、処理を行うことが可能である。この点については後述する。ここでは、簡単のために、まず任意の一つのタイミングにて処理が行われるものとして説明する。
【００７７】
プログラム作成手段９は、指定子解析手段８にて抽出された領域を含むサブプログラムを生成し、パラメータについての実測による最適化をこのサブプログラムを呼び出して実行するメインプログラムを生成するものである。プログラム作成手段９は、自動チューニング機能を付加するためのメインプログラムを作成するメインプログラム作成部９ａ、チューニング情報データベース１０ａ上のチューニング領域を含むサブプログラム群を作成するサブプログラム作成部９ｂ、および自動チューニング機能を達成するための処理プログラムを作成するチューニング機能付加部９ｃを含んでいる。
【００７８】
メインプログラム作成部９ａは、指定子解析手段８を介したプログラムに、チューニング（最適化）機能を付加する。例えば、メインプログラム作成部９ａは、図３に示すプログラムの一例から、図５（ａ）に示すような、最適化を行うためのメインプログラムを作成する。
【００７９】
このメインプログラムは、実行時に「自動チューニングする」と指定して実行した場合には、後述する自動チューニングサブルーチンを呼び出し、パラメータの最適化を行う。また、「自動チューニングする」と指定しない場合には、図３に示すプログラムと同様の内容が実行される。この際、既にパラメータの最適化がなされている場合には、その結果を参照して実行するようになっている。
【００８０】
また、メインプログラム作成部９ａは、例えば図３に示すプログラムの一例であるサブルーチンを、図５（ｂ）に示すようなサブルーチン(Subroutine xxx())に書き換える。
【００８１】
なお、この例においては、図３に示す、プログラムの一例としてのサブルーチン中に指定子を記述しているが、メインプログラム中に指定子を記述したとしても、メインプログラム作成部９ａは上述と同様の処理を行う。
【００８２】
サブプログラム作成部９ｂは、メインプログラム作成部９ａによって書き換えられた図５（ｂ）に示すサブルーチンに対応する、図５（ｃ）に示すようなサブルーチン(Subroutine Sub_A(J))を新たに作成する。この図５（ｃ）に示すサブルーチンは、処理対象のチューニング領域のみをサブルーチン化したものであり、チューニング情報データベース１０ａのチューニング領域集合１０ｅを参照して作成される。
【００８３】
このサブプログラム作成部９ｂにて作成した図５（ｃ）に示すサブルーチンは、メインプログラム作成部９ａによって書き換えられた、図５（ｂ）に示すサブルーチンから呼び出されるようになっている。また、サブプログラム作成部９ｂにて作成したサブルーチンは、後述するチューニング機能付加部９ｃにて作成されるサブルーチンからも呼び出されるようになっている。
【００８４】
次に、チューニング機能付加部９ｃは、メインプログラム作成部９ａによって作成された図５（ａ）に示すメインルーチンに対応する、図５（ｄ）に示す自動チューニング機能を達成するためのサブルーチン（自動チューニングサブルーチン）を作成する。ここで、チューニング機能付加部９ｃは、後述するコスト定義関数決定手段１０から入力されるコスト関数を用いるようになっている。ここでは、計算の所要時間を、指定されたコスト関数で近似するものとする。コスト定義関数決定手段１０の詳細については後述する。プログラム作成部９は、得られたプログラムを図１に示すコンパイラ７に出力する。
【００８５】
ここで、図５（ｄ）に示す関数Ｆ（Ｉ）は、図３に示すプログラムに記載された指定子から作成される関数である。より詳細には、Ｆ（Ｉ）は、測定用ループのインデックスＩと、チューニング領域Ａをサブルーチン化する際にパラメータ化したパラメータの値Ｊとを１対１対応させる関数である。なお、この関数Ｆ（Ｉ）は、コスト定義関数決定手段１０によってサンプリングされたサンプリング点のみを含むようになっていてもよい。
【００８６】
また、図５（ｄ）に示すプログラムは、時間を計測するための測定用ループ（Ｉループ）（実測ルーチン）を含んでいる。この一例では、測定用ループ中に、サブプログラム作成部９ｂにて作成したサブルーチン(Subroutine Sub_A(J))の呼び出しを付加するようになっている。これによって、測定用ループによって、例えばコストとしての所要時間を計測することができる。
【００８７】
また、図５（ｄ）に示すプログラムは、計測した時間を用いて最適なパラメータを推定する、「パラメータ推定処理（ａ）」（推定ルーチン）を含んでいる。これによって、最適なパラメータを得ることができる。また、ここで得たパラメータは（プログラム生成部６の）外部の（例えばパラメータ情報ファイル５のような）記憶媒体に保存するようになっている。なお、パラメータ推定処理（ａ）の詳細については後述する。
【００８８】
以上に概略を説明したように、本実施形態のプログラム生成部６は、図３に示すような、指定子を含むプログラムから、図５（ａ）〜（ｄ）のような、最適化のための設定を含んだプログラムを生成できる。特に、プログラム作成手段９は、メインプログラムから呼び出す、またはメインプログラムに含ませるための、パラメータごとにサブプログラムを呼び出して所要時間を計測する実測ルーチンと、実測ルーチンにて計測した所要時間を用いて最適なパラメータを推定する推定ルーチンとを作成する。
【００８９】
なお、プログラム生成部６に、指定子に囲まれた領域（チューニング領域）が複数あるプログラムが入力された場合においても、上述と同様の処理が行われる。例えば、図３において、チューニング領域Ｂがチューニング領域Ａの下部に追加された場合、図５（ａ）（ｂ）（ｄ）において、それぞれ、チューニング領域Ａの処理部分の下部に同様の処理部分が付加される。また、図５（ｃ）と同様のサブルーチンが新たに作成される。
【００９０】
なお図５（ｄ）に示す測定用ループ、パラメータ推定処理（ａ）、および図５（ｂ）に示すパラメータ推定処理（ｂ）の呼び出しは、チューニング機能付加部９ｃによって、それぞれのプログラムに付加される。より詳細には、後述するコスト定義関数決定手段１０のコスト定義関数決定部１０ｃにて決定されたコスト定義関数などに応じて、チューニング機能付加部９ｃが所定の処理をするようになっている。
【００９１】
ここで、コスト定義関数決定手段１０について説明する。コスト定義関数決定手段１０は、チューニング情報データベース１０ａ、コスト定義関数ライブラリ１０ｂおよびコスト定義関数決定部１０ｃを含んでいる。
【００９２】
チューニング情報データベース１０ａは、パラメータ１０ｄとチューニング領域集合１０ｅとを含んでいる。チューニング情報データベース１０ａは、指定子解析手段８で解析された、チューニングに必要なパラメータ１０ｄ、および最適化の対象となるプログラムの一部分（サブプログラム）としてのチューニング領域集合１０ｅを保存するためのものである。このチューニング情報データベース１０ａには、プログラム作成手段９とコスト定義関数決定部１０ｃとがアクセスして、保存されたパラメータ１０ｄ、チューニング領域集合１０ｅを得るようになっている。
【００９３】
コスト定義関数ライブラリ１０ｂは、コスト定義関数を記録しているライブラリである。このコスト定義関数は、システムの開発者、計算装置１のユーザなどが、自由に登録／削除できるようになっている。コスト定義関数ライブラリ１０ｂは、複数のコスト定義関数を含んでおり、例えば線形多項式１０ｆを含んでいる。このコスト定義関数は、例えば、パラメータごとに計測した所要時間を近似するために用いられる。
【００９４】
コスト定義関数決定部１０ｃは、指定子に記載されたパラメータ推定処理の方式を決定する部分である。このパラメータ推定処理として、コスト定義関数決定部１０ｃは、以下のコスト定義関数決定処理、サンプル点決定処理、パラメータ推定処理（ａ）、パラメータ推定処理（ｂ）、および測定用ループ処理に関する自動チューニング付加処理を行う。これに応じて、チューニング機能付加部９ｃが、各プログラムに対して上述した所定の処理をするようになっている。
【００９５】
まず始めに、コスト定義関数決定部１０ｃは、コスト定義関数決定処理を行う。この場合、コスト定義関数決定部１０ｃは、指定子中に記載されたコスト定義関数の指定に基づいて、コスト定義関数を決定する。
【００９６】
本実施形態においては、ユーザによる指定子中でのコスト定義関数の指定は、例えば、コスト定義関数ライブラリ１０ｂに含まれている関数を指定することによって行われる。また、ユーザによって、コスト定義関数ライブラリ１０ｂに含まれない関数が指定された場合には、計算装置１のプログラム生成部６が、コスト定義関数を所定の方式で決定（自動決定）することもある。
【００９７】
コスト定義関数ライブラリ１０ｂに含まれている関数が指定された場合には、コスト定義関数決定部１０ｃは、その指定された関数そのものをコスト定義関数ライブラリ１０ｂから選択し、チューニング機能付加部９ｃにそのコスト関数を引き渡す。そして、チューニング機能付加部９ｃがプログラムを生成する。
【００９８】
一方、以下で説明するように、コスト定義関数ライブラリ１０ｂに含まれていない関数が指定された場合には、本実施形態においては、チューニング情報データベース１０ａに登録されている対象のチューニング領域について、コスト定義関数ライブラリ１０ｂに登録されている関数を順次試行して、所要時間を実測して誤差評価を行う。その評価結果から、最も精度が良く、誤差が少ないコスト定義関数を採用し、チューニング機能付加部９ｃに引き渡す。そして、チューニング機能付加部９ｃがプログラムを生成する。なお、関数について試行を行い、所要時間を実測する際には、指定子でパラメータの定義域が指定されている場合は、その定義域全てについて行う。また、指定子でパラメータの定義域が指定されていない場合には、自動生成されるパラメータの上限値を参照し、その上限値まで全ての値について行う。このように、コスト定義関数決定部１０ｃが、計測した所要時間を、コスト定義関数ライブラリ１０ｂ中に含まれるコスト定義関数の全てを順次用いて近似して、そのうちから最も近似精度のよいコスト定義関数を選択する構成であってもよい。
【００９９】
ここで、コスト定義関数決定処理の一例を、図６に概略を示す。コスト定義関数決定部１０ｃは、Ｓ１１にて、指定子に記載されている関数が、コスト定義関数ライブラリ１０ｂに含まれている関数であるか否かを判別する。ここで、例えばユーザによって自動設定要求がなされている場合は、指定された関数がコスト定義関数ライブラリ１０ｂに含まれていないものと判別することにする。
【０１００】
Ｓ１１において指定子に記載されている関数がコスト定義関数ライブラリ１０ｂに含まれている関数である場合には、Ｓ１２に進み、コスト定義関数ライブラリ１０ｂから指定された関数を取り出してＳ１３に進む。Ｓ１３においては、コスト定義関数決定部１０ｃは、取り出した関数をチューニング機能付加部９ｃに引き渡して処理を終了する。例えば、指定子に線形多項式が記載されている場合には、コスト定義関数決定部１０ｃは、コスト定義関数ライブラリ１０ｂの線形多項式１０ｆを、チューニング機能付加部９ｃに引き渡す。
【０１０１】
一方、Ｓ１１において例えば自動設定要求がなされており、指定子に記載されている関数がコスト定義関数ライブラリ１０ｂに含まれていないものと判別された場合には、Ｓ１４に進む。この場合には、指定子に記載されている関数を用いることができないため、Ｓ１４以下では、コスト定義関数ライブラリ１０ｂ中に含まれるコスト定義関数のうち、例えば最も精度の高い関数を選択するための処理を行う。
【０１０２】
Ｓ１４において、コスト定義関数決定部１０ｃは、チューニング情報データベース１０ａのチューニング領域集合１０ｅから、対応するチューニング領域を取り出し、Ｓ１５に進む。Ｓ１５においては、取り出したチューニング領域に、測定用処理部の付加として、図１０に示すような（Ｉについての）ループを設定し、Ｓ１６に進む。Ｓ１６では、コスト定義関数ライブラリ１０ｂ中に含まれる全てのコスト定義関数について、精度の確認が済んでいるか否かを判別する。
【０１０３】
Ｓ１６において精度の確認が済んでいないと判別された場合には、Ｓ１７に進んで、コスト定義関数ライブラリ１０ｂ中に含まれている、未だ精度を確認していないコスト定義関数を一つ選択する。Ｓ１７の次のＳ１８では、選択したコスト定義関数を用いて精度を評価する。Ｓ１８の次のＳ１９では、既に評価を行ったコスト定義関数による精度と、Ｓ１８において得られた精度とを比較して、Ｓ１８において得られた精度の方が良い場合には、最も精度の高いコスト定義関数の候補として、Ｓ１７にて選択したコスト定義関数を採用してＳ１６に進む。
【０１０４】
一方、Ｓ１６において全ての関数について精度の確認が済んでいると判別された場合には、Ｓ１３に進んで、最も精度の高いコスト定義関数をチューニング機能付加部９ｃに引き渡す。以上に説明したＳ１１〜Ｓ１９によってコスト定義関数決定処理の一例が実現される。
【０１０５】
次に、コスト定義関数決定部１０ｃは、実際に時間計測する場合に最適なものとなるようなサンプル点を決定するための、サンプル点決定処理を行う。
【０１０６】
例えば、ユーザによって指定子中にサンプリング点が指定されている場合には、サンプル点決定処理として、その指定されたサンプリング点を用いるように決定してもよい。また、例えばユーザによって指定子中にサンプリング点が指定されていない場合には、サンプル点決定処理として、適当なサンプリング点の集合を、誤差が少なくなるように定義域中から選択してもよい。
【０１０７】
または、例えばユーザによって指定子中にサンプリング点が指定されている場合であっても、サンプル点決定処理として、以下のように、指定されたサンプリング点のうちから、適当なサンプリング点の部分集合を、誤差が少なくなるように定義域中から選択してもよい。
【０１０８】
例えば、コスト定義関数決定部１０ｃは、図７に示すように、Ｓ２０にて指定子中の定義域を確認してＳ２１に進む。Ｓ２１では、指定子中の定義域の集合から、所定の方法でその集合の部分集合Ｓを抽出する。この部分集合Ｓとしては、その集合自身を選択してもよい。または、所定の方法として、乱数によって集合から部分集合Ｓを選択してもよい。または、集合から部分集合Ｓを選択するために、例えば遺伝的アルゴリズム（ＧＡ）を用いて選択してもよく、また過去の統計を利用してもよく、もしくは何らかの評価式を用いて決定してもよい。
【０１０９】
Ｓ２１の次のＳ２２では、対応するチューニング領域について、ここで選択した部分集合Ｓに含まれるパラメータを指定して精度を測定する。Ｓ２２の次のＳ２３では、Ｓ２２にて測定した精度が、以前のサンプル点決定処理において測定した精度よりもよければ、Ｓ２１にて選択した部分集合Ｓを、サンプル点の集合Ｏに設定する。Ｓ２４において、予め指定した試行回数が終了したか否かを判別し、終了していない場合にはＳ２１に進み、終了している場合にはＳ２５に進む。Ｓ２５では、Ｓ２３にて得た集合Ｏをサンプル点とする。このようにして、定義域が設定されている場合であっても、所定の精度を保ちつつ、さらに処理が少なくなるようにサンプリング点を決定して、さらに処理を早くできる。
【０１１０】
ここで、サンプリング点決定処理の一例について説明する。ここでは、固有値計算処理における主ループのアンローリングに関する最適化の場合について説明する。コスト定義関数としては線形５次多項式を利用し、最適化問題の解法としては最小二乗法を利用する。また、サンプル点として、サンプル点１は指定子中で指定したもので、［１−６、８、１６］とする。また、サンプル点２は自動設定したもので、［１−１６］とする。また、以下の表１において、推定パラメータ１はサンプル点１を用いてパラメータ推定したものであり、推定パラメータ２はサンプル点２を用いてパラメータ推定したものである。
【０１１１】
表１は、国産スーパコンピュータ（計算機Ａとする。）によって得られた結果を示すものである。また、表２は、国産スーパコンピュータ（計算機Ｂとする。）によって得られた結果を示すものである。また、表３は、ＰＣクラスタ（計算機Ｃとする。）によって得られた結果を示すものである。
【０１１２】
【表１】

【０１１３】
【表２】

【０１１４】
【表３】

【０１１５】
表１から、本発明に係る方法（再生方法）によって自動設定されるサンプリング点（サンプル点２）のほうが、計算機Ｃにおいて高いパラメータ推定精度を得る。したがって、本発明の機構におけるサンプル点自動決定処理による効果は大きいといえる。
【０１１６】
次に、コスト定義関数決定部１０ｃは、自動チューニング付加処理を順次行う。この自動チューニング付加処理は、パラメータ推定処理（ａ）、パラメータ推定処理（ｂ）、および測定用ループ処理を含んでいる。
【０１１７】
パラメータ推定処理（ａ）は、サンプリング点決定処理で決まったサンプリング点、およびそのサンプリング点に対する実行時間を入力することで、コスト定義関数決定処理で決定したコスト定義関数を基にして、適切な最適化問題を解くプログラムを生成するための処理である。生成されたプログラムは、チューニング機能付加部９ｃの生成するプログラムから、パラメータ推定処理（ａ）として呼びだされる。
【０１１８】
このパラメータ推定処理（ａ）においては、図８に示すように、Ｓ２６にてサンプリング点決定処理で決まったサンプリング点、およびそのサンプリング点についての実行時間を得て、Ｓ２７に進む。例えば、図５（ｄ）に示すプログラムにおいては、測定用ループの後にパラメータ推定処理（ａ）が行われるため、測定用ループによって測定された値を得る。
【０１１９】
Ｓ２７では、コスト定義関数決定処理で決定したコスト定義関数を基にして、適切な最適化問題を解く。Ｓ２７の次のＳ２８では、推定による適切なパラメータ、およびコスト定義関数の係数情報を得る。以上のような処理によって、推定による適切なパラメータを得ることができる。なお、ここで示すフローチャートは、パラメータ推定処理（ａ）の一例を示すものであり、これに限るものではない。また、パラメータ推定処理（ａ）を実現するプログラムは、例えばこの図８に示す各処理を実行するものであればよく、詳細は問わない。
【０１２０】
次に、パラメータ推定処理（ｂ）は、パラメータ推定処理（ａ）にて自動決定されたコスト定義関数の係数情報を入力とすることで、コスト定義関数決定処理で決定したコスト定義関数を基にして、適切な最適化問題を解く。これによって、最適と推定されるパラメータを決定する処理のプログラムを自動生成する。生成されたプログラムは、チューニング機能付加部９ｃの生成するプログラムから、パラメータ推定処理（ｂ）として呼びだされる。
【０１２１】
パラメータ推定処理（ｂ）においては、例えば図９に示すように、Ｓ２９にてパラメータ推定処理（ａ）で決定されたコスト定義関数の係数情報を得る。例えば、図５（ａ）（ｂ）（ｄ）で示すプログラムの一例においては、自動チューニングを行ってパラメータ推定処理（ａ）が行われた後に、パラメータ推定処理（ｂ）が行われるので、このようにコスト定義関数の係数情報を得ることができる。
【０１２２】
Ｓ２９の次のＳ３０では、コスト定義関数決定部で決定されたコスト定義関数からのコスト情報を用いて、最適なパラメータを決定し、Ｓ３１に進む。Ｓ３１では、推定による適切なパラメータを得る。このように、Ｓ２９〜Ｓ３１の処理によって、推定による適切なパラメータを得ることができる。なお、ここで示すフローチャートは、パラメータ推定処理（ｂ）の一例を示すものであり、これに限るものではない。また、パラメータ推定処理（ｂ）を実現するプログラムは、例えばこの図９に示す各処理を実行するものであればよく、詳細は問わない。
【０１２３】
次に、測定用ループ処理は、図１０に示すように、サンプル点決定処理で決定されたサンプル点の個数に応じた測定用ループを形成するものである。
【０１２４】
以上に説明した、パラメータ推定処理（ａ）、パラメータ推定処理（ｂ）、および測定用ループ処理によって、自動生成されたプログラムは、チューニング機能付加部９ｃに送られる。これらのプログラムは、チューニング機能付加部９ｃによって生成されたプログラムから呼び出される。
【０１２５】
ここで、計算装置１による処理について、具体例を参照して説明する。計算装置１は、以下のようにプログラムを生成して最適化を行う。ここでは、一例として、計算機言語として、Fortran90言語を用いている場合について説明する。また、本実施形態のユーザは、ＭＰＩ(MessagePassingInterface)を計算機環境として利用している。しかしながら、本発明はこれに限るものではない。なお以下に説明する、生成された計算機言語は、本実施形態の説明のためのものであり、本発明はこれに限るものではない。また、本実施形態の説明用に特化したものであり、本実施形態の計算装置１によって生成される計算機言語と厳密に同一ではないことに注意する。
【０１２６】
この計算装置をユーザが用いる際には、図１１に示すように、Ｓ３５にてユーザが所定の形式のプログラムを計算装置１に入力する。ここで、所定の形式のプログラムとは指定子にて最適化するべきパラメータなどを指定したものである。
【０１２７】
ここで、このＳ３５においてユーザによって入力されるプログラムの一例を、図１２に示す。このプログラムは、行列積の処理をFortran90言語で記述したプログラムであって、指定子を記述して自動チューニング機能の付加を指示した一例である。なお、図１２に示す例において、『!ABCLib$』にて始まる行が、指定子に相当する。
【０１２８】
上記の例では、９行目の指定子『varied (i) from 1 to 8』にて、１段から８段までパラメータ(i)についてアンローリング指定（＝パラメータ化）をするように指定されている。１１行目から１７行目までは、チューニング領域に相当する。１０行目の指定子『fitting polynomial 5』は、コスト定義関数ライブラリ内に登録されている５次線形多項式（fitting polynomial 5）の利用を指定する。また、１０行の指定子『sampled (1-3,6,8)』は、サンプリング点〔１−３，６，８〕についてパラメータ推定を行うことを指定する。これらの指定子の情報から、自動チューニング機能を付加した計算機言語を自動生成する。
【０１２９】
Ｓ３６においては、計算装置１のプログラム生成部６が、入力されたプログラムから、パラメータの調整に適したプログラムを生成する。プログラム生成部６は、生成したプログラムを、Ｓ３７にてコンパイラ７に出力する。
【０１３０】
ここで、Ｓ３６においてプログラム生成部６が生成したプログラムの一例を、図１３（ａ）（ｂ）、図１４（ａ）（ｂ）として示す。
【０１３１】
図１３（ａ）は、図１２のプログラムからプログラム生成部６が生成したメインプログラムを示す。図１３（ｂ）は、図１２のプログラムからプログラム生成部６が生成した自動チューニング用プログラムを示す。また、図１４（ａ）と（ｂ）とを一体としたプログラムは、プログラム生成部６が図１２のプログラムから生成した、チューニング領域を含むサブルーチンである。
【０１３２】
Ｓ３８においては、コンパイラ７が、プログラムを実行形式に翻訳して、プロセッサ２に入力する。Ｓ３９においては、プロセッサ２が翻訳された実行形式を実行して、最適なパラメータを得て、例えばパラメータ情報ファイル５に出力する。以上のようにして、計算装置１を用いれば、指定子によってパラメータを指定したプログラムを入力することによって、そのプログラムについて容易に最適化を行うことができる。
【０１３３】
以上のように、本実施形態に係る計算装置１は、最適化を行うプログラム中の領域と最適化を行うパラメータとを指定する指定子が含まれているプログラムが入力されると、指定子によって指定される領域とパラメータとについての、実測による最適化を実行するためのプログラムを生成するプログラム生成部６を備えている。したがって、パラメータの最適化を容易に行うことができる。
【０１３４】
また、以上のように、本発明は、プログラム中の任意の箇所において自動チューニング機能を付加するためのプログラム利用形態、プログラミング言語処理装置、プログラミング言語処理方法、および記録媒体に関するものである。
【０１３５】
ここで、従来の構成によれば、パラメータの登録の後にも、実際にパラメータの最適化を達成するために、種々の設定が必要となる。したがって、パラメータを最適化する際には、開発時間と開発費用の増大、機能拡張性の低さ、およびバグ混入の可能性の高さなどの問題を生ずることになる。
【０１３６】
そこで本発明では、最終的に利用者が必要となる計算機言語を用いて自動的に自動チューニング処理を付加する指定子（ディレクティブ）を利用し、かつその指定子で記述されたプログラムに対する処理機構を解決手段として用いることで、上述の問題を解決した。
【０１３７】
例えば、上述の実施形態のように、自動チューニング機能を付加したプログラム生成を自動的に行うので、自動チューニング機能を付加したソフトウェアにおおいて、開発時間と開発費用の増大を防止し、低い機能拡張性を生じさせず、また、高いバグ混入の可能性にいたることがない。
【０１３８】
また、パラメータ調整のための最適化問題求解処理において、本発明の計算装置に搭載したコスト定義関数ライブラリとコスト定義関数決定部の機能によって、複数のコスト定義関数から誤差が最小となるコスト定義関数の自動選択、およびサンプリング点の自動選択が可能となる。
【０１３９】
このことから、上述の実施形態のように、パラメータ推定のための最適化問題求解処理において、従来のような、手作業で実装するために単一のコスト定義関数による推定機能しか実現できず、このため低いパラメータ推定精度を生じていた、といった問題を解決できる。これによって、従来から問題となっている、パラメータ推定機能が低い、という問題を解決できる。
【０１４０】
なお、以下では、指定子解析部８ａが、指定子から抽出したパラメータを、図１に示すパラメータ調整層４へと入力した場合の処理について説明する。このように、パラメータ調整層４が自動チューニングの種類ごとに分けられた処理の付加をしてもよい。また、プログラム生成部６にて付加される自動チューニング機能は、パラメータ調整層４の指示によるものであるとみなすこともできる。
【０１４１】
計算装置１を用いてユーザがライブラリ３を実行する際には、所望のサブルーチン３ａに対して適当なパラメータを設定した上で実行指示をする。
【０１４２】
ここで、サブルーチン３ａに対して設定されるパラメータには、計算装置１の実行性能のみを変化させて、ライブラリ３のサブルーチン３ａの出力を変化させないパラメータが含まれる。以下では、このようなパラメータを、性能情報パラメータ(Performance Parameters :ＰＰ)と呼ぶ。
【０１４３】
また、サブルーチン３ａに対して設定されるパラメータのうち、計算装置１の実行性能とライブラリ３のサブルーチン３ａの出力とを共に変化させるようなパラメータを、以下では基本情報(Basic Parameters :ＢＰ)パラメータと呼ぶ。
【０１４４】
例えば、数値計算ライブラリに含まれるサブルーチン３ａが、行列の固有値を計算する固有値計算サブルーチンであるとする。このとき、所望の行列の実体や、その行列のサイズなどは、基本情報パラメータＢＰに相当する。また、計算装置１の行列計算におけるループアンローリング段数は、性能情報パラメータＰＰに相当する。
【０１４５】
計算装置１においては、与えられた基本情報パラメータＢＰを用いて、性能パラメータＰＰを最適化することによって、所望の結果を最小の時間で得ることができる。性能情報パラメータＰＰ、基本情報パラメータＢＰは、パラメータ調整層４を介してライブラリ３に入力される。性能情報パラメータＰＰおよび基本情報パラメータＢＰ以外のパラメータは、計算装置１の外部からライブラリ３に直接入力されるか、またはライブラリ３の内部で用いられる。
【０１４６】
本実施形態のパラメータ調整層４は、図１５に示すように、調整可能なパラメータである性能情報パラメータＰＰを最適化するために、インストール時最適化層４ａ、実行前最適化層４ｂ、実行時最適化層４ｃの各層を備えている。各層４ａ〜４ｃはパラメータを自身で保持することはなく、パラメータ情報ファイル５に保存する。
【０１４７】
インストール時最適化層（ＩＯＬ）４ａは、ライブラリ３のインストール時に最適化を行う。
【０１４８】
インストール時最適化層４ａは、例えば図１７（ａ）に示すように、ライブラリ３のインストール時に（Ｓ１）、性能情報パラメータＰＰのうちの一部であるインストール時最適化パラメータ（ＩＯＰ）を最適化し（Ｓ２）、得られたパラメータ（ＩＯＰ）をパラメータ情報ファイル５に出力する。
【０１４９】
なお、ライブラリ３のインストール時には、通常は、基本情報パラメータＢＰが定まっていることはない。このため、インストール時最適化層４ａは、例えば基本情報パラメータＢＰの値を適当にサンプリングして、そのサンプリングした抽出点ごとに、適当に定義したコスト定義関数を最小化するパラメータを決定する。そして、適当なモデル式によって、サンプリングした抽出点と抽出点との間のデータについて補間する。
【０１５０】
実行前最適化層（ＢＥＯＬ）４ｂは、ユーザが指定する特定パラメータ（例えば問題サイズなど）の指定後に最適化を行う。
【０１５１】
実行前最適化層４ｂは、基本情報パラメータＢＰの入力に応じて、これを用いて、性能情報パラメータＰＰのうちの一部である実行前最適化パラメータＢＥＯＰを最適化する。例えば図１７（ｂ）に示すように、ユーザ指定パラメータとしての基本情報パラメータＢＰの定義（入力）に応じて（Ｓ４）、パラメータ情報ファイル５のパラメータ（ＩＯＰ）を参照して（Ｓ５）、最適化を行い（Ｓ６）、得られた最適化パラメータ（ＢＥＯＰ）をパラメータ情報ファイル５に出力する。
【０１５２】
なお、実行前最適化層４ｂは、ユーザによって指定された基本情報パラメータＢＰを用いて、最適なパラメータを得るために、実測にて試行をする。
【０１５３】
実行時最適化層（ＲＯＬ）４ｃは、インストール時最適化層４ａまたは実行前最適化層４ｂの少なくとも一方によるパラメータ最適化が終了した後で、かつ対象のライブラリ（やルーチン）の実行時に、最適化を行う。
【０１５４】
実行時最適化層４ｃは、例えば図１７（ｃ）に示すように、ライブラリ３（ライブラリ３のサブルーチン３ａ）の実行指示を検出すると（Ｓ８）、既に設定された性能情報パラメータＰＰを参照して（Ｓ９）、この性能情報パラメータＰＰによる計算が所望の精度を満たしていないときには、最適化を再度行う（Ｓ１０）。Ｓ１０においては、計算が所望の精度を満たすような、最適なパラメータＰＰが得られるまで計算を繰り返す。
【０１５５】
このように、実行時最適化層４ｃは、既に設定された性能情報パラメータＰＰを参照して、例えば十分な精度が得られるような所定の場合には、最適化のための計算を行わない。
【０１５６】
以上のように、本実施形態のパラメータ調整層４においては、インストール時最適化層４ａにて最適化したパラメータ情報ＩＯＰは、パラメータ情報ファイル５に保存され、実行前最適化層４ｂと実行時最適化層４ｃとで参照可能となっている。また、実行前最適化層４ｂにて最適化したパラメータ情報ＢＥＯＰは、パラメータ情報ファイル５に保存され、実行時最適化層４ｃで参照可能となっている。
【０１５７】
ここで、性能情報パラメータＰＰの各要素は、パラメータ（ＩＯＰ）、パラメータ（ＢＥＯＰ）、パラメータ（ＲＯＰ）の各集合のうちの少なくとも一つに含まれている。すなわち、性能情報パラメータＰＰの各要素は、パラメータ調整層４の各層４ａ〜４ｃのために、重複を許して、３つの部分集合（ＩＯＰ、ＢＥＯＰ、ＲＯＰ）に分解される。これを式で表現すると、以下のようになる。
ＰＰパラメータ＝ＩＯＰ ∪ ＢＥＯＰ ∪ ＲＯＰ …（式１）
したがって、本実施形態の計算装置１は、パラメータ調整層４を用いて、性能情報パラメータＰＰに含まれる全ての要素を、上述したタイミングのいずれかにて最適化できる。
【０１５８】
特に、本実施形態の計算装置１は、問題に応じた例えば行列サイズ（ｎ）のような基本情報パラメータＢＰが定まると、実際の計算の実行前の時点で最適化を行う実行前最適化層４ｂを備えている。これによって、従来の計算装置よりも正確な最適化が可能となる。
【０１５９】
ここで、従来の自動チューニングソフトウェアの構成方式では、例えば図１８（ａ）に示すようにソフトウェアインストール時にパラメータ最適化を行うもの、または例えば図１８（ｂ）に示すようにライブラリ実行時にパラメータ最適化を行うもの、のみ存在していた。これらのソフトウェア構成方式では、汎用的な処理に適用できない、パラメータ調整が不十分となる場合がある、という問題がある。また図１８（ａ）（ｂ）から分かるように、従来の自動チューニングではパラメータは１種類であった。
【０１６０】
そこで本発明においては、より汎用的な処理においてパラメータ調整が適用でき、かつ従来よりも高度なパラメータ調整機構を有するソフトウェア構成方式によって課題の解決をねらうものである。
【０１６１】
特に、本実施形態の計算装置１は、問題に応じた例えば行列サイズ（ｎ）のような基本情報パラメータＢＰが定まると、実際の計算の実行前の時点で最適化を行う実行前最適化層４ｂを備えている。これによって、従来の計算装置のような、ＩＯＬ、またはＲＯＬ単独の場合よりもより正確な最適化が可能となる。
【０１６２】
次に、上述した構成のプログラム、コンピュータなどについて、その特徴点を説明する。
【０１６３】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった地点を検出する手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う手順とを含んでいることを特徴としている。
【０１６４】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【０１６５】
上記プログラムが実行されたコンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった地点を検出する。
【０１６６】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【０１６７】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【０１６８】
すなわち、ライブラリの内容を数式として表したときに、数式中の変数として表現されるパラメータが、基本情報パラメータに相当する。また、数式中に現れず、または数式において単なる媒介変数として現れるパラメータが、性能情報パラメータに相当する。このため、例えば性能情報パラメータを変化させたとしても、数式によって得られる結果（ライブラリの出力）は変わらない。
【０１６９】
その後、コンピュータは、ライブラリの実際の実行の前に、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。これによって、確実に最適な性能情報パラメータを得ることができる。
【０１７０】
ここで、従来の最適化のためのプログラムの一例は、例えばライブラリのインストール時に性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【０１７１】
また、従来の最適化のためのプログラムの他の一例は、例えばライブラリの実行時に性能情報パラメータの最適化を行う。この場合には、性能情報パラメータを最適化するための計算時間が、ライブラリの実行コストに計上されてしまう。このため、最適化のために十分な時間を取れずに、最適なパラメータが得られない虞れがある。
【０１７２】
そこで、本発明に係る上述のプログラムのように、実際の計算の前に、実行コストを予め実測して、最適な性能情報パラメータを得るようにする。これによって、より精密かつ確実なパラメータ調整が可能となる。また、プログラムの実行前において、計算所要時間を予測できる。
【０１７３】
なお、本発明に係るプログラムを、ユーザが知りうる情報が定まった地点でのパラメータ最適化機能を有するソフトウェアである、と表現することもできる。
【０１７４】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリのインストール時に上記性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった地点を検出する検出手順と、上記初期設定手順において設定された上記性能情報パラメータを参照して、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順とを含んでいることを特徴としている。
【０１７５】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【０１７６】
上記プログラムが実行されたコンピュータは、ライブラリのインストール時に、性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【０１７７】
また、コンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった地点を検出する。
【０１７８】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【０１７９】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【０１８０】
その後、コンピュータは、ライブラリの実際の実行の前に、インストール時に設定された性能情報パラメータを参照して、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。特に、インストール時に設定された性能情報パラメータの最適値周辺の値のみについて、試行計算を行うようにしてもよい。これによって、試行計算の回数を削減して、最適な性能情報パラメータを得ることができる。このように、より精密かつ確実なパラメータ調整が可能となる。
【０１８１】
なお、本発明に係るプログラムを、ソフトウェアのインストール時、およびユーザが知りうる情報が定まった地点でのソフトウェアの実行前、のパラメータ最適化機能を有するソフトウェアである、と表現することもできる。
【０１８２】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった地点を検出する検出手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいることを特徴としている。
【０１８３】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【０１８４】
上記プログラムが実行されたコンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった地点を検出する。
【０１８５】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【０１８６】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【０１８７】
その後、コンピュータは、ライブラリの実際の実行の前に、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。これによって、確実に最適な性能情報パラメータを得ることができる。
【０１８８】
また、コンピュータは、ライブラリの実際の実行の際に、既に設定された性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしているか否かを試行により判別する。そして、所望の精度を満たしていないときには、基本情報パラメータを用いて性能情報パラメータの最適化を再度実行する。そして、所望の精度を得ることのできる性能情報パラメータを用いて、ライブラリを実行する。
【０１８９】
このように、実際の計算の前に、実行コストを予め実測して、最適な性能情報パラメータを得るようにする。基本情報パラメータの変更がないときには、予め設定した性能情報パラメータを用いてライブラリを実行できる。また、基本情報パラメータの変更があるときでも、所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できる。したがって、実行時におけるパラメータの最適化に要する時間を不要として、ライブラリの実行コスト（計算時間）を増大させない。また、ライブラリの実行の前に精度を確認するので、より精密かつ確実なパラメータ調整が可能となる。
【０１９０】
なお、本発明に係るプログラムを、ユーザが知りうる情報が定まった地点でのソフトウェアの実行前、およびソフトウェア実行時、のパラメータ最適化機能を有するソフトウェアである、と表現することもできる。
【０１９１】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリのインストール時に上記性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった地点を検出する検出手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいることを特徴としている。
【０１９２】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
上記プログラムが実行されたコンピュータは、ライブラリのインストール時に、性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【０１９３】
また、コンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった地点を検出する。
【０１９４】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【０１９５】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【０１９６】
また、コンピュータは、ライブラリの実際の実行の際に、既に設定された性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしているか否かを試行により判別する。そして、所望の精度を満たしていないときには、基本情報パラメータを用いて性能情報パラメータの最適化を再度実行する。そして、所望の精度が得られる性能情報パラメータを用いて、ライブラリを実行する。
【０１９７】
このように、実際の計算の前に、性能情報パラメータを設定しておく。実際の計算の際に、その性能情報パラメータによって所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できる。したがって、実行時におけるパラメータの最適化に要する時間を不要として、ライブラリの実行コスト（計算時間）を増大させない。また、ライブラリの実行の前に精度を確認するので、より精密かつ確実なパラメータ調整が可能となる。
【０１９８】
なお、本発明に係るプログラムを、ソフトウェアのインストール時、およびソフトウェア実行時、のパラメータ最適化機能を有するソフトウェアである、と表現することもできる。
【０１９９】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を、上記コンピュータに実行させるためのプログラムにおいて、上記ライブラリのインストール時に上記性能情報パラメータの最適化を行う初期設定手順と、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった地点を検出する検出手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う前調整手順と、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順とを含んでいることを特徴としている。
【０２００】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。
【０２０１】
上記プログラムが実行されたコンピュータは、ライブラリのインストール時に、性能情報パラメータの最適化を行う。この場合、例えば行列のサイズのような基本情報パラメータが定まっていないため、所定の誤差を含んだ、なんらかの推定モデルによって、最適な性能情報パラメータを推測する。
【０２０２】
また、コンピュータは、ライブラリの実際の実行の前に、例えばユーザからの基本情報パラメータの入力を検出することによって、基本情報パラメータが定まった地点を検出する。
【０２０３】
ここで、基本情報パラメータとは、実行性能とライブラリの出力とを共に変化させるパラメータである。
【０２０４】
例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、行列のサイズ、行列の実体などが、基本情報パラメータに相当する。また、例えば並列計算機を用いる場合のループアンローリング段数は、性能情報パラメータに相当する。
【０２０５】
その後、コンピュータは、ライブラリの実際の実行の前に、インストール時に設定された性能情報パラメータを参照して、基本情報パラメータを用いて性能情報パラメータの最適化を行う。より詳細には、例えば基本情報パラメータを用い、性能情報パラメータのそれぞれの値について試行計算を行って、実行コストを予め実測する。特に、インストール時に設定された性能情報パラメータの最適値周辺の値のみについて、試行計算を行うようにしてもよい。これによって、試行計算の回数を削減して、最適な性能情報パラメータを得ることができる。このように、より精密かつ確実なパラメータ調整が可能となる。
【０２０６】
また、コンピュータは、ライブラリの実際の実行の際に、既に設定された性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしているか否かを試行により判別する。そして、所望の精度を満たしていないときには、基本情報パラメータを用いて性能情報パラメータの最適化を再度実行する。そして、所望の精度が得られる性能情報パラメータを用いて、ライブラリを実行する。
【０２０７】
このように、実際の計算の前に、実行コストを予め実測して、最適な性能情報パラメータを得るようにする。基本情報パラメータの変更がないときには、予め設定した性能情報パラメータを用いてライブラリを実行できる。また、基本情報パラメータの変更があるときでも、所望の精度が得られる場合には、パラメータの最適化のための計算をせずに、ライブラリを実行できる。したがって、実行時におけるパラメータの最適化に要する時間を不要として、ライブラリの実行コスト（計算時間）を増大させない。また、ライブラリの実行の前に精度を確認するので、より精密かつ確実なパラメータ調整が可能となる。
【０２０８】
なお、本発明に係るプログラムを、ソフトウェアのインストール時、ユーザが知りうる情報が定まった地点でのソフトウェアの実行前、およびソフトウェア実行時、の３階層のパラメータ最適化機能を有するソフトウェアである、と表現することもできる。
【０２０９】
本発明に係るプログラムは、上記課題を解決するために、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータについて最適化する機能を上記コンピュータに実現させるためのプログラムにおいて、上記性能情報パラメータの各要素を、上記ライブラリのインストール時に最適化を行うパラメータの第１の集合、上記ライブラリの実行の前に最適化を行うパラメータの第２の集合、または上記ライブラリの実行の際に最適化を行うパラメータの第３の集合のうちの少なくとも一つに含まれるように設定して、第１の集合の要素を最適化する機能と、第２の集合の要素を最適化する機能と、第３の集合の要素を最適化する機能とを上記コンピュータに実現させることを特徴としている。
【０２１０】
このプログラムは、コンピュータにおける例えば数値計算ライブラリのようなライブラリの実行コストを最適化するために用いられるプログラムである。実行コストとは、例えば実行に要する計算資源、計算時間である。このプログラムは、ライブラリのパラメータのうち、実行性能のみを変化させてライブラリの出力を変化させない性能情報パラメータの値を、ライブラリの実行コストが最適なものとなるように調整する。例えば数値計算ライブラリのうちの、行列の固有値計算ライブラリにおいては、並列計算機を用いる場合のループアンローリング段数が、性能情報パラメータに相当する。
【０２１１】
上記プログラムが実行されたコンピュータにおいては、性能情報パラメータが、ライブラリのインストール時に最適化を行うパラメータの第１の集合、ライブラリの実行の前に最適化を行うパラメータの第２の集合、またはライブラリの実行の際に最適化を行うパラメータの第３の集合のうちの少なくとも一つに含まれるように設定される。
【０２１２】
ここで、性能情報パラメータが何らかの意味で最適化可能であるならば、インストール時、ライブラリ実行前、ライブラリ実行の際のいずれかにおいて最適化することは、常に可能である。また、性能情報パラメータを、上述の第１〜第３のうちから選択された少なくとも一つ以上の集合に含まれるように設定する具体的な構成には、ある程度任意性があるが、その構成はどのように選択してもよい。
【０２１３】
そして、コンピュータは、第１〜第３の集合について、それぞれ最適化を行う。したがって、性能情報パラメータの全てが最適化可能となり、汎用な処理に適用できる。すなわち、複数のルーチンを含んだライブラリ全体に対する最適化が可能となる。
【０２１４】
一方、従来の最適化法は、ソフトウェアインストール時にパラメータ最適化を行うもの、またはライブラリ実行時にパラメータ最適化を行うもの、のいずれか一方しかなかった。このため、問題によっては、インストール時にしか最適化できない、または実行時にしか最適化できないものがあるので、全ての問題に対して汎用することができなかった。
【０２１５】
なお、本発明に係るプログラムを、最適化すべきパラメータに関して、インストール時、実行前、実行時の３種のパラメータに分離し、それぞれのパラメータ最適化を行うソフトウェアである、と表現することもできる。
【０２１６】
本発明に係る記録媒体は、上記課題を解決するための、上述のいずれかのプログラムを記録したコンピュータ読み取り可能な記録媒体である。
【０２１７】
この記録媒体がコンピュータにて読み取られると、上述のいずれかのプログラムがコンピュータにて実行される。したがって、上述のプログラムと同様の効果を得ることができる。
【０２１８】
なお、記録媒体の構成としては、ハードディスク、CD ROM(Read Only Memory)などに限るものではなく、どのような記録媒体であってもよい。
【０２１９】
また、本発明に係るコンピュータは、上記課題を解決するために、上述の記録媒体を備えている構成である。
【０２２０】
このコンピュータにて上述の記録媒体を読み取りすると、上述のいずれかのプログラムがコンピュータにて実行される。したがって、上述のプログラムと同様の効果を得ることができる。
【０２２１】
なお、このコンピュータは、コンピュータ内に複数のプロセッサを有する並列計算装置であってもよいし、または、複数のコンピュータがネットワークに接続されて複数のプロセッサを有する計算装置として機能する分散計算装置であってもよい。
【０２２２】
また、上述のコンピュータは、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの調整を行う調整方法において、上記ライブラリの上記パラメータに含まれる、実行性能と上記ライブラリの出力とを共に変化させる基本情報パラメータが定まった地点を検出する手順と、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を行う手順とを含んでいる調整方法を実行するものである、と表現することもできる。
【０２２３】
また、上述のコンピュータは、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの調整を行う調整方法において、上記ライブラリの実行の際に、既に設定された上記性能情報パラメータを参照して、この性能情報パラメータによる計算が所望の精度を満たしていないときには、上記基本情報パラメータを用いて上記性能情報パラメータの最適化を再度行う再調整手順を含んでいる調整方法を実行するものである、と表現することもできる。
【０２２４】
また、上述のコンピュータは、上記調整方法を実行することによって、コンピュータに備えられたライブラリのパラメータに含まれる、実行性能のみを変化させて上記ライブラリの出力を変化させない性能情報パラメータの最適化を行う調整装置として機能する。また、上述のコンピュータは、上述のプログラムとライブラリとを備えた計算装置として機能する。
【０２２５】
なお、上述の構成において、性能情報パラメータの最適化とは、性能情報パラメータの全てを最適化するものではなく、最適化が可能なもののうち、適当なものについて最適化を行うことを意味する。
【０２２６】
〔実施の形態２〕
本発明の他の実施の形態について、図１９ないし図２７に基づいて説明する。本実施形態の計算装置は、実施の形態１にて説明した、図１に示す計算装置１と同様の構成を有しており、以下では簡単のため計算装置１として参照する。
【０２２７】
計算装置１は、所定の形式の計算機言語（プログラム）から、パラメータの最適化（チューニング）が容易となるような他のプログラムを生成する、プログラム生成部６を備えた構成である。また、プログラムを実行形式に翻訳するコンパイラ７を備えている。
【０２２８】
本実施形態の計算装置１のプログラム生成部６は、メインプログラムの実行の際に最適化を行うために、メインプログラムの先頭において実測ルーチンおよび推定ルーチンを呼び出すようなメインプログラムを生成する構成である。また、計算装置１のプログラム生成部６は、指定子に応じて、サブプログラムを呼び出しているループ中において、実測ルーチンおよび推定ルーチンを呼び出すようなメインプログラムを生成こともできる。
【０２２９】
以下では、計算装置１の動作の概略について説明した後に、より具体的な実施例について説明する。
【０２３０】
図１および図２に示すように、本実施形態の計算装置１においても、指定子を記述したプログラムを用いて、自動チューニング機構の付加されたプログラムが生成される。そして、このプログラムを実行することによって、プログラムに含まれるパラメータの最適化を実行できる。
【０２３１】
より詳細には、計算装置１の有するプログラム生成部６は、図２に示すように、指定子解析手段８、プログラム作成手段９、およびコスト定義関数決定手段１０を含んでいる。そして、実施の形態１と同様に、指定子を記述したプログラムが処理されて、新たなプログラムが生成される。このプログラムの生成とは、プログラムの書き換えをも含むものとする。
【０２３２】
本実施形態においては、指定子解析手段８、プログラム作成手段９の構成・動作は、実施の形態１と異なるものとなっている。この点について、以下で図面を参照してより詳細に説明をする。
【０２３３】
図１９（ａ）（ｂ）には、計算装置１の処理の対象となる、指定子の記述されたプログラムの一例を示す。
【０２３４】
図１９（ｂ）に示すサブルーチンxxxは、指定子の記述された、最適化するべきチューニング領域Ｂを有するプログラムの一例である。また、図１９（ａ）に示すメインルーチンは、そのサブルーチンxxxを呼び出している。
【０２３５】
なお、図１９（ａ）（ｂ）は、それぞれ本実施形態の説明のために必要な箇所のみを特に示した一例であり、例えばサブルーチンxxxには図１９（ｂ）に示すように他の領域が含まれていてもよいし、また例えばメインルーチンにも図示しない他の領域、他の処理が含まれていてもよい。また、プログラム中の日本語による記載は、特に断らない場合は、具体的なプログラムの一例ではなく、プログラムによって実現するべき制御動作を抽象的に日本語で表現したものである。また、プログラム中の日本語が、プログラム中のコメントを表す場合もある。
【０２３６】
ここで、指定子解析手段８の有する指定子解析部８ａは、上述の実施の形態のように、自動チューニングの種類（インストール時、実行起動前）ごとに、指定子で指定されたパラメータとチューニング領域を分け、別々に以降の機構に引き渡し、処理を行うことが可能である。また、指定子解析部８ａは、本実施形態にて説明するように、自動チューニングの種類として、実行時の自動チューニングを行うこともできる。
【０２３７】
より詳細には、指定子解析部８ａは、図１９（ｂ）に示すような実行時最適化を指示する指定子の有無を判別するとともに、実行時最適化において起動時に最適化を行うか、または実行時最適化において該当部分実行時に最適化を行うかのいずれかを指定する指定子についても判別を行うようになっている。
【０２３８】
また、この指定子解析部８ａは、指定子に基づいて判別した情報を、図２に示すプログラム作成手段９のメインプログラム作成部９ａとサブプログラム作成部９ｂとに通知する。そして、プログラム作成手段９は、上述の指定子に応じた処理を行う。
【０２３９】
ここで、上述の実施の形態において、インストール時の最適化を指定する指定子installに対応するものとして、例えば実行時最適化を指定する指定子の一例としてdynamicを用いることができる。また、実行時最適化において、例えば起動時に最適化を行うための指定子の一例としてinitを用いることができ、また、該当部分実行時に最適化を行うための指定子の一例としてはhereを用いることができる。
【０２４０】
なお、以下に説明する例では、プログラムをFortran言語を用いて記述しているが、本発明はこれに限るものではなく、任意の関数型計算機言語（Ｃ言語、Ｃ＋＋言語など）を用いて記載されたプログラムにおいても、本発明による処理の本質は同じとなる。したがって、本発明の処理は、計算機言語の違いによる影響を受けない。
【０２４１】
以下では、指定子としてinitを用いる方式１（起動時実行方式）と、指定子としてhereを用いる方式２（該当部分実行方式）とに分けて説明をする。
【０２４２】
まず、方式１においては、図１９（ｂ）に示す実行時最適化指定子として、dynamicとinitが指定されているものとする。このとき、このプログラムの入力に応じて、計算装置１の指定子解析部８ａ・メインプログラム作成部９ａ・サブプログラム作成部９ｂが、図２０（ａ）に示すようなメインプログラム、図２０（ｂ）に示すようなプログラム、図２０（ｃ）に示すような実測・推定ルーチンを生成する。また、ここでは、図示していないが、チューニング領域Ｂについて、最適化するためのパラメータを引数としてチューニング領域Ｂを実行するプログラムを、サブプログラムとして作成する。
【０２４３】
一方、方式２においては、図１９（ｂ）に示す実行時最適化指定子として、dynamicとhereが指定されているものとする。このとき、このプログラムの入力に応じて、計算装置１の指定子解析部８ａ・メインプログラム作成部９ａ・サブプログラム作成部９ｂが、図２１（ａ）に示すようなメインプログラム、図２１（ｂ）に示すようなプログラム、図２１（ｃ）に示すような実測・推定ルーチンを生成する。また、ここでは、図示していないが、チューニング領域Ｂについて、最適化するためのパラメータを引数としてチューニング領域Ｂを実行するプログラムを、サブプログラムとして作成する。
【０２４４】
なお、図１９（ａ）（ｂ）においては、サブルーチン中に指定子を記述した例について説明しているが、これに限るものではない。方式１、方式２のいずれにおいても、メインルーチン中に指定子を記述した場合であっても、最適化するためのチューニング領域についてサブプログラムを作成し、メインプログラムとしてのメインルーチンから呼び出すようにするという点で、同様の処理がなされる。
【０２４５】
以上に概略を説明したように、方式１にて生成したプログラムは図２０（ａ）に示すようにメインプログラムの先頭に実測・推定ルーチンとしてのAuto_xxxの呼び出しがある一方、方式２にて生成したプログラムは図２１（ａ）に示すようにチューニング領域の直前に実測・推定ルーチンとしてのAuto_xxxの呼び出しがある。この違いによって、後述するように、実測・推定ルーチンの実行による最適化のための時間が大きく異なることになる。
【０２４６】
以下では、サブプログラムのより詳細な一例を参照して、より具体的な実施例について説明する。ここでは、実施例として、疎行列連立一次方程式の解法で用いられる反復解法の１つである、共役勾配法（ＣＧ法：Conjugate Gradient）への適用例について説明する。
【０２４７】
ＣＧ法とは、疎行列Ａと右辺ベクトルｂとが与えられたときに、連立一次方程式Ａｘ＝ｂを満たす解ベクトルｘを求めるための方法である。このような解法はいろいろ知られているが、ＣＧ法は反復解法と呼ばれる解法の一つである。このＣＧ法においては、反復回数（後述するＩループの繰り返し回数）は疎行列Ａの数値的特徴に依存して決まることになるため、反復回数は「問題依存」すると呼ばれる。
【０２４８】
まず、ＣＧ法のサブルーチンにおける疎行列一ベクトル積計算部分に、方式２（該当部分実行方式）の実行時自動チューニングを指定した適用例について説明する。
【０２４９】
図２２は、符号Ｃ７で示す疎行列一ベクトル積演算処理（q^(I)=A p^(I)）に対して、方式２の実行時自動チューニングを行う指定子（dynamic, here）が指定されている。
【０２５０】
ここで、図２２の内容について簡単に説明をする。まず、図２２に示される各変数について、説明をする。図２２に示すＡは、疎行列を表しており、連立一次方程式の係数行列に相当する。Ａは、例えば１次元配列を用いて実装されることが多い。また、ｂは１次元配列のｎ次元ベクトルであり、連立一次方程式の右辺ベクトルに相当する。
【０２５１】
また、Ｉループ（Ｉについてのループ）におけるスカラーの値は下付きの「 _」で示している。Ｉループにおけるベクトルの値は、上付きの「 ^ 」で表記している。またベクトルの転置を、「Ｔ」で表す。例えば、p_(I)はＩループでのスカラーｐの値を示し、p^(I)はＩループでのベクトルｐの値を示し、p^(I)ＴはＩループでのべクトルｐの転置べクトルの値を示す。なお、このＩループの反復回数は、上述のように問題依存するため、この図２２では特に示していない。
【０２５２】
また、プログラム作成用補助配列として、z^(I-1)、r^(I-1)、Ｍ、p^(I-1)、q^(I)を用いる。また、プログラム作成用補助変数（スカラー）として、p_(I-1)、beta_(I-1)、a_Iを用いる。ここで、z^(I-1)、r^(I-1)、p^(I-1)、q^(I)は、ｎ次元ベクトルの１次元配列である。また、Ｍは疎行列であり、例えば１次元配列によって実装されることが多い。また、p_(I-1)、beta_(I-1)、a_Iは、倍精度実数のスカラーである。
【０２５３】
また、図２２の符号Ｃ１にて示す処理は、プログラムのコメントである。与えられたベクトルb、 x^(0)を用いて、行列Aとベクトルxとのベクトル積 Ax^(0)とベクトルbとの差を演算して、r^(0)を計算する。
【０２５４】
符号Ｃ２の処理は、与えられた疎行列Ｍ、ベクトルr^(I-1)を用いて、ベクトルz^(I-1)を求めることを意味する。この求解には、ＣＧ法の反復回数を減少させるようなＭを作成し、ベクトルzを求めるような処理を行うための、ある種の数値計算アルゴリズムの利用が必要となる。このようなアルゴリズムについては、通常ＣＧ法において用いられているものを用いることができる。詳細については説明を省略する。
【０２５５】
符号Ｃ３の処理は、与えられた転置ベクトルr^(I-1)T とベクトルz^(I-1)との内積演算をすることで、スカラーp_(I-1)を計算することを意味する。
【０２５６】
符号Ｃ４の処理は、ベクトルのコピーを行うことを意味する。
【０２５７】
符号Ｃ５の処理は、与えられたスカラーp_(I-1)とp_(I-2)との除算から、スカラーbeta_(I-1)を計算することを意味する。
【０２５８】
符号Ｃ６の処理は、与えられたベクトルz^(I-1)、スカラーbeta_(I-1)、ベクトルp^(I-1)から、ベクトルp^(I)を計算することを意味する。このために、スカラー・ベクトル積beta_(I-1) p^(I-1)の演算結果であるベクトルと、ベクトルz^(I-1)との加算処理が必要となっている。
【０２５９】
符号Ｃ７の処理は、疎行列Ａとベクトルp^(I)との疎行列・ベクトル積をすることで、ベクトルq^(I)を計算することを意味する。
【０２６０】
符号Ｃ８の処理は、ベクトルの転置p^(I)Tと、ベクトルq^(I)との内積計算の結果のスカラー値と、スカラー値p_(I-1)との除算をすることで、スカラー値a_Iを計算することを意味する。
【０２６１】
符号Ｃ９の処理は、スカラー値a_Iと、ベクトルp^(I)との積の結果のベクトルと、ベクトルx^(I-1)とを加算することで、ベクトルx^(I)を計算することを意味する。
【０２６２】
符号Ｃ１０の処理は、スカラー値a_Iと、ベクトルq^(I)との積の結果のベクトルと、ベクトルr^(I-1)とを演算することで、ベクトルr^(I-1)を計算することを意味する。
【０２６３】
また、符号Ｃ１０の後に示す、末尾の「収束を確かめ、必要なら繰り返す」との処理は、収束判定結果が十分であれば、Ｉループでの反復を中断して、enddo以降の部分に分岐することを意味する。ここで、収束を計算する方法はいろいろあり、どのようなものを用いてもよい。例えば、一般的な処理方式として、Ａｘ＝ｂについてＣＧ法で計算中のｘに対してｒ＝｜Ａｘ−ｂ｜を計算して、ｒが十分に小さいかどうかを検査すればよい。
【０２６４】
なお、上述の符号Ｃ７で示す疎行列一ベクトル積演算処理は、より具体的には、図２３に示すようなプログラムに相当するものである。
【０２６５】
ここで、図２３においては、疎行列Ａとベクトルｘとの疎行列・ベクトル積演算の具体的なコードを示すために、疎行列Ａを表現するためのデータ構造を実現する配列（情報を維持する配列）として、Aval(J)、row_ptr(I),col_ind(J)を用いている。また、疎行列Ａと行列・ベクトル積演算をするために必要な、ベクトルｘの要素として、x(col_ind(J))を用いている。
【０２６６】
より詳細には、Aval(J)は、疎行列Ａの数値である、倍精度実数値が格納されている、１次元配列を意味する。また、col_ind(J)は、整数の１次元行列であり、疎行列Ａの非零要素がある列の番号が収納されている。したがって、x(col_ind(J))によって、疎行列Ａの非零要素に対応するベクトルｘの要素を返すことができる。
【０２６７】
また、row_ptr(I)には、疎行列Ａの非零要素がある行の番号が収納されている。これら、row_ptr(I)、col_ind(J)の値は、疎行列Ａが確定する段階で設定される。したがって、これらの値は、ライブラリ呼び出しの地点で定まっている、静的な値である。言い換えると、これらの値は、ＣＧ法をプログラムする際の補助配列における値のように、ＣＧ法のプログラム中で動的に決まる値ではない。
【０２６８】
ここで、図２２に示す指定子は、図２３で示されるコードの最内ループ（Ｊループ）に対し、アンローリング処理をする自動チューニングを指定している。
【０２６９】
このループ長は、変数配列row_ptr(I)、row_ptr(I+1)で指定されていることから、ループ長は固定ではない。また一般的に、実行時にならないとこの変数配列の値は決まらない。したがって、本適用例は、実行時自動チューニングのみ指定できる一例といえる。
【０２７０】
そして、図２３に示すプログラムに対して、図２２に示すような実行時自動チューニングでのアンローリング指定によって、このチューニング領域は、図２４に示すようなサブプログラムとなる。すなわち、図２３の疎行列−ベクトル積コードが図２４に示すように、１段ないし８段のアンローリング段数を有するプログラムに書き換えられる。なお、図２４において符号ｄ１で示す領域は、アンローリング段数が３段ないし７段の領域を省略して示すものである。
【０２７１】
また、計算装置１によって、図２５（ａ）に示すようなメインプログラム、図２５（ｂ）に示すようなプログラム、図２５（ｃ）に示すような実測・推定ルーチンが作成される。なお図２５（ａ）〜（ｃ）に示すプログラムコードは、本適用例説明のために簡略化したコードであり、実際に生成されるコードとは同一ではない。例えば、各プログラムは、さらに図示しない他の処理を含んでいてもよいことはもちろんである。
【０２７２】
このように、方式２においては、図２４に示した８段アンローリングのコードの実行時問を、図２２で示した該当部分が実行される度に測定し、最適な段数を求めるようになっている。
【０２７３】
次に、方式１（起動時実行方式）の実行時自動チューニングを指定した適用例について説明する。
【０２７４】
図２６は、方式２における図２２に相当する、最適化するためのプログラムを示すものである。この方式２においては、後述するように、ＣＧ法のサブルーチンが起動される前に一度だけ図２４によるコードの実行時間を測定し、あとは該当部分で最適なアンローリング段数であるパラメータ値（J_valの値）を参照するコードを自動生成する。
【０２７５】
ここで、図２６と図２２とは、指定子の指定（initかhereか）が異なるのみであり、他は同様であるので、ここでは説明を省略する。
【０２７６】
そして、図２６のプログラムに対するアンローリング指定によって、図２２と同様に、チューニング領域は図２４に示すようなサブプログラムとなる。
【０２７７】
そして、計算装置１によって、図２７（ａ）に示すようなメインプログラム、図２７（ｂ）に示すようなプログラム、図２７（ｃ）に示すような実測・推定ルーチンが作成される。なお図２７（ａ）〜（ｃ）に示すプログラムコードは、本適用例説明のために簡略化したコードであり、実際に生成されるコードとは同一ではない。例えば、各プログラムは、さらに図示しない他の処理を含んでいてもよいことはもちろんである。
【０２７８】
次に、方式１、方式２を用いてチューニングを行った結果について説明をする。
【０２７９】
方式１（図２６）・方式２（図２２）における、ＣＧ法の反復回数、すなわち、図２６・図２２におけるＩループの繰り返し回数を１００回であるとする。なお、この反復回数は、実際には解くべき疎行列の数値的特徴に応じて決まる、問題依存する量である。
【０２８０】
また疎行列−ベクトル積演算以外の実行時間を、１反復あたり０．５秒とする。さらに各パラメータチェック（アンローリング段数の決定）のために要する時問、すなわち疎行列−ベクトル積演算１回当たりの時問、を１秒とする。これは、より詳細には、図２４に記載のSub_SMVCGに特定の値の引数J_valを指定して実行させる、call Sub_SMVCG(J_val)の実行時間に相当する。
【０２８１】
このとき、方式１（起動時実行方式）における実行時間の見積もりは、以下のようになる。まず、メインルーチンにおいて、Auto_CGからSub_SMVCGが８回呼び出される（１秒×８）。また、Sub_CGのループ内において、Auto_CGではなくSub_SMVCGが呼び出され（１秒）、さらにその他の演算が実行される（0.5秒）。このループが１００回実行される。これによって、１５８秒必要となる。
【０２８２】
また方式２（該当部分実行方式）における実行時間の見積もりは、Sub_CGのループ内においては、Auto_CGからSub_SMVCGが８回呼び出され（１秒×８）、さらにその他の演算が実行される（0.5秒）。また、Ｊ固定のSub_SMVCG(J)が実行される（１秒）。このループが１００回実行される。これによって、９５０秒必要となる。
【０２８３】
したがって、見積もられる実行時間としては、方式１では１５８秒であるのに対して、方式２では９５０秒となる。したがって、方式１は方式２に比べ、９５０／１５８＝約６倍高速となる。以上のように、この例の場合は、方式１による方が、方式２に比べて約５〜８倍だけ高速となる。
【０２８４】
ここで、一般的に、間題の数値特性が厳しくなる、難しい問題になるほど、反復回数は増加する。このため、上述の見積もりによれば、問題が難しくなるほど、方式１と方式２との実行時間の差は大きくなるといえる。したがって、方式１は、パラメータのチューニング時間を含まざるを得ない実行時のパラメータ最適化処理において、実際の実行時間の観点から非常に有効であるといえる。以上の適用例のように、方式１の利点は大きい。
【０２８５】
このように、実行時自動チューニング処理を、該当領域が含まれるサブルーチンの実行前に１度行うように分離する方式の適用により、従来から実行時自動チューニングにおいて問題となっていた、（１）冗長な最適化処理を繰り返す点、（２）上記（１）の理由から最適化処理の時間が長くなる点の各問題を解決することができる。
【０２８６】
なお、実施の形態２と上述の実施の形態１との関係について、説明を補足する。例えば、実施形態１における図１３と、実施形態２における図２０とは、同一のものではない。
【０２８７】
まず、実施の形態１記載の例および処理結果は、インストール時方式、実行前方式を指定した場合の処理に限定されており、実行時方式を指定した場合の処理については記載がない。すなわち、実施の形態２の図２１に記載の（Sub_xxx中でのAuto_xxxの呼び出し）などに相当するものは、実施の形態１においては具体的な実施例としては記載していない。実施の形態１における図１３は、インストール時方式、実行前方式を指定した場合の処理に関するものである。
【０２８８】
また、実施の形態１においても、チューニング領域の指定場所について、メインルーチン中やサブルーチン中など、その場所は限定されない。すなわち、実施の形態１においても、メインルーチンからではなく、メインプログラムとしてのサブルーチンからの呼び出しが可能である。
【０２８９】
また、実施の形態１において実行時最適化を指定する場合は、実施の形態２の方式２に相当し、例えば図２１と同様のコードが生成される。すなわち、実行時方式を指定する場合には、一般的に、実行時にならないとパラメータのチューニングができない理由があるので、その処理の特殊性から、図２１と同一のコードを生成する必要がある、ということになる。
【０２９０】
一方、実施の形態２における方式１では、実施の形態１とは異なり、チューニング領域の指定がメインルーチンまたはサブルーチンのどちらに存在しようとも、例えばAuto_xxxのような自動チューニングルーチンの呼び出しを、強制的にメインプログラムの先頭に移動させる。より詳細には、自動チューニングルーチンの呼び出しを、サブプログラムを呼び出しているループ（Ｉループ）の前に移動させる。すなわち、方式１は、実測・推定ルーチンをメインルーチンから直接呼び出すようにするか、またはメインルーチンから呼び出されるサブルーチンにおいて呼び出すようにするかを切り替えるものではない。
【０２９１】
上述のように、方式１と方式２とでは、実行時間が大きく異なることから、実行時方式においては、問題の性質に応じて、方式１の図２０をより好ましいものとして用いることができる。
【０２９２】
以上のように、本発明は、例えばコンピュータに蓄えられたプログラムにおけるパラメータの最適化、コンピュータを実行させるためのプログラム、記録媒体およびコンピュータに関するものである。特に、上述の実施の形態２は、実行時自動チューニングにおける高速最適化方式に関するものである。
【０２９３】
ここで、ソフトウェアの性能を高めるための自動チューニングの種類は、その最適化を行うタイミングにより、インストール時、実行起動前、および実行時の３種に分類できる。この３種の自動チューニングのうち、もっとも最適化のための時問を考慮しなくてはならない処理が、実行時の自動チューニングである。実行時の自動チューニングを行う揚合には、上述の実施の形態のように、自動チューニングの対象領域であるサブプログラム、もしくはプログラムの一部分が実行された時に、パラメータのチューニングを行う方式が知られている。
【０２９４】
しかしながら上述の構成によれば、（１）冗長な最適化処理を繰り返す、（２）上記（１）の理由から最適化処理の時間が長い、という間題を生ずる。
【０２９５】
そこで、上述のように、実行時最適化の指示において、該当部分を含むサブルーチン等の起動時（呼び出し前）に１度だけ行う処理（方式１）、および該当箇所が呼ばれた時に行う処理（方式２）の２方式に処理を分離して指定することで問題の解決を図る。すなわち、例えばサブプログラム自身が反復回数の多いループの中から呼び出されている場合、ループ内のサブプログラムの呼び出し直前に最適化するか、またはこのループの外側において呼び出して最適化するかを切り替えることができるようにする。なお、実施の形態２における発明機能の利用形態、および処理機構の概略は上述したように、実施の形態１と同様である。
【０２９６】
本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても、本発明の技術的範囲に含まれる。
【０２９７】
上述の具体的な実施形態または実施例は、あくまでも、本発明の技術内容を明らかにするものであって、本発明はそのような具体例にのみ限定して狭義に解釈されるべきものではなく、特許請求の範囲に示した範囲で種々の変更が可能であり、変更した形態も本発明の技術的範囲に含まれる。
【０２９８】
【発明の効果】
本発明に係る計算装置は、以上のように、最適化を行うプログラム中の領域と最適化を行うパラメータとを指定する指定子が含まれている上記プログラムが入力されると、上記指定子によって指定される上記領域と上記パラメータとについての、実測による最適化を実行するためのプログラムを生成するプログラム生成部を備えている構成である。
【０２９９】
それゆえ、この計算装置に対して、所定の指定子を記載したプログラムを入力すれば、このプログラムの指定した領域を指定したパラメータについて最適化するためのプログラムを得ることができるという効果を奏する。
【０３００】
本発明に係る計算装置は、以上のように、上記構成において、上記プログラム生成部は、入力される上記プログラムから上記指定子によって指定される上記領域と上記パラメータとを抽出する指定子解析手段と、上記指定子解析手段にて抽出された上記領域を含むサブプログラムを生成し、上記サブプログラムを呼び出して、上記パラメータについての実測による最適化を実行するためのメインプログラムを生成する、プログラム作成手段とを含んでいる構成である。
【０３０１】
それゆえ、この構成によって、上述の本発明に係る計算装置を実現できるという効果を奏する。
【０３０２】
本発明に係る計算装置は、以上のように、上記構成において、上記プログラム作成手段は、上記メインプログラムから呼び出す、または上記メインプログラムに含ませるための、上記パラメータごとに上記サブプログラムを呼び出して所要時間を計測する実測ルーチンと、上記実測ルーチンにて計測した上記所要時間を用いて最適なパラメータを推定する推定ルーチンとを作成する構成である。
【０３０３】
それゆえ、この構成であれば、実測ルーチンと推定ルーチンとが、メインプログラムから呼び出され、またはメインプログラムに含まれているので、メインプログラムを実行可能形式に翻訳して実行するだけで、最適なパラメータを得ることができるという効果を奏する。
【０３０４】
本発明に係る計算装置は、以上のように、上記構成において、上記プログラム作成手段は、上記メインプログラムの実行の際に最適化を行うために、上記サブプログラムが上記メインプログラムのループ内において呼び出されている場合には、上記ループの外側で上記ループよりも前において、上記実測ルーチンおよび上記推定ルーチンを呼び出す上記メインプログラムを生成する構成である。
【０３０５】
それゆえ、ループの内部にて毎回実測ルーチンが実行されることがないので、その分だけ最適化に要する時間を短縮できるという効果を奏する。
【０３０６】
本発明に係る計算装置は、以上のように、上記構成において、上記プログラム作成手段は、上記メインプログラムの実行の際に最適化を行うために、上記サブプログラムが上記メインプログラムのループ内において呼び出されている場合には、上記ループの外側で上記ループよりも前において、上記実測ルーチンおよび上記推定ルーチンを呼び出す上記メインプログラムか、または、上記ループ内において上記実測ルーチンおよび上記推定ルーチンを呼び出す上記メインプログラムかのいずれかを、上記指定子に応じて選択して生成する構成である。
【０３０７】
それゆえ、最適化の際の実測ルーチンおよび推定ルーチンの呼び出し回数を減らすか、または通常の最適化を行うかを、指定子に応じて切り替えることができるという効果を奏する。
【０３０８】
本発明に係る計算装置は、以上のように、上記構成において、上記パラメータごとに計測した上記所要時間を近似するためのコスト定義関数を含むコスト定義関数ライブラリを備えている構成である。
【０３０９】
それゆえ、この構成であれば、例えば、このコスト定義関数ライブラリに含まれるコスト定義関数を用いて、所望の近似を行うことができるという効果を奏する。
【０３１０】
本発明に係る計算装置は、以上のように、上記構成において、計測した上記所要時間を、上記コスト定義関数ライブラリ中に含まれるコスト定義関数の全てを順次用いて近似して、そのうちから最も近似精度のよいコスト定義関数を選択するコスト定義関数決定部を備えている構成である。
【０３１１】
それゆえ、この構成であれば、例えば指定子に推定ルーチンにて用いる近似関数の指定を含めない場合であっても、最適な近似関数を得ることができるという効果を奏する。
【０３１２】
本発明に係る計算装置は、上記指定子解析手段にて抽出した上記領域と上記パラメータとを記憶するチューニング情報データベースを有しており、上記プログラム作成手段と上記コスト定義関数決定部とが、上記チューニング情報データベースを参照して上記領域または上記パラメータを取得する構成である。
【０３１３】
それゆえ、プログラム作成手段とコスト定義関数決定部とが領域またはパラメータを用いる際に、チューニング情報データベースを参照すればよく、その度に領域またはパラメータを抽出する必要がないという効果を奏する。
【０３１４】
本発明に係る計算方法は、以上のように、最適化を行うプログラム中の領域と最適化を行うパラメータとを指定する指定子が含まれている上記プログラムが入力されると、上記指定子によって指定される上記領域と上記パラメータとについての、実測による最適化を実行するためのプログラムを生成する工程と、上記生成する工程にて得た上記プログラムを実行して最適化を行う工程とを含んでいる構成である。
【０３１５】
それゆえ、この計算方法を例えばコンピュータのような計算装置にて実行すれば、上述の計算装置を実現できるという効果を奏する。
【０３１６】
本発明に係る計算方法は、以上のように、上記構成において、上記プログラムの実行の際に最適化を行うために、上記領域が上記プログラムのループ内において呼び出されている場合には、上記プログラムを生成する工程において、上記ループの外側で上記ループよりも前に、上記領域についての上記パラメータごとの所要時間の実測と実測した上記所要時間から最適なパラメータの推測とを行うような上記プログラムを生成する構成である。
【０３１７】
それゆえ、ループの内部において毎回実測・推測を実行することがないので、その分だけ最適化に要する時間を短縮できるという効果を奏する。
【０３１８】
本発明に係る計算方法は、以上のように、上記構成において、上記プログラムの実行の際に最適化を行うために、上記領域が上記プログラムのループ内において呼び出されている場合には、上記プログラムを生成する工程において、上記ループの外側で上記ループよりも前に、上記領域についての上記パラメータごとの所要時間の実測と実測した上記所要時間から最適なパラメータの推測とを行うような上記プログラムか、または上記ループ内にて上記領域についての上記パラメータごとの所要時間の実測と実測した上記所要時間から最適なパラメータの推測とを行うような上記プログラムかのいずれかを、上記指定子に応じて選択して生成する構成である。
【０３１９】
それゆえ、ループの外側でループよりも前に実測・推測を実行するか、またはループの内部において毎回実測・推測を実行するかを、指定子の設定によって簡単に切り替えることができるという効果を奏する。
【０３２０】
本発明に係るプログラムは、以上のように、上記構成において、コンピュータを、上述のいずれかに記載の計算装置の各手段として動作させる構成である。
【０３２１】
それゆえ、このプログラムを用いれば、上述の計算装置を実現できるという効果を奏する。
【０３２２】
本発明に係る記録媒体は、以上のように、上記構成において、上述のプログラムをコンピュータ読み取り可能に記録した構成である。
【０３２３】
それゆえ、この記録媒体のプログラムをコンピュータにて読み取って実行すれば、上述の計算装置を実現できるという効果を奏する。
【図面の簡単な説明】
【図１】本発明に係る計算装置の概略構成を示すブロック図である。
【図２】上記計算装置のプログラム生成部の構成を示すブロック図である。
【図３】上記計算装置に入力されるプログラムの一例を示す図である。
【図４】（ａ）は上記プログラムのチューニング領域の一具体例を示す図であり、（ｂ）は（ａ）に示すチューニング領域が上記プログラム生成部によって処理されて得られるプログラムの一例を示す図である。
【図５】（ａ）は図３に示すプログラムから上記プログラム生成部によって生成されるメインプログラムの一例を示す図であり、（ｂ）は図３に示すプログラムが上記プログラム生成部によって書き換えられた一例を示す図であり、（ｃ）は図３に示すプログラムから上記プログラム生成部によって生成されるサブプログラムの一例を示す図であり、（ｄ）は図３に示すプログラムから上記プログラム生成部によって生成されるチューニング用プログラムの一例を示す図である。
【図６】上記プログラム生成部による、コスト定義関数決定処理の一例を示すフローチャートである。
【図７】上記プログラム生成部による、サンプリング点決定処理の一例を示すフローチャートである。
【図８】上記プログラム生成部による、パラメータ推定処理（ａ）の一例を示すフローチャートである。
【図９】上記プログラム生成部による、パラメータ推定処理（ｂ）の一例を示すフローチャートである。
【図１０】上記プログラム生成部による、測定用ループ処理の一例を示すフローチャートである。
【図１１】上記計算装置による処理の概略を示すフローチャートである。
【図１２】上記計算装置に入力されるプログラムの他の一例を示す図である。
【図１３】（ａ）は図１２に示すプログラムから上記プログラム生成部によって生成されるメインプログラムの一例を示す図であり、（ｂ）は図１２に示すプログラムから上記プログラム生成部によって生成されるチューニング用プログラムの一例を示す図である。
【図１４】（ａ）は図１２に示すプログラムが上記プログラム生成部によって書き換えられ、生成されるサブプログラムの一例の一部を示す図であり、（ｂ）は（ａ）とは異なる一部を示す図である。
【図１５】上記計算装置の一部を示すブロック図である。
【図１６】上記計算装置の他の一部を示すブロック図である。
【図１７】（ａ）はインストール時最適化の手順を示すフローチャートであり、（ｂ）はライブラリ実行前最適化の手順を示すフローチャートであり、（ｃ）はライブラリ実行時最適化の手順を示すフローチャートである。
【図１８】（ａ）は従来のコンピュータの一例の一部を示すブロック図であり、（ｂ）は従来のコンピュータの他の一例の一部を示すブロック図である。
【図１９】（ａ）は上記計算装置に入力されるプログラムのさらに他の一例の一部を示す図であり、（ｂ）は上記プログラムの（ａ）とは異なる一部を示す図である。
【図２０】（ａ）は図１９（ａ）（ｂ）に示すプログラムから上記プログラム生成部によって生成されるメインプログラムの一例を示す図であり、（ｂ）は図１９（ａ）（ｂ）に示すプログラムから上記プログラム生成部によって書き換えられたプログラムの一例を示す図であり、（ｃ）は図１９（ａ）（ｂ）に示すプログラムから上記プログラム生成部によって生成されるチューニング用プログラムの一例を示す図である。
【図２１】（ａ）は図１９（ａ）（ｂ）に示すプログラムから上記プログラム生成部によって生成されるメインプログラムの他の一例を示す図であり、（ｂ）は図１９（ａ）（ｂ）に示すプログラムから上記プログラム生成部によって書き換えられたプログラムの他の一例を示す図であり、（ｃ）は図１９（ａ）（ｂ）に示すプログラムから上記プログラム生成部によって生成されるチューニング用プログラムの他の一例を示す図である。
【図２２】上記計算装置に入力されるプログラムのさらに他の一例を示す図である。
【図２３】図２２に示すプログラムの一部をより具体的に記載した一例を示す図である。
【図２４】図２２および図２３に示すプログラムから生成されるサブプログラムの一例を示す図である。
【図２５】（ａ）は上記プログラム生成部によって生成されるメインプログラムの他の一例を示す図であり、（ｂ）は上記プログラム生成部によって書き換えられたプログラムの他の一例を示す図であり、（ｃ）は上記プログラムから上記プログラム生成部によって生成されるチューニング用プログラムの他の一例を示す図である。
【図２６】上記計算装置に入力されるプログラムの、図２２とは異なる一例を示す図である。
【図２７】
（ａ）は上記プログラム生成部によって生成されるメインプログラムのさらに他の一例を示す図であり、（ｂ）は上記プログラム生成部によって書き換えられたプログラムのさらに他の一例を示す図であり、（ｃ）は上記プログラムから上記プログラム生成部によって生成されるチューニング用プログラムのさらに他の一例を示す図である。
【符号の説明】
１計算装置
２プロセッサ
３ユーザライブラリ（ライブラリ）
４パラメータ調整層
５パラメータ情報ファイル
６プログラム生成部
８指定子解析手段
９プログラム作成手段
１０コスト定義関数決定手段
１０ａチューニング情報データベース
１０ｂコスト定義関数ライブラリ
１０ｃコスト定義関数決定部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a calculation device, a calculation method, a program, and a recording medium for optimizing parameters included in a program.
[0002]
[Prior art]
Conventionally, when a software program to be executed is optimized in a computing device such as a computer, a user specifies parameters for optimization, and the user sequentially instructs optimization processing for the parameters. I was supposed to.
[0003]
For example, the user manually registers parameters to be tuned (optimized) for the program. Furthermore, in order to actually perform optimization, for example, it is necessary to instruct the computer about preprocessing for tuning, actual tuning method, processing for using the tuned parameters, and the like. .
[0004]
As an example of a configuration for performing such tuning, a Japanese patent publication “Japanese Patent Laid-Open Publication No. 2000-276454 (publication date: October 6, 2000)” is installed with adjusting parameters. A configuration method of software having the function of performing is described.
[0005]
[Patent Document 1]
JP 2000-276454 A
[0006]
[Problems to be solved by the invention]
However, according to the above-described conventional configuration, problems such as an increase in development time and development cost, a low function expandability, and a high possibility of bug incorporation occur when parameters are optimized.
[0007]
In other words, according to the conventional configuration, various settings are required to actually achieve parameter optimization even after parameter registration. Therefore, when optimizing the parameters, problems such as an increase in development time and development cost, a low function expandability, and a high possibility of bug incorporation are caused.
[0008]
Further, for example, in the optimization problem solving process for optimal parameter estimation, since setting is performed manually, usually only an estimation function using a single cost definition function is realized. For this reason, the problem that a parameter estimation function is low also arises.
[0009]
The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a calculation device, a calculation method, a program, and a recording medium that can easily optimize parameters.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, a computing device according to the present invention is a computing device for optimizing parameters included in an input program, and an area in the program to be optimized and parameters to be optimized. When the above-mentioned program including a specifier designating the above is input, a program generation for generating a program for executing optimization by actual measurement on the above-described area and the parameter designated by the above-mentioned specifier It is characterized by having a part.
[0011]
This computing device optimizes an input program.
More specifically, when the calculation apparatus detects that a predetermined specifier is included in the input program, a program generation for generating a program for actually executing the optimization of the program in response thereto is detected. Department. The program generation unit generates a new program for performing optimization for the area specified by the specifier for the parameter specified by the specifier. The parameter specification may include not only the parameter type indicating which variable to specify, but also the parameter range specification. The program generation includes program rewriting.
[0012]
For example, the program generation unit creates a subprogram that includes an area specified by the specifier. For example, if the specifier specifies optimization of the number of loop unrolling stages as a parameter, and the specified area is loop processing, loop processing corresponding to the number of loop unrolling stages specified as an argument is executed. Create a subprogram. That is, the subprogram is a program that has, for example, a parameter for adjustment as an argument, and executes processing for the area specified by the specifier in accordance with this parameter. Also, an actual measurement routine is created for calling this subprogram for each parameter and measuring the actual required time. In addition, an estimation routine for estimating optimum parameters from the required time measured by the actual measurement routine is created.
[0013]
For this reason, if a program in which a predetermined specifier is described is input to this computing apparatus, a program for optimizing the parameter designated for the area designated by this program can be obtained.
[0014]
The computing device may include a compiler that translates the program generated by the program generation unit into an execution format. In addition, the computing device may include a processor that executes the execution format, and the execution format generated by the compiler may be executed by the processor and actually optimized. Alternatively, the calculation device may transmit the program generated by the program generation unit to another external calculation device to perform translation into an execution format and actual optimization. In any case, the computer apparatus according to the present invention only needs to include a program generation unit that generates a new program from the program when a program of a predetermined format is input.
[0015]
For example, if the subprogram, the actual measurement routine, and the estimation routine generated by the program generation unit are translated into an executable format in the calculation device, and the actual measurement routine and the estimation routine are executed, optimum parameters can be obtained.
[0016]
Here, in the conventional computing device, when the input program is optimized, a predetermined instruction from the user is required at each stage of processing. For this reason, time is required for setting, and development time and development cost are increased. In addition, since the setting by the user has various restrictions and cannot be said to be convenient, the function expandability is low. In addition, there is a possibility that a mistake is included in the setting by the user and a bug is mixed.
[0017]
Note that the above computing device can also be expressed as a programming language processing device provided with a generating means for generating a program with an automatic tuning function added.
[0018]
In order to solve the above-described problem, in the computer device according to the present invention, in the above-described configuration, the program generation unit extracts the region and the parameter specified by the specifier from the input program. Generating a subprogram including the region extracted by the analyzing means and the specifier analyzing means, calling the subprogram, and generating a main program for performing optimization by actual measurement of the parameters; And a program creation means.
[0019]
With this configuration, the above-described computing device according to the present invention can be realized. For example, the main program includes an actual measurement routine for calling the subprogram for each parameter to measure the actual required time, and an estimation routine for estimating the optimum parameter from the required time measured by the actual measurement routine. You can do it.
[0020]
Note that the main program here is not limited to a so-called main routine, and may include a subroutine called from the main routine. In other words, as described above, the subprogram is generated by extracting an area for optimization, and for example, executes processing for the area using an optimization parameter as an argument. Therefore, the main program only needs to call the subprogram. Accordingly, the present invention is not limited to the main routine, and a subroutine called from the main routine may call the above-mentioned subprogram as the main program.
[0021]
The order in which the program creation means creates the main program and the subprogram may be any order. For example, the main program may be created after the subprogram is created, or the main program is created, for example. Subprograms may be created after creation.
[0022]
In order to solve the above-described problem, the calculation apparatus according to the present invention has the above-described configuration, wherein the program creation means calls the subprogram for each parameter to be called from the main program or included in the main program. An actual measurement routine for calling and measuring a required time and an estimation routine for estimating an optimum parameter using the required time measured by the actual measurement routine are created.
[0023]
With this configuration, since the actual measurement routine and the estimation routine are called from or included in the main program, the optimum parameters can be obtained simply by translating the main program into an executable format and executing it. be able to. The specifier may include designation of an approximation function for estimating an optimum parameter used in the estimation routine by approximation.
[0024]
In order to solve the above-described problem, the calculation apparatus according to the present invention is configured so that, in the above-described configuration, the program creating means performs optimization when executing the main program, so that the subprogram is a loop of the main program. When called in, the main program for calling the actual measurement routine and the estimation routine is generated outside the loop and before the loop.
[0025]
Here, in the actual measurement routine, as described above, the subprogram is called for each parameter and the required time is measured, so that it takes time to measure the required time.
[0026]
Therefore, as described above, a program that calls the actual measurement routine outside the loop and before the loop is created as the main program. Therefore, since the number of calls of the actual measurement routine at the time of optimization can be reduced, the time required for actual measurement can be reduced. That is, since the actual measurement routine is not executed every time inside the loop, the time required for optimization can be shortened accordingly.
[0027]
Also, as for the estimation routine, if the actual measurement routine is called outside the loop, it is not necessary to call it inside the loop.
[0028]
For this purpose, the program creation means may be configured to generate a main program that calls an actual measurement routine and an estimation routine at the head of the main program, for example. If it is this structure, said calculation apparatus is realizable reliably.
[0029]
Note that the above computing device can also be expressed as a computing device that executes a software configuration method having a tuning method that is performed before calling a subprogram including a corresponding area in parameter tuning at the time of execution.
[0030]
In order to solve the above-described problem, the calculation apparatus according to the present invention is configured so that, in the above-described configuration, the program creating means performs optimization when executing the main program, so that the subprogram loops the main program. The main program that calls the actual measurement routine and the estimation routine outside the loop and before the loop, or the actual measurement routine and the estimation routine within the loop. One of the main programs to be called is selected and generated according to the specifier.
[0031]
With this configuration, it is possible to switch according to the specifier whether the number of calls of the actual measurement routine and the estimation routine at the time of optimization is reduced or normal optimization is performed.
[0032]
That is, for example, if the variables in the area of the subprogram to be optimized are fixed in the loop that calls the above-mentioned subprogram, the specifier is set to an appropriate one and the actual measurement / estimation routine is set in the loop. The usual optimization that calls is performed.
[0033]
For example, if the variables in the subprogram area are determined before the loop that calls the above-mentioned subprogram, optimization can be achieved by performing the optimization by calling the actual measurement / estimation routine before this loop. Time required can be reduced.
[0034]
In addition, in the parameter tuning at the time of execution, software having a configuration in which the calculation device is separated into two methods: a tuning method that is performed before calling a subprogram including the corresponding region and a tuning method that is performed when the corresponding region is executed It can also be expressed as a computing device that executes the configuration method.
[0035]
In order to solve the above-described problem, the calculation apparatus according to the present invention includes a cost definition function library including a cost definition function for approximating the required time measured for each parameter in the above configuration. It is said.
[0036]
With this configuration, for example, a desired approximation can be performed using a cost definition function included in the cost definition function library.
[0037]
Further, for example, when the specifier includes designation of an approximate function used in the estimation routine, the designated approximate function may be searched from the cost definition function library and used.
[0038]
In order to solve the above problems, the computing device according to the present invention approximates the measured time required in the above configuration by sequentially using all the cost definition functions included in the cost definition function library, Is provided with a cost definition function determination unit that selects a cost definition function having the best approximation accuracy.
[0039]
With this configuration, for example, even when the designation of the approximation function used in the estimation routine is not included in the specifier, the optimum approximation function can be obtained.
[0040]
Even when the approximate function specified by the specifier is not included in the cost definition function library, an accurate approximate function can be selected as in the above configuration.
[0041]
The calculation apparatus according to the present invention has a tuning information database that stores the region extracted by the specifier analysis unit and the parameter, and the program creation unit and the cost definition function determination unit include the tuning information database. The region or the parameter is acquired with reference to a tuning information database.
[0042]
With this configuration, since the region and parameters extracted by the specifier analysis unit are stored in the tuning information database, the tuning information is used when the program creation unit and the cost definition function determination unit use the region or parameter. It is only necessary to refer to the database, and it is not necessary to extract the area or parameter each time.
[0043]
In order to solve the above problems, a calculation method according to the present invention is a calculation method for optimizing parameters included in a program input to a calculation device. When a program that includes a specifier that specifies a parameter for performing the operation is input, a program for performing optimization by actual measurement on the region and the parameter specified by the specifier is generated. And a step of performing optimization by executing the program obtained in the generating step.
[0044]
If this calculation method is executed by a calculation device such as a computer, the above-described calculation device can be realized. Note that the above-described calculation method can also be expressed as a programming language processing method including a step of generating software with an automatic tuning function added.
[0045]
In order to solve the above-described problem, the calculation method according to the present invention, in the above configuration, in the case where the area is called in the loop of the program in order to perform optimization when the program is executed. In the step of generating the program, the required time for each parameter for the region is actually measured and the optimum parameter is estimated from the actually measured required time outside the loop and before the loop. The above program is generated.
[0046]
In this way, since the actual measurement / estimation is executed before the loop outside the loop and the actual measurement / estimation is not executed every time inside the loop, the time required for optimization can be shortened accordingly.
[0047]
In order to solve the above-described problem, the calculation method according to the present invention, in the above configuration, in the case where the area is called in the loop of the program in order to perform optimization when the program is executed. In the step of generating the program, the required time for each parameter for the region is actually measured and the optimum parameter is estimated from the actually measured required time outside the loop and before the loop. Either the program or the program that performs the actual measurement of the required time for each of the parameters in the loop and the estimation of the optimal parameter from the actually measured required time. It is characterized by being selected and generated according to the above.
[0048]
With this configuration, whether the actual measurement / estimation is executed before the loop outside the loop or the actual measurement / estimation is executed every time inside the loop can be easily switched by setting the specifier. The specifier may be selected according to the nature of the target problem realized by, for example, a subprogram.
[0049]
In addition, the calculation method described above can be expressed as a programming language processing method including a step of generating a program to which an automatic tuning function of the above-described calculation device is added.
[0050]
Moreover, you may implement | achieve the programming language processing apparatus provided with the production | generation means which produces | generates the program which has an automatic tuning function using the above-mentioned calculation method.
[0051]
In order to solve the above problems, a program according to the present invention is characterized in that, in the above-described configuration, a computer is operated as each unit of the above-described computing device.
[0052]
If this program is used, the above-described computing device can be realized. It should be noted that the method of using this program can also be expressed as a program usage form performed to use the above-described language processing apparatus.
[0053]
In order to solve the above problems, a recording medium according to the present invention is characterized in that the above-described program is recorded in a computer-readable manner in the above configuration.
[0054]
If the program of this recording medium is read and executed by a computer, the above-described computing device can be realized. Note that this recording medium can also be expressed as a computer-readable recording medium recording a program for causing a function for generating software with an automatic tuning function to be generated.
[0055]
DETAILED DESCRIPTION OF THE INVENTION
[Embodiment 1]
An embodiment of the present invention will be described with reference to FIGS. 1 to 17 as follows.
[0056]
The computing device of this embodiment has a configuration including a program generating unit that generates another program that facilitates parameter optimization (tuning) from a computer language (program) in a predetermined format. It also includes a compiler that translates the program into an executable format.
[0057]
As illustrated in FIG. 1, the computing device 1 includes a processor 2, a user library 3, a parameter adjustment layer 4, a parameter information file 5, a program generation unit 6, and a compiler 7.
[0058]
The computing device 1 includes a recording medium (not shown). For example, the calculation device 1 performs a calculation by calling a subroutine in the library 3 using a parameter (not shown) input from the outside. The calculation result is output to a display device (not shown).
[0059]
The processor 2 is a calculation processing unit for performing calculations. The processor 2 includes nprocs processors (not shown). The computing device 1 functions as a parallel computing device using a plurality of processors 2.
[0060]
The library 3 is a numerical calculation library. The library 3 includes at least one subroutine. As shown in FIG. 16, the library 3 of the present embodiment includes a plurality of subroutines 3 a to 3 k. FIG. 16 shows a part of the computing device 1 shown in FIG.
[0061]
The library 3 and the subroutines 3a to 3k are accessed by describing parameters using some method (dedicated description language or the like). Some of these parameters are directly input to the library 3 by the user from the outside, for example. The other part of the parameters is used in the library 3. Still another part of the parameters is input to the library 3 through the parameter adjustment layer 4.
[0062]
The library 3 is a numerical calculation library developed by a user, but is not limited thereto, and may be a system library developed by a library developer, for example. For such a library prepared in advance in a computer environment such as MPI (Message Passing Interface) or OS (Operating System), if the software interface is well known, the user or library developer will describe the parameters. Thus, the parameter information can be delivered to the parameter adjustment layer 4.
[0063]
Note that the contents and number of subroutines provided in the library are not particularly limited. The computing device 1 may be provided with a program other than the library, and other functions may be realized by the program.
[0064]
The parameter adjustment layer 4 functions as an adjustment device that adjusts parameters used by the library 3. The parameter adjustment layer 4 adjusts a part of parameters input to the library 3 and then inputs the parameters to the library 3. The parameter adjustment layer 4 includes an installation optimization layer (IOL) 4a, a pre-execution-invocation optimization layer (BEOL) 4b, and a run-time optimization layer (ROL). ) 4c is included. The functions of these layers will be described later.
[0065]
The parameter information file 5 is a file for storing parameters adjusted in the parameter adjustment layer 4.
[0066]
In the computing device 1 of the present embodiment, the library 3 is a function realized by reading and executing a program recorded on a recording medium (not shown). The parameter adjustment layer 4 is also a function realized by reading and executing a program recorded on a recording medium (not shown).
[0067]
The program generation unit 6 generates another program that can easily execute parameter optimization from a program in a predetermined format. Details of the program generation unit 6 will be described later.
[0068]
The compiler 7 translates a program into an execution format. The compiler 7 of this embodiment translates the program generated by the program generation unit 6 into an execution format. The compiler 7 outputs the translated execution format to the processor 2. When the execution format is executed by the processor 2, the parameters can be actually optimized as will be described later.
[0069]
The program generator 6 and the compiler 7 are functions realized by the computer 1 reading and executing a program recorded on a recording medium (not shown).
[0070]
Here, details of the program generation unit 6 will be described. When a program including a specifier that designates an area in the program to be optimized and a parameter to be optimized is input, the program generation unit 6 receives information about the area and parameter designated by the specifier. Then, a program for executing optimization by actual measurement is generated. As shown in FIG. 2, the program generation unit 6 includes a specifier analysis unit 8, a program creation unit 9, and a cost definition function determination unit 10.
[0071]
The specifier analysis unit 8 analyzes a program including the specifier, and extracts a parameter specified by the specifier and a part of the program specified by the specifier (hereinafter referred to as a tuning area (region)). Is to do.
[0072]
The specifier analysis unit 8 includes a specifier analysis unit 8a. A program input to the program generation unit 6 is first input to the specifier analysis unit 8a. The specifier analysis unit 8a extracts parameters and tuning regions from the program, and outputs them to the parameter 10d and tuning region set 10e included in the tuning information database 10a of the cost definition function determining unit 10, respectively. Thus, the specifier analysis unit 8a extracts parameters from the specifier. Further, the contents of the process when performing optimization are extracted from the specifier. Further, a tuning area is extracted and a predetermined process is performed as necessary.
[0073]
Here, in FIG. 3, a program (Subroutine xxx ()) described at an abstract level is an example of a program including a tuning region to be optimized. Here, an example of the tuning area is shown as “tuning area” in the figure. Also, an example of the specifier is shown as “start of specifier” and “end of specifier”. The tuning area of this embodiment is an area surrounded by the start of the specifier and the end of the specifier. However, the present invention is not limited to this. For example, it can be specified by the start of the specifier and the number of lines from the start position.
[0074]
In this example, an example of a program is described using the Fortran language. However, the present invention is not limited to this, and any other computer language may be used. Even in this case, the processing is the same. For example, the essence of the processing according to the present invention is the same in a program written using a functional computer language (C language, C ++ language, etc.). In addition, the description in Japanese in the program is not an example of a specific program, but a control operation to be realized by the program is abstractly expressed in Japanese unless otherwise specified. In addition, Japanese in the program may indicate a comment in the program.
[0075]
An example of the tuning area is shown in FIG. Consider the case where the program shown in FIG. 4A is surrounded by a specifier, and unrolling is specified as an optimization method by the specifier. In this case, the specifier analysis unit 8a generates a program as shown in FIG. 4B and transfers the information to the program creation means 9. That is, when the program described in the tuning area is subjected to a certain process such as an unrolling specifier and a process to be a new tuning area is designated, the specifier analysis unit 8a performs the process. After that, the information is transferred to the program creation means 9. If the process specified by the specifier does not particularly require a change to the program, the specifier analysis unit 8a delivers the extracted tuning area to the program creation means 9 as it is.
[0076]
Further, the specifier analysis unit 8a can also input, for example, parameters extracted from the specifier to the parameter adjustment layer 4 shown in FIG. The parameter adjustment layer 4 has an installation-time optimization layer 4a, a pre-execution optimization layer 4b, and a runtime optimization layer 4c, respectively, according to the specified tuning timing (installation, pre-execution, and execution). It is possible to divide the parameter specified by the specifier and the tuning area, and separately deliver them to the subsequent mechanism for processing. This point will be described later. Here, for the sake of simplicity, description will be made assuming that processing is performed at an arbitrary timing.
[0077]
The program creation means 9 generates a subprogram including the area extracted by the specifier analysis means 8, and generates a main program that calls and executes optimization by actual measurement for parameters. The program creation means 9 includes a main program creation unit 9a for creating a main program for adding an automatic tuning function, a subprogram creation unit 9b for creating a subprogram group including a tuning area on the tuning information database 10a, and automatic tuning. A tuning function adding unit 9c for creating a processing program for achieving the function is included.
[0078]
The main program creation unit 9a adds a tuning (optimization) function to the program via the specifier analysis means 8. For example, the main program creation unit 9a creates a main program for optimization as shown in FIG. 5A from the example of the program shown in FIG.
[0079]
When this main program is executed by specifying “automatic tuning” at the time of execution, an automatic tuning subroutine described later is called to optimize parameters. If “automatic tuning” is not designated, the same contents as the program shown in FIG. 3 are executed. At this time, if the parameter has already been optimized, the result is referred to and executed.
[0080]
Further, the main program creation unit 9a rewrites a subroutine as an example of the program shown in FIG. 3 into a subroutine (Subroutine xxx ()) as shown in FIG. 5B, for example.
[0081]
In this example, the specifier is described in the subroutine as an example of the program shown in FIG. 3. However, even if the specifier is described in the main program, the main program creation unit 9a is the same as described above. Perform the process.
[0082]
The subprogram creating unit 9b newly creates a subroutine (Subroutine Sub_A (J)) as shown in FIG. 5C corresponding to the subroutine shown in FIG. 5B rewritten by the main program creating unit 9a. . The subroutine shown in FIG. 5C is a subroutine for only the tuning area to be processed, and is created with reference to the tuning area set 10e in the tuning information database 10a.
[0083]
The subroutine shown in FIG. 5C created by the subprogram creation unit 9b is called from the subroutine shown in FIG. 5B rewritten by the main program creation unit 9a. The subroutine created by the subprogram creating unit 9b is called from the subroutine created by the tuning function adding unit 9c described later.
[0084]
Next, the tuning function adding unit 9c corresponds to the main routine shown in FIG. 5A created by the main program creating unit 9a, and is a subroutine (automatic) for achieving the automatic tuning function shown in FIG. Create a tuning subroutine. Here, the tuning function adding unit 9c uses a cost function input from the cost definition function determining means 10 described later. Here, the time required for calculation is approximated by a designated cost function. Details of the cost definition function determination means 10 will be described later. The program creation unit 9 outputs the obtained program to the compiler 7 shown in FIG.
[0085]
Here, the function F (I) shown in FIG. 5D is a function created from the specifier described in the program shown in FIG. More specifically, F (I) is a function that makes a one-to-one correspondence between the index I of the measurement loop and the parameter value J parameterized when the tuning area A is converted into a subroutine. The function F (I) may include only the sampling points sampled by the cost definition function determination unit 10.
[0086]
Further, the program shown in FIG. 5D includes a measurement loop (I loop) (measurement routine) for measuring time. In this example, a subroutine (Subroutine Sub_A (J)) created by the subprogram creation unit 9b is added to the measurement loop. Thus, for example, the required time as cost can be measured by the measurement loop.
[0087]
The program shown in FIG. 5D includes a “parameter estimation process (a)” (estimation routine) for estimating an optimum parameter using the measured time. Thereby, an optimum parameter can be obtained. The parameters obtained here are stored in an external storage medium (such as the parameter information file 5) (for example, the parameter information file 5). Details of the parameter estimation process (a) will be described later.
[0088]
As outlined above, the program generation unit 6 of this embodiment performs optimization for the optimization as shown in FIGS. 5A to 5D from the program including the specifier as shown in FIG. You can generate a program that contains settings for In particular, the program creation means 9 uses an actual measurement routine that calls a subprogram for each parameter and measures the required time to be called from the main program or included in the main program, and the required time measured by the actual measurement routine. And an estimation routine for estimating an optimum parameter.
[0089]
Even when a program having a plurality of regions (tuning regions) surrounded by a specifier is input to the program generation unit 6, the same processing as described above is performed. For example, in FIG. 3, when the tuning area B is added to the lower part of the tuning area A, similar processing parts are respectively provided below the processing parts of the tuning area A in FIGS. 5 (a), (b), and (d). Added. Further, a subroutine similar to that shown in FIG. 5C is newly created.
[0090]
Note that the measurement loop shown in FIG. 5D, the parameter estimation process (a), and the call of the parameter estimation process (b) shown in FIG. 5B are added to each program by the tuning function addition unit 9c. The More specifically, the tuning function adding unit 9c performs a predetermined process in accordance with the cost definition function determined by the cost definition function determining unit 10c of the cost definition function determining unit 10 described later.
[0091]
Here, the cost definition function determination means 10 will be described. The cost definition function determination means 10 includes a tuning information database 10a, a cost definition function library 10b, and a cost definition function determination unit 10c.
[0092]
The tuning information database 10a includes a parameter 10d and a tuning area set 10e. The tuning information database 10a is used to store the parameters 10d necessary for tuning and the tuning area set 10e as a part (subprogram) of the optimization target program analyzed by the specifier analysis unit 8. is there. The tuning information database 10a is accessed by the program creation means 9 and the cost definition function determination unit 10c to obtain the saved parameters 10d and the tuning area set 10e.
[0093]
The cost definition function library 10b is a library that records cost definition functions. This cost definition function can be freely registered / deleted by the system developer, the user of the computer 1, and the like. The cost definition function library 10b includes a plurality of cost definition functions, and includes, for example, a linear polynomial 10f. This cost definition function is used, for example, to approximate the required time measured for each parameter.
[0094]
The cost definition function determination unit 10c is a part that determines the method of parameter estimation processing described in the specifier. As this parameter estimation process, the cost definition function determination unit 10c adds automatic tuning related to the following cost definition function determination process, sample point determination process, parameter estimation process (a), parameter estimation process (b), and measurement loop process. Process. In response to this, the tuning function adding unit 9c performs the predetermined processing described above for each program.
[0095]
First, the cost definition function determination unit 10c performs a cost definition function determination process. In this case, the cost definition function determination unit 10c determines the cost definition function based on the specification of the cost definition function described in the specifier.
[0096]
In the present embodiment, the specification of the cost definition function in the specifier by the user is performed by, for example, specifying a function included in the cost definition function library 10b. In addition, when a function that is not included in the cost definition function library 10b is specified by the user, the program generation unit 6 of the computing device 1 may determine (automatically determine) the cost definition function by a predetermined method. .
[0097]
When a function included in the cost definition function library 10b is specified, the cost definition function determination unit 10c selects the specified function itself from the cost definition function library 10b, and the tuning function addition unit 9c receives the function. Deliver the cost function. Then, the tuning function adding unit 9c generates a program.
[0098]
On the other hand, as described below, when a function that is not included in the cost definition function library 10b is specified, in the present embodiment, the cost of the target tuning area registered in the tuning information database 10a is determined. The functions registered in the definition function library 10b are tried in sequence, the required time is measured, and error evaluation is performed. Based on the evaluation result, a cost definition function with the highest accuracy and little error is adopted and delivered to the tuning function adding unit 9c. Then, the tuning function adding unit 9c generates a program. When the function is tried and the required time is actually measured, if the parameter definition area is specified by the specifier, it is executed for all the definition areas. If the parameter definition area is not specified by the specifier, the upper limit value of the automatically generated parameter is referred to, and the process is performed for all values up to the upper limit value. In this way, the cost definition function determination unit 10c approximates the measured required time by sequentially using all of the cost definition functions included in the cost definition function library 10b, and then the cost definition function with the highest approximation accuracy is obtained. The structure which selects may be sufficient.
[0099]
Here, an example of the cost definition function determination process is schematically shown in FIG. In S11, the cost definition function determination unit 10c determines whether or not the function described in the specifier is a function included in the cost definition function library 10b. Here, for example, when an automatic setting request is made by the user, it is determined that the designated function is not included in the cost definition function library 10b.
[0100]
When the function described in the specifier in S11 is a function included in the cost definition function library 10b, the process proceeds to S12, the function designated from the cost definition function library 10b is extracted, and the process proceeds to S13. In S13, the cost definition function determination unit 10c passes the extracted function to the tuning function addition unit 9c, and ends the process. For example, when a linear polynomial is described in the specifier, the cost definition function determination unit 10c delivers the linear polynomial 10f of the cost definition function library 10b to the tuning function addition unit 9c.
[0101]
On the other hand, if, for example, an automatic setting request is made in S11 and it is determined that the function described in the specifier is not included in the cost definition function library 10b, the process proceeds to S14. In this case, since the function described in the specifier cannot be used, in S14 and subsequent steps, for example, a function with the highest accuracy is selected from among the cost definition functions included in the cost definition function library 10b. Process.
[0102]
In S14, the cost definition function determination unit 10c extracts the corresponding tuning area from the tuning area set 10e of the tuning information database 10a, and proceeds to S15. In S15, a loop (for I) as shown in FIG. 10 is set in the extracted tuning area as an addition of the measurement processing unit, and the process proceeds to S16. In S16, it is determined whether or not accuracy has been confirmed for all cost definition functions included in the cost definition function library 10b.
[0103]
If it is determined in S16 that the accuracy has not been confirmed, the process proceeds to S17, and one cost definition function that has not yet been confirmed in accuracy is selected in the cost definition function library 10b. In S18 following S17, the accuracy is evaluated using the selected cost definition function. In S19 following S18, the accuracy based on the cost definition function that has already been evaluated is compared with the accuracy obtained in S18. If the accuracy obtained in S18 is better, the cost with the highest accuracy is obtained. The cost definition function selected in S17 is adopted as the definition function candidate, and the process proceeds to S16.
[0104]
On the other hand, if it is determined in S16 that the accuracy has been confirmed for all functions, the process proceeds to S13, and the most accurate cost definition function is delivered to the tuning function adding unit 9c. An example of the cost definition function determination process is realized by S11 to S19 described above.
[0105]
Next, the cost definition function determination unit 10c performs a sample point determination process for determining a sample point that is optimal when time is actually measured.
[0106]
For example, when the sampling point is specified in the specifier by the user, it may be determined to use the specified sampling point as the sampling point determination process. For example, when the sampling point is not specified in the specifier by the user, an appropriate set of sampling points may be selected from the definition area so that the error is reduced as the sampling point determination process.
[0107]
Or, for example, even when sampling points are specified in the specifier by the user, as a sampling point determination process, an appropriate subset of sampling points is selected from the specified sampling points as follows. , It may be selected from the definition area so as to reduce the error.
[0108]
For example, as shown in FIG. 7, the cost definition function determination unit 10c confirms the definition area in the specifier in S20 and proceeds to S21. In S21, a subset S of the set is extracted from the set of domain in the specifier by a predetermined method. As this subset S, the set itself may be selected. Alternatively, as a predetermined method, the subset S may be selected from the set by random numbers. Alternatively, in order to select the subset S from the set, it may be selected using, for example, a genetic algorithm (GA), past statistics may be used, or determined using some evaluation formula Also good.
[0109]
In S22 following S21, the accuracy is measured for the corresponding tuning region by specifying the parameters included in the subset S selected here. In S23 next to S22, if the accuracy measured in S22 is better than the accuracy measured in the previous sample point determination process, the subset S selected in S21 is set to the sample point set O. In S24, it is determined whether or not the number of trials specified in advance has ended. If not, the process proceeds to S21, and if it has been completed, the process proceeds to S25. In S25, the set O obtained in S23 is used as a sample point. In this way, even when the domain is set, the sampling point is determined so that the processing is further reduced while maintaining a predetermined accuracy, and the processing can be further accelerated.
[0110]
Here, an example of the sampling point determination process will be described. Here, the case of optimization related to unrolling of the main loop in the eigenvalue calculation processing will be described. A linear quintic polynomial is used as the cost definition function, and the least square method is used as the solution to the optimization problem. As the sample point, the sample point 1 is specified in the specifier and is [1-6, 8, 16]. Sample point 2 is automatically set and is [1-16]. In Table 1 below, the estimation parameter 1 is parameter estimation using the sample point 1, and the estimation parameter 2 is parameter estimation using the sample point 2.
[0111]
Table 1 shows the results obtained by a domestic supercomputer (computer A). Table 2 shows the results obtained by a domestic supercomputer (computer B). Table 3 shows the results obtained from the PC cluster (computer C).
[0112]
[Table 1]

[0113]
[Table 2]

[0114]
[Table 3]

[0115]
From Table 1, the sampling point (sample point 2) automatically set by the method according to the present invention (reproduction method) obtains higher parameter estimation accuracy in the computer C. Therefore, it can be said that the effect of the sample point automatic determination process in the mechanism of the present invention is great.
[0116]
Next, the cost definition function determination unit 10c sequentially performs automatic tuning addition processing. This automatic tuning addition process includes a parameter estimation process (a), a parameter estimation process (b), and a measurement loop process.
[0117]
In the parameter estimation process (a), the sampling point determined in the sampling point determination process and the execution time for the sampling point are input, and the optimal optimization is performed based on the cost definition function determined in the cost definition function determination process. This is a process for generating a program that solves the optimization problem. The generated program is called as parameter estimation processing (a) from the program generated by the tuning function adding unit 9c.
[0118]
In this parameter estimation process (a), as shown in FIG. 8, the sampling point determined by the sampling point determination process in S26 and the execution time for the sampling point are obtained, and the process proceeds to S27. For example, in the program shown in FIG. 5D, since the parameter estimation process (a) is performed after the measurement loop, the value measured by the measurement loop is obtained.
[0119]
In S27, an appropriate optimization problem is solved based on the cost definition function determined by the cost definition function determination process. In S28 following S27, appropriate parameters by estimation and coefficient information of the cost definition function are obtained. By the processing as described above, it is possible to obtain an appropriate parameter by estimation. In addition, the flowchart shown here shows an example of the parameter estimation process (a), and is not limited to this. Moreover, the program which implement | achieves parameter estimation process (a) should just perform each process shown, for example in this FIG. 8, for example, and details are not ask | required.
[0120]
Next, the parameter estimation process (b) uses the coefficient information of the cost definition function automatically determined in the parameter estimation process (a) as an input, based on the cost definition function determined in the cost definition function determination process. Solve the appropriate optimization problem. As a result, a program for processing for determining the parameter estimated to be optimal is automatically generated. The generated program is called as parameter estimation processing (b) from the program generated by the tuning function adding unit 9c.
[0121]
In the parameter estimation process (b), for example, as shown in FIG. 9, coefficient information of the cost definition function determined in the parameter estimation process (a) is obtained in S29. For example, in the example of the program shown in FIGS. 5A, 5B, and 5D, the parameter estimation process (b) is performed after the automatic tuning and the parameter estimation process (a). Thus, the coefficient information of the cost definition function can be obtained.
[0122]
In S30 following S29, an optimum parameter is determined using the cost information from the cost definition function determined by the cost definition function determination unit, and the process proceeds to S31. In S31, an appropriate parameter by estimation is obtained. In this way, appropriate parameters by estimation can be obtained by the processing of S29 to S31. In addition, the flowchart shown here shows an example of parameter estimation processing (b), and is not limited to this. Moreover, the program which implement | achieves parameter estimation processing (b) should just perform each process shown, for example in this FIG. 9, for example, and details are not ask | required.
[0123]
Next, the measurement loop process forms a measurement loop corresponding to the number of sample points determined in the sample point determination process, as shown in FIG.
[0124]
The program automatically generated by the parameter estimation process (a), parameter estimation process (b), and measurement loop process described above is sent to the tuning function adding unit 9c. These programs are called from the program generated by the tuning function adding unit 9c.
[0125]
Here, the processing by the calculation apparatus 1 will be described with reference to a specific example. The computing device 1 performs optimization by generating a program as follows. Here, as an example, a case where the Fortran 90 language is used as the computer language will be described. Further, the user of the present embodiment uses MPI (Message Passing Interface) as a computer environment. However, the present invention is not limited to this. Note that the generated computer language described below is for describing the present embodiment, and the present invention is not limited to this. Further, it is specialized for explanation of the present embodiment, and it should be noted that it is not exactly the same as the computer language generated by the computing device 1 of the present embodiment.
[0126]
When the user uses this computing device, as shown in FIG. 11, the user inputs a program of a predetermined format to the computing device 1 in S35. Here, the program of a predetermined format is a program in which parameters to be optimized are specified by a specifier.
[0127]
Here, an example of the program input by the user in S35 is shown in FIG. This program is an example in which matrix product processing is described in the Fortran 90 language, and a specifier is described to instruct the addition of an automatic tuning function. In the example shown in FIG. 12, a line beginning with “! ABCLib $” corresponds to a specifier.
[0128]
In the above example, the specifier “varied (i) from 1 to 8” on the 9th line is specified to specify unrolling (= parameterization) for parameter (i) from 1st to 8th. Yes. The 11th to 17th lines correspond to the tuning area. The specifier “fitting polynomial 5” on the 10th line specifies the use of a fifth-order linear polynomial (fitting polynomial 5) registered in the cost definition function library. The 10-line specifier “sampled (1-3, 6, 8)” specifies that parameter estimation is performed for sampling points [1-3, 6, 8]. A computer language with an automatic tuning function is automatically generated from the information of these specifiers.
[0129]
In S36, the program generation unit 6 of the computing device 1 generates a program suitable for parameter adjustment from the input program. The program generation unit 6 outputs the generated program to the compiler 7 in S37.
[0130]
Here, an example of the program generated by the program generation unit 6 in S36 is shown in FIGS. 13 (a) (b) and 14 (a) (b).
[0131]
FIG. 13A shows a main program generated by the program generation unit 6 from the program of FIG. FIG. 13B shows an automatic tuning program generated by the program generation unit 6 from the program of FIG. 14A and 14B is a subroutine including a tuning area generated by the program generation unit 6 from the program of FIG.
[0132]
In S38, the compiler 7 translates the program into an execution format and inputs it to the processor 2. In S39, the processor 2 executes the translated execution format, obtains optimum parameters, and outputs them to the parameter information file 5, for example. As described above, if the computing device 1 is used, the program can be easily optimized by inputting the program in which the parameter is specified by the specifier.
[0133]
As described above, when the computer 1 according to this embodiment receives a program including a specifier that specifies an area in a program to be optimized and a parameter to be optimized, the specifier A program generation unit 6 is provided for generating a program for executing optimization by actual measurement for the designated region and parameter. Therefore, parameter optimization can be easily performed.
[0134]
Further, as described above, the present invention relates to a program usage mode, a programming language processing device, a programming language processing method, and a recording medium for adding an automatic tuning function at an arbitrary place in a program.
[0135]
Here, according to the conventional configuration, various settings are necessary to actually achieve parameter optimization even after parameter registration. Therefore, when optimizing the parameters, problems such as an increase in development time and development cost, a low function expandability, and a high possibility of bug incorporation are caused.
[0136]
Therefore, the present invention uses a specifier (directive) that automatically adds an automatic tuning process using a computer language that is finally required by the user, and provides a processing mechanism for a program described by the specifier. By using it as a solution, the above problems were solved.
[0137]
For example, as in the above-mentioned embodiment, since the program generation with the automatic tuning function is automatically performed, in the software with the automatic tuning function, the increase in development time and development cost is prevented, and the low function expansion It does not cause the possibility of high bugs.
[0138]
Further, in the optimization problem solving process for parameter adjustment, the cost definition function that minimizes the error from a plurality of cost definition functions by the function of the cost definition function library and the cost definition function determination unit installed in the calculation apparatus of the present invention. And automatic selection of sampling points.
[0139]
From this, in the optimization problem solution processing for parameter estimation as in the above-described embodiment, only an estimation function with a single cost definition function can be realized for manual implementation, Therefore, it is possible to solve the problem that low parameter estimation accuracy has occurred. This can solve the problem that the parameter estimation function is low, which has been a problem in the past.
[0140]
In the following, processing when the specifier analysis unit 8a inputs parameters extracted from the specifier to the parameter adjustment layer 4 illustrated in FIG. 1 will be described. In this way, the parameter adjustment layer 4 may add processing divided for each type of automatic tuning. The automatic tuning function added by the program generation unit 6 can also be regarded as being based on an instruction from the parameter adjustment layer 4.
[0141]
When the user executes the library 3 using the computing device 1, an execution instruction is given after setting appropriate parameters for the desired subroutine 3a.
[0142]
Here, the parameters set for the subroutine 3 a include parameters that change only the execution performance of the computing device 1 and do not change the output of the subroutine 3 a of the library 3. Hereinafter, such a parameter is referred to as a performance information parameter (Performance Parameters: PP).
[0143]
Of the parameters set for the subroutine 3a, parameters that change both the execution performance of the computing device 1 and the output of the subroutine 3a of the library 3 are referred to as basic information (BP) parameters below. Call.
[0144]
For example, it is assumed that the subroutine 3a included in the numerical calculation library is an eigenvalue calculation subroutine for calculating eigenvalues of a matrix. At this time, the substance of the desired matrix, the size of the matrix, and the like correspond to the basic information parameter BP. Further, the number of loop unrolling stages in the matrix calculation of the calculation apparatus 1 corresponds to the performance information parameter PP.
[0145]
In the computing device 1, a desired result can be obtained in a minimum time by optimizing the performance parameter PP using the given basic information parameter BP. The performance information parameter PP and the basic information parameter BP are input to the library 3 via the parameter adjustment layer 4. Parameters other than the performance information parameter PP and the basic information parameter BP are directly input to the library 3 from the outside of the computing device 1 or used inside the library 3.
[0146]
As shown in FIG. 15, the parameter adjustment layer 4 of the present embodiment includes an optimization layer 4a at the time of installation, an optimization layer 4b before execution, and an optimization layer 4b at the time of execution in order to optimize the performance information parameter PP that is an adjustable parameter. Each layer of the optimization layer 4c is provided. Each layer 4 a to 4 c does not hold the parameter itself, but stores it in the parameter information file 5.
[0147]
The installation optimization layer (IOL) 4a performs optimization when the library 3 is installed.
[0148]
For example, as shown in FIG. 17A, the installation optimization layer 4a optimizes an installation optimization parameter (IOP) which is a part of the performance information parameter PP when the library 3 is installed (S1). (S2) The obtained parameter (IOP) is output to the parameter information file 5.
[0149]
Note that, at the time of installing the library 3, the basic information parameter BP is not normally determined. For this reason, the installation optimization layer 4a appropriately samples, for example, the value of the basic information parameter BP, and determines a parameter that minimizes an appropriately defined cost definition function for each sampled extraction point. Then, the data between the sampled extraction points is interpolated by an appropriate model formula.
[0150]
The pre-execution optimization layer (BEOL) 4b performs optimization after designating a specific parameter (for example, problem size) designated by the user.
[0151]
In response to the input of the basic information parameter BP, the pre-execution optimization layer 4b optimizes the pre-execution optimization parameter BEOP which is a part of the performance information parameter PP. For example, as shown in FIG. 17B, in accordance with the definition (input) of the basic information parameter BP as the user-specified parameter (S4), the parameter (IOP) in the parameter information file 5 is referred to (S5), and the optimum (S6), and the obtained optimization parameter (BEOP) is output to the parameter information file 5.
[0152]
It should be noted that the pre-execution optimization layer 4b performs a trial by actual measurement in order to obtain an optimum parameter using the basic information parameter BP designated by the user.
[0153]
The runtime optimization layer (ROL) 4c is optimized after the parameter optimization by at least one of the installation optimization layer 4a and the pre-execution optimization layer 4b is completed, and when the target library (or routine) is executed. To do.
[0154]
When the execution optimization layer 4c detects an execution instruction of the library 3 (subroutine 3a of the library 3) (S8), for example, as shown in FIG. (S9) When the calculation based on the performance information parameter PP does not satisfy the desired accuracy, optimization is performed again (S10). In S10, the calculation is repeated until an optimum parameter PP is obtained so that the calculation satisfies a desired accuracy.
[0155]
In this way, the runtime optimization layer 4c refers to the performance information parameter PP that has already been set, and does not perform optimization calculations in a predetermined case where, for example, sufficient accuracy is obtained.
[0156]
As described above, in the parameter adjustment layer 4 of the present embodiment, the parameter information IOP optimized by the installation optimization layer 4a is stored in the parameter information file 5, and the pre-execution optimization layer 4b and the execution optimization It is possible to refer to the conversion layer 4c. The parameter information BEOP optimized in the pre-execution optimization layer 4b is stored in the parameter information file 5 and can be referred to in the runtime optimization layer 4c.
[0157]
Here, each element of the performance information parameter PP is included in at least one of a set of parameters (IOP), parameters (BEOP), and parameters (ROP). That is, each element of the performance information parameter PP is decomposed into three subsets (IOP, BEOP, ROP) for each of the layers 4a to 4c of the parameter adjustment layer 4 to allow duplication. This can be expressed as follows:
PP parameter = IOP ∪ BEOP ∪ ROP (Equation 1)
Therefore, the computing device 1 according to the present embodiment can optimize all the elements included in the performance information parameter PP at any one of the timings described above using the parameter adjustment layer 4.
[0158]
In particular, when the basic information parameter BP such as the matrix size (n) corresponding to the problem is determined, the computing device 1 according to the present embodiment performs optimization before execution of the actual calculation. 4b. This allows for more accurate optimization than conventional computing devices.
[0159]
Here, in the conventional automatic tuning software configuration method, for example, parameter optimization is performed at the time of software installation as shown in FIG. 18A, or parameter optimization is performed at the time of library execution as shown in FIG. 18B, for example. There was only something to do. These software configuration methods have a problem that they cannot be applied to general-purpose processing and parameter adjustment may be insufficient. As can be seen from FIGS. 18A and 18B, the conventional automatic tuning has one parameter.
[0160]
Therefore, in the present invention, the problem can be solved by a software configuration method that can apply parameter adjustment in more general-purpose processing and has a more advanced parameter adjustment mechanism than the conventional one.
[0161]
In particular, when the basic information parameter BP such as the matrix size (n) corresponding to the problem is determined, the computing device 1 according to the present embodiment performs optimization before execution of the actual calculation. 4b. This enables more accurate optimization than the case of the IOL or ROL alone as in a conventional computing device.
[0162]
Next, the features of the program, computer, etc. having the above-described configuration will be described.
[0163]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, using the basic information parameter, a procedure for detecting a point where a basic information parameter included in the parameter of the library that changes both the execution performance and the output of the library is determined. And a procedure for optimizing the performance information parameter.
[0164]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0165]
The computer on which the program is executed detects the point where the basic information parameter is determined by detecting the input of the basic information parameter from the user, for example, before the actual execution of the library.
[0166]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0167]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0168]
That is, when the contents of the library are expressed as mathematical expressions, parameters expressed as variables in the mathematical expressions correspond to basic information parameters. A parameter that does not appear in the mathematical expression or appears as a simple parameter in the mathematical expression corresponds to the performance information parameter. For this reason, for example, even if the performance information parameter is changed, the result (library output) obtained by the mathematical formula does not change.
[0169]
Thereafter, the computer optimizes the performance information parameter using the basic information parameter before the actual execution of the library. More specifically, for example, a basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. As a result, an optimal performance information parameter can be obtained reliably.
[0170]
Here, as an example of a conventional optimization program, for example, performance information parameters are optimized when a library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0171]
Another example of a conventional optimization program is to optimize performance information parameters when a library is executed, for example. In this case, the calculation time for optimizing the performance information parameter is included in the execution cost of the library. For this reason, there is a possibility that an optimal parameter cannot be obtained without taking sufficient time for optimization.
[0172]
Therefore, as in the above-described program according to the present invention, the execution cost is measured in advance before the actual calculation to obtain the optimum performance information parameter. As a result, more accurate and reliable parameter adjustment is possible. In addition, the calculation time can be predicted before the program is executed.
[0173]
The program according to the present invention can also be expressed as software having a parameter optimization function at a point where information that the user can know is determined.
[0174]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, an initial setting procedure for optimizing the performance information parameter at the time of installing the library, and a basic for changing both the execution performance and the output of the library included in the parameter of the library A detection procedure for detecting a point where an information parameter is determined; a pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter with reference to the performance information parameter set in the initial setting procedure; It is characterized by containing.
[0175]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0176]
The computer on which the program is executed optimizes the performance information parameter when the library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0177]
In addition, before the actual execution of the library, the computer detects a point where the basic information parameter is determined, for example, by detecting an input of the basic information parameter from the user.
[0178]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0179]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0180]
Thereafter, before the actual execution of the library, the computer refers to the performance information parameter set at the time of installation, and optimizes the performance information parameter using the basic information parameter. More specifically, for example, the basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. In particular, trial calculation may be performed only for values around the optimum value of the performance information parameter set at the time of installation. As a result, the number of trial calculations can be reduced and an optimum performance information parameter can be obtained. In this way, more accurate and reliable parameter adjustment is possible.
[0181]
The program according to the present invention can also be expressed as software having a parameter optimization function at the time of software installation and before execution of software at a point where information that can be known by the user is determined.
[0182]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, a detection procedure for detecting a point where basic information parameters included in the parameters of the library to change both execution performance and output of the library are determined, and the basic information parameters are used. The pre-adjustment procedure for optimizing the performance information parameter and the execution of the library refer to the performance information parameter that has already been set, and the calculation using the performance information parameter does not satisfy the desired accuracy. Sometimes, using the basic information parameter, the performance information parameter It is characterized in that it contains a re-adjustment procedure to optimize the meter again.
[0183]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0184]
The computer on which the program is executed detects the point where the basic information parameter is determined by detecting the input of the basic information parameter from the user, for example, before the actual execution of the library.
[0185]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0186]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0187]
Thereafter, the computer optimizes the performance information parameter using the basic information parameter before the actual execution of the library. More specifically, for example, the basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. As a result, an optimal performance information parameter can be obtained reliably.
[0188]
Further, when the library is actually executed, the computer refers to the performance information parameter that has already been set, and determines whether or not the calculation based on the performance information parameter satisfies the desired accuracy. When the desired accuracy is not satisfied, the performance information parameter is optimized again using the basic information parameter. Then, the library is executed using the performance information parameter that can obtain the desired accuracy.
[0189]
As described above, before the actual calculation, the execution cost is measured in advance to obtain the optimum performance information parameter. When the basic information parameter is not changed, the library can be executed using the preset performance information parameter. In addition, even when there is a change in the basic information parameter, if a desired accuracy can be obtained, the library can be executed without performing calculation for parameter optimization. Therefore, the time required for parameter optimization at the time of execution is unnecessary, and the execution cost (calculation time) of the library is not increased. In addition, since the accuracy is checked before the library is executed, more precise and reliable parameter adjustment is possible.
[0190]
Note that the program according to the present invention can also be expressed as software having a parameter optimization function before execution of software at a point where information that can be known by the user is determined and during software execution.
[0191]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, an initial setting procedure for optimizing the performance information parameter at the time of installing the library, and a basic for changing both the execution performance and the output of the library included in the parameter of the library A detection procedure for detecting a point where an information parameter is determined, and when the library is executed, the performance information parameter that has already been set is referred to, and when the calculation based on the performance information parameter does not satisfy the desired accuracy, Using the basic information parameters, the performance information That it contains a re-adjustment procedure to optimize the parameters again it is characterized.
[0192]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
The computer on which the program is executed optimizes the performance information parameter when the library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0193]
In addition, before the actual execution of the library, the computer detects a point where the basic information parameter is determined, for example, by detecting an input of the basic information parameter from the user.
[0194]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0195]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0196]
Further, when the library is actually executed, the computer refers to the performance information parameter that has already been set, and determines whether or not the calculation based on the performance information parameter satisfies the desired accuracy. When the desired accuracy is not satisfied, the performance information parameter is optimized again using the basic information parameter. Then, the library is executed using the performance information parameter that provides the desired accuracy.
[0197]
Thus, the performance information parameter is set before the actual calculation. In the actual calculation, if a desired accuracy can be obtained by the performance information parameter, the library can be executed without performing the calculation for parameter optimization. Therefore, the time required for parameter optimization at the time of execution is unnecessary, and the execution cost (calculation time) of the library is not increased. In addition, since the accuracy is checked before the library is executed, more precise and reliable parameter adjustment is possible.
[0198]
The program according to the present invention can also be expressed as software having a parameter optimization function at the time of software installation and software execution.
[0199]
In order to solve the above-described problem, the program according to the present invention optimizes the performance information parameter included in the parameters of the library provided in the computer without changing the output of the library by changing only the execution performance. In a program to be executed by a computer, an initial setting procedure for optimizing the performance information parameter at the time of installing the library, and a basic for changing both the execution performance and the output of the library included in the parameter of the library A detection procedure for detecting a point where an information parameter is determined, a pre-adjustment procedure for optimizing the performance information parameter using the basic information parameter, and the performance information parameter already set when the library is executed Refer to the There when not meet the desired accuracy is characterized in that it includes a re-adjustment procedure again to optimize the performance information parameter using the basic information parameter.
[0200]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized.
[0201]
The computer on which the program is executed optimizes the performance information parameter when the library is installed. In this case, since the basic information parameter such as the size of the matrix is not determined, the optimum performance information parameter is estimated by some estimation model including a predetermined error.
[0202]
In addition, before the actual execution of the library, the computer detects a point where the basic information parameter is determined, for example, by detecting an input of the basic information parameter from the user.
[0203]
Here, the basic information parameter is a parameter that changes both the execution performance and the output of the library.
[0204]
For example, in the matrix eigenvalue calculation library of the numerical calculation library, the matrix size, the matrix entity, and the like correspond to the basic information parameters. For example, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0205]
Thereafter, before the actual execution of the library, the computer refers to the performance information parameter set at the time of installation, and optimizes the performance information parameter using the basic information parameter. More specifically, for example, the basic information parameter is used, trial calculation is performed for each value of the performance information parameter, and the execution cost is actually measured in advance. In particular, trial calculation may be performed only for values around the optimum value of the performance information parameter set at the time of installation. As a result, the number of trial calculations can be reduced and an optimum performance information parameter can be obtained. In this way, more accurate and reliable parameter adjustment is possible.
[0206]
Also, the computer refers to the performance information parameter that has already been set when actually executing the library, and determines whether or not the calculation based on the performance information parameter satisfies the desired accuracy by trial. Then, when the desired accuracy is not satisfied, the performance information parameter is optimized again using the basic information parameter. Then, the library is executed using the performance information parameter that provides the desired accuracy.
[0207]
As described above, before the actual calculation, the execution cost is measured in advance to obtain the optimum performance information parameter. When the basic information parameter is not changed, the library can be executed using the preset performance information parameter. In addition, even when there is a change in the basic information parameter, if a desired accuracy can be obtained, the library can be executed without performing calculation for parameter optimization. Therefore, the time required for parameter optimization at the time of execution is unnecessary, and the execution cost (calculation time) of the library is not increased. In addition, since the accuracy is checked before the library is executed, more precise and reliable parameter adjustment is possible.
[0208]
The program according to the present invention is software having a three-level parameter optimization function at the time of software installation, before execution of software at a point where information that can be known by the user is determined, and at the time of software execution. It can also be expressed.
[0209]
In order to solve the above problems, the program according to the present invention has a function of optimizing performance information parameters included in the parameters of the library provided in the computer so that only the execution performance is changed and the output of the library is not changed. In the program to be realized by the computer, each element of the performance information parameter is a first set of parameters to be optimized when the library is installed, and a second parameter to be optimized before the library is executed. Or a function for optimizing the elements of the first set by setting to be included in at least one of the third set of parameters to be optimized when executing the library, A function for optimizing elements of the second set and a function for optimizing elements of the third set; It is characterized in that to realize.
[0210]
This program is a program used for optimizing the execution cost of a library such as a numerical calculation library in a computer. The execution cost is, for example, calculation resources and calculation time required for execution. This program adjusts the value of the performance information parameter that does not change the output of the library by changing only the execution performance among the parameters of the library so that the execution cost of the library is optimized. For example, in the matrix eigenvalue calculation library of the numerical calculation library, the number of loop unrolling stages when a parallel computer is used corresponds to the performance information parameter.
[0211]
In the computer on which the program is executed, the performance information parameter includes a first set of parameters to be optimized when the library is installed, a second set of parameters to be optimized before the library is executed, It is set to be included in at least one of the third set of parameters to be optimized during execution.
[0212]
Here, if the performance information parameter can be optimized in any sense, it is always possible to optimize at the time of installation, before library execution, or during library execution. In addition, the specific configuration for setting the performance information parameter so as to be included in at least one set selected from the above first to third is somewhat arbitrary, but the configuration is Any method may be selected.
[0213]
Then, the computer optimizes each of the first to third sets. Therefore, all the performance information parameters can be optimized, and can be applied to general-purpose processing. That is, the entire library including a plurality of routines can be optimized.
[0214]
On the other hand, the conventional optimization method has only one of the parameter optimization at the time of software installation and the parameter optimization at the time of library execution. For this reason, some problems can be optimized only at the time of installation, or can be optimized only at the time of execution, and therefore cannot be used for all problems.
[0215]
It should be noted that the program according to the present invention can be expressed as software for performing parameter optimization by separating the parameters to be optimized into three types of parameters at the time of installation, before execution, and at the time of execution.
[0216]
A recording medium according to the present invention is a computer-readable recording medium on which any of the above-described programs is recorded in order to solve the above problems.
[0217]
When this recording medium is read by a computer, one of the above-described programs is executed by the computer. Therefore, the same effect as the above-described program can be obtained.
[0218]
The configuration of the recording medium is not limited to a hard disk, a CD ROM (Read Only Memory), etc., and any recording medium may be used.
[0219]
In addition, a computer according to the present invention is configured to include the above-described recording medium in order to solve the above problems.
[0220]
When this computer reads the above-described recording medium, any of the above-described programs is executed by the computer. Therefore, the same effect as the above-described program can be obtained.
[0221]
The computer may be a parallel computing device having a plurality of processors in the computer, or may be a distributed computing device that functions as a computing device having a plurality of processors connected to a network. May be.
[0222]
In the adjustment method for adjusting the performance information parameter included in the parameters of the library provided in the computer and changing only the execution performance and not changing the output of the library, the computer may include the parameter of the library. And a procedure for detecting a point where a basic information parameter for changing both the execution performance and the output of the library is determined, and a procedure for optimizing the performance information parameter using the basic information parameter. It can also be expressed as executing the adjustment method.
[0223]
In the adjustment method for adjusting the performance information parameter included in the library parameters included in the computer and changing only the execution performance and not changing the output of the library, the computer executes the library. In addition, referring to the performance information parameter that has already been set, if the calculation based on the performance information parameter does not satisfy the desired accuracy, readjustment is performed to optimize the performance information parameter again using the basic information parameter. It can also be expressed that the adjustment method including the procedure is executed.
[0224]
In addition, by executing the adjustment method, the computer described above optimizes performance information parameters included in the parameters of the library provided in the computer so that only the execution performance is changed and the output of the library is not changed. Functions as an adjustment device. Further, the above-described computer functions as a computing device including the above-described program and library.
[0225]
In the configuration described above, the optimization of the performance information parameter does not optimize all of the performance information parameters, but means that optimization is performed on an appropriate one among those that can be optimized.
[0226]
[Embodiment 2]
Another embodiment of the present invention will be described with reference to FIGS. The computing device of the present embodiment has the same configuration as the computing device 1 shown in FIG. 1 described in the first embodiment. Hereinafter, the computing device 1 is referred to as the computing device 1 for simplicity.
[0227]
The computing device 1 has a configuration including a program generating unit 6 that generates another program that facilitates parameter optimization (tuning) from a computer language (program) in a predetermined format. In addition, a compiler 7 for translating the program into an execution format is provided.
[0228]
The program generation unit 6 of the computing device 1 of the present embodiment is configured to generate a main program that calls an actual measurement routine and an estimation routine at the beginning of the main program in order to perform optimization when the main program is executed. . Further, the program generation unit 6 of the computing device 1 can also generate a main program that calls the actual measurement routine and the estimation routine in the loop that calls the subprogram according to the specifier.
[0229]
Below, after explaining the outline of operation | movement of the calculation apparatus 1, a more concrete Example is described.
[0230]
As shown in FIG. 1 and FIG. 2, also in the computing device 1 of the present embodiment, a program to which an automatic tuning mechanism is added is generated using a program describing a specifier. By executing this program, the parameters included in the program can be optimized.
[0231]
More specifically, as shown in FIG. 2, the program generation unit 6 included in the computing device 1 includes a specifier analysis unit 8, a program creation unit 9, and a cost definition function determination unit 10. Then, as in the first embodiment, the program describing the specifier is processed to generate a new program. The generation of the program includes rewriting of the program.
[0232]
In the present embodiment, the configurations and operations of the specifier analysis means 8 and the program creation means 9 are different from those in the first embodiment. This point will be described in more detail below with reference to the drawings.
[0233]
FIGS. 19A and 19B show an example of a program in which a specifier is described, which is a processing target of the computing device 1.
[0234]
A subroutine xxx shown in FIG. 19B is an example of a program having a tuning area B to be optimized in which a specifier is described. Further, the main routine shown in FIG. 19A calls the subroutine xxx.
[0235]
19 (a) and 19 (b) are examples that specifically show only the portions necessary for the description of the present embodiment. For example, subroutine xxx includes other areas as shown in FIG. 19 (b). May be included, and for example, the main routine may include other areas and other processes not shown. In addition, the description in Japanese in the program is not an example of a specific program, but a control operation to be realized by the program is abstractly expressed in Japanese unless otherwise specified. In addition, Japanese in the program may indicate a comment in the program.
[0236]
Here, the specifier analysis unit 8a included in the specifier analysis unit 8 performs the parameters and tuning specified by the specifier for each type of automatic tuning (at the time of installation and before execution is started) as in the above-described embodiment. It is possible to divide the area and deliver it separately to the subsequent mechanisms for processing. In addition, as described in the present embodiment, the specifier analysis unit 8a can perform automatic tuning at the time of execution as the type of automatic tuning.
[0237]
More specifically, the specifier analysis unit 8a determines whether or not there is a specifier that instructs runtime optimization as shown in FIG. 19B, and performs optimization at startup in runtime optimization. Alternatively, in the run-time optimization, a determination is also made on a specifier that designates whether to perform optimization at the time of partial execution.
[0238]
The specifier analysis unit 8a notifies the main program creation unit 9a and the subprogram creation unit 9b of the program creation unit 9 shown in FIG. 2 of information determined based on the designator. And the program creation means 9 performs the process according to the above-mentioned specifier.
[0239]
Here, in the above-described embodiment, dynamic can be used as an example of a specifier that specifies runtime optimization, for example, corresponding to the specifier install that specifies optimization at the time of installation. Also, in runtime optimization, for example, init can be used as an example of a specifier for performing optimization at startup, and here is used as an example of a specifier for performing optimization at the time of partial execution be able to.
[0240]
In the example described below, the program is described using the Fortran language, but the present invention is not limited to this, and described using any functional computer language (C language, C ++ language, etc.). The essence of the processing according to the present invention is the same in the programmed program. Therefore, the processing of the present invention is not affected by the difference in computer language.
[0241]
In the following, description will be made separately on method 1 (startup execution method) using init as a specifier and method 2 (corresponding partial execution method) using here as a specifier.
[0242]
First, in method 1, it is assumed that dynamic and init are specified as the runtime optimization specifier shown in FIG. At this time, in response to the input of this program, the specifier analysis unit 8a, the main program creation unit 9a, and the subprogram creation unit 9b of the computing device 1 perform the main program as shown in FIG. ) And an actual measurement / estimation routine as shown in FIG. Although not shown here, a program for executing the tuning region B with the parameter for optimization as an argument is created as a subprogram for the tuning region B.
[0243]
On the other hand, in method 2, it is assumed that dynamic and here are specified as the runtime optimization specifier shown in FIG. At this time, in response to the input of this program, the specifier analysis unit 8a, the main program creation unit 9a, and the subprogram creation unit 9b of the computing device 1 perform the main program as shown in FIG. ) And an actual measurement / estimation routine as shown in FIG. Although not shown here, a program for executing the tuning region B with the parameter for optimization as an argument is created as a subprogram for the tuning region B.
[0244]
In FIGS. 19A and 19B, an example in which a specifier is described in a subroutine is described, but the present invention is not limited to this. In both method 1 and method 2, even if a specifier is described in the main routine, a subprogram is created for the tuning area for optimization and is called from the main routine as the main program. In this respect, the same processing is performed.
[0245]
As outlined above, the program generated by method 1 is generated by method 2 while Auto_xxx is called as an actual measurement / estimation routine at the beginning of the main program as shown in FIG. 20 (a). As shown in FIG. 21A, the program has a call of Auto_xxx as an actual measurement / estimation routine immediately before the tuning area. Due to this difference, as will be described later, the time for optimization by execution of the actual measurement / estimation routine varies greatly.
[0246]
Hereinafter, a more specific example will be described with reference to a more detailed example of the subprogram. Here, as an embodiment, an application example to the conjugate gradient method (CG method: Conjugate Gradient), which is one of the iterative methods used in solving sparse matrix simultaneous linear equations, will be described.
[0247]
The CG method is a method for obtaining a solution vector x that satisfies the simultaneous linear equations Ax = b when a sparse matrix A and a right-hand vector b are given. Various solutions are known, but the CG method is one of the solutions called iterative solution. In this CG method, the number of iterations (the number of iterations of an I loop described later) is determined depending on the numerical characteristics of the sparse matrix A, so the number of iterations is called “problem-dependent”.
[0248]
First, an application example in which execution time automatic tuning of method 2 (corresponding partial execution method) is specified for a sparse matrix one-vector product calculation part in a CG method subroutine will be described.
[0249]
FIG. 22 shows a specifier (dynamic, here) for performing automatic tuning at the time of method 2 for the sparse matrix one-vector product operation process (q ^ (I) = A p ^ (I)) denoted by reference symbol C7. It is specified.
[0250]
Here, the contents of FIG. 22 will be briefly described. First, each variable shown in FIG. 22 will be described. A shown in FIG. 22 represents a sparse matrix and corresponds to a coefficient matrix of simultaneous linear equations. A is often implemented using, for example, a one-dimensional array. Further, b is an n-dimensional vector of a one-dimensional array and corresponds to the right-hand vector of simultaneous linear equations.
[0251]
In addition, the scalar value in the I loop (the loop for I) is indicated by a subscript “_”. The value of the vector in the I loop is represented by a superscript “^”. The transposition of the vector is represented by “T”. For example, p_ (I) indicates the value of the scalar p in the I loop, p ^ (I) indicates the value of the vector p in the I loop, and p ^ (I) T indicates the vector p in the I loop. Indicates the transposed vector value. Note that the number of iterations of the I loop is not shown in FIG. 22 because it depends on the problem as described above.
[0252]
Also, z ^ (I-1), r ^ (I-1), M, p ^ (I-1), and q ^ (I) are used as auxiliary sequences for creating a program. In addition, p_ (I-1), beta_ (I-1), and a_I are used as auxiliary variables (scalar) for program creation. Here, z ^ (I-1), r ^ (I-1), p ^ (I-1), and q ^ (I) are one-dimensional arrays of n-dimensional vectors. M is a sparse matrix and is often implemented by, for example, a one-dimensional array. P_ (I-1), beta_ (I-1), and a_I are double precision real scalars.
[0253]
Moreover, the process shown by the code | symbol C1 of FIG. 22 is a comment of a program. Using the given vector b, x ^ (0), the difference between the vector product Ax ^ (0) of the matrix A and the vector x and the vector b is calculated to calculate r ^ (0).
[0254]
The process of code C2 means that a vector z ^ (I-1) is obtained using a given sparse matrix M and vector r ^ (I-1). For this solution, it is necessary to use a certain numerical calculation algorithm for generating M that reduces the number of iterations of the CG method and performing processing for obtaining the vector z. About such an algorithm, what is normally used in CG method can be used. Details are omitted here.
[0255]
The processing of the code C3 means that a scalar p_ (I-1) is calculated by performing an inner product operation of the given transposed vector r ^ (I-1) T and vector z ^ (I-1). To do.
[0256]
The process of code C4 means that a vector is copied.
[0257]
The process of code C5 means that scalar beta_ (I-1) is calculated from the division of given scalars p_ (I-1) and p_ (I-2).
[0258]
The process of code C6 means that the vector p ^ (I) is calculated from the given vector z ^ (I-1), scalar beta_ (I-1), and vector p ^ (I-1). For this purpose, it is necessary to add the vector which is the operation result of the scalar vector product beta_ (I-1) p ^ (I-1) and the vector z ^ (I-1).
[0259]
The process of the code C7 means that the vector q ^ (I) is calculated by performing a sparse matrix / vector product of the sparse matrix A and the vector p ^ (I).
[0260]
The process of code C8 is by dividing the scalar value p_ (I-1) by the scalar value of the inner product calculation of the vector transpose p ^ (I) T and the vector q ^ (I), Means to calculate the scalar value a_I.
[0261]
In the process of the code C9, a vector x ^ (I) is calculated by adding the vector x ^ (I-1) to the vector resulting from the product of the scalar value a_I and the vector p ^ (I). Means that.
[0262]
The processing of the code C10 is performed by calculating a vector r ^ (I-1) by calculating a vector r ^ (I-1) and a vector resulting from the product of the scalar value a_I and the vector q ^ (I). Means to calculate.
[0263]
The process of “confirm convergence and repeat if necessary” shown after the code C10 interrupts the iteration in the I loop and branches to the part after enddo if the convergence determination result is sufficient. Means that. Here, there are various methods for calculating the convergence, and any method may be used. For example, as a general processing method, r = | Ax−b | is calculated with respect to x being calculated by the CG method for Ax = b, and whether r is sufficiently small may be inspected.
[0264]
More specifically, the sparse matrix one-vector product operation process indicated by the reference C7 corresponds to a program as shown in FIG.
[0265]
Here, in FIG. 23, in order to show a specific code of a sparse matrix / vector product operation of a sparse matrix A and a vector x, an array (maintaining information) for realizing a data structure for expressing the sparse matrix A is shown. Aval (J), row_ptr (I), and col_ind (J) are used as the array). Further, x (col_ind (J)) is used as an element of the vector x necessary for performing a matrix / vector product operation with the sparse matrix A.
[0266]
More specifically, Aval (J) means a one-dimensional array in which double-precision real values that are values of the sparse matrix A are stored. Further, col_ind (J) is an integer one-dimensional matrix, and stores the column number where the non-zero element of the sparse matrix A exists. Therefore, the element of the vector x corresponding to the non-zero element of the sparse matrix A can be returned by x (col_ind (J)).
[0267]
Also, row_ptr (I) stores the number of the row where the non-zero element of the sparse matrix A exists. These values of row_ptr (I) and col_ind (J) are set when the sparse matrix A is determined. Therefore, these values are static values determined at the library call point. In other words, these values are not dynamically determined values in the program of the CG method, like the values in the auxiliary array when the CG method is programmed.
[0268]
Here, the specifier shown in FIG. 22 designates automatic tuning for performing unrolling processing on the innermost loop (J loop) of the code shown in FIG.
[0269]
Since this loop length is specified by the variable arrays row_ptr (I) and row_ptr (I + 1), the loop length is not fixed. In general, the value of this variable array is not determined until run time. Therefore, this application example can be said to be an example in which only execution-time automatic tuning can be specified.
[0270]
Then, with respect to the program shown in FIG. 23, the unrolling designation in the automatic tuning at the time of execution as shown in FIG. 22 makes this tuning area a subprogram as shown in FIG. That is, the sparse matrix-vector product code of FIG. 23 is rewritten into a program having 1 to 8 unrolling stages as shown in FIG. In FIG. 24, the area indicated by reference sign d1 is shown by omitting the area where the number of unrolling stages is three to seven.
[0271]
Further, the computer 1 creates a main program as shown in FIG. 25A, a program as shown in FIG. 25B, and an actual measurement / estimation routine as shown in FIG. Note that the program codes shown in FIGS. 25A to 25C are simplified codes for the purpose of describing this application example, and are not the same as the codes that are actually generated. For example, each program may further include other processing (not shown).
[0272]
Thus, in the method 2, the execution time of the 8-stage unrolling code shown in FIG. 24 is measured every time the corresponding part shown in FIG. 22 is executed, and the optimum number of stages is obtained. ing.
[0273]
Next, an application example in which execution time automatic tuning of method 1 (startup execution method) is specified will be described.
[0274]
FIG. 26 shows an optimization program corresponding to FIG. In this method 2, as will be described later, the code execution time according to FIG. 24 is measured only once before the CG method subroutine is started, and then the parameter value (J_val Code that refers to the value of) is automatically generated.
[0275]
Here, FIG. 26 and FIG. 22 differ only in the designation of the specifier (init or here), and the others are the same, and thus the description thereof is omitted here.
[0276]
Then, by the unrolling designation for the program of FIG. 26, the tuning area becomes a subprogram as shown in FIG. 24 as in FIG.
[0277]
Then, the computer 1 creates a main program as shown in FIG. 27A, a program as shown in FIG. 27B, and an actual measurement / estimation routine as shown in FIG. Note that the program codes shown in FIGS. 27A to 27C are codes simplified for the purpose of describing the application example, and are not the same as the codes actually generated. For example, each program may further include other processing (not shown).
[0278]
Next, the result of tuning using method 1 and method 2 will be described.
[0279]
It is assumed that the number of iterations of the CG method in method 1 (FIG. 26) and method 2 (FIG. 22), that is, the number of iterations of the I loop in FIGS. 26 and 22 is 100. Note that the number of iterations is a problem-dependent amount that is determined according to the numerical characteristics of the sparse matrix to be solved in practice.
[0280]
The execution time other than the sparse matrix-vector product operation is 0.5 seconds per iteration. Further, the time required for each parameter check (determining the number of unrolling stages), that is, the time per sparse matrix-vector product operation is 1 second. More specifically, this corresponds to the execution time of call Sub_SMVCG (J_val) in which Sub_SMVCG shown in FIG. 24 is specified and executed with an argument J_val having a specific value.
[0281]
At this time, the estimation of the execution time in method 1 (execution method at startup) is as follows. First, in the main routine, Sub_SMVCG is called eight times from Auto_CG (1 second × 8). In the Sub_CG loop, Sub_SMVCG is called instead of Auto_CG (1 second), and other operations are executed (0.5 seconds). This loop is executed 100 times. This requires 158 seconds.
[0282]
In addition, the estimation of execution time in method 2 (corresponding partial execution method) is that Sub_SMVCG is called eight times from Auto_CG (1 second x 8) in the Sub_CG loop, and other operations are executed (0.5 seconds). . Also, J-fixed Sub_SMVCG (J) is executed (1 second). This loop is executed 100 times. This requires 950 seconds.
[0283]
Therefore, the estimated execution time is 158 seconds in method 1, while it is 950 seconds in method 2. Therefore, the method 1 is 950/158 = about 6 times faster than the method 2. As described above, in this example, the method 1 is about 5 to 8 times faster than the method 2.
[0284]
Here, in general, the number of iterations increases as the numerical characteristics of the problem become more severe or difficult. For this reason, according to the above estimation, it can be said that the more difficult the problem is, the larger the difference in execution time between method 1 and method 2. Therefore, it can be said that the method 1 is very effective from the viewpoint of the actual execution time in the parameter optimization processing at the time of inevitably including the parameter tuning time. As in the above application example, the advantage of method 1 is great.
[0285]
As described above, the application of the method of separating the execution time automatic tuning process so that it is performed once before the execution of the subroutine including the corresponding area has caused a problem in the execution time automatic tuning. (1) Redundancy It is possible to solve the problems that the optimization process is repeated, and (2) that the optimization process takes a long time for the reason (1) above.
[0286]
A supplementary explanation will be given for the relationship between the second embodiment and the first embodiment. For example, FIG. 13 in the first embodiment and FIG. 20 in the second embodiment are not the same.
[0287]
First, the example and the processing result described in the first embodiment are limited to the processing when the installation time method and the pre-execution method are specified, and the processing when the execution time method is specified is not described. That is, what corresponds to (calling of Auto_xxx in Sub_xxx) shown in FIG. 21 of the second embodiment is not described as a specific example in the first embodiment. FIG. 13 in the first embodiment relates to processing when an installation time method and a pre-execution method are designated.
[0288]
Also in the first embodiment, the designated location of the tuning area is not limited in the main routine or subroutine. That is, also in the first embodiment, it is possible to call from a subroutine as a main program, not from a main routine.
[0289]
In addition, when execution time optimization is designated in the first embodiment, it corresponds to the method 2 of the second embodiment, and for example, the same code as in FIG. 21 is generated. That is, when specifying the runtime method, generally, there is a reason that the parameter cannot be tuned unless it is at the time of execution. Therefore, it is necessary to generate the same code as FIG. It turns out that.
[0290]
On the other hand, unlike the first embodiment, the method 1 in the second embodiment forcibly calls an automatic tuning routine such as Auto_xxx, regardless of whether the tuning area is specified in the main routine or the subroutine. Move to the beginning of the main program. More specifically, the call of the auto tuning routine is moved before the loop calling the subprogram (I loop). That is, the method 1 does not switch whether the actual measurement / estimation routine is called directly from the main routine or the subroutine called from the main routine.
[0291]
As described above, since the execution time is greatly different between the method 1 and the method 2, the runtime method shown in FIG. 20 of the method 1 can be used as a more preferable one according to the nature of the problem.
[0292]
As described above, the present invention relates to, for example, optimization of parameters in a program stored in a computer, a program for causing a computer to execute, a recording medium, and a computer. In particular, the above-described second embodiment relates to a high-speed optimization method in runtime automatic tuning.
[0293]
Here, the types of automatic tuning for improving the performance of the software can be classified into three types at the time of installation, before execution, and at the time of execution depending on the timing of optimization. Among these three types of automatic tuning, the most important process that needs to be taken into consideration is the automatic tuning at the time of execution. As a method for performing automatic tuning at the time of execution, there is known a method of tuning parameters when a subprogram or a part of a program that is a target area of automatic tuning is executed, as in the above-described embodiment. ing.
[0294]
However, according to the above-described configuration, there is a problem that (1) the redundant optimization process is repeated, and (2) the optimization process takes a long time for the reason (1).
[0295]
Therefore, as described above, in the instruction for optimization at the time of execution, a process (method 1) that is performed only once when a subroutine including a corresponding part is started (before calling) and a process that is performed when the corresponding part is called ( The problem is solved by specifying the processing separately in the two methods of method 2). That is, for example, if the subprogram itself is called from within a loop with a large number of iterations, switching between optimizing immediately before calling the subprogram in the loop or calling outside the loop to optimize To be able to. In addition, the utilization form of the invention function in Embodiment 2 and the outline of the processing mechanism are the same as in Embodiment 1 as described above.
[0296]
The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention.
[0297]
The specific embodiments or examples described above are merely to clarify the technical contents of the present invention, and the present invention is not limited to such specific examples and should not be interpreted in a narrow sense. Various modifications can be made within the scope of the claims, and the modified embodiments are also included in the technical scope of the present invention.
[0298]
【The invention's effect】
As described above, when the computer including the specifier that specifies the area in the program to be optimized and the parameter to be optimized is input, the computing device according to the present invention receives the specifier. The configuration includes a program generation unit that generates a program for executing optimization by actual measurement for the specified region and the parameter.
[0299]
Therefore, if a program in which a predetermined designator is described is input to this computing device, there is an effect that a program for optimizing the parameter designated in the program can be obtained.
[0300]
As described above, the calculation device according to the present invention has the above-described configuration, wherein the program generation unit includes specifier analysis means for extracting the region and the parameter specified by the specifier from the input program. A program creation unit that generates a subprogram including the region extracted by the specifier analysis unit, calls the subprogram, and generates a main program for performing optimization by actual measurement on the parameter It is the structure which contains.
[0301]
Therefore, this configuration has an effect that the above-described computing device according to the present invention can be realized.
[0302]
As described above, in the computer device according to the present invention, in the above configuration, the program creation unit calls the subprogram for each parameter to be called from the main program or included in the main program. This is a configuration for creating an actual measurement routine for measuring time and an estimation routine for estimating an optimum parameter using the required time measured by the actual measurement routine.
[0303]
Therefore, with this configuration, since the actual measurement routine and the estimation routine are called from the main program or included in the main program, it is optimal to simply translate the main program into an executable format and execute it. There is an effect that a parameter can be obtained.
[0304]
As described above, in the computer device according to the present invention, in the above configuration, the program creating means is called up in the loop of the main program in order to perform optimization when the main program is executed. In this case, the main program that calls the actual measurement routine and the estimation routine is generated outside the loop and before the loop.
[0305]
Therefore, since the actual measurement routine is not executed every time inside the loop, the time required for the optimization can be shortened accordingly.
[0306]
As described above, in the computer device according to the present invention, in the above configuration, the program creating means is called up in the loop of the main program in order to perform optimization when the main program is executed. The main program that calls the actual measurement routine and the estimation routine outside the loop and before the loop, or the main program that calls the actual measurement routine and the estimation routine within the loop. One of the programs is selected and generated according to the specifier.
[0307]
Therefore, there is an effect that the number of calls of the actual measurement routine and the estimation routine at the time of optimization can be reduced or normal optimization can be switched according to the specifier.
[0308]
As described above, the calculation apparatus according to the present invention is configured to include a cost definition function library including a cost definition function for approximating the required time measured for each parameter in the above configuration.
[0309]
Therefore, with this configuration, for example, a desired approximation can be performed using the cost definition function included in the cost definition function library.
[0310]
As described above, the calculation apparatus according to the present invention approximates the measured time required in the above-described configuration by sequentially using all the cost definition functions included in the cost definition function library, and most closely approximates them. In this configuration, a cost definition function determining unit that selects a cost definition function with high accuracy is provided.
[0311]
Therefore, with this configuration, there is an effect that an optimum approximation function can be obtained even when the designation of the approximation function used in the estimation routine is not included in the specifier, for example.
[0312]
The calculation apparatus according to the present invention has a tuning information database that stores the region extracted by the specifier analysis unit and the parameter, and the program creation unit and the cost definition function determination unit include the tuning information database. The configuration is such that the region or the parameter is acquired with reference to a tuning information database.
[0313]
Therefore, when the program creation means and the cost definition function determination unit use the area or parameter, it is only necessary to refer to the tuning information database, and it is not necessary to extract the area or parameter each time.
[0314]
As described above, the calculation method according to the present invention, when the program including the specifier that specifies the area in the program to be optimized and the parameter to be optimized is input, A step of generating a program for executing optimization by actual measurement for the specified region and the parameter, and a step of performing optimization by executing the program obtained in the generation step It is the composition which is.
[0315]
Therefore, if this calculation method is executed by a calculation device such as a computer, the above-described calculation device can be realized.
[0316]
As described above, in the calculation method according to the present invention, in the above configuration, when the area is called in the loop of the program in order to perform optimization when the program is executed, the program In the step of generating the program, the program for performing the actual measurement of the required time for each of the parameters for the region and the estimation of the optimum parameter from the actually measured required time before the loop outside the loop. It is the structure to generate.
[0317]
Therefore, since the actual measurement / estimation is not executed every time inside the loop, the time required for the optimization can be shortened accordingly.
[0318]
As described above, in the calculation method according to the present invention, in the above configuration, when the area is called in the loop of the program in order to perform optimization when the program is executed, the program In the process of generating the above-mentioned program, the program that performs the actual measurement of the required time for each of the parameters for the region and the estimation of the optimal parameter from the measured required time before the loop outside the loop Or the above program that performs the actual measurement of the required time for each of the parameters in the loop and the estimation of the optimum parameter from the actually measured required time, depending on the specifier. It is the structure which selects and produces | generates.
[0319]
Therefore, there is an effect that it is possible to easily switch between the execution of the actual measurement / estimation outside the loop and before the loop, or the actual measurement / estimation performed every time inside the loop by setting the specifier. .
[0320]
As described above, the program according to the present invention is configured to cause the computer to operate as each unit of the above-described computing device in the above configuration.
[0321]
Therefore, if this program is used, the above-described computing device can be realized.
[0322]
As described above, the recording medium according to the present invention has the above-described configuration in which the above-described program is recorded in a computer-readable manner.
[0323]
Therefore, if the program of this recording medium is read and executed by a computer, the above-described computing device can be realized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a computing device according to the present invention.
FIG. 2 is a block diagram illustrating a configuration of a program generation unit of the calculation apparatus.
FIG. 3 is a diagram illustrating an example of a program input to the calculation apparatus.
4A is a diagram showing a specific example of a tuning area of the program, and FIG. 4B is an example of a program obtained by processing the tuning area shown in FIG. 4A by the program generation unit. FIG.
5A is a diagram showing an example of a main program generated by the program generation unit from the program shown in FIG. 3, and FIG. 5B is a diagram in which the program shown in FIG. 3 is rewritten by the program generation unit. It is a figure which shows an example, (c) is a figure which shows an example of the subprogram produced | generated by the said program production | generation part from the program shown in FIG. 3, (d) is the said program production | generation part from the program shown in FIG. It is a figure which shows an example of the program for tuning produced | generated.
FIG. 6 is a flowchart illustrating an example of a cost definition function determination process by the program generation unit.
FIG. 7 is a flowchart illustrating an example of sampling point determination processing by the program generation unit.
FIG. 8 is a flowchart showing an example of parameter estimation processing (a) by the program generation unit.
FIG. 9 is a flowchart showing an example of parameter estimation processing (b) by the program generation unit.
FIG. 10 is a flowchart illustrating an example of a measurement loop process performed by the program generation unit.
FIG. 11 is a flowchart showing an outline of processing by the calculation apparatus.
FIG. 12 is a diagram showing another example of a program input to the calculation apparatus.
13A is a diagram showing an example of a main program generated by the program generation unit from the program shown in FIG. 12, and FIG. 13B is generated by the program generation unit from the program shown in FIG. It is a figure which shows an example of the program for tuning.
14A is a diagram showing a part of an example of a subprogram generated by rewriting the program shown in FIG. 12 by the program generation unit, and FIG. 14B is a part different from FIG. FIG.
FIG. 15 is a block diagram showing a part of the computing device.
FIG. 16 is a block diagram showing another part of the calculation apparatus.
17A is a flowchart showing a procedure for optimization at the time of installation, FIG. 17B is a flowchart showing a procedure for optimization before library execution, and FIG. 17C shows a procedure for optimization at library execution time. It is a flowchart.
FIG. 18A is a block diagram showing a part of an example of a conventional computer, and FIG. 18B is a block diagram showing a part of another example of a conventional computer.
FIG. 19A is a diagram showing a part of still another example of a program input to the calculation apparatus, and FIG. 19B is a diagram showing a part of the program different from (a). .
20A is a diagram illustrating an example of a main program generated by the program generation unit from the programs illustrated in FIGS. 19A and 19B, and FIG. 20B is a diagram illustrating FIGS. 19A and 19B. FIG. 20C is a diagram showing an example of a program rewritten from the program shown in FIG. 19 by the program generation unit, and FIG. 19C is an example of a tuning program generated by the program generation unit from the program shown in FIGS. FIG.
FIG. 21A is a diagram showing another example of the main program generated by the program generation unit from the program shown in FIGS. 19A and 19B, and FIG. It is a figure which shows another example of the program rewritten by the said program production | generation part from the program shown to b), (c) is the tuning produced | generated by the said program production | generation part from the program shown to Fig.19 (a) (b) It is a figure which shows another example of the program for operation.
FIG. 22 is a diagram showing still another example of a program input to the calculation apparatus.
FIG. 23 is a diagram showing an example in which a part of the program shown in FIG. 22 is described more specifically.
24 is a diagram showing an example of a subprogram generated from the programs shown in FIGS. 22 and 23. FIG.
FIG. 25A is a diagram showing another example of the main program generated by the program generation unit, and FIG. 25B is a diagram showing another example of the program rewritten by the program generation unit. (C) is a figure which shows another example of the program for tuning produced | generated by the said program production | generation part from the said program.
FIG. 26 is a diagram showing an example of a program input to the computing device, which is different from FIG.
FIG. 27
(A) is a figure which shows another example of the main program produced | generated by the said program production | generation part, (b) is a figure which shows another example of the program rewritten by the said program production | generation part, c) is a diagram showing still another example of a tuning program generated by the program generator from the program.
[Explanation of symbols]
1 computing device
2 processor
3 User library (library)
4 Parameter adjustment layer
5 Parameter information file
6 Program generator
8 Designator analysis means
9 Program creation means
10 Cost definition function decision means
10a Tuning information database
10b Cost definition function library
10c Cost definition function determination unit

Claims

In a calculation apparatus equipped with a cost definition function library for optimizing parameters included in an input program,
When the program containing the specifier that specifies the area to be optimized, the parameter to be optimized, and the cost definition function is input, the code in the area specified by the specifier And a specifier analysis means for extracting the parameter specified by the specifier from the program,
Cost definition function selection means for selecting a cost definition function specified by the specifier from among cost definition functions that are polynomials having the parameters as variables included in the cost definition function library;
The code in the area specified by the specifier, or the code obtained by performing unrolling processing specified by the specifier on the code in the area specified by the specifier, and A subprogram that does not include the code outside the region specified by the specifier, and a main program that calls the subprogram and performs optimization by actual measurement on the parameter specified by the specifier Program generating means for generating ,
The program generation means is selected by an actual measurement routine that calls the subprogram for each parameter and measures the required time to be called from the main program or included in the main program, and the cost definition selection means. The coefficient of the cost definition function is determined by the least square method so as to best approximate the required time measured by the measurement routine, and the value of the parameter that minimizes the value of the cost definition function with the coefficient is set. Create an estimation routine to estimate,
A computing device characterized by that.

The program generation means generates a new program by replacing the code in the area specified by the specifier in the input program with a code for calling the subprogram, and the measurement routine and The calculation apparatus according to claim 1 , wherein a main program including a code for calling the new program is generated after the code for calling the estimation routine.

When the cost definition function specified by the specifier is not included in the cost definition function library, the cost definition function selection means is configured to make the cost definition so as to best approximate the required time measured by the actual measurement routine. The coefficient of each cost definition function included in the function library is determined by the least square method, and the cost definition function with the most approximate accuracy is selected from the cost definition functions with the specified coefficients .
The computing apparatus according to claim 1 or 2 , wherein

It has a tuning information database that stores the area and the parameter,
4. The calculation apparatus according to claim 3 , wherein the program generation unit and the cost definition function determination unit acquire the region or the parameter with reference to the tuning information database.

In a calculation method for optimizing parameters included in a program input to a calculation device including a specifier analysis means , a program generation means , a cost definition function selection means, and a cost definition function library ,
When the above program that includes a specifier that specifies the area to be optimized, the parameter to be optimized, and the cost definition function is input, the specifier analysis means specifies the specifier. A specifier analysis step of extracting from the program the code specified in the region and the parameter specified by the specifier;
Cost definition function selection means for selecting the cost definition function specified by the specifier from the cost definition functions that are polynomials having the parameter as a variable included in the cost definition function library. Process,
The program generation means obtains the code in the area specified by the specifier or the unrolling process specified by the specifier on the code in the area specified by the specifier. A subprogram that does not include the code outside the area specified by the specifier, and calls the subprogram to perform optimization by actual measurement on the parameter specified by the specifier A program generating step for generating a main program for
In the program generation step, the program generation means calls the subprogram for each parameter to be called from the main program or included in the main program, and measures the required time, and the actual measurement routine The coefficient of the cost definition function selected in the cost definition function selection step is determined by the least square method so that the required time measured in step 1 is best approximated, and the value of the cost definition function with the coefficient is minimized An estimation routine for estimating the value of the parameter is created .

In the program generation step, the program generation means generates a new program by replacing the code in the area specified by the specifier in the input program with a code for calling the subprogram. The calculation method according to claim 5 , further comprising: generating a main program including a code for calling the new program after the code for calling the actual measurement routine and the estimation routine.

When the cost definition function specified by the specifier is not included in the cost definition function library, in the cost definition function selection step, the cost definition function selection means calculates the required time measured by the actual measurement routine. In order to achieve the best approximation, the coefficient of each cost definition function included in the cost definition function library is determined by the least square method, and the cost definition function with the highest approximation accuracy is selected from the cost definition functions with the coefficients defined.
The calculation method according to claim 5 or 6, characterized in that:

The program for operating a computer as each means of the calculation apparatus of any one of Claim 1 to 4 .

The recording medium which recorded the program of Claim 8 so that computer reading was possible.