JP2004318538A

JP2004318538A - Method for monitoring performance

Info

Publication number: JP2004318538A
Application number: JP2003112279A
Authority: JP
Inventors: Satoshi Hirai; 聡平井; Koichi Kumon; 耕一久門
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-04-17
Filing date: 2003-04-17
Publication date: 2004-11-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for monitoring the performance of a processor having a plurality of privilege modes whereby when a counter for counting the occurrence of events overflows, an interrupt handler matching the privilege level thereof in operation can be directly obtained. <P>SOLUTION: The method for monitoring performance is disclosed. A processor CPU having a plurality of privilege modes is provided with a counter means 1-0 for counting events within the processor and exchange events outside it, and an indicating means 1-2 for indicating destinations to the processor for each privilege level of the processor CPU when the counter means 1-0 overflows, to render the destinations controllable at each privilege level. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】本発明は、プロセッサが持つ複数の特権レベルを利用して動作するソフトウェアの動作状態を正確に効率よく分析するための性能モニタリング方式に係り、特にソフトウェアで実現する仮想マシン環境において、各階層のプログラムの挙動分析やプログラムの問題点をチェックし改善するチューニングに対応できるものに関する。
【０００２】
【従来の技術】現在のプロセッサの多くは、プロセッサ内部のイベントおよび外部とのやり取りのイベントをカウントする性能モニタリング・カウンタを有している。
【０００３】
例えばインテル社のペンティアムプロセッサでは複数本のカウンタを持ち、クロック数、実行命令数、あるいはキャッシュミス数などの多数のイベントの中から選択してカウントできるように構成されている（例えば、非特許文献１参照）。
【０００４】
これによりプログラムのどの部分が多く使用されていたか等の動作の解析を行うことができる。
【０００５】
またインターナショナル・ビジネス・マシン社のパワーＰＣプロセッサでも、同様に構成されており、多数のイベントの中から選択してカウントすることができる。
【０００６】
これらのプロセッサの中には、カウンタがオーバフローした際に、特定のあるいは指定したベクタに割込みを発生させる機能を有するものがある。
【０００７】
この機能を利用して、特定のイベントが発生したアドレスをサンプリングすることにより、プログラム中のどこの部分で該当イベントが多く発生しているかを解析することができる。
【０００８】
そしてこの手法を用いることにより、オペレーティング・システムやアプリケーション・プログラムの挙動分析やボトルネック解析等のチューニングを行うことができる。なおこのような動作を行う市販ソフトウェアとしてはインテル社のＶＴｕｎｅなどが存在する。
【０００９】
【非特許文献１】
インテル社発行「ＩＡ−３２インテルアーキテクチャソフトウェアデベロッパァズマニアル第３巻：システムプログラミングガイド」
（ＩＡ−３２ＩｎｔｅｌＡｒｃｈｉｔｅｃｔｕｒｅＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｅｒ’ｓＭａｎｕａｌＶｏｌｕｍｅ３：ＳｙｓｔｅｍＰｒｏｇｒａｍｍｉｎｇＧｕｉｄｅ
ＣＨＡＰＴＥＲ１５ＤｅｂｕｇｇｉｎｇａｎｄＰｅｒｆｏｒｍａｎｃｅ
Ｍｏｎｉｔｏｒｉｎｇ
特に「１５、８ＰＥＲＦＯＲＭＡＮＣＥＭＯＮＩＴＯＲＩＮＧＯＶＥＲＶＩＥＷ」以降の部分参照。）
Ｉｎｔｅｌ社Ｗｅｂサイト（ｈｔｔｐ：／／ｗｗｗ．ｉｎｔｅｌ．ｃｏｍ）
【００１０】
【発明が解決しようとする課題】従来の性能モニタリング方式では、図６に示す如く、ハードウェア１、仮想マシンモニタ２、オペレーティングシステム（以下ＯＳという）ＯＳ１、ＯＳ２・・・、アプリケーション・プログラム（以下アプリという）ａ１、ａ２、ａ３、ａ４・・・等で仮想マシン環境において、仮想マシンモニタ２にオーバフロー収集手段１０を設け、前記イベントの発生をカウントするカウンタのオーバフローを検出していた。
【００１１】
なお特権レベルは、ハードウェアを構成するプロセッサの機能の制限状態を示すものであって数値の低いもの程高く、特権レベル０の仮想マシンモニタ２では、ハードウェアのすべてのリソースを使用することができ、特権レベル２の低い部分では、特権レベルの高い部分に比して直接使用可能なリソース部分がかなり制限されている。
【００１２】
このように、従来の技術では、前記イベントの発生をカウントするカウンタがオーバフローした際に、発生させることのできる割込み先が１つに固定されていた。またプロセッサの動作状況に応じて、つまり特権レベルに応じて割込み先を変えるといった機能がなかった。
【００１３】
このためプロセッサの異なった特権モードに複数のソフトウェアを動作させるような場合、それぞれの階層で動作するソフトウェアに直接割込みを発生させることができず、一度根本となる部分つまり全体を制御している仮想マシンモニタ２の部分で割込みを受け、それぞれの階層に対して通知を行って、プログラムの動作を特徴づけるデータを収集するプロファイリングに必要な各階層の動作状態を把握するための情報収集を行うため、このオーバヘッドの発生と前記情報収集の間にプログラムが動作して正確さが失われたり、細部にずれが生じて分析が不充分になったり、オーバヘッドや測定の歪みが生じていた。
【００１４】
特に、図６に示すような仮想マシン環境において、一般的に仮想マシンモニタ２で割込み処理を行わせるため、その上で動作するＯＳが直接割込みを受けることができず、仮想マシンモニタ上で動作するＯＳやアプリのチューニングつまりプログラムの問題点を正確に把握してこれを改善することが難しかった。
【００１５】
したがって本発明の目的は、このカウンタのオーバフローが発生したときに低オーバヘッドでプロファイリングが可能な性能モニタリング方式を提供することである。
【００１６】
【課題を解決するための手段】本発明は、プロセッサの動作状況に応じてイベントカウンタのオーバフロー割込み先を動的に変える機能およびこの機能を利用した性能モニタリング方式を提供するものである。
【００１７】
プロセッサが動作する複数の特権レベル毎に個別に割込み先登録レジスタを持ち、プロセッサが現在動作している特権レベルに応じて割込み先を変更する。またこの機能を利用し、各特権レベル毎に動作する複数のソフトウェアにイベントカウンタのオーバフロー割込み処理手段を持たせ、各階層で必要な情報を採取する。
【００１８】
これにより、プロセッサの異なった特権モードに複数のソフトウェアを動作させるような場合、特にソフトウェアで実現する仮想マシン環境において、仮想マシンモニタ、およびその上で動作する各オペレーティングシステムが同時平行に低オーバヘッドでプロファイリングを行うことが可能となる。
【００１９】
本発明の原理図を図１に示す。図１において、１はハードウェア、２は仮想マシンモニタ、２０は仮想マシンモニタに設けたイベントカウンタのオーバフロー割込み処理手段、２１はＯＳ１に設けたイベントカウンタのオーバフロー割込み処理手段、２２はＯＳ２に設けたイベントカウンタのオーバフロー割込み処理手段、２３はアプリａ１に設けたイベントカウンタのオーバフロー割込み処理手段、２４はアプリａ２に設けたイベントカウンタのオーバフロー割込み処理手段、２５、２６はそれぞれアプリａ３、ａ４に設けたイベントカウンタのオーバフロー割込み処理手段を示す。これら割込み処理手段２３、２４の一方、割込み処理手段２５、２６の一方は省略することもできる。そして仮想マシンモニタの特権レベルは「０」であり、ＯＳ１、ＯＳ２の特権レベルは「１」であり、アプリａ１〜ａ４の特権レベルは「２」である。
【００２０】
ＯＳ１、ＯＳ２は仮想マシンモニタ２により選択制御され、またアプリａ１、ａ２はＯＳ１により選択制御され、アプリａ３、ａ４はＯＳ２により選択制御される。
【００２１】
本発明における前記目的は下記（１）〜（４）により達成することができる。
【００２２】
（１）複数の特権モードを持つプロセッサにおいて、プロセッサ内部のイベント及び外部とのやり取りのイベントをカウントするカウンタ手段と、特権レベル毎に割込み先を指示した指示手段を設け、前記カウンタ手段のオーバフロー時のプロセッサの特権レベル毎にプロセッサに対して割込み先を制御可能とした性能モニタリング方式。
【００２３】
（２）前記（１）において、前記カウンタ手段のオーバフロー時に、プロセッサの特権レベル毎に割込み先位置、割込みレベル、割込みマスクを指定可能としたことを特徴とする性能モニタリング方式。
【００２４】
（３）前記（１）または（２）において、プロセッサの異なった特権モードに複数のソフトウェアを動作させ、各特権レベル毎のソフトウェアが前記イベントをカウントするカウント手段のオーバフロー割込みを直接得ることを可能とした性能モニタリング方式。
【００２５】
（４）前記（３）において、プロセッサの特権レベル毎に複数のソフトウェアを動作させ、同一特権レベル内のイベントカウンタ手段のオーバフロー割込みハンドラを同一の論理アドレスに配置することにより、複数ソフトウェアのプロファイリングを平行して行うことを可能とする性能モニタリング方式。
【００２６】
これにより下記の作用効果を奏することができる。
【００２７】
（１）イベント計測用のカウンタがオーバフローしたとき、プロセッサの特権レベル毎に割込み制御するように構成したので、特権レベルに応じて個別に割込み先に割込むことができ、直ちに必要なデータを収集できるので、時間的なずれのない、正確な挙動分析やボトルネック解析を行うことができる。
【００２８】
（２）特権レベル毎に割込み先位置、割込みレベル、割込みマスクを指定できるので、性能解析に不必要と思われる部分をマスクするなど、更に細かい制御をすることができる。
【００２９】
（３）プロセッサの異なった特権モードに複数のソフトウェアを動作させ、前記カウンタのオーバフロー割込みに対する適切な割込みハンドラを直接呼出すことが可能になるので、オーバヘッドがなく、正確な挙動分析、ボトルネック解析などのチューニングを行うことができる。
【００３０】
（４）プロセッサの特権レベル毎に複数のソフトウェアを動作させ、同一特権レベル内のイベントカウンタ手段のオーバフロー割込みハンドラを同一の論理アドレスに配置したので、例えばＯＳが切替ったときでもベクターテーブルを書き替える必要がないので、オーバヘッドを少なく、ＯＳやアプリのプロファイリングを平行して行うことが可能となる。
【００３１】
【発明の実施の形態】本発明の一実施の形態を図２にもとづき説明する。図２において、１−０はイベント・カウンタ、１−１は特権レベルチェック手段、１−２はベクタ・テーブル・レジスタであり、ハードウェア１内のプロセッサＣＰＵに有するものである。プロセッサＣＰＵはメモリＭ内に保持されている命令１、命令２、命令３・・・等で構成されるプログラムを実行するものである。
【００３２】
イベントカウンタ１−０は、このプログラムの実行に際し、あらかじめ定められたモニタリング・イベント例えばキャッシュミスが発生したときインクリメントされるものであり、このイベントカウンタ１−０のカウント値があらかじめ定められた値を越えたときつまりカウンタオーバフローが発生したとき、これをプロセッサが有する特権レベルチェック手段１−１に通知して、そのときのモニタリングイベントを発生させた命令の特権レベルをチェックし、ベクタ・テーブル・レジスタ１−２から一致する特権レベルの区分に設定された割込み処理を割込み処理部に割込み発生を通知して行わせる。
【００３３】
ベクタ・テーブル・レジスタ１−２には、特権レベル０、１、２毎に、この割込み処理のために実行すべき割込みハンドラアドレスが記入されており、これを実行することにより、どのイベント（例えばキャッシュミス、実行命令数、クロック数等）が、どのアドレスの命令で、何時カウントオーバフローが発生したかを示す情報等が基本的に収集される。
【００３４】
イベントカウンタ１−０は、キャッシュミス、クロック数、実行命令数等のイベント毎に設けるが、複数のイベント間で共通のイベントカウンタを設け、複数のイベント間でまとめてカウントアップしてもよい。
【００３５】
このようにして命令１、２、３・・・を順次実行中にイベントカウンタ１−０がオーバフローした場合、イベントを発生させた命令の特権レベルをチェックし、一致する特権レベルのベクタ・テーブル・レジスタ１−２に設定された割込み処理を行う。なお図２において命令１、２、３・・・の左側の数字１００、１１０、１２０・・・はその命令の記入されたメモリアドレスを示す。このようにして性能モニタリング・カウンタであるイベントカウンタのオーバフロー割込みに対し、適切な割込みハンドラを直接呼出すことが可能になる。
【００３６】
本発明の第２の実施の形態を図３にもとづき説明する。
【００３７】
第２の実施の形態では、イベントを計測するイベントカウンタの割込み機能を制御するものであり、このため、図３に示す如く、特権レベル数（この例では３レベル）に応じたベクタ・テーブル・レジスタ１−２−０を用意し、各々のベクタ・テーブル・レジスタ１−２−０にはマスク・ビット、割込みレベル、割込み番号を設定可能とする。
【００３８】
マスク・ビットは割込みを有効にするか否かを示すものであり、例えば「０」で有効を示し、「１」で無効を示す。
【００３９】
割込みレベルは同時に割込みが発生したときその優先度を示すもので、例えば「０」が最優先を示し、「１」が次の優先度を示す。図３の例では特権レベル０と１のイベントカウンタが同時にオーバフローしたとき、特権レベル０の方が優先処理されることを示す。
【００４０】
割込み番号は、割込み時に参照する割込み・ベクタ・テーブル１−３の番号を示す。この割込み番号はイベントカウンタのオーバフロー時に収集したい情報の種類に応じて任意に割当てることができる。図３の例では、特権レベル０のイベントカウンタがオーバフローした場合、割込み・ベクタ・テーブル１−３の割込み番号３で示す割込み３ハンドラアドレスの割込みハンドラにもとづく割込み処理が行われ、特権レベル１のイベントカウンタがオーバフローした場合、割込み番号２で示す割込み２ハンドラアドレスの割込みハンドラにもとづく割込み処理が行われ、特権レベル２のイベントカウンタがオーバフローした場合、割込み番号１で示す割込み１ハンドラアドレスの割込みハンドラにもとづく割込み処理が行われる。
【００４１】
このように、イベントカウンタがオーバフローした場合、かつイベントを発生させた命令の特権レベルに応じたベクタ・テーブル・レジスタ１−２−０のマスク・ビットが有効に設定されていた場合は、割込み番号に相当する割込みベクタテーブルを参照して割込を発生させる。
【００４２】
このようにして、特定の特権レベルの情報収集を重点的に行うことが可能になるなど、きめ細かな制御ができる。
【００４３】
本発明の第３の実施の形態を図４、図７を参照して説明する。
【００４４】
図７は従来の、ソフトウェアで実現する仮想マシン環境における割込み処理動作図を示し、図４は本発明の第３の実施の形態を示す。
【００４５】
従来では、図７におけるメモリＭの割込みベクタ・テーブルＩＴ内のすべての割込みハンドラアドレスには、通常割込み処理ルーチンである仮想マシンモニタ内にある割込みハンドラが登録されている。
【００４６】
したがってイベントカウンタＣＮＴのオーバフロー割込みが発生すると、そのイベントの特権レベルが０でも１でもこれにかかわらず、まず仮想マシンモニタ内の割込みハンドラＶＩＨが呼出され、その後に、その時点で仮想マシンモニタ上で動作しているＯＳ、例えばＯＳ１の割込みベクタ・テーブルＴ_１に登録されている、あらかじめ決められた、例えば「割込み１」の割込みハンドラが呼び出され、割込み処理が行われる。
【００４７】
これでは、前述の如く、仮想マシンモニタ内の割込みハンドラを経由して特権レベルに応じた割込みハンドラを呼出するため、オーバヘッドや測定のひずみが生じていた。
【００４８】
これを改善するため、図４に示すこの実施の形態では、イベントカウンタＩＣでオーバフロー割込みが発生すると、そのときの特権レベルをチェックし、割込みレベルに応じてベクタ・テーブル・レジスタ１−０を参照する。ベクタ・テーブル・レジスタ１−０には、オーバフローによる割込みが発生した時に参照されるメモリＭ中の割込みベクタ・テーブル内に各特権レベルで動作しているソフトウェアの割込みハンドラのアドレスを登録しておく。
【００４９】
これにより、例えば特権レベル１のＯＳ１が動作しているときにイベントカウンタがオーバフローしたとき、ベクタ・テーブル・レジスタ１−０の特権１の区分を参照することにより、直ちに特権１用の割込みハンドラ「割込み１」を呼出すことが可能となり、仮想マシンモニタの割込みハンドラを経由することなく、直接に各特権レベルに応じた割込みハンドラを呼出すことができる。
【００５０】
これにより図４に示す如く、プロセッサの異なった特権モードに複数のソフトウェア、例えば仮想マシンモニタ、ＯＳ１、ＯＳ２、アプリａ１、ａ２、ａ３、ａ４を同時に動作させている場合でも、適切な割込みハンドラを直接呼出すことができる。
【００５１】
本発明の第４の実施の形態を図５を参照して説明する。
【００５２】
割込みベクタテーブル内に、各特権レベルで動作しているソフトウェアの割込みハンドラを登録した場合でも、同一特権レベル内に複数のソフトウェアが動作していた場合、例えばＯＳ１とＯＳ２が動作していた場合、ソフトウェアを切換える毎に割込みベクタテーブル内のハンドラアドレスを変更する必要がある。
【００５３】
これを改善するため、第４の実施の形態では、同一特権レベル内のソフトウェア、例えばＯＳ１およびＯＳ２のイベントカウンタのオーバフロー割込みハンドラを、図５に示す如く同一の論理アドレスに配置する。
【００５４】
ＯＳ１とＯＳ２の領域は、図５の物理メモリ、論理メモリに示すように、論理アドレスが同じでも全く別であり、その内容も別である。異なる物理アドレスをたまたま同じ論理アドレスにマップしておく。実際にはＯＳ１とＯＳ２のイベントカウンタの割込みハンドラの内容は別のものである。
【００５５】
このようにイベントカウンタのオーバフロー割込みハンドラアドレスを同一の論理アドレスに配置することにより、特権レベル間だけでなく、同一特権レベル内でのソフトウェアの切換えにおいても、該当処理を行う割込みハンドラを直接呼出すことが可能になる。
【００５６】
以上説明のように、本発明によればプロセッサの異なった特権モードに複数のソフトウェアを動作させるような場合、また同一特権モードで複数のソフトウェアを同時に動作させるような場合に、性能モニタリング・カウンタのオーバフロー割込みに対して適切な割込みハンドラを直接呼出すことが可能となる。
【００５７】
これによりソフトウェアで実現する仮想マシン環境において、仮想マシン・モニタおよびその上で動作する各オペレーティングシステムおよびアプリケーションのプロファイリングを同時平行に低オーバヘッドで行うことができ、挙動分析やボトルネック解析等チューニングを効率良く正確に行うことが可能となる。
【００５８】
【発明の効果】本発明により下記の効果を奏することができる。
【００５９】
（１）イベント計測用のカウンタがオーバフローしたとき、プロセッサの特権レベル毎に割込み制御するように構成したので、特権レベルに応じて個別に割込み先に割込むことができ、直ちに必要なデータを収集できるので、時間的なずれのない、正確な挙動分析やボトルネック解析を行うことができる。
【００６０】
（２）特権レベル毎に割込み先位置、割込みレベル、割込みマスクを指定できるので、性能解析に不必要と思われる部分をマスクするなど、更に細かい制御をすることができる。
【００６１】
（３）プロセッサの異なった特権モードに複数のソフトウェアを動作させ、前記カウンタのオーバフロー割込みに対する適切な割込みハンドラを直接呼出すことが可能になるので、オーバヘッドがなく、正確な挙動分析、ボトルネック解析などのチューニングを行うことができる。
【００６２】
（４）プロセッサの特権レベル毎に複数のソフトウェアを動作させ、同一特権レベル内のイベントカウンタ手段のオーバフロー割込みハンドラを同一の論理アドレスに配置したので、例えばＯＳが切替ったときでもベクターテーブルを書き替える必要がないので、オーバヘッドを少なく、ＯＳやアプリのプロファイリングを平行して行うことが可能となる。
【図面の簡単な説明】
【図１】本発明の原理図である。
【図２】本発明の一実施の形態を示す。
【図３】本発明の第２の実施の形態を示す。
【図４】本発明の第３の実施の形態を示す。
【図５】本発明の第４の実施の形態を示す。
【図６】従来例を示す。
【図７】仮想マシン環境における割込処理動作図を示す。
【符号の説明】
１ハードウェア
２仮想マシンモニタ[0001]
[0001] 1. Field of the Invention [0002] The present invention relates to a performance monitoring method for accurately and efficiently analyzing the operating state of software that operates using a plurality of privilege levels of a processor, and more particularly to a virtual machine implemented by software. In an environment, the present invention relates to a program which can cope with a behavior analysis of a program in each layer and a tuning for checking and improving a problem of the program.
[0002]
2. Description of the Related Art Many modern processors have a performance monitoring counter for counting events inside the processor and events for external communication.
[0003]
For example, an Intel Pentium processor has a plurality of counters, and is configured to be able to select and count from a number of events such as the number of clocks, the number of executed instructions, or the number of cache misses (for example, Non-Patent Document 1).
[0004]
As a result, it is possible to analyze an operation such as which part of the program has been used a lot.
[0005]
The PowerPC processor of International Business Machines Corporation has the same configuration, and can select and count from a large number of events.
[0006]
Some of these processors have a function of generating an interrupt at a specific or specified vector when the counter overflows.
[0007]
By using this function to sample the address where a specific event has occurred, it is possible to analyze where in the program the corresponding event occurs frequently.
[0008]
By using this technique, it is possible to perform tuning such as behavior analysis and bottleneck analysis of the operating system and application programs. Note that commercially available software that performs such an operation includes VTune manufactured by Intel Corporation.
[0009]
[Non-patent document 1]
"IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide" published by Intel Corporation
(IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide
CHAPTER15 Debugging and Performance
Monitoring
In particular, see the part after “15.8 PERFORMANE MONITORING OVERVIEW”. )
Intel's Web site (http://www.intel.com)
[0010]
According to the conventional performance monitoring method, as shown in FIG. 6, a hardware 1, a virtual machine monitor 2, an operating system (hereinafter referred to as OS) OS1, OS2,. In the virtual machine environment, the overflow collecting means 10 is provided in the virtual machine monitor 2 at a1, a2, a3, a4,... To detect overflow of a counter for counting the occurrence of the event.
[0011]
The privilege level indicates the restricted state of the function of the processor constituting the hardware, and the lower the numerical value is, the higher the privilege level is. The virtual machine monitor 2 having the privilege level 0 can use all the resources of the hardware. In the lower privilege level 2, the directly usable resource portion is considerably restricted compared to the higher privilege level portion.
[0012]
As described above, in the related art, when the counter that counts the occurrence of the event overflows, only one interrupt destination can be generated. Further, there is no function of changing the interrupt destination according to the operation state of the processor, that is, according to the privilege level.
[0013]
For this reason, when a plurality of pieces of software are operated in different privilege modes of the processor, it is not possible to directly generate an interrupt in the software operating in each layer, and the virtual part which controls the root part, that is, the whole once is controlled. The machine monitor 2 receives an interrupt, notifies each layer, and collects data that characterizes the operation of the program. In order to collect information for grasping the operation state of each layer required for profiling. The program is operated between the occurrence of the overhead and the information collection, causing a loss of accuracy, deviations in details, insufficient analysis, and overhead and distortion of measurement.
[0014]
In particular, in a virtual machine environment as shown in FIG. 6, interrupt processing is generally performed by the virtual machine monitor 2, so that the OS running thereon cannot receive an interrupt directly, and operates on the virtual machine monitor. It is difficult to accurately grasp the problem of the OS or application tuning, that is, the problem of the program, and to improve the problem.
[0015]
Accordingly, it is an object of the present invention to provide a performance monitoring method capable of profiling with low overhead when the counter overflows.
[0016]
SUMMARY OF THE INVENTION The present invention provides a function for dynamically changing an overflow interrupt destination of an event counter in accordance with the operation state of a processor, and a performance monitoring system using this function.
[0017]
An interrupt destination register is individually provided for each of a plurality of privilege levels in which the processor operates, and the interrupt destination is changed according to the privilege level in which the processor is currently operating. Also, by utilizing this function, a plurality of software operating at each privilege level are provided with an overflow interrupt processing means of an event counter, and necessary information is collected at each layer.
[0018]
As a result, in a case where a plurality of software programs are operated in different privileged modes of the processor, particularly in a virtual machine environment implemented by software, the virtual machine monitor and each operating system running thereon can be simultaneously operated with low overhead. Profiling can be performed.
[0019]
FIG. 1 shows a principle diagram of the present invention. In FIG. 1, 1 is hardware, 2 is a virtual machine monitor, 20 is an event counter overflow interrupt processing means provided in the virtual machine monitor, 21 is an event counter overflow interrupt processing means provided in the OS1, and 22 is provided in the OS2. The overflow interrupt processing means of the event counter provided in the application a1, the overflow interrupt processing means of the event counter provided in the application a1, the overflow interrupt processing means of the event counter provided in the application a2, and the overflow interrupt processing means 25 and 26 are provided in the applications a3 and a4, respectively. 5 shows overflow interrupt processing means of the event counter. One of the interrupt processing units 23 and 24 and one of the interrupt processing units 25 and 26 may be omitted. The privilege level of the virtual machine monitor is “0”, the privilege levels of OS1 and OS2 are “1”, and the privilege levels of the applications a1 to a4 are “2”.
[0020]
The OS1 and OS2 are selectively controlled by the virtual machine monitor 2, the applications a1 and a2 are selectively controlled by the OS1, and the applications a3 and a4 are selectively controlled by the OS2.
[0021]
The object in the present invention can be achieved by the following (1) to (4).
[0022]
(1) In a processor having a plurality of privilege modes, a counter means for counting an event inside the processor and an event of exchange with the outside, and an instruction means for designating an interrupt destination for each privilege level are provided. A performance monitoring method that allows the processor to control the interrupt destination for each processor privilege level.
[0023]
(2) The performance monitoring method according to (1), wherein an interrupt destination position, an interrupt level, and an interrupt mask can be designated for each processor privilege level when the counter means overflows.
[0024]
(3) In the above (1) or (2), a plurality of software can be operated in different privilege modes of the processor, and the software of each privilege level can directly obtain an overflow interrupt of the counting means for counting the event. Performance monitoring method.
[0025]
(4) In the above (3), profiling of a plurality of software is performed by operating a plurality of software for each privilege level of the processor and arranging an overflow interrupt handler of the event counter means within the same privilege level at the same logical address. A performance monitoring method that enables parallel execution.
[0026]
Thereby, the following effects can be obtained.
[0027]
(1) When the counter for event measurement overflows, interrupt control is performed for each privilege level of the processor. Therefore, interrupts can be individually interrupted according to the privilege level, and necessary data can be collected immediately. Therefore, accurate behavior analysis and bottleneck analysis can be performed without time lag.
[0028]
(2) Since an interrupt destination position, an interrupt level, and an interrupt mask can be specified for each privilege level, more detailed control such as masking a portion deemed unnecessary for performance analysis can be performed.
[0029]
(3) Since a plurality of software programs can be operated in different privilege modes of the processor and an appropriate interrupt handler for the counter overflow interrupt can be directly called, there is no overhead, accurate behavior analysis, bottleneck analysis, etc. Can be tuned.
[0030]
(4) Since a plurality of pieces of software are operated for each privilege level of the processor and the overflow interrupt handler of the event counter means within the same privilege level is arranged at the same logical address, the vector table is written even when the OS is switched, for example. Since there is no need to change, profiling of the OS and the application can be performed in parallel with less overhead.
[0031]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described with reference to FIG. In FIG. 2, 1-0 is an event counter, 1-1 is a privilege level checking means, and 1-2 is a vector table register, which is provided in the processor CPU in the hardware 1. The processor CPU executes a program composed of instructions 1, 2, 3,... Held in the memory M.
[0032]
The event counter 1-0 is incremented when a predetermined monitoring event, for example, a cache miss occurs during execution of the program, and the count value of the event counter 1-0 is set to a predetermined value. When the counter overflows, that is, when a counter overflow occurs, this is notified to the privilege level checking means 1-1 of the processor, and the privilege level of the instruction which caused the monitoring event at that time is checked, and the vector table register is checked. The interrupt processing set to the corresponding privilege level division from 1-2 is notified to the interrupt processing unit to execute the interrupt processing.
[0033]
In the vector table register 1-2, an interrupt handler address to be executed for this interrupt processing is written for each of the privilege levels 0, 1, and 2. By executing this, an event (for example, Basically, information indicating at which address the cache miss, the number of executed instructions, the number of clocks, and the like, and when the count overflow occurs is collected.
[0034]
The event counter 1-0 is provided for each event such as a cache miss, the number of clocks, the number of executed instructions, and the like. However, a common event counter may be provided among a plurality of events to count up the plurality of events collectively.
[0035]
When the event counter 1-0 overflows while the instructions 1, 2, 3,... Are sequentially executed, the privilege level of the instruction that caused the event is checked, and the vector table of the corresponding privilege level is checked. The interrupt processing set in the register 1-2 is performed. 2, the numbers 100, 110, 120,... On the left side of the instructions 1, 2, 3,... Indicate the memory addresses where the instructions are written. In this way, it is possible to directly call an appropriate interrupt handler for an overflow interrupt of an event counter which is a performance monitoring counter.
[0036]
A second embodiment of the present invention will be described with reference to FIG.
[0037]
In the second embodiment, an interrupt function of an event counter for measuring an event is controlled. Therefore, as shown in FIG. 3, a vector table corresponding to the number of privilege levels (three levels in this example) is used. A register 1-2-0 is prepared, and a mask bit, an interrupt level, and an interrupt number can be set in each vector table register 1-2-0.
[0038]
The mask bit indicates whether or not to enable the interrupt. For example, “0” indicates validity, and “1” indicates invalidity.
[0039]
The interrupt level indicates the priority when an interrupt occurs at the same time. For example, “0” indicates the highest priority, and “1” indicates the next priority. The example of FIG. 3 shows that when the event counters of the privilege levels 0 and 1 simultaneously overflow, the privilege level 0 is given priority.
[0040]
The interrupt number indicates the number of the interrupt vector table 1-3 to be referred to at the time of the interrupt. This interrupt number can be arbitrarily assigned according to the type of information to be collected when the event counter overflows. In the example of FIG. 3, when the event counter of the privilege level 0 overflows, an interrupt process is performed based on the interrupt handler of the interrupt 3 handler address indicated by the interrupt number 3 in the interrupt vector table 1-3, and the privilege level 1 of the privilege level 1 is processed. When the event counter overflows, an interrupt process is performed based on the interrupt handler of the interrupt 2 handler address indicated by the interrupt number 2. When the privilege level 2 event counter overflows, the interrupt handler of the interrupt 1 handler address indicated by the interrupt number 1 is executed. An interrupt process based on this is performed.
[0041]
As described above, when the event counter overflows and when the mask bit of the vector table register 1-2-0 according to the privilege level of the instruction which caused the event is set to be valid, the interrupt number An interrupt is generated with reference to the interrupt vector table corresponding to the above.
[0042]
In this way, fine control can be performed such that information collection of a specific privilege level can be focused.
[0043]
A third embodiment of the present invention will be described with reference to FIGS.
[0044]
FIG. 7 shows a conventional interrupt processing operation diagram in a virtual machine environment realized by software, and FIG. 4 shows a third embodiment of the present invention.
[0045]
Conventionally, an interrupt handler in a virtual machine monitor, which is a normal interrupt processing routine, is registered in all the interrupt handler addresses in the interrupt vector table IT of the memory M in FIG.
[0046]
Therefore, when an overflow interrupt of the event counter CNT occurs, regardless of whether the privilege level of the event is 0 or 1, the interrupt handler VIH in the virtual machine monitor is first called, and thereafter, the interrupt handler VIH is displayed on the virtual machine monitor at that time. OS running, are registered, for example, in the interrupt vector table T ₁ of the OS1, predetermined, for example, interrupt handler "interrupt 1" is called, the interrupt processing is performed.
[0047]
In this case, as described above, since the interrupt handler corresponding to the privilege level is called via the interrupt handler in the virtual machine monitor, overhead and distortion of measurement have occurred.
[0048]
In order to improve this, in this embodiment shown in FIG. 4, when an overflow interrupt occurs in the event counter IC, the privilege level at that time is checked, and the vector table register 1-0 is referred to according to the interrupt level. I do. In the vector table register 1-0, the address of the interrupt handler of the software operating at each privilege level is registered in the interrupt vector table in the memory M which is referred to when an interrupt due to overflow occurs. .
[0049]
Thus, for example, when the event counter overflows while the OS1 of the privilege level 1 is operating, the interrupt handler for the privilege 1 is immediately referred to by referring to the section of the privilege 1 in the vector table register 1-0. Interrupt 1 "can be called, and an interrupt handler corresponding to each privilege level can be called directly without passing through the interrupt handler of the virtual machine monitor.
[0050]
Accordingly, as shown in FIG. 4, even when a plurality of software, for example, virtual machine monitors, OS1, OS2, applications a1, a2, a3, and a4 are simultaneously operated in different privilege modes of the processor, an appropriate interrupt handler is set. Can be called directly.
[0051]
A fourth embodiment of the present invention will be described with reference to FIG.
[0052]
Even when an interrupt handler of software operating at each privilege level is registered in the interrupt vector table, if a plurality of software are operating within the same privilege level, for example, if OS1 and OS2 are operating, Each time the software is switched, it is necessary to change the handler address in the interrupt vector table.
[0053]
In order to improve this, in the fourth embodiment, the software having the same privilege level, for example, the overflow interrupt handlers of the event counters of OS1 and OS2 are arranged at the same logical address as shown in FIG.
[0054]
As shown in the physical memory and the logical memory in FIG. 5, the areas of OS1 and OS2 are completely different even if the logical addresses are the same, and their contents are also different. Different physical addresses happen to be mapped to the same logical address. Actually, the contents of the interrupt handlers of the event counters of OS1 and OS2 are different.
[0055]
By arranging the overflow interrupt handler address of the event counter at the same logical address in this way, it is possible to directly call the interrupt handler performing the corresponding processing not only between privilege levels but also when switching software within the same privilege level. Becomes possible.
[0056]
As described above, according to the present invention, when a plurality of software programs are operated in different privilege modes of the processor, or when a plurality of software programs are simultaneously operated in the same privilege mode, the performance monitoring counter is not used. It is possible to directly call an appropriate interrupt handler for an overflow interrupt.
[0057]
As a result, in a virtual machine environment realized by software, profiling of a virtual machine monitor and each operating system and applications running on it can be performed simultaneously and in parallel with low overhead, and tuning such as behavior analysis and bottleneck analysis can be performed efficiently. It is possible to perform it accurately and accurately.
[0058]
According to the present invention, the following effects can be obtained.
[0059]
(1) When the counter for event measurement overflows, interrupt control is performed for each privilege level of the processor. Therefore, interrupts can be individually interrupted according to the privilege level, and necessary data can be collected immediately. Therefore, accurate behavior analysis and bottleneck analysis can be performed without time lag.
[0060]
(2) Since an interrupt destination position, an interrupt level, and an interrupt mask can be specified for each privilege level, more detailed control such as masking a portion deemed unnecessary for performance analysis can be performed.
[0061]
(3) Since a plurality of software programs can be operated in different privilege modes of the processor and an appropriate interrupt handler for the counter overflow interrupt can be directly called, there is no overhead, accurate behavior analysis, bottleneck analysis, etc. Can be tuned.
[0062]
(4) Since a plurality of pieces of software are operated for each privilege level of the processor and the overflow interrupt handler of the event counter means within the same privilege level is arranged at the same logical address, the vector table is written even when the OS is switched, for example. Since there is no need to change, profiling of the OS and the application can be performed in parallel with less overhead.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating the principle of the present invention.
FIG. 2 shows an embodiment of the present invention.
FIG. 3 shows a second embodiment of the present invention.
FIG. 4 shows a third embodiment of the present invention.
FIG. 5 shows a fourth embodiment of the present invention.
FIG. 6 shows a conventional example.
FIG. 7 shows an interrupt processing operation diagram in a virtual machine environment.
[Explanation of symbols]
1 Hardware 2 Virtual machine monitor

Claims

On processors with multiple privilege modes,
Counter means for counting an event inside the processor and an event of exchange with the outside;
Instructing means for indicating an interrupt destination for each privilege level is provided,
A performance monitoring method in which an interrupt destination can be controlled for the processor for each privilege level of the processor when the counter means overflows.

2. The performance monitoring method according to claim 1, wherein an interrupt destination position, an interrupt level, and an interrupt mask can be designated for each privilege level of the processor when the counter means overflows.

3. A performance monitor according to claim 1, wherein a plurality of software are operated in different privilege modes of the processor, and the software for each privilege level can directly obtain an overflow interrupt of the counting means for counting the event. method.

3. The profiling of a plurality of software programs is performed in parallel by operating a plurality of software programs for each privilege level of the processor and arranging an overflow interrupt handler of the event counter means within the same privilege level at the same logical address. Performance monitoring method that enables