JP2007207117A

JP2007207117A - Performance monitoring device, performance monitoring method and program

Info

Publication number: JP2007207117A
Application number: JP2006027622A
Authority: JP
Inventors: Yoshifumi Sakai; 良文坂井; Yoshitaka Ikeda; 佳隆池田; Tomokazu Shindo; 朋和進藤; Yuichi Yokoyama; 雄一横山
Original assignee: NS Solutions Corp
Current assignee: NS Solutions Corp
Priority date: 2006-02-03
Filing date: 2006-02-03
Publication date: 2007-08-16
Anticipated expiration: 2026-02-03
Also published as: JP4705484B2

Abstract

【課題】様々な形態で発生する事象に対して、最も的確な対策を選択・策定することを可能とする。
【解決手段】モニタ部１０１は、ＡＣ環境及び非ＡＣ環境の状態に係る状態情報を取得し、分析部１０３又はモデル診断部１０６は、取得された状態情報に基づいて、ＡＣ環境の装置の状態を判定する。シミュレーション部１０８は、その判定結果に対応する対策リストを参照し、対策リストに含まれる少なくとも一つの対策夫々によるシミュレーション処理を実行し、各対策の効果を評価する。
【選択図】図１It is possible to select and formulate the most appropriate countermeasure against an event that occurs in various forms.
A monitor unit 101 acquires state information related to states of an AC environment and a non-AC environment, and an analysis unit 103 or a model diagnosis unit 106 determines the state of an AC environment device based on the acquired state information. Determine. The simulation unit 108 refers to a countermeasure list corresponding to the determination result, executes a simulation process using at least one countermeasure included in the countermeasure list, and evaluates the effect of each countermeasure.
[Selection] Figure 1

Description

本発明は、例えば、対象となる外部装置の状態をコンピュータが管理する所謂自律型コンピューティングに適用可能な性能監視装置、性能監視方法及びプログラムに関するものである。 The present invention relates to a performance monitoring apparatus, a performance monitoring method, and a program applicable to so-called autonomous computing in which a computer manages the state of a target external device, for example.

人間によるコンピュータ管理の負荷を軽減するためにコンピュータが自ら管理する仕組み、所謂自律型コンピューティングが実現しつつある。自律型コンピューティングでは、コンピュータは所定の運用指針に基づいて、自律的に自己の障害を修復する（例えば、非特許文献１参照）。この自己管理は、以下のような手順を繰り返すことで実現されている。
（１）先ず、コンピュータシステムを監視してハードウェア、ソフトウェアの挙動をログデータとして集約
（２）集約したものを分析して状況を把握
（３）目的達成のための対策を立てる
（４）計画を実行・制御する In order to reduce the burden of computer management by humans, so-called autonomous computing is being realized. In autonomous computing, a computer autonomously repairs its own failure based on a predetermined operation guideline (see, for example, Non-Patent Document 1). This self-management is realized by repeating the following procedure.
(1) First, monitor the computer system and aggregate hardware and software behavior as log data. (2) Analyze the aggregated data to understand the situation. (3) Establish measures to achieve the objective. (4) Plan Execute and control

例えば、ＣＰＵの利用率を監視し（１）、利用率が急激に高まったときに（２）、他のリソースに負荷分散するという対策を立て（３）、実際に一部の処理を他のマシンに振り分ける（４）という処理をコンピュータが自律的に実行する。 For example, the CPU usage rate is monitored (1), and when the usage rate suddenly increases (2), measures are taken to distribute the load to other resources (3). The computer autonomously executes the process (4) of distributing to machines.

ところで、今日提案されている自律型コンピューティングの技術では上記（１）〜（４）のサイクルで運用されるが、（３）のプランニングの処理は元々人間が設定した運用指針に沿うように仕向けられている。従って、自律型コンピューティングを実装する上で設計者は予想でき得る事象について様々な運用指針を用意しておく。コンピュータは当初設定した運用指針を守って動作し続けることができるかどうかを判断して必要なアクションを起こすようになっている。また、以後本文中の前記自律型コンピューティングを、非特許文献１中のオートノミック・コンピューティング（ＡＣ）と同義として説明する。 By the way, the autonomous computing technology proposed today is operated in the above cycles (1) to (4), but the planning process (3) is intended to follow the operation guidelines originally set by humans. It has been. Therefore, when implementing autonomous computing, designers prepare various operational guidelines for events that can be predicted. The computer decides whether it can continue to operate according to the operation guidelines set at the beginning, and takes necessary actions. Hereinafter, the autonomous computing in the text will be described as synonymous with autonomic computing (AC) in Non-Patent Document 1.

「オートノミック・コンピューティングアーキテクチャに関するブループリント」、インターネット＜URL:http://www-6.ibm.com/jp/autonomic/pdf/acbp2_2005-06_v7.pdf＞“Blueprint on Autonomic Computing Architecture”, Internet <URL: http://www-6.ibm.com/jp/autonomic/pdf/acbp2_2005-06_v7.pdf>

しかしながら、コンピュータシステムでは日常的に発生しうる事象、例えば負荷が高まったりすることは想定しやすいが、システム構成が途中から変更されたり、人為的ミスによる障害など予想しがたい事象が発生することがある。また、現時点で問題が発生していなくとも、将来発生しうる問題の兆候が潜んでいることもある。そもそも、当初設定したポリシーが間違っているということも無いわけではない。 However, in a computer system, it is easy to assume an event that can occur on a daily basis, for example, an increase in load, but the system configuration is changed from the middle, or an unexpected event such as a failure due to human error occurs. There is. Even if no problem has occurred at this time, there may be signs of a problem that may occur in the future. In the first place, it is not without saying that the initially set policy is wrong.

このように、運用指針が適用しづらい事象が発生したり、現時点で異常が表れていないので通常の運用指針に基づいた運用が行われたり、本当は変更した方が良い運用指針が潜在したまま運用を続けると、間違えた運用指針に基づいてコンピュータが自律的制御を行ってしまうなど、オートノミック・コンピューティング本来の目的である"自律的に最適な処理を行うことで、人間が介在せずに変化に対応する"ことが達成できなくなってしまう。 In this way, an event that makes it difficult to apply the operation guideline occurs, or there is no abnormality at the moment, so operation based on the normal operation guideline is performed, or operation with a better operation guideline that is actually better changed is hidden. If you continue, the computer will perform autonomous control based on the wrong operation guidelines, such as "autonomous computing is the original purpose," autonomous optimal processing, changes without human intervention Will not be able to achieve that.

そこで、本発明の目的は、様々な形態で発生する、又は、将来発生しうる事象に対して、最も的確な対策を選択・策定することを可能とすることにある。 Therefore, an object of the present invention is to make it possible to select and formulate the most appropriate countermeasure against events that occur in various forms or may occur in the future.

本発明の性能監視装置は、少なくとも一つの外部装置と通信回線を介して接続される性能監視装置であって、前記外部装置の状態に係る状態情報を取得する取得手段と、前記取得手段により取得される前記状態情報に基づいて、前記外部装置の状態を判定する判定手段と、前記判定手段による判定結果に対応する対策リストを参照し、前記対策リストに含まれる少なくとも一つの対策情報夫々による前記外部装置の状態に係るシミュレーション処理を実行し、前記各対策情報により示される対策の効果を評価するシミュレーション手段とを有することを特徴とする。
本発明の性能監視方法は、少なくとも一つの外部装置と通信回線を介して接続される性能監視装置による性能監視方法であって、前記外部装置の状態に係る状態情報を取得する取得ステップと、前記取得ステップにより取得される前記状態情報に基づいて、前記外部装置の状態を判定する判定ステップと、前記判定ステップによる判定結果に対応する対策リストを参照し、前記対策リストに含まれる少なくとも一つの対策情報夫々による前記外部装置の状態に係るシミュレーション処理を実行し、前記各対策情報により示される対策の効果を評価するシミュレーションステップとを含むことを特徴とする。
本発明のプログラムは、前記性能監視方法をコンピュータに実行させることを特徴とする。 The performance monitoring device of the present invention is a performance monitoring device connected to at least one external device via a communication line, and obtains status information related to the status of the external device, and is obtained by the obtaining device The determination means for determining the state of the external device based on the state information, and the countermeasure list corresponding to the determination result by the determination means, and the at least one countermeasure information included in the countermeasure list And simulation means for executing a simulation process related to the state of the external device and evaluating the effect of the countermeasure indicated by each countermeasure information.
The performance monitoring method of the present invention is a performance monitoring method by a performance monitoring device connected to at least one external device via a communication line, and obtains status information related to the status of the external device; and At least one countermeasure included in the countermeasure list with reference to a determination step for determining the state of the external device based on the state information acquired in the acquisition step, and a countermeasure list corresponding to a determination result in the determination step A simulation step of executing a simulation process related to the state of the external device based on each piece of information and evaluating the effect of the countermeasure indicated by each countermeasure information.
The program according to the present invention causes a computer to execute the performance monitoring method.

本発明においては、外部装置の状態情報、又は後述する状態情報により作成したモデルに基づいて、外部装置の現在・将来の状態を分析・診断（判定）し、その判定結果に対応する対策リストに含まれる各対策によるシミュレーション処理を行って、対策リストに含まれる各対策情報に示される対策の効果を評価するように構成している。即ち、本発明は、外部装置が様々な事象の状態に陥っても、その状態に対応する対策リストによるシミュレーションを行って各対策の効果を評価することができる。
従って、本発明によれば、その評価結果に基づいて、様々な形態で発生する外部装置の事象に対して、最も的確な対策を選択・策定することが可能となる。 In the present invention, the current / future status of the external device is analyzed / diagnosed (determined) based on the status information of the external device or a model created by status information described later, and the countermeasure list corresponding to the determination result is displayed. A simulation process is performed by each countermeasure included, and the effect of the countermeasure shown in each countermeasure information included in the countermeasure list is evaluated. That is, according to the present invention, even if the external device falls into various event states, the effect of each measure can be evaluated by performing a simulation using a measure list corresponding to the state.
Therefore, according to the present invention, it is possible to select and formulate the most appropriate countermeasure against the event of the external device that occurs in various forms based on the evaluation result.

先ず、本発明の実施形態について説明する前に、以下の説明で用いる文言の定義を行う。
「ポリシー」とは、後述するオートノミック・コンピューティング環境（以下ＡＣ環境とする）の運用に関する指針である。ポリシーの一例としては、「ＣＰＵ使用率が０〜１０％であれば余剰である、ＣＰＵ使用率が１１〜８０％であれば正常である、ＣＰＵ使用率が８１％以上であれば過負荷である」、「ＣＰＵ使用率が過負荷の場合は、シミュレーションを実行して最適な結果を残した対策を選択する」、「システムの応答がない場合は、即座に再起動する」等が挙げられる。
「対策リスト」とは、ＡＣ環境内の装置に生じ得る各事象に紐つけられる対策の集合であり、事象と対策とはｍ：ｎで対応付けられている。なお、ｍ＝ｎであってもよく、ｍ≠ｎであってもよい。対策リストの一例としては、「ＣＰＵ使用率が閾値を超えている」という事象に対して「対策１．ＣＰＵを１つ追加、対策２．ＣＰＵを２つ追加、対策３.サーバ追加による負荷分散」で構成された対策リスト等が挙げられる。
「モデル」とは、ＡＣ環境及び後述する非ＡＣ環境から取得する監視データに基づいて、ＡＣ環境内の各装置について特徴を抽出したものである。その一例として、ＡＣ環境内におけるＡＰサーバからＣＰＵ使用率を示す監視データを取得した場合には、その線形近似式を求めることによってＣＰＵ使用率の時系列変化を表す以下のモデルが抽出できる。
ｆ（ｔ）＝ａｔ＋ｂ
ｆ（ｔ）：ＣＰＵ使用率、ｔ：時間、ａ，ｂ：実値 First, before describing embodiments of the present invention, the terms used in the following description are defined.
The “policy” is a guideline regarding the operation of an autonomic computing environment (hereinafter referred to as an AC environment) described later. As an example of the policy, “If the CPU usage rate is 0 to 10%, it is surplus, normal if the CPU usage rate is 11 to 80%, overload if the CPU usage rate is 81% or more. Yes, “If the CPU usage rate is overloaded, run a simulation to select the countermeasure that left the best results”, “If there is no system response, restart immediately”, etc. .
The “countermeasure list” is a set of countermeasures associated with each event that can occur in the devices in the AC environment, and the event and the countermeasure are associated with each other by m: n. Note that m = n or m ≠ n. An example of the countermeasure list is “Countermeasure 1. Addition of one CPU, Countermeasure 2. Addition of two CPUs, Countermeasure 3. Load distribution by adding servers” for the event “CPU usage rate exceeds the threshold”. ”And other measures list.
The “model” is a characteristic extracted for each device in the AC environment based on monitoring data acquired from the AC environment and a non-AC environment described later. As an example, when monitoring data indicating the CPU usage rate is acquired from an AP server in the AC environment, the following model representing the time series change of the CPU usage rate can be extracted by obtaining the linear approximation formula.
f (t) = at + b
f (t): CPU usage rate, t: time, a, b: actual value

以下、本発明を適用した好適な実施形態を、添付図面を参照しながら詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments to which the invention is applied will be described in detail with reference to the accompanying drawings.

図１は、本発明の実施形態に係るＡＣ性能監視装置１００の機能的な構成を示すブロック図である。図１に示すように、本実施形態に係るＡＣ性能監視装置１００は、サーバ類１００１、ストレージ類１００２及びネットワーク（ＮＷ）装置類１００３等から構成される情報処理システムであるＡＣ環境、及び、非ＡＣ環境とＬＡＮ（Local Area Network）等の通信回線で接続され、この通信回線を介して各装置の状態を監視することが可能である。 FIG. 1 is a block diagram showing a functional configuration of an AC performance monitoring apparatus 100 according to an embodiment of the present invention. As shown in FIG. 1, an AC performance monitoring apparatus 100 according to this embodiment includes an AC environment that is an information processing system including servers 1001, storages 1002, network (NW) devices 1003, and the like, and non- The AC environment is connected to a communication line such as a LAN (Local Area Network), and the state of each device can be monitored via this communication line.

なお、ＡＣ環境とは、本実施形態におけるオートノミック・コンピューティングの技術を適用する環境であり、図１の例では、サーバ類１００１、ストレージ類１００２及びネットワーク装置類１００３である。これに対し、非ＡＣ環境とは、本実施形態におけるオートノミック・コンピューティングの技術の適用外となる環境であり、この非ＡＣ環境から取得される監視データはＡＣ環境に対するオートノミック・コンピューティングに利用することも可能である。 The AC environment is an environment to which the autonomic computing technology according to the present embodiment is applied. In the example of FIG. 1, the servers 1001, the storage devices 1002, and the network devices 1003 are used. On the other hand, the non-AC environment is an environment that is not applied to the autonomic computing technology in the present embodiment, and the monitoring data acquired from the non-AC environment is used for autonomic computing for the AC environment. It is also possible.

また、上述したサーバ類１００１とは、ＷｅｂサーバやＡＰサーバ等の各種サーバのことであり、ストレージ類１００２とは、ＤＢ等の情報を記録可能な装置類である。ネットワーク装置類１００３とは、サーバ類１００１及びストレージ類１００２の各装置間を接続するＬＡＮ等の通信ネットワークである。 The servers 1001 described above are various servers such as a Web server and an AP server, and the storages 1002 are devices capable of recording information such as a DB. The network device 1003 is a communication network such as a LAN that connects the servers 1001 and the storage devices 1002.

モニタ部１０１は、ＡＣ環境及び非ＡＣ環境の各装置の状態を示す以下の監視データを取得する。ＡＣ環境のＷｅｂサーバ、ＡＰサーバ及びＤＢサーバからは、監視データとして、メモリの使用量を示すデータ及びＣＰＵの使用率を示すデータ等のリソース使用状況データ、ＡＣ環境の各装置の処理履歴を示すログデータ等を取得する。また、モニタ部１０１は、ＡＣ環境におけるＷｅｂサーバ、ＡＰサーバ及びＤＢサーバ間を接続する各通信回線（ネットワーク装置）からは、監視データとして、それらの通信回線で通信されるトランザクションのスループット、処理名等を示すトランザクションデータを取得する。モニタ部１０１は、取得した監視データを標準的なフォーマットに変換して後述するイベント情報蓄積部１０２に蓄積する。標準フォーマットへの変換は、必ずしも必要とはならないが、多種多様な情報を効率的に分析・診断（判定）するために行っている。以下では代表的な標準フォーマットであるＣＢＥ(Common Base Event)を用いた実施形態のみについて説明するが、処理を行う為の標準化に用いるフォーマットであればＣＢＥに限定する必要が無い事は言うまでも無い。 The monitor unit 101 acquires the following monitoring data indicating the state of each device in the AC environment and the non-AC environment. From the Web server, AP server, and DB server in the AC environment, as the monitoring data, the resource usage status data such as the data indicating the memory usage and the data indicating the CPU usage rate, and the processing history of each device in the AC environment are shown. Get log data. In addition, the monitor unit 101 receives, as monitoring data from each communication line (network device) connecting the Web server, the AP server, and the DB server in the AC environment, the throughput and processing name of transactions communicated through these communication lines. Get transaction data indicating etc. The monitor unit 101 converts the acquired monitoring data into a standard format and stores it in an event information storage unit 102 described later. The conversion to the standard format is not always necessary, but is performed in order to efficiently analyze and diagnose (determine) a wide variety of information. In the following, only an embodiment using CBE (Common Base Event), which is a typical standard format, will be described, but it is needless to say that the format is not limited to CBE as long as it is a format used for standardization for processing. No.

さらに、モニタ部１０１は、非ＡＣ環境からも監視データを取得する。非ＡＣ環境の装置から取得する監視データとしては、例えば、ＡＣ環境に対してアクセスを行う非ＡＣ環境の装置を監視し、ＡＣ装置の各装置に対して行われるアクセス数を監視データとして取得したり、ＡＣ環境内の温度を計測する非ＡＣ環境内の装置である温度計から温度データを監視データとして取得することが挙げられる。その他にも、ＡＣ環境内の装置に対するアクセス数が急激増加することが予測される時期情報を非ＡＣ環境内の装置から監視データとして取得することもできる。以下では、ＡＣ環境から取得する監視データのみを用いたオートノミック・コンピューティングについて説明するが、これらの非ＡＣ環境から取得できる監視データを更に加味することによってより精度の高いオートノミック・コンピューティングを実現することが可能となる。 Furthermore, the monitor unit 101 acquires monitoring data from a non-AC environment. As monitoring data acquired from a non-AC environment device, for example, a non-AC environment device that accesses the AC environment is monitored, and the number of accesses made to each device of the AC device is acquired as monitoring data. Or obtaining temperature data as monitoring data from a thermometer that is a device in a non-AC environment that measures the temperature in the AC environment. In addition, time information for which the number of accesses to devices in the AC environment is predicted to increase rapidly can be acquired as monitoring data from devices in the non-AC environment. In the following, autonomic computing using only monitoring data acquired from the AC environment will be described, but more accurate autonomic computing is realized by further adding monitoring data that can be acquired from these non-AC environments. It becomes possible.

分析部１０３は、モニタ部１０１が変換したＣＢＥデータに問題がないかを、知識情報蓄積部１０４から読み込んだポリシー１０４１に基づいて分析する。例えば、ＣＢＥデータによって示されるＣＰＵの使用率が８０％を越えている場合、ポリシー１０４１に基づいて、ＣＰＵ使用率が過負荷な状態であるという事象が分析される。ポリシー１０４１の一例を上記の説明で挙げたが、例えば上記のようにＣＰＵ使用率に関して分析する場合、分析対象となるＣＢＥデータが示す数値に一番近いポリシー、「ＣＰＵの使用率が閾値の８０％を越えたら過負荷な状態である」旨のポリシー１０４１が知識情報蓄積部１０４から読み込まれる。 The analysis unit 103 analyzes whether there is a problem in the CBE data converted by the monitor unit 101 based on the policy 1041 read from the knowledge information storage unit 104. For example, when the CPU usage rate indicated by the CBE data exceeds 80%, an event that the CPU usage rate is overloaded is analyzed based on the policy 1041. An example of the policy 1041 has been described in the above description. For example, when analyzing the CPU usage rate as described above, the policy closest to the numerical value indicated by the CBE data to be analyzed is “CPU usage rate is 80% of threshold value”. A policy 1041 is read from the knowledge information accumulating unit 104.

また例えば、分析対象となるＣＢＥデータがメモリの使用量であり、ＣＢＥデータが９０％を示す場合、この分析対象に一番近いデータに該当するポリシー「メモリに使用率が閾値の８５％を越えるとメモリが過度に消費された状態にある」旨のポリシー１０４１が知識情報蓄積部１０４から読み込まれ、この場合ＣＢＥデータによって示されるメモリの使用率が８５％を越えている為、メモリが過度に消費された状態であるという事象が分析される。 Further, for example, when the CBE data to be analyzed is the amount of memory used and the CBE data indicates 90%, the policy corresponding to the data closest to the analysis target “the usage rate in the memory exceeds 85% of the threshold value. "The memory is excessively consumed" is read from the knowledge information storage unit 104. In this case, the memory usage indicated by the CBE data exceeds 85%. The event of being consumed is analyzed.

また例えば、分析対象となるＣＢＥデータがスループットを示すトランザクションデータであり、ＣＢＥデータが１２０トランザクション/秒を示す場合、この分析対象に一番近いデータに該当するポリシー「スループットが１００トランザクション/秒未満であればサービスレベルが所定の範囲に収まる、スループットが１００トランザクション/秒以上であればサービスレベルが所定の範囲内に収まらない」旨のポリシー１０４１が読み込まれ、この場合ＣＢＥデータによって示されるスループットが１００トランザクション/秒を越えている為、システムが過負荷な状態であるという事象が分析される。 Further, for example, when the CBE data to be analyzed is transaction data indicating throughput and the CBE data indicates 120 transactions / second, the policy “throughput is less than 100 transactions / second” corresponding to the data closest to the analysis target. If there is a service level within a predetermined range, a policy 1041 that reads “the service level does not fall within the predetermined range if the throughput is 100 transactions / second or more” is read. In this case, the throughput indicated by the CBE data is 100. The event that the system is overloaded because it exceeds transactions / second is analyzed.

イベント情報蓄積部１０２は、モニタ部１０１によって変換されたＣＢＥデータを蓄積する。また、イベント情報蓄積部１０２は、蓄積したＣＢＥデータに対して定期的に統計処理を行って蓄積するＣＢＥデータ量を削減する。統計処理の例としては、一定期間中に蓄積したＣＢＥデータの最大／最小値を求める方法や、一定期間中に蓄積したＣＢＥデータの平均値を求める方法等が挙げられる。 The event information storage unit 102 stores the CBE data converted by the monitor unit 101. In addition, the event information storage unit 102 periodically performs statistical processing on the stored CBE data to reduce the amount of stored CBE data. Examples of statistical processing include a method for obtaining the maximum / minimum value of CBE data accumulated during a certain period, a method for obtaining an average value of CBE data accumulated during a certain period, and the like.

イベント情報蓄積部１０２に蓄積される情報としては、上述したリソース使用情報データ、ログデータ及びトランザクションデータ等の他、構成情報が蓄積される。構成情報とは、監視対象としたい情報処理システムの構成を示す情報（例えば、監視対象の情報処理システムは６台のＷｅｂサーバと２台のＡＰサーバと１台のＤＢサーバから構成される等）、情報処理システムを構成する各装置間がどのように接続され、そして、各装置間を接続するためのネットワークはどれほどの転送レートを持ったものであるかを示す情報、各装置内のハードウェア及びソフトウェアのスペックを示す情報等が含まれる。各ハードウェア及びソフトウェアのスペックとしては、単に購入時のスペックだけでなく、ファームウェアやソフトウェアのバージョン等も登録しておくとよい。なお、蓄積される構成情報は例えばオペレータ等によって入力される方法のみならず、ネットワークを介してＡＣ性能監視装置１００が取得して入力するようにしてもよい。 As information stored in the event information storage unit 102, configuration information is stored in addition to the resource usage information data, log data, and transaction data described above. The configuration information is information indicating the configuration of an information processing system to be monitored (for example, the information processing system to be monitored is composed of 6 Web servers, 2 AP servers, and 1 DB server). Information indicating how each device constituting the information processing system is connected, and what transfer rate the network for connecting each device has, hardware in each device And information indicating the specifications of the software. As the specifications of each hardware and software, it is preferable to register not only the specifications at the time of purchase but also the firmware and software versions. The accumulated configuration information may be acquired and input by the AC performance monitoring apparatus 100 via a network as well as a method of being input by an operator, for example.

モデル抽出部１０５は、イベント情報蓄積部１０２に蓄積されたＣＢＥデータに基づいて、該当するＡＣ環境の装置のモデル１０４２を抽出する。例えば、モデル抽出部１０５はＡＣ環境における或る装置のＣＰＵ使用率を示すＣＢＥデータを逐次取得し、それを線形近似することによってＣＰＵ使用率の時系列変化を表すモデル１０４２を抽出することができる。 The model extraction unit 105 extracts a model 1042 of the corresponding AC environment device based on the CBE data stored in the event information storage unit 102. For example, the model extraction unit 105 can sequentially acquire CBE data indicating the CPU usage rate of a certain device in an AC environment, and extract a model 1042 representing a time-series change in the CPU usage rate by linearly approximating the CBE data. .

また、モデル抽出部１０５はＡＣ環境における或る装置のスループットを示すＣＢＥデータを逐次取得し、それを線形近似することによってスループットの時系列変化を表すモデル１０４２を抽出することができる。 Further, the model extraction unit 105 can sequentially extract CBE data indicating the throughput of a certain device in an AC environment, and extract a model 1042 representing a time-series change in throughput by linearly approximating the CBE data.

さらに、モデル抽出部１０５は、上記のように、ＣＰＵ使用率及びスループットの時系列変化を線形近似したモデル１０４２を抽出した場合には、それらのモデル１０４２からＣＰＵ使用率とスループットとの相関関係を示すモデル１０４２を抽出することもできる。このようなモデル１０４２の抽出方法については後に詳述する。抽出した各モデル１０４２は、知識情報蓄積部１０４に蓄積する。 Further, when the model extraction unit 105 extracts the model 1042 that linearly approximates the time series change of the CPU usage rate and the throughput as described above, the model extraction unit 105 calculates the correlation between the CPU usage rate and the throughput from the model 1042. The model 1042 shown can also be extracted. A method for extracting such a model 1042 will be described in detail later. Each extracted model 1042 is stored in the knowledge information storage unit 104.

モデル診断部１０６は、知識情報蓄積部１０４に蓄積されるモデル１０４２と当該モデル１０４２に該当するポリシー１０４１とを参照し、ポリシー１０４１に基づいてモデル１０４２の診断を行う。 The model diagnosis unit 106 refers to the model 1042 stored in the knowledge information storage unit 104 and the policy 1041 corresponding to the model 1042, and diagnoses the model 1042 based on the policy 1041.

例えば、参照したモデル１０４２がＣＰＵ使用率の時系列変化を表すモデルであれば、当該モデル１０４２に該当するポリシー１０４１として、「ＣＰＵ使用率が０〜１０％であれば余剰である、ＣＰＵ使用率が１１〜８０％であれば正常である、ＣＰＵ使用率が８１％以上であれば過負荷である」というポリシー１０４１が参照される。将来の或る時点における予測値が所定の閾値を越えると予測される場合には、ＣＰＵ使用率に関して将来問題が生じる可能性があるという事象が診断される。 For example, if the referenced model 1042 represents a time-series change in the CPU usage rate, the policy 1041 corresponding to the model 1042 is “a CPU usage rate that is redundant if the CPU usage rate is 0 to 10%. The policy 1041 is referred to as “normal if the CPU is 11 to 80% and overloaded if the CPU usage rate is 81% or more”. If a predicted value at a future time is predicted to exceed a predetermined threshold, an event is diagnosed that a future problem with respect to CPU utilization may occur.

図６を用いて問題の事象がモデル診断部１０６によって診断される例を具体的に説明する。ＣＰＵ使用率の時系列変化を表すモデルがｆ_a（ｘ）＝αｘ＋βであり、そのモデルに紐付けられるポリシーが「ＣＰＵ使用率が０〜１０％であれば余剰である、ＣＰＵ使用率が１１〜８０％であれば正常である、ＣＰＵ使用率が８１％以上であれば過負荷である」である場合、図６に示すように、１カ月後におけるＣＰＵ使用率ｆ_a（ｘ）の値は８０％を越えている。このような場合、モデル診断部１０６は、１カ月後にはＣＰＵ使用率が過負荷の為、問題が生じる可能性があると診断する。 An example in which a problem event is diagnosed by the model diagnosis unit 106 will be described in detail with reference to FIG. The model representing the time-series change in the CPU usage rate is f _a (x) = αx + β, and the policy associated with the model is “the CPU usage rate is 11 if the CPU usage rate is 0 to 10%. If it is ˜80%, it is normal, and if the CPU usage rate is 81% or more ”, as shown in FIG. 6, the value of the CPU usage rate f _a (x) after one month Is over 80%. In such a case, the model diagnosis unit 106 diagnoses that a problem may occur because the CPU usage rate is overloaded after one month.

また、参照したモデル１０４２がスループットの時系列変化を表すモデルであれば、当該モデル１０４２に該当するポリシー１０４１として、「スループットが１００トランザクション/秒以上であればサービスレベルが所定の範囲に収まる、スループットが１００トランザクション/秒以上であればサービスレベルが所定の範囲内に収まらない」というポリシー１０４２が参照される。将来の或る時点における予測値が所定の閾値を越えると予測される場合には、スループットに関して将来問題が生じる可能性があるという事象が診断される。 Further, if the referenced model 1042 represents a time-series change in throughput, the policy 1041 corresponding to the model 1042 is “throughput that the service level falls within a predetermined range if the throughput is 100 transactions / second or more”. If the service level is 100 transactions / second or more, the service level does not fall within a predetermined range. If a predicted value at a future time is predicted to exceed a predetermined threshold, an event is diagnosed that a future problem may occur with respect to throughput.

図７を用いて問題の事象がモデル診断部１０６によって診断される他の例を具体的に説明すると、処理Ａ及び処理Ｂのスループットの時系列変化を表すモデルが夫々、ｆ_A（ｘ）＝α₁ｘ＋β₁、ｆ_B（ｘ）＝α₂ｘ＋β₂であり、それらのモデルに紐付けられるポリシーが「スループットが１００トランザクション/秒未満であればサービスレベルが所定の範囲に収まる、スループットが１００トランザクション/秒以上であればサービスレベルが所定の範囲内に収まらない」である場合、図７に示すように、１カ月後における処理Ａのスループットｆ_A（ｘ）の値は１００トランザクション/秒を越えている。このような場合、モデル診断部１０６は、１カ月後には処理Ａのスループットに問題が生じる可能性があると診断する。一方、１カ月後までの処理Ｂのスループットｆ_B（ｘ）の値は１００トランザクション/秒を下回っているため、１カ月後までに処理Ｂのスループットに問題が生じる可能性があると診断されない。 Referring to FIG. 7, another example in which a problem event is diagnosed by the model diagnosis unit 106 will be described in detail. Models representing time-series changes in throughput of the processing A and the processing B are respectively f _A (x) = α ₁ x + β ₁ , f _B (x) = α ₂ x + β ₂ , and the policy associated with these models is “if the throughput is less than 100 transactions / second, the service level falls within a predetermined range, and the throughput is 100 If the service level does not fall within the predetermined range if it is greater than or equal to transactions / second ", as shown in FIG. 7, the value of throughput f _A (x) of process A after one month is 100 transactions / second. It is over. In such a case, the model diagnosis unit 106 diagnoses that a problem may occur in the throughput of the process A after one month. On the other hand, since the value of the throughput f _B (x) of the process B until one month later is less than 100 transactions / second, it is not diagnosed that there may be a problem in the throughput of the process B by one month later.

さらに、参照したモデル１０４２がＣＰＵ使用率とスループットとの相関関係を示すモデルであれば、当該モデル１０４２に該当するポリシー１０４１として、「ＣＰＵ使用率とスループットとの相関関係が前後１日において誤差１０％以内に収めるべきである」というポリシー１０４１が参照される。将来の或る時点におけるＣＰＵ使用率とスループットとの相関関係が所定の均衡を保てていないことが予測される場合には、それらの相関関係に将来問題が生じる可能性があるという事象が分析される。 Further, if the referenced model 1042 is a model indicating the correlation between the CPU usage rate and the throughput, the policy 1041 corresponding to the model 1042 is “the correlation between the CPU usage rate and the throughput has an error 10 Policy 1041 that should be within% "is referenced. If it is predicted that the correlation between CPU usage and throughput at a certain time in the future will not maintain a predetermined balance, analyze the event that these correlations may cause problems in the future Is done.

図８を用いて問題の事象がモデル診断部１０６によって診断される更に他の例を具体的に説明すると、処理ａのＣＰＵ使用率とスループットとの相関関係を示すモデルが夫々、ｆ_TA（ｘ）＝ρ₁ｘ＋θ₁、ｆ_TB（ｘ）＝ρ₂ｘ＋θ₂であり、ｆ_TA（ｘ）は2006/01/01のデータを、ｆ_TB（ｘ）は2006/01/02のデータに基づいて作成したモデルである。それらのモデルに紐付けられるポリシーが「ＣＰＵ使用率とスループットとの相関関係が前後１日において誤差１０％以内に収めるべきである」である場合、図８に示すように、ｆ_TA（ｘ₁）とｆ_TB（ｘ₁）の間に１０％以上の誤差があれば、ＣＰＵ使用率とスループットとのバランスが崩れてシステムが異常な状態にあると診断する。 A further example in which a problem event is diagnosed by the model diagnosis unit 106 will be described in detail with reference to FIG. 8. Each model indicating the correlation between the CPU usage rate and the throughput of the process a is f _TA (x ) = Ρ ₁ x + θ ₁ , f _TB (x) = ρ ₂ x + θ ₂ , f _TA (x) is based on the data of _{January 1} , 2006, and f _TB (x) is based on the data of 2006/01/02. This is the model created. When the policy associated with these models is “the correlation between the CPU usage rate and the throughput should be within 10% of error in the previous and next day”, as shown in FIG. 8, f _TA (x ₁ ) And f _TB (x ₁ ) have an error of 10% or more, the CPU usage rate and the throughput are out of balance, and the system is diagnosed as being in an abnormal state.

計画部１０７は、分析部１０３によるＣＢＥデータに対する分析の結果、問題があると分析された事象、又は、モデル診断部１０６により将来問題が生じる可能性があると診断された事象に紐付けられた対策リスト１０４３を知識情報蓄積部１０４から選択し、その対策リスト１０４３に含まれる各対策によるシミュレーション処理を後述のシミュレーション部１０８に対して依頼する。 The planning unit 107 is associated with an event analyzed as having a problem as a result of analysis of the CBE data by the analysis unit 103 or an event diagnosed as a future problem by the model diagnosis unit 106. The countermeasure list 1043 is selected from the knowledge information storage unit 104, and a simulation process according to each countermeasure included in the countermeasure list 1043 is requested to the simulation unit 108 described later.

例えば、対象となる事象が「１カ月後におけるＣＰＵ使用率が８０％を越える」ような事象の場合、その事象に紐付けられる対策リスト１０４３の例として以下の（１）〜（６）に示すような対策リスト１０４３が挙げられる。
（１）ＣＰＵを１つ追加
（２）ＣＰＵを２つ追加
（３）サーバ追加による負荷分散（処理分散パターンＡ）
（４）サーバ追加による負荷分散（処理分散パターンＢ）
（５）サーバ追加による負荷分散（処理分散パターンＣ）
（６）サーバ追加による負荷分散（処理分散パターンＤ） For example, when the target event is an event such that “the CPU usage rate after one month exceeds 80%”, the following (1) to (6) are shown as examples of the countermeasure list 1043 linked to the event. Such a countermeasure list 1043 is exemplified.
(1) Add one CPU (2) Add two CPUs (3) Load distribution by adding servers (processing distribution pattern A)
(4) Load distribution by adding servers (processing distribution pattern B)
(5) Load distribution by adding servers (processing distribution pattern C)
(6) Load distribution by adding servers (processing distribution pattern D)

なお、図５（ａ）に示すように、処理分散パターンＡとは、本来、２種類の処理Ａと処理Ｂとを１つのサーバで処理していたが、そのサーバと追加したサーバとで処理Ａと処理Ｂとを一つずつ分散させて処理させる処理分散パターンである。 As shown in FIG. 5 (a), the process distribution pattern A is originally a process in which two types of processes A and B are processed by one server, but the process is performed by the server and the added server. This is a processing distribution pattern in which A and processing B are distributed one by one.

処理分散パターンＢとは、図５（ｂ）に示すように、本来、２種類の処理Ａと処理Ｂとを１つのサーバで処理していたが、そのサーバには同様に処理Ａと処理Ｂとを実行させるとともに、追加サーバにも処理Ａを実行させ、元々処理させていたサーバの処理Ａに関する処理負担を軽減する処理分散パターンである。 As shown in FIG. 5B, the process distribution pattern B is originally a process where two types of processes A and B are processed by a single server. The processing distribution pattern reduces the processing burden on the processing A of the server that was originally processed by causing the additional server to execute processing A.

処理分散パターンＣとは、図５（ｃ）に示すように、本来、２種類の処理Ａと処理Ｂとを１つのサーバで処理していたが、そのサーバには同様に処理Ａと処理Ｂとを実行させるとともに、追加サーバにも処理Ｂを実行させ、元々処理させていたサーバの処理Ｂに関する処理負担を軽減する処理分散パターンである。 As shown in FIG. 5C, the process distribution pattern C originally has two types of processes A and B processed by a single server, but the servers A and B are similarly processed. And the additional server execute the process B to reduce the processing load related to the process B of the server that was originally processed.

処理分散パターンＤとは、図５（ｄ）に示すように、本来、２種類の処理Ａと処理Ｂとを１つのサーバで処理していたが、そのサーバには同様に処理Ａと処理Ｂとを実行させるとともに、追加サーバにも処理Ａと処理Ｂとの両方を実行させ、元々処理させていたサーバの処理Ａ及び処理Ｂに関する処理負担を軽減する処理分散パターンである。 As shown in FIG. 5 (d), the process distribution pattern D is originally a process where two types of processes A and B are processed by one server. The processing distribution pattern reduces the processing load related to processing A and processing B of the server that was originally processed by causing the additional server to execute both processing A and processing B.

シミュレーション部１０８は、計画部１０７によって選択された対策リスト１０４３を知識情報蓄積部１０４から参照し、その対策リスト１０４３によるシミュレーション処理を実行する。 The simulation unit 108 refers to the countermeasure list 1043 selected by the planning unit 107 from the knowledge information storage unit 104, and executes a simulation process using the countermeasure list 1043.

なお、シミュレーション部１０８は、装置（又は、複数の装置から成るシステム）の構成変更の効果を定量化するためのシミュレータと呼ばれるツールによって構成することができる。シミュレータは、装置（又はシステム）構成や処理の特徴が入力されることによって性能値を予測することができる。ここで、装置（又はシステム）構成として入力される情報としては、例えば、サーバ数、ＣＰＵ数等が挙げられる。処理の特徴として入力される情報としては、例えば、各処理のＣＰＵにおける処理時間、各処理の発生頻度等が挙げられる。性能値として予測される情報としては、ＣＰＵ使用率、各処理に対する応答時間等が挙げられる。これらの入力データは、知識情報蓄積部１０４から読み出したモデルに基づいて算出して得られる情報であるため、モデルをパラメータとしてシミュレータに与えてもよい。 The simulation unit 108 can be configured by a tool called a simulator for quantifying the effect of the configuration change of a device (or a system composed of a plurality of devices). The simulator can predict the performance value by inputting the device (or system) configuration and processing characteristics. Here, examples of information input as the apparatus (or system) configuration include the number of servers, the number of CPUs, and the like. Examples of information input as a feature of the process include a processing time in the CPU of each process, an occurrence frequency of each process, and the like. Information predicted as the performance value includes a CPU usage rate, a response time for each process, and the like. Since these input data are information obtained by calculation based on the model read from the knowledge information storage unit 104, the model may be given to the simulator as a parameter.

例えば、対象となる事象が上述した「１カ月後におけるＣＰＵ使用率が８０％を越える」ような事象の場合、上記の（１）〜（６）の対策を含む対策リスト１０４３についてシミュレーション処理が実行され、以下のように各対策を実施した際の効果が定量化される。
対策（１）の結果：ＣＰＵ使用率８５％
対策（２）の結果：ＣＰＵ使用率なし（実現不可能な構成と判断されたため）
対策（３）の結果：ＣＰＵ使用率４０％
対策（４）の結果：ＣＰＵ使用率５５％
対策（５）の結果：ＣＰＵ使用率５５％
対策（６）の結果：ＣＰＵ使用率６５％ For example, when the target event is the above-mentioned event “the CPU usage rate after one month exceeds 80%”, the simulation process is executed for the countermeasure list 1043 including the countermeasures (1) to (6) above. The effects of implementing each measure are quantified as follows.
Result of measure (1): CPU usage rate 85%
Result of measure (2): No CPU usage rate (because it was determined that the configuration was not feasible)
Result of measure (3): CPU usage rate 40%
Result of measure (4): CPU usage rate 55%
Result of measure (5): CPU usage rate 55%
Result of measure (6): CPU usage rate 65%

また、対象となる事象が、例えば分析部１０３によって現在のＣＰＵ使用率が既に８０％を越えていると分析されたような事象であれば、同じく、その事象に対応する対策リスト１０４３が参照され、シミュレーション処理によって当該対策リスト１０４３内の対策毎に効果が定量化されることになる。 Also, if the target event is an event that has been analyzed by the analysis unit 103, for example, that the current CPU usage rate has already exceeded 80%, the countermeasure list 1043 corresponding to the event is also referred to. The effect is quantified for each countermeasure in the countermeasure list 1043 by the simulation process.

計画部１０７は、当該事象に該当するポリシー１０４１を知識情報蓄積部１０４から参照し、シミュレーション部１０８によるシミュレーション処理の評価結果のうちポリシー１０４１を満たす結果を導いた対策を決定する。例えば、当該事象に該当するポリシー１０４１が「ＣＰＵ使用率が過負荷の場合は、シミュレーションを実行して最適な結果を残した対策を選択する」というポリシー１０４１であれば、上記の例の場合、対策（３）が決定されることになる。計画部１０７は、このように対策を決定すると、例えば、対策（３）を１週間後に実行する等、対策の実行をスケジューリングする。 The planning unit 107 refers to the policy 1041 corresponding to the event from the knowledge information storage unit 104 and determines a measure that has led to a result satisfying the policy 1041 among the evaluation results of the simulation processing performed by the simulation unit 108. For example, if the policy 1041 corresponding to the event is the policy 1041 “if the CPU usage rate is overloaded, select a countermeasure that performs simulation and leaves an optimal result”. Countermeasure (3) will be determined. When the planning unit 107 determines the countermeasure in this way, for example, the planning unit 107 schedules the execution of the countermeasure, such as executing the countermeasure (3) after one week.

計画実行部１０９は、計画部１０７によって作成されたスケジュールに従って対策を実行する。 The plan execution unit 109 executes countermeasures according to the schedule created by the planning unit 107.

対策探索部１１０は、シミュレーション部１０８によるシミュレーション処理の全ての結果が、当該事象に該当するポリシー１０４１を満たさない場合、知識情報蓄積部１０４に蓄積される対策のうち当該事象に紐付けられていない対策を選択し、選択された対策によるシミュレーション処理をシミュレーション部１０８に対して依頼する。シミュレーション部１０８は、選択された対策を知識情報蓄積部１０４から参照し、各対策についてシミュレーション処理を実行し、各対策を実施した際の効果を定量化する。 If all the results of the simulation processing by the simulation unit 108 do not satisfy the policy 1041 corresponding to the event, the measure search unit 110 is not linked to the event among the measures stored in the knowledge information storage unit 104. A measure is selected, and a simulation process according to the selected measure is requested to the simulation unit 108. The simulation unit 108 refers to the selected countermeasure from the knowledge information storage unit 104, executes simulation processing for each countermeasure, and quantifies the effect when each countermeasure is implemented.

対策探索部１１０は、このように当該事象に紐付けられていない対策に対するシミュレーション処理の結果のうち、上記ポリシー１０４１を満たす結果を導いた対策が存在する場合、知識情報蓄積部１０４内においてその対策を当該事象に紐付けられた対策リスト１０４３に追加させるとともに、上記ポリシー１０４１を満たす結果に対応する対策を計画部１０７に渡す。 In the case where there is a countermeasure that leads to a result satisfying the policy 1041 among the results of the simulation processing for the countermeasure not associated with the event, the countermeasure search unit 110 performs the countermeasure in the knowledge information storage unit 104. Is added to the countermeasure list 1043 associated with the event, and a countermeasure corresponding to the result satisfying the policy 1041 is passed to the planning unit 107.

このように対策探索部１１０によってポリシーを満たす対策が発見され、対策探索部１１０によって当該対策が渡された場合、計画部１０７は、同様に当該対策の実行をスケジューリングする。 In this way, when the countermeasure search unit 110 finds a countermeasure that satisfies the policy and the countermeasure search section 110 passes the countermeasure, the planning unit 107 similarly schedules the execution of the countermeasure.

図２は、ＡＣ性能監視装置１００のハードウェア構成を示すブロック図である。ＣＰＵ２０１は、システムバスに接続される各デバイスやコントローラを統括的に制御する。ＲＯＭ２０３又はＨＤ２０７には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（Basic Input/Output System）やオペレーティングシステムプログラムや、ＡＣ性能監視装置１００が実行する例えば図３−１及び図３−２に示す処理のプログラム等が記憶されている。 FIG. 2 is a block diagram illustrating a hardware configuration of the AC performance monitoring apparatus 100. The CPU 201 comprehensively controls each device and controller connected to the system bus. The ROM 203 or the HD 207 has a BIOS (Basic Input / Output System) or an operating system program that is a control program of the CPU 201, a program for the processing shown in FIG. 3A and FIG. Is remembered.

なお、図２の例では、ハードディスク（ＨＤ）２０７はＡＣ性能監視装置１００の内部に配置された構成としているが、他の実施形態としてＨＤ２０７に相当する構成がＡＣ性能監視装置外部に配置された構成としてもよい。また、本実施形態に係る例えば図３−１及び図３−２に示す処理を行なうためのプログラムは、フレキシブルディスク（ＦＤ）２０６やＣＤ−ＲＯＭ等、コンピュータ読み取り可能な記録媒体に記録され、それらの記録媒体から供給される構成としてもよいし、インターネット等の通信媒体を介して供給される構成としてもよい。 In the example of FIG. 2, the hard disk (HD) 207 is configured to be arranged inside the AC performance monitoring apparatus 100. However, as another embodiment, a configuration corresponding to the HD 207 is arranged outside the AC performance monitoring apparatus. It is good also as a structure. Also, for example, the program for performing the processing shown in FIGS. 3A and 3B according to the present embodiment is recorded on a computer-readable recording medium such as a flexible disk (FD) 206 or a CD-ROM. The recording medium may be supplied from a recording medium, or may be supplied via a communication medium such as the Internet.

ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＡＭ２０２にロードして、プログラムを実行することで各種動作を実現するものである。 The RAM 202 functions as a main memory, work area, and the like for the CPU 201. The CPU 201 implements various operations by loading a program necessary for execution of processing into the RAM 202 and executing the program.

ディスクコントローラ２０５は、ＨＤ２０７やＦＤ２０６等の外部メモリへのアクセスを制御する。通信ＩＦコントローラ２０４は、インターネットやＬＡＮと接続し、例えばＴＣＰ／ＩＰによって外部との通信を制御するものである。 The disk controller 205 controls access to external memories such as the HD 207 and the FD 206. The communication IF controller 204 is connected to the Internet or a LAN, and controls communication with the outside by, for example, TCP / IP.

ディスプレイコントローラ２０８は、ディスプレイ２０９における画像表示を制御する。 The display controller 208 controls image display on the display 209.

ＫＢ（キーボード）コントローラ２１０は、キーボード（ＫＢ）２１１からの操作入力を受け付け、ＣＰＵ２０１に対して送信する。なお、図示していないが、キーボード２１１の他に、マウス等のポインティングデバイスもユーザの操作手段として本実施形態に係るＡＣ性能監視装置１００に適用可能である。 The KB (keyboard) controller 210 receives an operation input from the keyboard (KB) 211 and transmits it to the CPU 201. Although not shown, in addition to the keyboard 211, a pointing device such as a mouse can also be applied to the AC performance monitoring apparatus 100 according to the present embodiment as a user operation means.

モニタ部１０１、分析部１０３、モデル抽出部１０５、モデル診断部１０６、計画部１０７、シミュレーション部１０８、計画実行部１０７及び対策探索部１１０は、例えばＨＤ２０７内に記憶され、必要に応じてＲＡＭ２０２にロードされるプログラム及びそれを実行するＣＰＵ２０１に相当する構成である。 The monitor unit 101, the analysis unit 103, the model extraction unit 105, the model diagnosis unit 106, the planning unit 107, the simulation unit 108, the plan execution unit 107, and the countermeasure search unit 110 are stored in the HD 207, for example, and stored in the RAM 202 as necessary. The configuration corresponds to the program to be loaded and the CPU 201 that executes the program.

また、知識情報蓄積部１０４及びイベント情報蓄積部１０２は、例えばＨＤ２０７又はＲＡＭ２０２内の一部記憶領域に相当する構成である。なお、知識情報蓄積部１０４及びイベント情報蓄積部１０２は、ＡＣ性能監視装置１００内部に備える構成の他、外部に備えた構成としてもよい。 The knowledge information storage unit 104 and the event information storage unit 102 have a configuration corresponding to a partial storage area in the HD 207 or the RAM 202, for example. The knowledge information storage unit 104 and the event information storage unit 102 may be configured externally in addition to the configuration provided inside the AC performance monitoring apparatus 100.

次に、本実施形態に係るＡＣ性能監視装置１００の動作を、図３−１、図３−２、図１０及び図１１のフローチャートを参照しながら説明する。 Next, the operation of the AC performance monitoring apparatus 100 according to the present embodiment will be described with reference to the flowcharts of FIGS. 3-1, 3-2, 10 and 11.

先ず、図１０を用いてモニタ部１０１による監視データの取得処理からイベント情報蓄積部１０２へのＣＢＥデータの蓄積処理について説明する。図１０において、モニタ部１０１は、ＡＣ環境及び非ＡＣ環境の各装置から監視データを取得し、取得した監視データをＣＢＥデータに変換する（ステップＳ１００１、Ｓ１００２）。次に、モニタ部１０１は、当該ＣＢＥデータをイベント情報蓄積部１０２に蓄積させる（ステップＳ１００３）。このようにモニタ部１０１は、ステップＳ１００１〜ステップＳ１００３の処理を繰り返し実行してＣＢＥデータをイベント情報蓄積部１０２に対して蓄積していく。なお、イベント情報蓄積部１０２内では、例えば所定期間毎に、蓄積されるＣＢＥデータの最大／最小値や平均値等を算出し、その値のみを保持するようにすることで蓄積するデータ量の削減が図られる。 First, a process for accumulating CBE data in the event information accumulation unit 102 from an acquisition process for monitoring data by the monitor unit 101 will be described with reference to FIG. In FIG. 10, the monitor unit 101 acquires monitoring data from each device in the AC environment and the non-AC environment, and converts the acquired monitoring data into CBE data (steps S1001 and S1002). Next, the monitor unit 101 stores the CBE data in the event information storage unit 102 (step S1003). In this way, the monitor unit 101 repeatedly executes the processing of steps S1001 to S1003 and accumulates CBE data in the event information accumulation unit 102. In the event information storage unit 102, for example, the maximum / minimum value or average value of the stored CBE data is calculated every predetermined period, and the data amount to be stored is stored by holding only that value. Reduction is planned.

次に、図３−１及び図３−２を用いて、イベント情報蓄積部１０２に蓄積されたＣＢＥデータに基づくモデルによってＡＣ環境の診断処理を行い、診断結果に問題がある場合には対策を実行するまでの処理について説明する。図３−１において、モデル抽出部１０５は、ＡＣ環境及び非ＡＣ環境の各装置に対応するＣＢＥデータをイベント情報蓄積部１０２から取得し、モデル１０４２を抽出する（ステップＳ３０１、Ｓ３０２）。 Next, using FIG. 3A and FIG. 3B, AC environment diagnosis processing is performed using a model based on the CBE data stored in the event information storage unit 102. Processing until execution will be described. 3A, the model extraction unit 105 acquires CBE data corresponding to each device in the AC environment and the non-AC environment from the event information storage unit 102, and extracts a model 1042 (steps S301 and S302).

ここで、モデル１０４２の抽出方法を、図４を参照しながら具体的に説明する。
先ず、図４（ａ）において、前回モデル１０４２を抽出した時点（時間２）から所定時間が経過し、モデル抽出部１０５は、時間１及び時間２の監視データとともに、新たに時間３の監視データを今回取得する。ここで取得する監視データは、図４（ａ）に示すように、ＣＰＵ使用率を示す監視データとスループットを示す監視データとであるものとする。 Here, a method of extracting the model 1042 will be specifically described with reference to FIG.
First, in FIG. 4A, a predetermined time has elapsed from the time (time 2) when the previous model 1042 was extracted, and the model extraction unit 105 newly adds monitoring data for time 3 together with monitoring data for time 1 and time 2. Get this time. As shown in FIG. 4A, the monitoring data acquired here is monitoring data indicating the CPU usage rate and monitoring data indicating the throughput.

次に、モデル抽出部１０５は、時間に対するＣＰＵの使用率の関係を表す座標系において、時間１〜時間３の監視データをプロットし、プロットした各監視データの線形近似式（ｆ_a（ｘ）＝αｘ＋β）を求めることによって、ＣＰＵ使用率の時系的変化を表すモデル１０４２を抽出する。モデル抽出部１０５は、抽出したモデル１０４２を知識情報蓄積部１０４に対して蓄積する。 Next, the model extraction unit 105 plots the monitoring data from time 1 to time 3 in a coordinate system representing the relationship of the CPU usage rate with respect to time, and linear approximation formula (f _a (x)) for each plotted monitoring data. = Αx + β) is extracted to extract a model 1042 representing a temporal change in the CPU usage rate. The model extraction unit 105 stores the extracted model 1042 in the knowledge information storage unit 104.

また、モデル抽出部１０５は、図４（ｂ）に示すように、時間に対するスループットの関係を表す座標系において、処理Ａ及び処理Ｂ夫々に関するスループットを示す時間１〜時間３の監視データをプロットし、処理Ａと処理Ｂとの夫々について各監視データの線形近似式（ｆ_A（ｘ）＝α₁ｘ＋β₁、ｆ_B（ｘ）＝α₂ｘ＋β₂）を求めることによって、スループットの時系的変化を表すモデル１０４２を抽出する。モデル抽出部１０５は、抽出したモデル１０４２を知識情報蓄積部１０４に対して蓄積する。 Further, as shown in FIG. 4B, the model extraction unit 105 plots monitoring data from time 1 to time 3 indicating the throughput for each of the processing A and the processing B in the coordinate system representing the relationship of the throughput with respect to time. For each of the processing A and the processing B, a linear approximation expression (f _A (x) = α ₁ x + β ₁ , f _B (x) = α ₂ x + β ₂ ) of each monitoring data is obtained, so that the time series of the throughput is obtained. A model 1042 representing a change is extracted. The model extraction unit 105 stores the extracted model 1042 in the knowledge information storage unit 104.

次に、モデル抽出部１０５は、これらの２つのモデル１０４２に対して相関分析及び多変量解析を行うことで、図４（ｃ）に示すように、処理Ａと処理Ｂとの夫々について、ＣＰＵ使用率とスループットとの相関を表す線形近似式（ｆ_TA（ｘ）＝ρ₁ｘ＋θ₁、ｆ_TB（ｘ）＝ρ₂ｘ＋θ₂）を求め、ＣＰＵ使用率とスループットとの相関を示すモデル１０４２を抽出する（ステップＳ３０３）。モデル抽出部１０５は、抽出したモデル１０４２を知識情報蓄積部１０４に対して蓄積する。 Next, the model extraction unit 105 performs correlation analysis and multivariate analysis on these two models 1042, and as shown in FIG. A model 1042 that obtains a linear approximation expression (f _TA (x) = ρ ₁ x + θ ₁ , f _TB (x) = ρ ₂ x + θ ₂ ) representing the correlation between the usage rate and the throughput and indicates the correlation between the CPU usage rate and the throughput. Is extracted (step S303). The model extraction unit 105 stores the extracted model 1042 in the knowledge information storage unit 104.

続いて、モデル診断部１０６は、知識情報蓄積部１０４に蓄積される複数のモデル１０４２と各モデル１０４２に該当するポリシー１０４１を夫々参照し、各モデル１０４２に対して該当するポリシー１０４１に基づく診断を実行する（ステップＳ３０４）。例えば、ＣＰＵ使用率の時系列変化を表すモデル１０４２に対しては、「ＣＰＵ使用率が０〜１０％であれば余剰である、ＣＰＵ使用率が１１〜８０％であれば正常である、ＣＰＵ使用率が８１％以上であれば過負荷である」というポリシー１０４１が適用される。そして、今回抽出したモデル１０４２から将来のＣＰＵ使用率を予測することもできる。今回抽出したモデル１０４２の傾向でＣＰＵ使用率が増加していき、例えば１カ月後のＣＰＵ使用率が８０％を越えることが予測される場合には、ＣＰＵ使用率に将来（１カ月後）に問題が生じる可能性があると診断する。 Subsequently, the model diagnosis unit 106 refers to each of the plurality of models 1042 stored in the knowledge information storage unit 104 and the policy 1041 corresponding to each model 1042, and performs diagnosis based on the corresponding policy 1041 for each model 1042. Execute (Step S304). For example, for the model 1042 representing the time series change of the CPU usage rate, “CPU usage rate is 0 to 10% is surplus, and CPU usage rate is 11 to 80%, normal. A policy 1041 is applied that indicates that the usage rate is 81% or more, an overload occurs. The future CPU usage rate can be predicted from the model 1042 extracted this time. The CPU usage rate is increasing due to the trend of the model 1042 extracted this time. For example, when the CPU usage rate after one month is predicted to exceed 80%, the CPU usage rate is set to the future (after one month). Diagnose a potential problem.

また、例えば、スループットの時系列変化を表すモデル１０４２に対しては、「スループットが１００トランザクション/秒以上であればサービスレベルが所定の範囲に収まる、スループットが１００トランザクション/秒以上であればサービスレベルが所定の範囲内に収まらない」というポリシー１０４１が適用される。同じく今回抽出したモデル１０４２からスループットを予測することもできる。今回抽出したモデル１０４２の傾向でスループットが増加していき、例えば３週間後にスループットが１００トランザクション/秒を越えることが予測される場合には、スループットに将来（３週間後）に問題が生じる可能性があると診断する。 Further, for example, for the model 1042 representing a time-series change in throughput, the service level is within a predetermined range if the throughput is 100 transactions / second or more, and the service level if the throughput is 100 transactions / second or more. Is not within the predetermined range ”policy 1041 is applied. Similarly, the throughput can be predicted from the model 1042 extracted this time. Throughput increases due to the trend of the model 1042 extracted this time. For example, if it is predicted that the throughput will exceed 100 transactions / second after 3 weeks, there is a possibility that problems will occur in the future (after 3 weeks). Diagnose that there is.

また、例えば、ＣＰＵ使用率とスループットとの相関を表すモデル１０４２に対しては、「ＣＰＵ使用率とスループットとの相関関係が前後１日において誤差１０％以内に収めるべきである」というポリシー１０４１が適用される。このモデル１０４２からはＣＰＵ使用率に対するスループットの傾向を判定することができるため、例えば、上記ポリシー１４０１に基づきＣＰＵ使用率に対してスループットが１年前と比較して１０％以上低い（又は、高い）と判定される場合には、問題があると診断される。 Further, for example, for the model 1042 representing the correlation between the CPU usage rate and the throughput, the policy 1041 that “the correlation between the CPU usage rate and the throughput should be within 10% error in the preceding and following day” is provided. Applied. Since the model 1042 can determine the tendency of the throughput with respect to the CPU usage rate, for example, based on the policy 1401, the throughput is 10% or more lower (or higher) than the previous year with respect to the CPU usage rate. ), It is diagnosed that there is a problem.

なお、ここでは、ＣＰＵ使用率の時系的変化を示すモデル１０４２とスループットの時系的変化を示すモデル１０４２とを抽出した後、ＣＰＵ使用率とスループットとの相関を示すモデル１０４２を抽出する流れのみについて説明しているが、ＣＰＵ使用率の時系列変化を示すモデル１０４２、スループットの時系的変化を示すモデル１０４２、ＣＰＵ使用率とスループットとの相関を示すモデル１０４２の抽出処理は夫々独立して行なうことができる。つまり、本実施形態におけるモデル抽出処理は、図３−１に示す流れには限られず、それぞれのモデルの抽出処理は任意のタイミングで行なわれる。また、ＣＰＵ使用率の時系列変化を示すモデル１０４２、スループットの時系的変化を示すモデル１０４２及びＣＰＵ使用率とスループットとの相関を示すモデル１０４２の全てを抽出せずに、そのうちの一部のモデルを抽出することもできる。即ち、ＣＰＵ使用率とスループットとの相関を示すモデル１０４２は抽出せずにＣＰＵ使用率の時系列変化を示すモデル１０４２及びスループットの時系的変化を示すモデル１０４２の２つのモデルだけを抽出することもできるし、ＣＰＵ使用率の時系列変化を示すモデル１０４２とスループットの時系的変化を示すモデル１０４２との何れか一方の１つのモデルのみを抽出することもできる。 Here, after extracting the model 1042 indicating the temporal change of the CPU usage rate and the model 1042 indicating the temporal change of the throughput, the flow of extracting the model 1042 indicating the correlation between the CPU usage rate and the throughput is extracted. However, the model 1042 showing the time series change of the CPU usage rate, the model 1042 showing the time series change of the throughput, and the model 1042 showing the correlation between the CPU usage rate and the throughput are independent of each other. Can be done. That is, the model extraction process in the present embodiment is not limited to the flow shown in FIG. 3A, and each model extraction process is performed at an arbitrary timing. Further, without extracting all of the model 1042 indicating the time series change of the CPU usage rate, the model 1042 indicating the time-based change of the throughput, and the model 1042 indicating the correlation between the CPU usage rate and the throughput, a part of them is extracted. A model can also be extracted. That is, without extracting the model 1042 indicating the correlation between the CPU usage rate and the throughput, only the two models, that is, the model 1042 indicating the time series change in the CPU usage rate and the model 1042 indicating the time series change in the throughput are extracted. It is also possible to extract only one of the model 1042 indicating the time-series change in the CPU usage rate and the model 1042 indicating the time-series change in the throughput.

続いて、計画部１０７は、モデル診断部１０６によりモデル１０４２に問題があると診断された場合、知識情報蓄積部１０４においてその問題の事象に紐付けられる対策リスト１０４３を選択する（ステップＳ３０５／ＹＥＳ、Ｓ３０６）。例えば、対象となる事象が１カ月後におけるＣＰＵ使用率が８０％を超過するという事象の場合、上述した（１）〜（６）の対策を含む対策リスト１０４３が選択されることになる。 Subsequently, when the model diagnosis unit 106 diagnoses that the model 1042 has a problem, the planning unit 107 selects a countermeasure list 1043 associated with the problem event in the knowledge information storage unit 104 (step S305 / YES). , S306). For example, when the target event is an event in which the CPU usage rate after one month exceeds 80%, the countermeasure list 1043 including the countermeasures (1) to (6) described above is selected.

ここで、計画部１０７は、当該ポリシー１０４１に基づいてシミュレーション部１０８にシミュレーション処理を依頼するか否かを判断する（ステップＳ３０７）。例えば、当該ポリシー１０４１が「システムの応答がない場合は、即座に再起動する」である場合には、シミュレーション部１０８に対してシミュレーション処理を依頼せず、即座に対策の実行をスケジューリングする（ステップＳ３０７／ＮＯ、Ｓ３１２）。また、対象となる事象が緊急の対処を要するものであるとして予めポリシー１０４１において定められている場合には、その問題がある事象の内容と当該事象に紐付けられている対策リスト１０４３をユーザに対して報知してもよい。これによって、ユーザは報知された対策リスト１０４３のうちから所望の対策を選択し、対策の実行を行うことができる。 Here, the planning unit 107 determines whether to request a simulation process from the simulation unit 108 based on the policy 1041 (step S307). For example, if the policy 1041 is “restart immediately if there is no system response”, the simulation unit 108 is not requested to perform a simulation process, and the execution of countermeasures is immediately scheduled (step S307 / NO, S312). If the target event is determined in advance in the policy 1041 as requiring an urgent action, the contents of the event having the problem and the countermeasure list 1043 associated with the event are displayed to the user. You may alert | report. As a result, the user can select a desired countermeasure from the notified countermeasure list 1043 and execute the countermeasure.

一方、例えば、当該ポリシー１０４１が「ＣＰＵ使用率が過負荷の場合は、シミュレーションを実行して最適な結果を残した対策を選択する」である場合、計画部１０７は、対策リスト１０４３に含まれる各対策のシミュレーション処理をシミュレーション部１０８に対して依頼する（ステップＳ３０７／ＹＥＳ、Ｓ３０８）。シミュレーション部１０８は、計画部１０７によって選択された対策リスト１０４３を参照し、その対策リスト１０４３に含まれる各対策のシミュレーション処理を実行する（ステップＳ３０９）。 On the other hand, for example, when the policy 1041 is “if the CPU usage rate is overloaded, a simulation is executed to select a countermeasure that leaves an optimal result”, the planning unit 107 is included in the countermeasure list 1043. A simulation process for each countermeasure is requested to the simulation unit 108 (steps S307 / YES, S308). The simulation unit 108 refers to the countermeasure list 1043 selected by the planning unit 107, and executes a simulation process for each countermeasure included in the countermeasure list 1043 (step S309).

続いて、計画部１０７は、当該事象に該当するポリシー１０４１を知識情報蓄積部１０４から参照し、シミュレーション部１０８によるシミュレーション処理の結果のうち、参照したポリシー１０４１を満たす結果を導いた対策が存在するか否かを判定する（ステップＳ３１０）。 Subsequently, the planning unit 107 refers to the policy 1041 corresponding to the event from the knowledge information storage unit 104, and there is a countermeasure that has led to a result satisfying the referenced policy 1041 among the simulation processing results by the simulation unit 108. It is determined whether or not (step S310).

ポリシー１０４１を満たす結果を導いた対策が一つのみ存在する場合、計画部１０７は、その対策の実行を決定し、当該対策の実行をスケジューリングする（ステップＳ３１０／ＹＥＳ、Ｓ３１１、Ｓ３１２）。また、ポリシー１０４１を満たす結果を導いた対策が複数存在する場合には、計画部１０７は、ポリシー１０４１「ＣＰＵ使用率が過負荷の場合は、シミュレーションを実行して最適な結果を残した対策を選択する」に基づいてその複数の対策のうち最適な結果を導いた対策の実行を決定し、当該対策の実行をスケジューリングする（ステップＳ３１０／ＹＥＳ、Ｓ３１１、Ｓ３１２）。 When there is only one measure that has led to the result satisfying the policy 1041, the planning unit 107 determines the execution of the measure and schedules the execution of the measure (steps S310 / YES, S311, and S312). In addition, when there are a plurality of countermeasures that have led to a result satisfying the policy 1041, the planning unit 107 executes a policy 1041 “if the CPU usage rate is overloaded, execute a simulation and perform a countermeasure that leaves an optimal result. Based on “select”, the execution of the countermeasure that has led to the optimum result among the plurality of countermeasures is determined, and the execution of the countermeasure is scheduled (steps S310 / YES, S311, and S312).

計画実行部１０９は、計画部１０７によって作成されたスケジュールに従って対策を実行する（ステップＳ３１３）。計画部１０７によって例えば「１カ月後に１つＣＰＵを追加する」という対策のスケジュールが作成された場合、計画実行部１０９は、計画部１０７によって上記計画が作成された日から１カ月後に対象となるＡＣ環境の装置に対してＣＰＵを１つ追加するように制御する。 The plan execution unit 109 executes countermeasures according to the schedule created by the planning unit 107 (step S313). For example, when the planning unit 107 creates a countermeasure schedule “add one CPU after one month”, the plan execution unit 109 becomes a target one month after the date when the plan is created by the planning unit 107. Control is performed so that one CPU is added to an AC environment device.

一方、シミュレーション部１０８によるシミュレーション処理の結果のうち、ポリシー１０４１を満たす結果を導いた対策が存在しないと判定された場合（当該事象に紐つけられる対策リスト１０４３にポリシー１０４１を満たす結果を導く対策が含まれない場合）、対策探索部１１０は、当該対策リスト１０４３以外の対策を知識情報蓄積部１０４から参照し、参照した対策に対するシミュレーション処理をシミュレーション処理部１０８に順次依頼する（ステップＳ３１０／ＮＯ、Ｓ３１４）。 On the other hand, when it is determined that there is no countermeasure that led to the result satisfying the policy 1041 among the results of the simulation processing by the simulation unit 108 (the countermeasure that leads the result satisfying the policy 1041 to the countermeasure list 1043 associated with the event). If not included, the measure search unit 110 refers to measures other than the measure list 1043 from the knowledge information storage unit 104, and sequentially requests the simulation processing unit 108 to perform simulation processing for the referenced measures (step S310 / NO, S314).

シミュレーション処理部１０８は、対策探索部１１０によって依頼された対策のシミュレーション処理を実行する（ステップＳ３１５）。 The simulation processing unit 108 executes a simulation process for the countermeasure requested by the countermeasure searching unit 110 (step S315).

続いて、対策探索部１１０は、当該対策リスト１０４３以外の対策の全てについてのシミュレーション処理を依頼すると、シミュレーション部１０８による各対策に対するシミュレーション処理の結果と上記ポリシー１０４１とを照らし合わせ、ポリシー１０４１を満たす結果を導いた対策が存在するか否かを判断する（ステップＳ３１６／ＮＯ、Ｓ３１７）。なお、本実施形態では、対策探索部１１０は、知識情報蓄積部１０４内に蓄積される上記対策リスト以外の対策全てを探索する全探索手法を用いているが、他の実施形態として、上記対策リスト以外の対策をランダムに探索するランダム探索手法や一定のポリシー（条件）を満たす対策が発見された時点で探索を止める最適化方法論等を利用することもできる。 Subsequently, when the countermeasure search unit 110 requests a simulation process for all the countermeasures other than the countermeasure list 1043, the countermeasure search unit 110 compares the result of the simulation process for each countermeasure by the simulation unit 108 with the policy 1041, and satisfies the policy 1041. It is determined whether there is a countermeasure that has led to the result (steps S316 / NO, S317). In the present embodiment, the countermeasure search unit 110 uses a full search method for searching all countermeasures other than the countermeasure list stored in the knowledge information storage unit 104. However, as another embodiment, the countermeasure search unit 110 It is also possible to use a random search method for randomly searching for measures other than the list, an optimization methodology for stopping the search when a measure satisfying a certain policy (condition) is found, or the like.

ポリシー１０４１を満たす結果を導いた対策が一つのみ存在する場合、対策探索部１１０は、その対策の実行を決定し、当該対策の実行のスケジューリングを計画部１０７に対して依頼する（ステップＳ３１７／ＹＥＳ、Ｓ３１８）。また、ポリシー１０４１を満たす結果を導いた対策が複数存在する場合、対策探索部１１０は、その複数の対策のうち最適な結果を導いた対策の実行を決定し、当該探索の実行のスケジューリングを計画部１０７に対して依頼する（ステップＳ３１７／ＹＥＳ、Ｓ３１８）。一方、ポリシー１０４１を満たす結果を導いた対策が存在しない場合（ステップＳ３１７／ＮＯ）、ステップＳ３０１の処理に戻る。 When there is only one countermeasure that has led to the result satisfying the policy 1041, the countermeasure searching unit 110 determines the execution of the countermeasure and requests the planning unit 107 to schedule the execution of the countermeasure (Step S317 / YES, S318). Further, when there are a plurality of countermeasures that have led to the result satisfying the policy 1041, the countermeasure searching unit 110 determines execution of the countermeasure that has led to the optimum result among the plurality of countermeasures, and schedules the execution of the search. Request is made to the unit 107 (steps S317 / YES, S318). On the other hand, if there is no countermeasure that has led to the result satisfying the policy 1041 (step S317 / NO), the process returns to step S301.

続いて、対策探索部１１０は、計画部１０７に対してスケジューリングを依頼した対策を、当該事象の対策リスト１０４３に追加して紐つける（ステップＳ３１９）。このように対策探索部１１０によって今回探索された対策が対策リスト１０４３に追加される。従って、次回、同じ事象が分析部１０３によって分析、又は、モデル診断部１０６によって診断された場合、ステップＳ３１４〜ステップＳ３１９を行うことなく、今回探索された対策についてのシミュレーション処理を行うことが可能となる。 Subsequently, the countermeasure searching unit 110 adds the countermeasure requested for scheduling to the planning unit 107 to the countermeasure list 1043 of the event and links the countermeasure (step S319). In this way, the countermeasure searched this time by the countermeasure searching unit 110 is added to the countermeasure list 1043. Therefore, next time, when the same event is analyzed by the analysis unit 103 or diagnosed by the model diagnosis unit 106, it is possible to perform the simulation process for the countermeasure searched this time without performing steps S314 to S319. Become.

続いて、計画部１０７は、対策探索部１１０から依頼された対策の実行をスケジューリングする（ステップＳ３１２）。 Subsequently, the planning unit 107 schedules execution of the countermeasure requested from the countermeasure searching unit 110 (step S312).

計画実行部１０９は、計画部１０７によって作成されたスケジュールに従って対策を実行する（ステップＳ３１３）。 The plan execution unit 109 executes countermeasures according to the schedule created by the planning unit 107 (step S313).

次に、図１０及び図３−２を用いて、モニタ部１０１から直接得られるＣＢＥデータを分析し、分析結果に問題がある場合には対策を実行するまでの処理について説明する。なお、図３−２は、上述したように、ＡＣ環境の診断処理を含む流れを説明する上でも用いている。以下に説明する分析処理を含む流れにおいても図３−２と同様の処理が行なわれるため、図３−２に該当する処理については適宜説明を省略する。 Next, with reference to FIG. 10 and FIG. 3-2, processing until CBE data obtained directly from the monitor unit 101 is analyzed and when there is a problem in the analysis result will be described. Note that FIG. 3-2 is also used to describe the flow including the diagnosis process of the AC environment as described above. In the flow including the analysis processing described below, the same processing as that in FIG. 3-2 is performed, and thus the description of the processing corresponding to FIG. 3-2 will be omitted as appropriate.

図１１において、分析部１０３は、モニタ部１０１からＣＢＥデータを取得し、当該ＣＢＥデータに該当するポリシー１０４１を知識情報蓄積部１０４から参照し、参照したポリシー１０４１に基づいて当該ＣＢＥデータに問題がないかを分析する（ステップＳ１１０１、Ｓ１１０２）。上述したように、ＣＢＥデータがＣＰＵの使用率を示すデータであって、且つ、「ＣＰＵ使用率が０〜１０％であれば余剰である、ＣＰＵ使用率が１１〜８０％であれば正常である、ＣＰＵ使用率が８１％以上であれば過負荷である」というポリシー１０４１であれば、ＣＢＥデータにより示されるＣＰＵの使用率が８０％を越えていたらＣＢＥデータに問題があると分析され、反対にＣＢＥデータにより示されるＣＰＵの使用率が８０％未満である場合には、ＣＢＥデータには問題がないと分析される。 In FIG. 11, the analysis unit 103 acquires CBE data from the monitor unit 101, refers to the policy 1041 corresponding to the CBE data from the knowledge information storage unit 104, and has a problem with the CBE data based on the referenced policy 1041. It is analyzed whether there is any (steps S1101 and S1102). As described above, the CBE data is data indicating the usage rate of the CPU, and “if the CPU usage rate is 0 to 10%, it is redundant, and if the CPU usage rate is 11 to 80%, it is normal. If the policy 1041 is “If the CPU usage rate is 81% or more”, the policy 1041 indicates that the CPU usage rate indicated by the CBE data exceeds 80%, and that there is a problem with the CBE data. On the contrary, if the CPU usage rate indicated by the CBE data is less than 80%, it is analyzed that there is no problem in the CBE data.

続いて、計画部１０７は、分析部１０３によりＣＢＥデータに問題があると分析された場合、知識情報蓄積部１０４においてその問題の事象に紐付けられる対策リスト１０４３を選択する（ステップＳ１１０２／ＹＥＳ、Ｓ１１０３）。 Subsequently, when the analysis unit 103 analyzes that there is a problem in the CBE data, the planning unit 107 selects a countermeasure list 1043 associated with the problem event in the knowledge information storage unit 104 (step S1102 / YES, S1103).

次に、計画部１０７は、当該ポリシー１０４１に基づいてシミュレーション部１０８にシミュレーション処理を依頼するか否かを判断する。例えば、当該ポリシー１０４１が「システムの応答がない場合は、即座に再起動する」である場合には、シミュレーション部１０８に対してシミュレーション処理を依頼せず、即座に対策の実行をスケジューリングする（ステップＳ１１０４／ＮＯ、Ｓ３１２）。また、対象となる事象が緊急の対処を要するものであるとして予めポリシー１０４１において定められている場合には、その問題がある事象の内容と当該事象に紐付けられている対策リスト１０４３をユーザに対して報知してもよい。これによって、ユーザは報知された対策リスト１０４３のうちから所望の対策を選択し、対策の実行を行なうことができる。なお、ステップＳ３１２以降の処理は、ＡＣ環境の診断処理を含む流れと同様であるため、説明を省略する。 Next, the planning unit 107 determines whether to request a simulation process from the simulation unit 108 based on the policy 1041. For example, if the policy 1041 is “restart immediately if there is no system response”, the simulation unit 108 is not requested to perform a simulation process, and the execution of countermeasures is immediately scheduled (step S1104 / NO, S312). If the target event is determined in advance in the policy 1041 as requiring an urgent action, the contents of the event having the problem and the countermeasure list 1043 associated with the event are displayed to the user. You may alert | report. As a result, the user can select a desired countermeasure from the notified countermeasure list 1043 and execute the countermeasure. Since the processing after step S312 is the same as the flow including the AC environment diagnosis processing, the description thereof will be omitted.

一方、例えば、当該ポリシー１０４１が「ＣＰＵ使用率が過負荷の場合は、シミュレーションを実行して最適な結果を残した対策を選択する」である場合、計画部１０７は、対策リスト１０４３に含まれる各対策のシミュレーション処理をシミュレーション部１０８に対して依頼する（ステップＳ１１０４／ＹＥＳ、Ｓ３０８）。シミュレーション部１０８は、計画部１０７によって選択された対策リスト１０４３を参照し、その対策リスト１０４３に含まれる各対策のシミュレーション処理を実行する（ステップＳ３０９）。なお、ステップＳ３１０以降の処理は、ＡＣ環境の診断処理を含む流れと同様であるため、説明を省略する。 On the other hand, for example, when the policy 1041 is “if the CPU usage rate is overloaded, a simulation is executed to select a countermeasure that leaves an optimal result”, the planning unit 107 is included in the countermeasure list 1043. A simulation process for each countermeasure is requested to the simulation unit 108 (steps S1104 / YES, S308). The simulation unit 108 refers to the countermeasure list 1043 selected by the planning unit 107, and executes a simulation process for each countermeasure included in the countermeasure list 1043 (step S309). Note that the processing after step S310 is the same as the flow including the AC environment diagnosis processing, and thus the description thereof is omitted.

以上のように、本実施形態においては、監視データ（ＣＢＥデータ）に対応するポリシーから現在の問題の事象を分析（判定）し、又は、監視データ（ＣＢＥデータ）の履歴からモデルを抽出して当該モデルとそのモデルに対応するポリシーから現在・将来の問題の事象を診断（判定）し、上記判定結果に基づいてその事象に対応する対策リストによるシミュレーション処理を行って、各対策の効果を評価するようにしている。即ち、本実施形態は、ＡＣ環境内における各装置が様々な事象に陥った場合でも、その事象に対応する対策リストによるシミュレーション処理によって各対策の効果を評価することができる。 As described above, in this embodiment, the current problem event is analyzed (determined) from the policy corresponding to the monitoring data (CBE data), or the model is extracted from the history of the monitoring data (CBE data). Diagnose (determine) current and future problem events from the model and the policy corresponding to that model, and perform simulation processing using the countermeasure list corresponding to the event based on the above determination results to evaluate the effectiveness of each countermeasure Like to do. That is, according to the present embodiment, even when each device in the AC environment falls into various events, the effect of each countermeasure can be evaluated by simulation processing using a countermeasure list corresponding to the event.

従って、本実施形態によれば、各対策の効果に関する評価結果に基づいて、ＡＣ環境内における各装置の様々な事象に対して最も的確な対策を選択・策定することが可能である。 Therefore, according to the present embodiment, it is possible to select and formulate the most appropriate countermeasure for various events of each device in the AC environment based on the evaluation result regarding the effect of each countermeasure.

また、本実施形態では、計画部１０７が最適な効果を導いた対策の決定及び対策の実行のスケジューリングを行い、計画実行部１０９によってそのスケジューリングに従って対策を自動的に実行することが可能である。 Further, in the present embodiment, the planning unit 107 can determine the countermeasure that has led to the optimum effect and schedule the execution of the countermeasure, and the plan executing unit 109 can automatically execute the countermeasure according to the scheduling.

さらに、本実施形態では、仮に或る事象に対応する対策リストから最適な対策が発見できなかった場合でも、その他の対策を探索することによって、当該事象に適用する対策の幅を事前の対策リストから更に広げることが可能である。 Furthermore, in the present embodiment, even when an optimal countermeasure cannot be found from the countermeasure list corresponding to a certain event, the range of countermeasures to be applied to the event is determined in advance by searching for other countermeasures. It is possible to further expand from.

以上では、ＣＰＵ使用率の時系列変化、スループットの時系列変化及びＣＰＵ使用率とスループットとの相関関係を表すモデルを抽出した場合について説明を行った。これら以外にも、例えば、図９に示すように、前後１日において処理Ａのスループットと処理Ｂのスループットとの監視データを取得し、それらに基づいて処理Ａのスループットと処理Ｂのスループットとの相関関係を表すモデルｆ_TAB1（ｘ）＝ρ_AB1ｘ＋θ_AB1、ｆ_TAB2（ｘ）＝ρ_AB2ｘ＋θ_AB2を抽出して問題の事象を診断することも可能である。即ち、それらのモデルに紐付けられるポリシーが「処理Ａのスループットと処理Ｂのスループットとの相関関係が前後１日において誤差１０％以内に収めるべきである」である場合、図９に示すように、ｆ_TAB1（ｘ₁）とｆ_TAB2（ｘ₁）の間に１０％以上の誤差があれば、処理Ａのスループットと処理Ｂのスループットとのバランスが崩れる可能性があると分析又は診断する。その後は同様に、この問題の事象に対応する対策リストによるシミュレーション処理が実行され、最適な結果を導いた対策が実行される。 In the above, the case where the model showing the time series change of the CPU usage rate, the time series change of the throughput, and the correlation between the CPU usage rate and the throughput has been described. In addition to these, for example, as shown in FIG. 9, monitoring data of the throughput of the process A and the throughput of the process B is acquired in the previous and next day, and the throughput of the process A and the throughput of the process B are obtained based on these data. It is also possible to extract a model f _TAB1 (x) = ρ _AB1 x + θ _AB1 and f _TAB2 (x) = ρ _AB2 x + θ _AB2 representing the correlation to diagnose a problem event. That is, when the policy associated with these models is “the correlation between the throughput of process A and the throughput of process B should be within 10% of error in the previous and next day”, as shown in FIG. , F _TAB1 (x ₁ ) and f _TAB2 (x ₁ ) are analyzed or diagnosed that there is a possibility that the balance between the throughput of process A and the throughput of process B may be lost if there is an error of 10% or more. Thereafter, similarly, a simulation process is executed using a countermeasure list corresponding to the event of this problem, and a countermeasure that has led to an optimum result is performed.

本発明は、以上に述べたモデル以外にもＡＣ環境から取得し得る監視データに基づいて、種々のモデルを抽出できることは勿論である。また、同一の装置から得られた監視データだけでなく、異なる複数の装置から監視データを得て、装置間の監視データの相関関係を表すモデル等の抽出を行うことも可能である。 In the present invention, various models can be extracted based on monitoring data that can be acquired from the AC environment in addition to the models described above. In addition to monitoring data obtained from the same device, it is also possible to obtain monitoring data from a plurality of different devices and extract a model or the like representing the correlation of the monitoring data between the devices.

本発明の実施形態に係るＡＣ性能監視装置の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the AC performance monitoring apparatus which concerns on embodiment of this invention. ＡＣ性能監視装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of AC performance monitoring apparatus. 本発明の実施形態に係るＡＣ性能監視装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the AC performance monitoring apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るＡＣ性能監視装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the AC performance monitoring apparatus which concerns on embodiment of this invention. モデルの抽出方法を具体的に説明するための図である。It is a figure for demonstrating the extraction method of a model concretely. 複数の処理分散パターンを説明するための図である。It is a figure for demonstrating a some process distribution pattern. 問題の事象が分析部によって分析又はモデル診断部によって診断される例を具体的に説明するための図である。It is a figure for demonstrating concretely the example in which the phenomenon of a problem is analyzed by an analysis part, or is diagnosed by a model diagnosis part. 問題の事象が分析部によって分析又はモデル診断部によって診断される例を具体的に説明するための図である。It is a figure for demonstrating concretely the example in which the phenomenon of a problem is analyzed by an analysis part, or diagnosed by a model diagnosis part. 問題の事象が分析部によって分析又はモデル診断部によって診断される例を具体的に説明するための図である。It is a figure for demonstrating concretely the example in which the phenomenon of a problem is analyzed by an analysis part, or diagnosed by a model diagnosis part. 他のモデルの抽出例及びそのモデルに基づく問題の事象の分析又は診断例を説明するための図である。It is a figure for demonstrating the example of analysis or diagnosis of the problem phenomenon based on the extraction example of another model, and its model. 本発明の実施形態に係るＡＣ性能監視装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the AC performance monitoring apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るＡＣ性能監視装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the AC performance monitoring apparatus which concerns on embodiment of this invention.

Explanation of symbols

１００：ＡＣ性能監視装置
１０１：モニタ部
１０２：イベント情報蓄積部
１０３：分析部
１０４：知識情報蓄積部
１０５：モデル抽出部
１０６：モデル診断部
１０７：計画部
１０８：シミュレーション部
１０９：計画実行部
１１０：対策探索部
１００１：サーバ類
１００２：ストレージ類
１００３：ネットワーク装置類
１００４：非ＡＣ環境
１０４１：ポリシー
１０４２：モデル
１０４３：対策リスト 100: AC performance monitoring device 101: Monitor unit 102: Event information storage unit 103: Analysis unit 104: Knowledge information storage unit 105: Model extraction unit 106: Model diagnosis unit 107: Planning unit 108: Simulation unit 109: Plan execution unit 110 : Countermeasure search unit 1001: Servers 1002: Storages 1003: Network devices 1004: Non-AC environment 1041: Policy 1042: Model 1043: Countermeasure list

Claims

A performance monitoring device connected to at least one external device via a communication line,
Obtaining means for obtaining state information relating to the state of the external device;
Determination means for determining the state of the external device based on the state information acquired by the acquisition means;
Referring to the countermeasure list corresponding to the determination result by the determination means, execute a simulation process related to the state of the external device by each of at least one countermeasure information included in the countermeasure list, and A performance monitoring apparatus comprising a simulation means for evaluating an effect.

Storage means for storing the status information acquired by the acquisition means in an external or internal recording medium;
Model extraction means for extracting model information representing the state of the external device based on the history of the state information stored in the recording medium;
The performance monitoring apparatus according to claim 1, wherein the determination unit determines a state of the external device based on the model information.

3. The determination unit according to claim 2, wherein the determination unit determines the state of the external device based on the model information and policy information related to the operation of the external device according to the type of the state information. The performance monitoring device described.

4. The apparatus according to claim 1, further comprising countermeasure determining means for determining one countermeasure information from the countermeasure list based on an evaluation result of the effect of each countermeasure by the simulation means. 5. Performance monitoring device.

When one measure information cannot be determined from the measure list based on the evaluation result by the measure determining unit, the measure determining unit further includes search means for searching for other measure information not included in the measure list,
5. The performance according to claim 4, wherein the simulation means executes a simulation process related to the state of the external device based on the other countermeasure information, and evaluates the effect of the countermeasure indicated by the other countermeasure information. Monitoring device.

When the measure determining means determines the other measure information based on the evaluation result of the effect of the measure indicated by the other measure information, the searching means associates the other measure information with the determination result. 6. The performance monitoring apparatus according to claim 5, wherein

A performance monitoring method by a performance monitoring device connected to at least one external device via a communication line,
An acquisition step of acquiring state information relating to the state of the external device;
A determination step of determining a state of the external device based on the state information acquired by the acquisition step;
Referring to the countermeasure list corresponding to the determination result of the determination step, execute a simulation process related to the state of the external device by each of at least one countermeasure information included in the countermeasure list, and the countermeasure list indicated by each countermeasure information And a simulation step for evaluating the effect.

A program for causing a computer to execute the performance monitoring method according to claim 7.