JP6752889B2

JP6752889B2 - Management method and management calculator

Info

Publication number: JP6752889B2
Application number: JP2018533329A
Authority: JP
Inventors: 鈴木　克典; 克典鈴木; 金子　聡; 聡金子; 裕教江丸
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-08-09
Filing date: 2016-08-09
Publication date: 2020-09-09
Anticipated expiration: 2036-08-09
Also published as: JPWO2018029767A1; WO2018029767A1

Description

本発明は、管理方法に関する。 The present invention relates to a management method.

計算機システムの運用管理においては、サーバやストレージシステムといったシステム構成要素の性能を監視し、性能低下が検知された場合には性能が自然と回復したか否かに限らず、将来的に性能低下が再発しないように構成変更等の対策を講じる必要がある。しかしながら、実施し得る構成変更手段と構成変更手段のパラメータとの組み合わせは大量である。そのため、管理者負担軽減のためには性能低下再現を抑止する対策案を提案する支援が重要である。 In computer system operation management, the performance of system components such as servers and storage systems is monitored, and if performance degradation is detected, it does not matter whether the performance recovers naturally or not, and performance degradation will occur in the future. It is necessary to take measures such as configuration changes to prevent recurrence. However, there are a large number of possible combinations of the configuration changing means and the parameters of the configuration changing means. Therefore, in order to reduce the burden on the administrator, it is important to support the proposal of countermeasures to prevent the reproduction of performance degradation.

特許文献１によれば、ストレージシステム内の論理ボリュームに対するＩ／Ｏ量が変化して所定の要求性能を満たさなくなった場合に、要求性能を満たさなくなった時点の構成情報、性能情報、及び容量消費傾向を基に、将来時点で論理ボリュームの要求性能を満たす状態とする構成変更案を生成し、最も適切な構成変更案を表示することができる。また、将来の構成変更予定の情報を基に、将来のストレージシステムの消費容量と構成変化を予測し、構成変更実施予定時刻において実施することができる構成変更案のみを表示することができる。 According to Patent Document 1, when the amount of I / O for the logical volume in the storage system changes and the required performance is not satisfied, the configuration information, the performance information, and the capacity consumption at the time when the required performance is not satisfied are satisfied. Based on the tendency, it is possible to generate a configuration change proposal that satisfies the required performance of the logical volume at a future point and display the most appropriate configuration change proposal. In addition, based on the information of the future configuration change schedule, it is possible to predict the future capacity consumption and the configuration change of the storage system, and display only the configuration change proposal that can be implemented at the scheduled configuration change implementation time.

国際公開第２０１４／０７３０４５号International Publication No. 2014/073045

上述の特許文献１は、論理ボリュームの所定の要求性能を満たさなくなった時点、及び構成変更に要する時間を加味した構成変更終了時点の、構成情報と性能情報を基に、要求性能を満たすことができる構成変更案を生成する。 The above-mentioned Patent Document 1 can satisfy the required performance based on the configuration information and the performance information at the time when the predetermined required performance of the logical volume is not satisfied and at the time when the configuration change is completed in consideration of the time required for the configuration change. Generate possible configuration change proposals.

しかしながら、変化が緩やかな容量消費傾向に比べて性能はアプリケーションのワークロードに応じて刻々と大きく変化する。そのため、特許文献１に基づき特定時点のみの情報を用いて生成した対策案では、性能低下時の負荷が再度発生した場合に、着目した時点に関しては性能低下再発が抑止されたとしても、それ以外の時点では適切な性能改善効果が見込めず性能低下が発生する可能性がある。 However, compared to the gradual change in capacity consumption tendency, the performance changes greatly from moment to moment depending on the workload of the application. Therefore, in the countermeasure plan generated by using the information only at a specific time point based on Patent Document 1, when the load at the time of performance deterioration occurs again, even if the recurrence of the performance deterioration is suppressed at the time of focus, other than that. At this point, an appropriate performance improvement effect cannot be expected and performance deterioration may occur.

一方で、前述の通り性能は刻々と変化するため、性能低下再現を防ぎたい複数の過去時点について、特許文献１に基づき対策案を生成したとしても、着目する時点によって生成される対策案が異なる可能性がある。全ての時点について適切な対策案を生成しようとすると、膨大な計算時間が必要になる。 On the other hand, as described above, the performance changes from moment to moment, so even if a countermeasure plan is generated based on Patent Document 1 for a plurality of past time points in which performance degradation is desired to be prevented, the countermeasure plan generated differs depending on the time of interest. there is a possibility. It takes a huge amount of calculation time to generate appropriate countermeasures at all points in time.

上記課題を解決するために、本発明の一態様である管理方法は、計算機システム内の複数のリソースのうち過去の期間の前記計算機システムの性能低下の原因であるボトルネックリソースに対し、前記期間の前記ボトルネックリソースの負荷の時系列データであるボトルネック負荷データを取得し、前記複数のリソースのうち前記ボトルネックリソースの負荷に影響を与える関連リソースに対し、前記期間の前記関連リソースの負荷の時系列データである関連負荷データを取得し、前記ボトルネック負荷データの傾向に基づいて、前記計算機システムの操作に必要なパラメータの決定の方針を示す複数の対策方針候補の中から、前記性能低下のための対策方針を選択し、前記対策方針と前記関連負荷データの傾向とに基づいて、夫々が前記計算機システムの操作を示す複数の対策手段候補の中から、前記性能低下のための対策手段を選択し、前記対策方針に基づいて、前記対策手段のパラメータを決定する、ことを備える。 In order to solve the above problems, the management method according to one aspect of the present invention has a period of the bottleneck resource that is the cause of the performance deterioration of the computer system in the past period among a plurality of resources in the computer system. The bottleneck load data, which is the time-series data of the load of the bottleneck resource, is acquired, and the load of the related resource during the period is relative to the related resource that affects the load of the bottleneck resource among the plurality of resources. The performance is selected from a plurality of countermeasure policy candidates that acquire the related load data, which is the time-series data of the above, and indicate the policy for determining the parameters necessary for operating the computer system based on the tendency of the bottleneck load data. A countermeasure for the deterioration is selected, and based on the countermeasure policy and the tendency of the related load data, the countermeasure for the performance deterioration is selected from a plurality of countermeasure candidate candidates each indicating the operation of the computer system. It is provided that the means is selected and the parameters of the countermeasure means are determined based on the countermeasure policy.

期間に亘って適切な対策案を、迅速に生成することができる。 Appropriate countermeasures can be quickly generated over a period of time.

実施例１に係る管理計算機の動作の概要を示す。An outline of the operation of the management calculator according to the first embodiment is shown. 実施例１に係る計算機システムのハードウェア構成を示す。The hardware configuration of the computer system according to the first embodiment is shown. 実施例１に係る管理計算機のハードウェア構成を示す。The hardware configuration of the management computer according to the first embodiment is shown. 実施例１に係る計算機システムの論理構成を示す。The logical configuration of the computer system according to the first embodiment is shown. ＶＭ管理テーブル１１０１を示す。The VM management table 1101 is shown. ホスト管理テーブル１１０２を示す。The host management table 1102 is shown. ストレージボリューム管理テーブル１１０３を示す。The storage volume management table 1103 is shown. しきい値管理テーブル１１０４を示す。The threshold management table 1104 is shown. 性能管理テーブル１１０５を示す。The performance management table 1105 is shown. 対策手段管理テーブル１１０６を示す。Countermeasure means management table 1106 is shown. メトリック管理テーブル１１０７を示す。The metric management table 1107 is shown. 対策手段適用管理テーブル１１０８を示す。Countermeasure means application management table 1108 is shown. 選択手段保持テーブル１１０９を示す。The selection means holding table 1109 is shown. 性能監視処理のフローチャートである。It is a flowchart of performance monitoring processing. 対策案生成処理のフローチャートである。It is a flowchart of a countermeasure plan generation process. Ｓｃｒｉｐｔ１のフローチャートである。It is a flowchart of Script1. Ｓｃｒｉｐｔ２のフローチャートである。It is a flowchart of Script2. Ｓｃｒｉｐｔ３のフローチャートである。It is a flowchart of Script3. Ｓｃｒｉｐｔ４のフローチャートである。It is a flowchart of Script4. 対策方針判断処理（Ｓ２０２０）を示すフローチャートである。It is a flowchart which shows the measure policy determination process (S2020). 対策手段選択処理（Ｓ２０３０）を示すフローチャートである。It is a flowchart which shows the countermeasure means selection process (S2030). 実施例２に係る対策手段選択処理（Ｓ２０３０）を示すフローチャートである。It is a flowchart which shows the countermeasure means selection process (S2030) which concerns on Example 2. FIG. 実施例２に係る対策手段組み合わせ生成処理を示すフローチャートである。It is a flowchart which shows the countermeasure means combination generation processing which concerns on Example 2. FIG.

幾つかの実施例について、図面を参照して説明する。なお、以下に説明する実施例は請求の範囲にかかる発明を限定するものではなく、また実施例の中で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。 Some embodiments will be described with reference to the drawings. It should be noted that the examples described below do not limit the invention according to the claims, and that all the elements and combinations thereof described in the examples are indispensable for the means for solving the invention. Not exclusively.

なお、以下の説明では、「ａａａテーブル」の表現にて各種情報を説明することがあるが、各種情報は、テーブル以外のデータ構造で表現されていても良い。データ構造に依存しないことを示すために「ａａａテーブル」を「ａａａ情報」と呼ぶことができる。さらに、テーブル内の各列の値からなる情報要素を欄またはエントリと呼び、「ａａａテーブル」のエントリを、説明のために、「ａａａテーブルエントリ」と称する。 In the following description, various information may be described by the expression of "aaa table", but various information may be expressed by a data structure other than the table. The "aaa table" can be called "aaa information" to show that it does not depend on the data structure. Further, an information element consisting of the values of each column in the table is referred to as a column or entry, and an entry in the "aaa table" is referred to as an "aaa table entry" for the sake of explanation.

また、以下の説明では、単に管理計算機及びサーバを主語として処理を説明する場合があるが、これら処理は、計算機が備える制御デバイスが有するプロセッサ（例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ））によって、実行されていることを示す。同様に、単にストレージ装置を主語として処理を説明する場合には、ストレージ装置が備えるコントローラが実行していることを示す。また、上記制御デバイス及びコントローラのうちの少なくとも１つは、プロセッサそれ自体であっても良いし、制御デバイス又はコントローラが行う処理の一部又は全部を行うハードウェア回路を含んでも良い。 Further, in the following description, the processing may be described simply with the management computer and the server as the subjects, but these processings are executed by the processor (for example, CPU (Central Processing Unit)) of the control device included in the computer. Indicates that Similarly, when the process is simply described with the storage device as the subject, it indicates that the controller provided in the storage device is executing. Further, at least one of the control device and the controller may be the processor itself, or may include a hardware circuit that performs a part or all of the processing performed by the control device or the controller.

プログラムは、プログラムソースから各計算機或いはストレージ装置にインストールされても良い。プログラムソースは、例えば、プログラム配布サーバ又は記憶メディアであっても良い。 The program may be installed on each computer or storage device from the program source. The program source may be, for example, a program distribution server or storage media.

以降の説明においては、構成変更操作（例えばＶＭを移動する、オーナー権を変更する）や設定変更操作（例えばＩ／Ｏ量に上限を設定する）などの操作のことを対策手段と記載し、対策手段を実施する際に入力する必要がある設定値のことをパラメータと記載する。また、対策手段とパラメータの組み合わせのことを対策案やプランと記載する。 In the following description, operations such as configuration change operations (for example, moving a VM, changing ownership) and setting change operations (for example, setting an upper limit on the amount of I / O) are described as countermeasures. Describe the set value that needs to be entered when implementing the countermeasure measures as a parameter. In addition, the combination of countermeasure means and parameters is described as a countermeasure plan or plan.

（１−１）本実施例の形態の概要 (1-1) Outline of the embodiment of this embodiment

図１は、実施例１に係る管理計算機の動作の概要を示す。 FIG. 1 shows an outline of the operation of the management calculator according to the first embodiment.

この図は、計算機システム１０における性能低下対策案の生成指示方法の例と、性能低下対策案の提示例、及び対策案生成の流れを示す。 This figure shows an example of a method for instructing generation of a performance deterioration countermeasure plan in the computer system 10, a presentation example of a performance deterioration countermeasure plan, and a flow of generating a countermeasure plan.

管理計算機１００は、計算機システム１０と、表示装置４００とに接続される。計算機システム１０は、サーバ２００と、ストレージ装置３００とを含む。また、サーバ２００ではＶＭ７００が稼動していてもよい。これらの詳細は後述する。表示装置４００は、入力装置を含んでいてもよいし、端末装置であってもよい。計算機システムは、サーバ２００とストレージ装置３００の何れか一つであってもよいし、ＶＭとボリュームを提供する一つの計算機であってもよい。 The management computer 100 is connected to the computer system 10 and the display device 400. The computer system 10 includes a server 200 and a storage device 300. Further, the VM 700 may be running on the server 200. These details will be described later. The display device 400 may include an input device or may be a terminal device. The computer system may be any one of the server 200 and the storage device 300, or may be one computer that provides the VM and the volume.

本実施例の概要を示すための一例として、図１ではストレージ装置３００内のボリューム１に関して、性能低下が発生し要求性能を下回る状態が発生した場合を示している。以下、本実施例における処理の流れを説明する。 As an example for showing the outline of this embodiment, FIG. 1 shows a case where the performance of the volume 1 in the storage device 300 is deteriorated and the required performance is lower than the required performance. Hereinafter, the flow of processing in this embodiment will be described.

まず、管理計算機１００が計算機システムの性能低下を検知すると、管理者に対してアラートを発行する。アラートを受領すると管理者は、過去の任意期間を指定期間として指定して、指定期間の負荷が再度発生しても性能低下が再発しないことを目的とする対策案の生成を開始する。指定期間の指定の方法は例えば、表示装置４００に表示された画面に、指定期間の開始時刻と終了時刻を入力する方法でも良いし、負荷変化を示すグラフ上の点をポイントする、グラフ上の時刻を表す線分を移動させるなどの方法であっても良い。 First, when the management computer 100 detects a deterioration in the performance of the computer system, it issues an alert to the administrator. Upon receiving the alert, the administrator designates an arbitrary period in the past as a specified period, and starts generating a countermeasure plan aiming at the performance deterioration not recurring even if the load of the specified period occurs again. The method of designating the designated period may be, for example, a method of inputting the start time and end time of the designated period on the screen displayed on the display device 400, or a method of pointing a point on the graph showing the load change on the graph. A method such as moving a line segment representing the time may be used.

また、管理計算機１００が、性能低下を検知した場合に、指定期間を設定し、自動的に対策案の生成を開始してもよい。 Further, when the management computer 100 detects a performance deterioration, a designated period may be set and the generation of the countermeasure plan may be automatically started.

管理計算機１００は、対策方針判断モジュールと、対策手段選択モジュールと、対策案生成モジュールとを含む。管理計算機１００が、表示装置４００から指定期間を指定され性能低下への対策を指示されると、対策方針判断モジュールは、性能低下が発生した計算機システムの構成要素（以降、構成要素のことをリソースと呼び、特に性能低下の原因であるリソースをボトルネックリソースと呼ぶ）の指定期間における負荷情報を参照し、負荷の傾向である負荷傾向を分析する。そして、対策方針判断モジュールは、分析結果を元に、ボトルネックリソースの性能低下が再現したと仮定した場合に、その性能低下を効率的に解消することができる対策案の生成方針（以降、簡単のために対策方針と記載する）を決定する。この図の例において、ボトルネックリソースは、ストレージ装置３００のボリューム１であり、ボトルネックリソースの負荷は、稼働率で表される。 The management computer 100 includes a countermeasure policy determination module, a countermeasure means selection module, and a countermeasure plan generation module. When the management computer 100 is instructed by the display device 400 to take measures against the performance deterioration, the countermeasure policy determination module uses the components of the computer system in which the performance deterioration has occurred (hereinafter, the components are used as resources). The load information, which is the tendency of the load, is analyzed by referring to the load information in the specified period (the resource that causes the performance deterioration is called the bottleneck resource). Then, the countermeasure policy judgment module assumes that the performance degradation of the bottleneck resource is reproduced based on the analysis result, and the countermeasure policy generation policy that can efficiently eliminate the performance degradation (hereinafter, simple). To be described as a countermeasure policy). In the example of this figure, the bottleneck resource is the volume 1 of the storage device 300, and the load of the bottleneck resource is represented by the operating rate.

次に、対策手段選択モジュールは、ボトルネックリソースの負荷を増加させている原因である（ボトルネックリソースの負荷に影響を与える）、計算機システムの構成要素（以降、簡単のために関連リソースと呼ぶ）の指定期間における負荷情報を参照し、各関連リソースの負荷傾向を分析する。そして、その分析結果を元に、先に決定した対策方針の下で、性能低下の解消の効果がある対策手段のみを選択する。この図の例において、関連リソースは、サーバ２００のＶＭ１、ＶＭ２であり、関連リソースの負荷は、Ｉ／Ｏ量で表される。 Next, the countermeasure selection module is the cause of increasing the load of the bottleneck resource (affecting the load of the bottleneck resource), and is a component of the computer system (hereinafter referred to as a related resource for simplicity). ) Refers to the load information in the specified period and analyzes the load tendency of each related resource. Then, based on the analysis result, only the countermeasure measures effective in eliminating the performance deterioration are selected based on the countermeasure policy decided in advance. In the example of this figure, the related resources are VM1 and VM2 of the server 200, and the load of the related resources is represented by the amount of I / O.

その後、対策案生成モジュールは、先に決定した対策手段のパラメータの計算方法を決定する。そして、決定した計算方法に基づいてパラメータ（対策手段の設定値）を決定する。 After that, the countermeasure plan generation module determines the calculation method of the parameters of the countermeasure means determined earlier. Then, the parameter (set value of the countermeasure means) is determined based on the determined calculation method.

この図の例において、対策方針判断モジュールは、ボトルネックリソース負荷傾向に基づいて、対策方針として、各時点の負荷を均一に押し下げることを選択する。対策手段選択モジュールは、対策方針と関連リソース負荷傾向とに基づいて、対策手段として、ストレージ側Ｉ／Ｏ量制限を選択する。対策案生成モジュールは、制限するＩ／Ｏ量を決定する。 In the example of this figure, the countermeasure policy determination module selects to uniformly reduce the load at each time point as the countermeasure policy based on the bottleneck resource load tendency. The countermeasure means selection module selects the storage side I / O amount limit as the countermeasure means based on the countermeasure policy and the related resource load tendency. The countermeasure generation module determines the amount of I / O to be limited.

管理計算機１００は、決定した対策案を表示装置４００に表示させる。表示方法は、例えば対策手段（Method）とそのパラメータ（Detail）を、テキストで表示する方法であっても良い。また、表示される内容は、対策手段とパラメータ以外の要素、例えば対策案の概要などの付加情報を含んでもよく、複数の対策案を表示してもよい。 The management computer 100 causes the display device 400 to display the determined countermeasure plan. The display method may be, for example, a method of displaying the countermeasure means (Method) and its parameters (Detail) in text. In addition, the displayed contents may include elements other than the countermeasure means and parameters, for example, additional information such as an outline of the countermeasure plan, and a plurality of countermeasure plans may be displayed.

また、対策案は、表示装置４００に表示されるのみではなく、ファイルなどの電子データとして保存されてもよい。 Further, the countermeasure plan is not only displayed on the display device 400, but may be saved as electronic data such as a file.

（１−２）計算機システムのハードウェア構成 (1-2) Hardware configuration of computer system

図２は、実施例１に係る計算機システムのハードウェア構成を示す。 FIG. 2 shows the hardware configuration of the computer system according to the first embodiment.

サーバ２００が例えばＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などの第１の通信ネットワーク５００を介して、管理計算機１００、ストレージ装置３００、表示装置４００に接続されている。また、サーバ２００は例えばＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）などの第２の通信ネットワーク６００を介してストレージ装置３００と接続されている。なお、第１の通信ネットワーク５００と第２の通信ネットワーク６００は一体に形成されていてもよい。 The server 200 is connected to the management computer 100, the storage device 300, and the display device 400 via a first communication network 500 such as a LAN (Local Area Network). Further, the server 200 is connected to the storage device 300 via a second communication network 600 such as a SAN (Storage Area Network). The first communication network 500 and the second communication network 600 may be integrally formed.

管理計算機１００は、例えばＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）などのＩ／Ｆ１３０を含む。詳細は図３の説明として後述する。 The management computer 100 includes an I / F 130 such as a NIC (Network Interface Card). Details will be described later as a description of FIG.

サーバ２００は、プロセッサ２１０と、記憶資源２２０と、管理Ｉ／Ｆ（以下、Ｍ−Ｉ／Ｆと記載）２３０と、通信Ｉ／Ｆ（以下、Ｃ−Ｉ／Ｆと記載）２４０を含んで構成される物理計算機である。 The server 200 includes a processor 210, a storage resource 220, a management I / F (hereinafter referred to as M-I / F) 230, and a communication I / F (hereinafter referred to as C-I / F) 240. It is a physical computer that is composed.

Ｍ−Ｉ／Ｆ２３０は第１のプロトコルで通信するための通信インタフェース装置であり、例えばＮＩＣである。Ｃ−Ｉ／Ｆ２４０は第２のプロトコルで通信するためのインタフェース装置であり、例えばＨＢＡ（ＨｏｓｔＢｕｓＡｄａｐｔｅｒ）である。 The MI / F 230 is a communication interface device for communicating with the first protocol, for example, a NIC. The C-I / F240 is an interface device for communicating with the second protocol, and is, for example, an HBA (Host Bus Adapter).

記憶資源２２０は例えばメモリであり、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）などの補助記憶装置を含んでも良い。記憶資源２２０は、例えばＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や業務アプリケーションなどのアプリケーション・プログラムを記憶し、プロセッサ２１０がそのアプリケーション・プログラムやＯＳを実行する。 The storage resource 220 is, for example, a memory, and may include an auxiliary storage device such as an HDD (Hard Disk Drive). The storage resource 220 stores an application program such as an OS (Operating System) or a business application, and the processor 210 executes the application program or the OS.

ストレージ装置３００は、記憶デバイス群３６０と、記憶デバイス群３６０に接続されたコントローラ３９０とを有する。記憶デバイス群３６０は１以上の記憶デバイスから構成される。記憶デバイスは、例えばＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）であり、ＨＤＤや、ＦｅＲＡＭといったその他の不揮発性メモリであってもよい。このように、ストレージ装置３００は性能の異なる複数の記憶デバイスが混在していてもよい。なお、記憶デバイス群３６０はストレージ装置３００の外部から提供されるものであってもよい。 The storage device 300 has a storage device group 360 and a controller 390 connected to the storage device group 360. The storage device group 360 is composed of one or more storage devices. The storage device is, for example, an SSD (Solid State Drive), and may be another non-volatile memory such as an HDD or FeRAM. As described above, the storage device 300 may include a plurality of storage devices having different performances. The storage device group 360 may be provided from the outside of the storage device 300.

コントローラ３９０は、メモリ３２０と、Ｍ−Ｉ／Ｆ３３０と、Ｃ−Ｉ／Ｆ３４０と、デバイスＩ／Ｆ（Ｄ−Ｉ／Ｆ）３５０と、キャッシュメモリ３８０と、それらに接続されたプロセッサ３１０を有する。 The controller 390 has a memory 320, an MI / F 330, a CI / F 340, a device I / F (DI / F) 350, a cache memory 380, and a processor 310 connected to them. ..

Ｄ−Ｉ／Ｆ３５０は第３のプロトコルで記憶デバイスと通信するための通信インタフェース装置である。Ｄ−Ｉ／Ｆ３５０は、記憶デバイスの種類毎に有していてもよい。 The DI / F350 is a communication interface device for communicating with a storage device by a third protocol. The DI / F 350 may be provided for each type of storage device.

メモリ３２０は、プロセッサ３１０で実行されるコンピュータプログラムや、種々の情報を記憶する。また、キャッシュメモリ３８０は、サーバ２００から受けたライトコマンドに従うデータや、サーバ２００から受けたリードコマンドに応答して、記憶デバイスから読み出されたデータが一時格納される。なお、メモリ３２０とキャッシュメモリ３８０は一体の装置であってもよい。 The memory 320 stores a computer program executed by the processor 310 and various information. Further, the cache memory 380 temporarily stores data according to a write command received from the server 200 and data read from the storage device in response to a read command received from the server 200. The memory 320 and the cache memory 380 may be an integrated device.

表示装置４００は、例えばディスプレイとキーボードとポインタデバイス等の入出力デバイスを含む装置であるが、これ以外のデバイスであってもよい。また、入出力デバイスの代替としてシリアルインタフェースやイーサーネットインタフェースを入出力デバイスとし、当該インタフェースにディスプレイ又はキーボード又はポインタデバイスを有する表示用計算機を接続し、表示用情報を表示用計算機に送信したり、入力用情報を表示用計算機から受信することで、表示用計算機で表示を行ったり、入力を受け付けることで入出力デバイスでの入力及び表示を代替してもよい。また、表示装置４００は、管理計算機１００や、サーバ２００と一体として形成してもよい。 The display device 400 is a device including, for example, an input / output device such as a display, a keyboard, and a pointer device, but other devices may be used. In addition, as an alternative to the input / output device, a serial interface or an Ethernet interface is used as the input / output device, and a display computer having a display or a keyboard or a pointer device is connected to the interface to transmit display information to the display computer. By receiving the input information from the display computer, the display may be performed by the display computer, or by accepting the input, the input and display on the input / output device may be substituted. Further, the display device 400 may be formed integrally with the management computer 100 and the server 200.

以上が、実施例１に係る計算機システムのハードウェア構成である。なお、処理の高速化や信頼性向上のために、管理計算機１００、サーバ２００、ストレージ装置３００は複数接続されていてもよい。 The above is the hardware configuration of the computer system according to the first embodiment. A plurality of management computers 100, servers 200, and storage devices 300 may be connected in order to speed up processing and improve reliability.

図３は、実施例１に係る管理計算機のハードウェア構成を示す。 FIG. 3 shows the hardware configuration of the management computer according to the first embodiment.

管理計算機１００は、記憶資源１１０と、通信インタフェース（Ｉ／Ｆ）１３０と、それらに接続されたプロセッサ１２０を含んで構成される。 The management computer 100 includes a storage resource 110, a communication interface (I / F) 130, and a processor 120 connected to them.

記憶資源１１０は、例えばメモリであり、ＨＤＤ等の補助記憶装置を含んでも良い。記憶資源１１０は、種々の情報と、プロセッサ１２０によって実行されるプログラムを記憶する。具体的には、情報として、ＶＭ管理テーブル１１０１、ホスト管理テーブル１１０２、ストレージボリューム管理テーブル１１０３、しきい値管理テーブル１１０４、性能管理テーブル１１０５、対策手段管理テーブル１１０６、メトリック管理テーブル１１０７、対策手段適用管理テーブル１１０８、選択手段保持テーブル１１０９が記憶され、プログラムとして、監視プログラム１１２１、対策案生成プログラム１１２２、対策方針判断プログラム１１２３、対策手段選択プログラム１１２４、パラメータ計算スクリプト群１１３が記憶される。 The storage resource 110 is, for example, a memory, and may include an auxiliary storage device such as an HDD. The storage resource 110 stores various information and a program executed by the processor 120. Specifically, as information, VM management table 1101, host management table 1102, storage volume management table 1103, threshold management table 1104, performance management table 1105, countermeasure means management table 1106, metric management table 1107, countermeasure means application. The management table 1108 and the selection means holding table 1109 are stored, and as programs, the monitoring program 1121, the countermeasure plan generation program 1122, the countermeasure policy determination program 1123, the countermeasure means selection program 1124, and the parameter calculation script group 113 are stored.

（１−３）計算機システムの論理構成 (1-3) Logical configuration of computer system

図４は、実施例１に係る計算機システムの論理構成を示す。 FIG. 4 shows the logical configuration of the computer system according to the first embodiment.

ストレージ装置３００は、複数のボリューム３７０を有する。ボリュームは例えば後述する論理ボリューム、仮想ボリューム、ダイナミック仮想ボリュームであってもよい。 The storage device 300 has a plurality of volumes 370. The volume may be, for example, a logical volume, a virtual volume, or a dynamic virtual volume described later.

論理ボリュームは記憶デバイスの物理記憶領域（物理ページと呼ぶ）から構成されたボリュームである。ストレージ装置３００の内部において、複数の記憶デバイスをＲＡＩＤ（ＲｅｄｕｎｄａｎｔＡｒｒａｙｓｏｆＩｎｄｅｐｅｎｄｅｎｔＤｉｓｋｓ）技術でまとめた記憶デバイスのグループをＲＡＩＤグループと呼ぶ（図示せず）。論理ボリュームは、ＲＡＩＤグループの一部物理領域を切り出し、計算機から論理的に１つの記憶資源としてアクセス可能としたＬＵ（ＬｏｇｉｃａｌＵｎｉｔ）である。 A logical volume is a volume composed of a physical storage area (called a physical page) of a storage device. Inside the storage device 300, a group of storage devices in which a plurality of storage devices are grouped by RAID (Redundant Arrays of Independent Disks) technology is called a RAID group (not shown). The logical volume is an LU (Logical Unit) that is obtained by cutting out a part of the physical area of the RAID group and making it logically accessible as one storage resource from a computer.

仮想ボリュームは、ストレージ装置３００のシンプロビジョニング機構により実現される仮想的なボリュームである。シンプロビジョニング機構は、複数の論理ボリュームの論理的な記憶領域（論理ページと呼ぶ）からプールを構成する（図示せず）。そして、シンプロビジョニング機構は、仮想ボリュームに対するサーバ２００からのＩ／Ｏを受領したときに、プールから論理ページを仮想ボリュームに割り当てる。これにより、シンプロビジョニング機構は、必要に応じて仮想ボリュームの容量を拡張することができる。 The virtual volume is a virtual volume realized by the thin provisioning mechanism of the storage device 300. The thin provisioning mechanism constitutes a pool (not shown) from logical storage areas (called logical pages) of a plurality of logical volumes. Then, when the thin provisioning mechanism receives the I / O from the server 200 for the virtual volume, the thin provisioning mechanism allocates the logical page from the pool to the virtual volume. As a result, the thin provisioning mechanism can expand the capacity of the virtual volume as needed.

ダイナミック仮想ボリュームは、仮想ボリュームに対するＩ／Ｏ量の変化などに対応して、論理ページに対応する物理ページをＩ／Ｏ性能や信頼性の異なる別の記憶デバイスの物理ページに動的に変更することができる仮想ボリュームである。 The dynamic virtual volume dynamically changes the physical page corresponding to the logical page to the physical page of another storage device having different I / O performance and reliability in response to a change in the amount of I / O with respect to the virtual volume. It is a virtual volume that can be.

サーバ２００はストレージ装置３００へアクセスする計算機の一例である。サーバ２００はアプリケーション・プログラムを実行することができるＯＳや、ＶＭ７００を論理的に生成して実行することができるハイパバイザ８００を有していてもよい。ハイパバイザ８００は一度に複数のＶＭ８００を制御することができる。夫々のＶＭ８００は、あたかもスタンドアローンの物理計算機のようにアプリケーション・プログラムを実行することができる。 The server 200 is an example of a computer that accesses the storage device 300. The server 200 may have an OS capable of executing an application program or a hypervisor 800 capable of logically generating and executing a VM 700. The hypervisor 800 can control a plurality of VM800s at one time. Each VM800 can execute application programs as if it were a stand-alone physical computer.

ボリューム３７０は、サーバ２００やＶＭ７００のいずれか１つに対して提供される。例えば、汎用的なＯＳを有するサーバ２００に対して提供されたボリューム３７０は、ＯＳから一つの記憶資源として認識される。ハイパバイザ８００を有するサーバ２００に対して提供されたボリューム３７０は、データストアとして認識され、複数のＶＭ７００のデータが格納される。ハイパバイザ８００は、データストアに格納したＶＭのデータを、プロセッサ２１０で実行することでＶＭ７００を提供する。そのため、ＶＭ７００上で設定変更が行われると、ボリューム３７０に対してＩ／Ｏが発生する。また、ボリューム３７０がＶＭ７００に直接に提供されてもよい。その場合、ボリューム３７０はＶＭ７００上から一つの記憶資源として認識される。 The volume 370 is provided for any one of the server 200 and the VM700. For example, the volume 370 provided to the server 200 having a general-purpose OS is recognized by the OS as one storage resource. The volume 370 provided to the server 200 having the hypervisor 800 is recognized as a data store and stores data of a plurality of VM700s. The hypervisor 800 provides the VM 700 by executing the VM data stored in the data store on the processor 210. Therefore, when the setting is changed on the VM700, I / O is generated for the volume 370. Also, the volume 370 may be provided directly to the VM700. In that case, the volume 370 is recognized as one storage resource from the VM700.

なお、ハイパバイザ８００は、複数のボリューム３７０を用いてＶＭ７００を構成してもよい。この場合は、ＶＭ７００の制御情報や、ＯＳイメージや、スナップショットなどの各情報が夫々別のボリューム３７０に格納される。 The hypervisor 800 may configure the VM700 by using a plurality of volumes 370. In this case, the control information of the VM700, the OS image, the snapshot, and the like are stored in separate volumes 370.

ハイパバイザ８００は、あるサーバ２００上で稼動しているＶＭ７００を別のサーバ２００に対して移動する、ＶＭマイグレーションの機能を有していてもよい。また、あるＶＭ７００のデータを別のボリューム３７０に移動させる、ＶＭストレージマイグレーションの機能を有していてもよい。 The hypervisor 800 may have a VM migration function of moving a VM 700 running on one server 200 to another server 200. It may also have a VM storage migration function that moves data from one VM 700 to another volume 370.

（１−４）各種テーブルの内容 (1-4) Contents of various tables

図５は、ＶＭ管理テーブル１１０１を示す。 FIG. 5 shows the VM management table 1101.

ＶＭ管理テーブル１１０１は、ＶＭ７００の各種情報を格納する。ＶＭ管理テーブル１１０１によって、ＶＭ７００を稼動させるサーバ２００およびＶＭ７００のデータを格納するデータストア、ＶＭ７００の各種設定情報、ＶＭ７００の移動先として選択可能なサーバやデータストアの情報などが分かる。管理計算機１００は、定期的にサーバ２００と通信し、サーバ２００で稼動する各ＶＭ７００の各種情報を取得し、ＶＭ管理テーブル１１０１に格納する。 The VM management table 1101 stores various information of the VM 700. From the VM management table 1101, it is possible to know the data store that stores the data of the server 200 that operates the VM 700 and the VM 700, various setting information of the VM 700, the information of the server and the data store that can be selected as the destination of the VM 700, and the like. The management computer 100 periodically communicates with the server 200, acquires various information of each VM 700 operating on the server 200, and stores the information in the VM management table 1101.

具体的にＶＭ管理テーブル１１０１は、ＶＭ毎のエントリに以下の情報を有する。ＶＭＩＤ１１０１０は、当該ＶＭを識別するためのＩＤである。サーバＩＤ１１０１１は、当該ＶＭが稼動しているサーバ２００を識別するためのＩＤである。データストア名１１０１２は、当該ＶＭのデータが格納されているデータストアの名前である。ＣＰＵコア数１１０１３は、当該ＶＭに割り当てられているＣＰＵのコア数である。メモリ量１１０１４は、当該ＶＭが使用可能なメモリ量の最大値である。なお、ＣＰＵコア数１１０１３とメモリ量１１０１４はＶＭ７００の設定値の一例である。ＶＭ管理テーブル１１０１は他の設定値を含んでもよい。 Specifically, the VM management table 1101 has the following information in the entry for each VM. The VMID11010 is an ID for identifying the VM. The server ID 11011 is an ID for identifying the server 200 on which the VM is running. The data store name 11012 is the name of the data store in which the data of the VM is stored. The number of CPU cores 11013 is the number of CPU cores assigned to the VM. The memory amount 11014 is the maximum value of the memory amount that can be used by the VM. The number of CPU cores 11013 and the amount of memory 11014 are examples of the set values of the VM700. The VM management table 1101 may include other set values.

移動可能サーバＩＤ１１０１５は、当該ＶＭの移動先となることができるサーバ２００のＩＤである。当該ＶＭの移動先となることができるサーバ２００が無い場合は、“―”が格納される。 The movable server ID 11015 is the ID of the server 200 that can be the destination of the VM. If there is no server 200 that can be the destination of the VM, "-" is stored.

移動可能データストア名１１０１６は、当該ＶＭのデータの移動先となることができるデータストアの名前である。当該ＶＭのデータの移動先となることができるデータストアが無い場合は、“―”が格納される。 The movable data store name 11016 is the name of a data store that can be the destination of the data of the VM. If there is no data store that can be the destination of the VM data, "-" is stored.

図６は、ホスト管理テーブル１１０２を示す。 FIG. 6 shows the host management table 1102.

ホスト管理テーブル１１０２は、ストレージ装置３００にアクセスする計算機の各種情報を格納する。ホスト管理テーブル１１０２によって、各計算機が物理計算機かＶＭか、汎用のＯＳが稼動しているかハイパバイザが稼動しているか、の判別ができる。また、サーバ２００の各種構成情報や設定情報が分かる。管理計算機１００は、定期的にサーバ２００やＶＭ７００と通信し、各種情報を取得してホスト管理テーブル１１０２に格納する。 The host management table 1102 stores various information of the computer that accesses the storage device 300. From the host management table 1102, it is possible to determine whether each computer is a physical computer or a VM, and whether a general-purpose OS is running or a hypervisor is running. In addition, various configuration information and setting information of the server 200 can be known. The management computer 100 periodically communicates with the server 200 and the VM 700, acquires various information, and stores it in the host management table 1102.

具体的にホスト管理テーブル１１０２は、計算機毎のエントリに以下の情報を有する。サーバＩＤ１１０２０は、当該計算機を識別するためのＩＤである。仮想化フラグ１１０２１は、当該計算機がサーバ２００の場合には“Ｎｏ”が格納され、ＶＭ７００の場合は“Ｙｅｓ”が格納される。ＶＭＩＤ１１０２２は、当該計算機がＶＭの場合にＶＭＩＤ１１０１０を格納する。当該計算機がＶＭで無い場合は、“―”が格納される。ＯＳ種類１１０２３は、当該計算機に適用されているＯＳの種類を格納する。当該計算機でハイパバイザが稼動している場合は、値として“ハイパバイザ”が格納される。また、当該計算機で汎用ＯＳが稼動している場合は、ＯＳの種類が判別できる値が格納される。例えば、ＯＳ名が格納される。 Specifically, the host management table 1102 has the following information in the entry for each computer. The server ID 11020 is an ID for identifying the computer. The virtualization flag 11021 stores "No" when the computer is the server 200, and stores "Yes" when the computer is the VM700. The VMID11022 stores the VMID11010 when the computer is a VM. If the computer is not a VM, "-" is stored. The OS type 11023 stores the type of OS applied to the computer. When the hypervisor is running on the computer, the "hypervisor" is stored as a value. When a general-purpose OS is running on the computer, a value that can determine the type of OS is stored. For example, the OS name is stored.

ＣＰＵコア数１１０２４は、当該計算機がサーバ２００の場合はプロセッサ２１０のコア数が格納され、当該計算機がＶＭ７００の場合は割り当てられているコア数１１０１３の値が格納される。メモリ量１１０２５は、当該計算機がサーバ２００の場合は記憶資源２２０のうちメモリの容量が格納され、計算機がＶＭ７００の場合は割り当てられているメモリ量１１０１４の値が格納される。なお、ＣＰＵコア数１１０２４とメモリ量１１０２５は計算機の設定値の一例である。ホスト管理テーブル１１０２はその他の情報、例えばクラスタリングの構成情報や、電源管理の設定など他の設定値を含んでもよい。ボリュームＩＤ１１０２６は、当該計算機に対して提供されているボリュームの識別ＩＤが格納される。 The number of CPU cores 11024 stores the number of cores of the processor 210 when the computer is the server 200, and stores the value of the number of assigned cores 11013 when the computer is the VM700. When the computer is the server 200, the memory capacity of the storage resources 220 is stored in the memory amount 11025, and when the computer is the VM700, the value of the allocated memory amount 11014 is stored. The number of CPU cores 11024 and the amount of memory 11025 are examples of computer setting values. The host management table 1102 may include other information such as clustering configuration information and other setting values such as power management settings. The volume ID 11026 stores the identification ID of the volume provided to the computer.

図７は、ストレージボリューム管理テーブル１１０３を示す。 FIG. 7 shows the storage volume management table 1103.

ストレージボリューム管理テーブル１１０３は、ストレージ装置３００が提供するボリューム３７０の各種情報を格納する。ストレージボリューム管理テーブル１１０３によって、ストレージ装置３００内部におけるボリューム３７０の構成情報と設定情報とが分かる。例えば、ボリューム３７０とＲＡＩＤグループやプールの関連が分かる。また、ボリューム３７０に対して発行されるＩ／Ｏを制御するためのＣＰＵの割り当て設定などが分かる。さらに、ストレージボリューム管理テーブル１１０３によって、ボリューム３７０の提供を受ける計算機が分かる。管理計算機１００は定期的にサーバ２００、ストレージ装置３００と通信し、ボリューム３７０の各種情報を取得し、ストレージボリューム管理テーブル１１０３に格納する。 The storage volume management table 1103 stores various information of the volume 370 provided by the storage device 300. From the storage volume management table 1103, the configuration information and the setting information of the volume 370 inside the storage device 300 can be known. For example, you can see the relationship between volume 370 and RAID groups and pools. In addition, the CPU allocation setting for controlling the I / O issued to the volume 370 can be known. Further, the storage volume management table 1103 indicates the computer to which the volume 370 is provided. The management computer 100 periodically communicates with the server 200 and the storage device 300, acquires various information of the volume 370, and stores it in the storage volume management table 1103.

具体的にはストレージボリューム管理テーブル１１０３は、ボリューム毎のエントリに、以下の情報を有する。ストレージＩＤ１１０３０は、当該ボリュームを有するストレージ装置３００を識別するためのＩＤである。ボリュームＩＤ１１０３１は、当該ボリュームを識別するためのＩＤである。ＲＡＩＤグループＩＤ１１０３２は、当該ボリュームが論理ボリュームである場合に、当該ボリュームが所属するＲＡＩＤグループを識別するためのＩＤである。当該ボリュームが論理ボリュームで無い場合は、“―”の値が格納される。プールＩＤ１１０３３は、当該ボリュームが仮想ボリュームである場合に、当該ボリュームが所属するプールを識別するためのＩＤである。当該ボリュームが仮想ボリュームで無い場合は、“―”の値が格納される。 Specifically, the storage volume management table 1103 has the following information in the entry for each volume. The storage ID 11030 is an ID for identifying the storage device 300 having the volume. The volume ID 11031 is an ID for identifying the volume. The RAID group ID 11032 is an ID for identifying the RAID group to which the volume belongs when the volume is a logical volume. If the volume is not a logical volume, the value of "-" is stored. The pool ID 11033 is an ID for identifying the pool to which the volume belongs when the volume is a virtual volume. If the volume is not a virtual volume, the value of "-" is stored.

担当ＣＰＵＩＤ１１０３４は、当該ボリュームへのＩ／Ｏ処理や、当該ボリュームに関連するレプリケーションなどのその他のストレージ内部処理を担当するプロセッサ３１０を識別するためのＩＤである。キャッシュメモリＩＤ１１０３５は、当該ボリュームへのＩ／Ｏ処理や、当該ボリュームに関連するレプリケーションなどのその他のデータ処理を担当するキャッシュメモリ３８０を識別するためのＩＤである。キャッシュメモリ容量１１０３６は、当該ボリュームへのＩ／Ｏ処理や、当該ボリュームに関連するレプリケーションなどのその他のデータ処理においてキャッシュとして利用することができる最大容量である。例えば、ストレージ装置３００は、キャッシュメモリ３８０の領域を論理的に分割し、ボリューム３７０毎に利用可能な領域を割り当てる機能を有していてもよい。キャッシュメモリ３８０の領域が論理的に分割されている場合は、キャッシュメモリ容量１１０３６には、当該ボリュームが利用することができるキャッシュメモリ領域の容量が格納され、当該ボリュームのキャッシュメモリ領域への割当状態が変更された場合などにキャッシュメモリ容量１１０３６の値が変更される。なお、担当ＣＰＵＩＤ１１０３４、キャッシュメモリＩＤ１１０３５、キャッシュメモリ容量１１０３６は、当該ボリュームの設定情報の一部である。ストレージボリューム管理テーブル１１０３は、その他の情報、例えばボリューム３７０のレプリケーションに関する設定や、ダイナミック仮想ボリュームの場合には論理ページが配置される記憶デバイスなどの設定を含んでいてもよい。 The responsible CPU ID 11034 is an ID for identifying the processor 310 that is in charge of I / O processing to the volume and other storage internal processing such as replication related to the volume. The cache memory ID 11035 is an ID for identifying the cache memory 380 that is in charge of I / O processing to the volume and other data processing such as replication related to the volume. The cache memory capacity 11036 is the maximum capacity that can be used as a cache in I / O processing to the volume and other data processing such as replication related to the volume. For example, the storage device 300 may have a function of logically dividing the area of the cache memory 380 and allocating an available area for each volume 370. When the area of the cache memory 380 is logically divided, the cache memory capacity 11036 stores the capacity of the cache memory area that can be used by the volume, and the allocation state of the volume to the cache memory area is stored. The value of the cache memory capacity 11036 is changed when is changed. The responsible CPU ID 11034, the cache memory ID 11035, and the cache memory capacity 11036 are a part of the setting information of the volume. The storage volume management table 1103 may include other information, such as settings related to replication of volume 370, and in the case of a dynamic virtual volume, settings such as a storage device on which logical pages are arranged.

割当サーバＩＤ１１０３７は、当該ボリュームの提供を受けるサーバ２００またはＶＭ７００を識別するＩＤである。データストア名１１０３８は、当該ボリュームがハイパバイザを有するサーバ２００に提供されている場合に、ハイパバイザ上で当該ボリュームに設定されているデータストアの名前である。 The allocation server ID 11037 is an ID that identifies the server 200 or VM700 that receives the provision of the volume. The data store name 11038 is the name of the data store set for the volume on the hypervisor when the volume is provided to the server 200 having the hypervisor.

図８は、しきい値管理テーブル１１０４を示す。 FIG. 8 shows the threshold management table 1104.

しきい値管理テーブル１１０４は、計算機システムを構成する各構成要素について、解消するべき性能低下が発生したと判断するためのしきい値を保持する。しきい値管理テーブル１１０４は、しきい値として警告しきい値と絶対しきい値の２種類のしきい値を保持する。絶対しきい値は、ＳＬＡ（ＳｅｒｖｉｃｅＬｅｖｅｌＡｇｒｅｅｍｅｎｔ）を守るために、維持する必要のあるしきい値である。警告しきい値は、特定のリソースの特定の性能メトリックが絶対しきい値に近づかないようにするためのしきい値であり、絶対しきい値よりも程度の低い値が設定される。性能メトリックは、リソースの負荷を示す値であり、計算機システムにより記録される。 The threshold value management table 1104 holds a threshold value for determining that a performance deterioration to be resolved has occurred for each component constituting the computer system. The threshold value management table 1104 holds two types of threshold values, a warning threshold value and an absolute threshold value. The absolute threshold is a threshold that needs to be maintained in order to protect the SLA (Service Level Agreement). The warning threshold is a threshold for preventing a specific performance metric of a specific resource from approaching the absolute threshold, and is set to a value lower than the absolute threshold. The performance metric is a value indicating the load of the resource and is recorded by the computer system.

具体的にはしきい値管理テーブル１１０４は、リソース毎のエントリに以下の値を有する。リソースＩＤ１１０４０は、計算機システムを構成する構成要素であるリソースを識別するＩＤである。計算機システムは、構成要素として、サーバ２００や、ストレージ装置３００や、ネットワークスイッチなど計算機システムを構成する物理構成要素の内部物理構成要素、例えばストレージ装置３００内部のＲＡＩＤグループなどを含んでもよい。また、計算機システムは、ＶＭ７００など、性能メトリックを有する論理的な構成要素を含んでもよい。リソース種別１１０４１は、当該リソースの種別を示す。例えば、ストレージ装置３００の内部のプールであれば“Ｓｔｏｒａｇｅ．Ｐｏｏｌ”の値が格納される。性能メトリック種別１１０４２は、しきい値が設定される性能メトリックの種別を表す。例えば、単位時間当たりに利用されている時間の割合を示す“稼働率”や、単位時間当たりのＩ／Ｏ回数を表す“ＩＯＰＳ”の値が格納される。警告しきい値１１０４３は、警告しきい値の値が格納される。警告しきい値１１０４３には、絶対しきい値１１０４４以下の値が格納される。絶対しきい値１１０４４は、ＳＬＡやＳＬＯ（ＳｅｒｖｉｃｅＬｅｖｅｌＯｂｊｅｃｔｉｖｅ）から算出されたしきい値が格納される。警告しきい値超過許容時間１１０４５は、警告しきい値１１０４３を超過することが許容される時間が格納される。 Specifically, the threshold value management table 1104 has the following values in the entries for each resource. The resource ID 11040 is an ID that identifies a resource that is a component that constitutes a computer system. The computer system may include internal physical components such as a server 200, a storage device 300, and a physical component constituting the computer system such as a network switch, for example, a RAID group inside the storage device 300 and the like. The computer system may also include logical components with performance metrics, such as VM700. The resource type 11041 indicates the type of the resource. For example, in the case of the pool inside the storage device 300, the value of "Store.Pool" is stored. The performance metric type 11042 represents the type of performance metric for which the threshold value is set. For example, the value of "operating rate" indicating the ratio of the time used per unit time and the value of "IOPS" indicating the number of I / Os per unit time are stored. The warning threshold value 11043 stores the value of the warning threshold value. The warning threshold 11043 stores a value equal to or less than the absolute threshold 11044. The absolute threshold value 11044 stores a threshold value calculated from an SLA or SLO (Service Level Agreement). The warning threshold excess permissible time 11045 stores the time permissible to exceed the warning threshold 11043.

図９は、性能管理テーブル１１０５を示す。 FIG. 9 shows the performance management table 1105.

性能管理テーブル１１０５は、計算機システムから取得される、各リソースの性能メトリックの時系列データである性能データを保持する。ここでの性能管理テーブル１１０５の例は、ストレージ装置３００内部のプールに関する性能データを保持する。なお、管理計算機１００は、リソース種別毎に、同様の構造の性能管理テーブルを格納していてもよい。この図は、それらの一例として記載するものである。性能管理テーブル１１０５によって、過去の特定時刻における各リソースの性能データが分かる。管理計算機１００は、定期的に監視プログラム１１２１を実行し、取得した性能データを性能管理テーブル１１０５に保存する。 The performance management table 1105 holds performance data which is time series data of performance metrics of each resource acquired from the computer system. The example of the performance management table 1105 here holds performance data related to the pool inside the storage device 300. The management computer 100 may store a performance management table having a similar structure for each resource type. This figure is shown as an example of them. The performance management table 1105 shows the performance data of each resource at a specific time in the past. The management computer 100 periodically executes the monitoring program 1121 and saves the acquired performance data in the performance management table 1105.

具体的には性能管理テーブル１１０５は、リソース毎、時刻毎のエントリに、以下の情報を有する。リソースＩＤ１１０５０は、当該リソースを識別するためのＩＤである。時刻１１０５１は、当該リソースの性能データ内の一つの値を取得した時刻である。稼働率１１０５２は、当該時刻における当該リソース（図９においてはプール）の稼働率である。ＩＯＰＳ１１０５３は、当該時刻における当該リソース（図９においてはプール）のＩＯＰＳである。稼働率１１０５２とＩＯＰＳ１１０５３は、性能メトリックの一例である。これらの値は、リソース種別によって、取得することができる性能メトリックに代えられていてもよいし、性能管理テーブル１１０５に含まれる性能メトリックの種類は２個に限らない。 Specifically, the performance management table 1105 has the following information in the entries for each resource and each time. The resource ID 11050 is an ID for identifying the resource. The time 11051 is the time when one value in the performance data of the resource is acquired. The operating rate 11052 is the operating rate of the resource (pool in FIG. 9) at the time. IOPS11053 is the IOPS of the resource (pool in FIG. 9) at the time. Occupancy rates 11052 and IOPS 11053 are examples of performance metrics. These values may be replaced with the performance metrics that can be acquired depending on the resource type, and the types of performance metrics included in the performance management table 1105 are not limited to two.

図１０は、対策手段管理テーブル１１０６を示す。 FIG. 10 shows a countermeasure means management table 1106.

対策手段管理テーブル１１０６は、ボトルネックリソースの種別と、しきい値を違反している性能メトリックの種別毎に性能メトリックを改善することができる対策手段を保持する。対策手段管理テーブル１１０６によって、性能低下発生時に有効な対策手段を判別することができ、さらに対策手段のパラメータとなるリソースであるパラメータリソースの種別と、組み合わせることができる対策手段を判別することができる。 Countermeasure measures The management table 1106 holds countermeasures that can improve the performance metric for each type of bottleneck resource and the type of performance metric that violates the threshold value. The countermeasure measures management table 1106 can be used to determine effective countermeasure measures when performance degradation occurs, and further, it is possible to determine the types of parameter resources that are the parameters of the countermeasure measures and the countermeasure measures that can be combined. ..

具体的には対策手段管理テーブル１１０６は、対策手段毎のエントリに、以下の情報を有する。ボトルネックリソース種別１１０６０は、当該対策手段の対象のボトルネックリソースの種別である。しきい値管理テーブル１１０４のリソース種別１１０４１と同様の値を格納することができる。ボトルネック性能メトリック種別１１０６１は、しきい値を違反している性能メトリックの種別である。しきい値管理テーブル１１０４の性能メトリック種別１１０４２と同様の値を格納することができる。対策手段１１０６２は、ボトルネックリソース種別１１０６０に関して、ボトルネック性能メトリック種別１１０６１の性能データを改善することができる当該対策手段である。例えば、“ＶＭストレージマイグレーション”や“Ｉ／Ｏ量上限制御”など、対策手段を特定することができる値が格納される。パラメータリソース種別１１０６３は、当該対策手段がパラメータとするリソースの種別である。例えば、値が“ＶＭ”であればＶＭを対象とした対策手段であることを示し、値が“Ｓｔｏｒａｇｅ．Ｖｏｌｕｍｅ”であればストレージ装置３００内部のボリューム３７０を対象とした対策手段であることを示す。組み合わせ可能対策手段１１０６４は、当該対策手段と同時に実施することができる別の対策手段を示す。 Specifically, the countermeasure means management table 1106 has the following information in the entry for each countermeasure means. The bottleneck resource type 11060 is the type of the bottleneck resource that is the target of the countermeasure. A value similar to the resource type 11041 of the threshold management table 1104 can be stored. The bottleneck performance metric type 11061 is a type of performance metric that violates the threshold value. A value similar to the performance metric type 11042 of the threshold management table 1104 can be stored. Countermeasure means 11062 is the countermeasure means capable of improving the performance data of the bottleneck performance metric type 11061 with respect to the bottleneck resource type 11060. For example, a value that can specify a countermeasure means such as "VM storage migration" or "I / O amount upper limit control" is stored. The parameter resource type 11063 is a resource type used as a parameter by the countermeasure means. For example, if the value is "VM", it means that it is a countermeasure measure for VM, and if the value is "Storage.Volume", it means that it is a countermeasure measure for volume 370 inside the storage device 300. Shown. Combineable countermeasure means 11064 indicates another countermeasure that can be implemented at the same time as the countermeasure.

図１１は、メトリック管理テーブル１１０７を示す。 FIG. 11 shows the metric management table 1107.

メトリック管理テーブル１１０７は、ボトルネックリソースの性能メトリックに関連して値が変化する、別のリソースの性能メトリック種別が格納される。 The metric management table 1107 stores the performance metric type of another resource whose value changes in relation to the performance metric of the bottleneck resource.

具体的にはメトリック管理テーブル１１０７は、性能メトリックの組み合わせ毎のエントリに、以下の情報を有する。ボトルネックリソース種別１１０７０は、性能低下の原因であるボトルネックリソースの種別を示す。ボトルネック性能メトリック種別１１０７１は、しきい値を違反しているボトルネックリソースの性能メトリックの種別を示す。パラメータリソース種別１１０７２は、ボトルネックリソース種別１１０７０のボトルネック性能メトリック種別１１０７１に関連するリソースの種別である。パラメータ性能メトリック種別１１０７３は、ボトルネック性能メトリック種別１１０７１に関連している性能メトリックの種別である。 Specifically, the metric management table 1107 has the following information in the entry for each combination of performance metrics. Bottleneck resource type 11070 indicates the type of bottleneck resource that is the cause of performance degradation. Bottleneck performance metric type 11071 indicates the type of performance metric of the bottleneck resource that violates the threshold value. The parameter resource type 11072 is a resource type related to the bottleneck performance metric type 11071 of the bottleneck resource type 11070. The parameter performance metric type 11073 is a type of performance metric associated with the bottleneck performance metric type 11071.

この図に示す一例は、ＶＭ７００やボリューム３７０のＩＯＰＳが増加した結果、ストレージ装置３００の内部のプールの稼働率が上昇している（悪化している）ことを示している。 An example shown in this figure shows that the operating rate of the pool inside the storage device 300 is increasing (deteriorating) as a result of the increase in IOPS of the VM700 and the volume 370.

図１２は、対策手段適用管理テーブル１１０８を示す。 FIG. 12 shows a countermeasure means application management table 1108.

対策手段適用管理テーブル１１０８によって、ボトルネックリソースや関連リソースの負荷傾向に基づき対策手段が有効か、無効かを判断することができる。 The countermeasure measure application management table 1108 can be used to determine whether the countermeasure means are effective or invalid based on the load tendency of the bottleneck resource and the related resource.

具体的には対策手段適用管理テーブル１１０８は、対策手段毎のエントリに以下の情報を有する。ボトルネックリソース種別１１０８０は、性能低下の原因であるボトルネックリソースの種別を示す。性能メトリック種別１１０８１は、しきい値を違反しているボトルネックリソースの性能メトリックの種別を示す。対策手段１１０８２は、ＶＭストレージマイグレーションなどの対策手段を表す値が格納される。 Specifically, the countermeasure means application management table 1108 has the following information in the entry for each countermeasure means. The bottleneck resource type 11080 indicates the type of bottleneck resource that causes the performance deterioration. The performance metric type 11081 indicates the type of the performance metric of the bottleneck resource that violates the threshold value. The countermeasure means 11082 stores a value representing a countermeasure means such as VM storage migration.

対策方針１１０８３は、ボトルネックリソースの性能低下を解消するための当該対策手段のパラメータ決定の方針を示す。例えば、指定期間内の各時点の性能データの値を平均的に低下させる方針（この図中には“平均下げ”と記載）や、指定期間内においてピーク値となる時点の性能データの値を優先的に低下させる方針（この図中には“ピーク下げ”と記載）などが格納される。なお、上記２つの方針以外の方針が記載されてもよい。 Countermeasure policy 11083 indicates a policy for determining parameters of the countermeasure means for eliminating the performance deterioration of the bottleneck resource. For example, a policy to reduce the value of performance data at each time point within the specified period on average (described as "average reduction" in this figure), or the value of performance data at the time of peak value within the specified period. The policy of preferentially lowering (described as "peak lowering" in this figure) is stored. In addition, a policy other than the above two policies may be described.

関連リソース傾向１１０８４は、ボトルネックリソースに関連する関連リソースについて、指定期間における負荷傾向を示す。負荷傾向は、複数のパタンの一つで表される。例えば、指定期間において複数の時点間の性能データの変動幅が小さい（この図中には“安定”と記載）や、特定の時点の性能データの値のみが大きい（この図中には“不安定”と記載）などのパタンを示す値が格納される。なお。上記２つのパタン以外の傾向を表す値が格納されてもよい。 Related resource tendency 11084 shows the load tendency of the related resource related to the bottleneck resource in the specified period. The load tendency is represented by one of a plurality of patterns. For example, the fluctuation range of performance data between multiple time points in a specified period is small (described as "stable" in this figure), or only the value of performance data at a specific time point is large ("not" in this figure). A value indicating a pattern such as "stable") is stored. In addition. A value representing a tendency other than the above two patterns may be stored.

適用可否１１０８５は、ボトルネックリソース種別１１０８０のリソース種別に対し、関連リソースの負荷傾向が関連リソース傾向１１０８４のパタンである場合において、対策手段１１０８２を対策方針１１０８３で適用することが概して有効か、無効かを示す。有効である場合には“Ｙｅｓ”が格納され、無効の場合には“Ｎｏ”が格納される。適用可否１１０８５が“Ｎｏ”の場合には、ボトルネックリソースと関連リソースの負荷傾向を鑑みると当該対策手段を実施したとしても、性能改善の効果が得られる可能性が低いことを示している。 Applicability 11085 indicates whether it is generally effective or invalid to apply the countermeasure means 11082 in the countermeasure policy 11083 when the load tendency of the related resource is the pattern of the related resource tendency 11084 with respect to the resource type of the bottleneck resource type 11080. Indicates. If it is valid, "Yes" is stored, and if it is invalid, "No" is stored. When the applicability 11085 is "No", it is indicated that the effect of performance improvement is unlikely to be obtained even if the countermeasure measures are implemented in consideration of the load tendency of the bottleneck resource and the related resource.

優先度１１０８６は、ボトルネックリソース種別１１０８０のリソース種別に対し、関連リソースの負荷傾向が関連リソース傾向１１０８４のパタンである場合において、各対策手段を用いる対策案の優先順位を示す。例えば優先度が１である対策案は、優先度が２である対策案に比べ、優先的に実施すべきであることを示す。優先度１１０８６は、例えば各リソースの性能に対する影響の度合いや、信頼性への影響の度合いなどによって決定されてよい。 The priority 11086 indicates the priority of the countermeasure plan using each countermeasure means when the load tendency of the related resource is the pattern of the related resource tendency 11084 with respect to the resource type of the bottleneck resource type 11080. For example, it indicates that a countermeasure plan having a priority of 1 should be implemented with priority over a countermeasure plan having a priority of 2. The priority 11086 may be determined, for example, by the degree of influence on the performance of each resource, the degree of influence on reliability, and the like.

スクリプトＩＤ１１０８７は、対策手段１１０８２のパラメータを計算する方法を記載したスクリプトを識別するＩＤである。スクリプトの具体的な例は後述する。 The script ID 11087 is an ID that identifies a script that describes a method for calculating the parameters of the countermeasure means 11082. A specific example of the script will be described later.

図１３は、選択手段保持テーブル１１０９を示す。 FIG. 13 shows the selection means holding table 1109.

選択手段保持テーブル１１０９は、対策手段選択プログラム１１２４によって決定された対策手段と、対策手段のパラメータ候補となるリソースであるパラメータ候補リソースとを保持する。パラメータ候補リソースは、ボトルネックリソースに対する関連リソースの少なくとも一部である。 The selection means holding table 1109 holds the countermeasure means determined by the countermeasure means selection program 1124 and the parameter candidate resource which is a resource that is a parameter candidate of the countermeasure means. Parameter candidate resources are at least part of the associated resource for the bottleneck resource.

具体的には選択手段保持テーブル１１０９は、プラン毎のエントリに、以下の情報を有する。プランは、対策手段とパラメータ候補リソースの組を示す。プランＩＤ１１０９０は、指定期間、ボトルネックリソース、性能メトリックに対するプランを識別するＩＤである。指定期間、ボトルネックリソース、性能メトリックに対して複数の対策手段が取り得る場合、１つのプランに対して、各々の対策手段のパラメータ候補リソースと、対策手段を組み合わせる場合の各対策手段の適用優先度が格納される。 Specifically, the selection means holding table 1109 has the following information in the entry for each plan. The plan shows a set of countermeasures and parameter candidate resources. Plan ID 11090 is an ID that identifies a plan for a specified period, bottleneck resource, and performance metric. When multiple countermeasures can be taken for a specified period, bottleneck resource, and performance metric, priority is given to the application of each countermeasure when combining the parameter candidate resources of each countermeasure and the countermeasures for one plan. The degree is stored.

ボトルネックリソース種別１１０９１は、性能低下の原因であるボトルネックリソースの種別を示す。ボトルネックリソースＩＤ１１０９２は、性能低下の原因であるボトルネックリソースの識別ＩＤである。性能メトリック種別１１０９３は、しきい値を違反しているボトルネックリソースの性能メトリックの種別を示す。指定期間１１０９４は、対策案の生成を指定された指定期間である。対策方針１１０９５は、ボトルネックリソースの性能低下を解消するための対策手段のパラメータ決定の方針を示す。対策方針１１０９５には、対策方針１１０８３と同様の値が格納される。関連リソース傾向１１０９６は、ボトルネックリソースに関連するリソースの負荷傾向のパタンを示す。関連リソース傾向１１０９６には、関連リソース傾向１１０８４と同様の値が格納される。対策手段１１０９７は、ＶＭストレージマイグレーションなど、対策手段を表す値が格納される。対策手段１１０９７には、対策手段１１０８２と同様の値が格納される。同一のプランＩＤ１１０９０に対応する複数の対策手段１１０９７を組み合わせて対策案を生成する場合に、優先度１１０９８は、各対策手段の適用順序を示す。パラメータ候補リソース１１０９９は、対策手段１１０９７のパラメータ候補リソースを識別するＩＤが格納される。“Ａｌｌ”の場合は、対策手段１１０９７のパラメータとしてとり得るすべてのリソースが候補であることを意味する。 Bottleneck resource type 11091 indicates the type of bottleneck resource that is the cause of performance degradation. The bottleneck resource ID 11092 is an identification ID of the bottleneck resource that causes the performance deterioration. The performance metric type 11093 indicates the type of the performance metric of the bottleneck resource that violates the threshold value. The designated period 11094 is a designated period in which the generation of a countermeasure plan is designated. Countermeasure policy 11095 indicates a policy for determining parameters of countermeasure means for eliminating the performance deterioration of bottleneck resources. The countermeasure policy 11095 stores the same value as the countermeasure policy 11083. Related resource trend 11096 shows a pattern of resource load tendency related to bottleneck resources. The related resource tendency 11096 stores the same value as the related resource tendency 11084. The countermeasure means 11097 stores a value representing the countermeasure means such as VM storage migration. The countermeasure means 11097 stores the same value as the countermeasure means 11082. When a plurality of countermeasure measures 11097 corresponding to the same plan ID 11090 are combined to generate a countermeasure plan, priority 11098 indicates an application order of each countermeasure means. The parameter candidate resource 11099 stores an ID that identifies the parameter candidate resource of the countermeasure means 11097. In the case of "All", it means that all resources that can be taken as parameters of the countermeasure means 11097 are candidates.

（１−５）各装置の動作の詳細 (1-5) Details of operation of each device

次に、管理計算機１００の性能監視処理について説明する。性能監視処理は、管理計算機１００に記憶されている監視プログラム１１２１を、プロセッサ１２０で実行する事によって実施される処理である。 Next, the performance monitoring process of the management computer 100 will be described. The performance monitoring process is a process executed by executing the monitoring program 1121 stored in the management computer 100 on the processor 120.

図１４は、性能監視処理のフローチャートである。 FIG. 14 is a flowchart of the performance monitoring process.

性能監視処理は定期的に実行されても良いし、ユーザからの指示によって実行されてもよい。本実施例の性能監視処理は、予め設定された監視間隔で定期的に実行される。性能監視処理は、計算機システムを構成する各種構成要素から性能データを取得し、しきい値管理テーブル１１０４に登録されたしきい値情報に照らして性能対策の実施が必要か否かを判断する。もしも、性能対策が必要な場合、性能監視処理は、対策案生成プログラム１１２２を実行する。 The performance monitoring process may be executed periodically or may be executed according to an instruction from the user. The performance monitoring process of this embodiment is periodically executed at preset monitoring intervals. The performance monitoring process acquires performance data from various components constituting the computer system, and determines whether or not it is necessary to implement performance measures in light of the threshold information registered in the threshold management table 1104. If performance countermeasures are required, the performance monitoring process executes the countermeasure plan generation program 1122.

本実施例において、性能対策が必要と判断するケースとしては、警告しきい値を超過してから一定時間が経過した場合と、次回の性能監視処理実行後に対策案を生成すると絶対しきい値を超過する可能性が高い場合との２ケースについて説明する。 In this embodiment, the cases where it is judged that the performance countermeasure is necessary are the case where a certain time has passed since the warning threshold was exceeded and the absolute threshold when the countermeasure plan is generated after the next performance monitoring process is executed. Two cases, one with a high possibility of exceeding the threshold, and the other with a high possibility of exceeding the threshold value will be described.

まず、監視プログラム１１２１は、ＶＭ管理テーブル１１０１や、ホスト管理テーブル１１０２や、ストレージボリューム管理テーブル１１０３など、計算機システムを構成するリソースの構成情報を管理するテーブルを参照し、管理下にあるリソース毎に現在時刻の性能データを取得して、性能管理テーブル１１０５に格納する（Ｓ１０００、Ｓ１０１０）。監視プログラム１１２１は、各リソースに対してＳ１０００〜Ｓ１１１０の処理を行う。 First, the monitoring program 1121 refers to a table that manages configuration information of resources constituting the computer system, such as the VM management table 1101, the host management table 1102, and the storage volume management table 1103, and for each resource under management. The performance data of the current time is acquired and stored in the performance management table 1105 (S1000, S1010). The monitoring program 1121 performs the processes of S1000 to S1110 for each resource.

次に、監視プログラム１１２１は、しきい値管理テーブル１１０４を参照し、リソースの種別とリソースＩＤを元に、性能メトリック種別１１４２毎の警告しきい値１１０４３と、絶対しきい値１１０４４と、警告しきい値超過許容時間１１０４５とを取得する（Ｓ１０２０）。もしも、警告しきい値１１０４３が設定されていない場合、監視プログラム１１２１は、以降の性能違反チェックのための処理を行う必要が無いため、Ｓ１０００に戻り、次のリソースについて処理を実施する（Ｓ１０３０）。 Next, the monitoring program 1121 refers to the threshold value management table 1104, and warns the warning threshold value 11043 and the absolute threshold value 11044 for each performance metric type 1142 based on the resource type and the resource ID. Acquire the threshold value excess allowable time 11045 (S1020). If the warning threshold 11043 is not set, the monitoring program 1121 does not need to perform subsequent processing for performance violation check, so it returns to S1000 and performs processing for the next resource (S1030). ..

もしも、警告しきい値１１０４３が設定されていれば、監視プログラム１１２１は、Ｓ１０１０で取得した現在時刻の性能データと警告しきい値１１０４３を比較し、警告しきい値１１０４３を違反していなければ、Ｓ１０００に戻り次のリソースについて処理を実施する（Ｓ１０４０）。 If the warning threshold 11043 is set, the monitoring program 1121 compares the performance data of the current time acquired in S1010 with the warning threshold 11043, and if the warning threshold 11043 is not violated, Return to S1000 and perform processing for the next resource (S1040).

Ｓ１０４０の比較の結果、警告しきい値１１０４３を違反している場合、監視プログラム１１２１は、次回に性能監視処理を実行するまで警告しきい値１１０４３を超過していたとしても、許容されるか否かを判断する。そのために、監視プログラム１１２１は、まずしきい値管理テーブル１１０４より、警告しきい値超過許容時間１１０４５を取得する（Ｓ１０５０）。次に、監視プログラム１１２１は、対策案の生成予想時間を見積もる（Ｓ１０６０）。その方法としては、例えば当該リソースに対し、同種類で同数の関連リソースに基づいて生成された対策案について、過去の対策案生成時間を記憶し、それを対策案生成予想時間として用いるなどの方法がある。 As a result of comparison in S1040, if the warning threshold value 11043 is violated, whether or not the monitoring program 1121 is allowed to exceed the warning threshold value 11043 until the next performance monitoring process is executed. To judge. Therefore, the monitoring program 1121 first acquires the warning threshold excess allowable time 11045 from the threshold value management table 1104 (S1050). Next, the monitoring program 1121 estimates the estimated generation time of the countermeasure plan (S1060). As a method, for example, for the resource, the past countermeasure generation time is stored for the countermeasures generated based on the same type and the same number of related resources, and the countermeasure proposal generation time is used as the expected countermeasure generation time. There is.

次に、監視プログラム１１２１は、Ｓ１０６０で見積もった対策案生成予想時間と監視プログラム１１２１の実行間隔を足した値よりも、警告しきい値超過許容時間１１０４５の方が大きいか、小さいかを計算する（Ｓ１０７０）。警告しきい値超過時間１１０４５の方が小さい場合（Ｓ１０７０の結果がＹｅｓの場合）、監視プログラム１１２１は、仮に次回監視プログラム１１２１実行時まで常に警告しきい値を超過していると警告しきい値超過許容時間１１０４５を超過してしまうことを意味している。そのため、監視プログラム１１２１は、当該リソースの当該性能メトリックにおいて、性能異常が発生していると判断する（Ｓ１０８０）。 Next, the monitoring program 1121 calculates whether the warning threshold excess permissible time 11045 is larger or smaller than the value obtained by adding the estimated generation time of the countermeasure proposal estimated in S1060 and the execution interval of the monitoring program 1121. (S1070). If the warning threshold exceeding time 11045 is smaller (when the result of S1070 is Yes), the monitoring program 1121 will always exceed the warning threshold until the next monitoring program 1121 is executed. This means that the excess allowable time 11045 will be exceeded. Therefore, the monitoring program 1121 determines that a performance abnormality has occurred in the performance metric of the resource (S1080).

Ｓ１０７０の結果がＮｏの場合、次に監視プログラム１１２１は、当該リソースに絶対しきい値１１０４４が設定されているか、否かを判断する（Ｓ１０９０）。絶対しきい値が設定されている場合（Ｙｅｓ）、監視プログラム１１２１は、当該リソースの過去の負荷傾向を基に将来の性能変化を見積もり、絶対しきい値に到達する予想時間を計算する（Ｓ１１００）。将来の性能変化を予想する方法としては、例えば過去の一定期間の性能データに関して最小二乗法を用いて性能変化の近似式を計算し、将来も過去のトレンドが持続すると仮定して近似式より絶対しきい値１１０４４に到達する時刻を計算する方法などを用いてよい。 If the result of S1070 is No, then the monitoring program 1121 determines whether or not the absolute threshold 11044 is set for the resource (S1090). When the absolute threshold is set (Yes), the monitoring program 1121 estimates the future performance change based on the past load tendency of the resource and calculates the estimated time to reach the absolute threshold (S1100). ). As a method of predicting future performance changes, for example, an approximate expression of performance change is calculated using the least squares method for performance data over a certain period in the past, and it is assumed that the past trend will continue in the future, and it is absolute from the approximate expression. A method of calculating the time when the threshold value 11044 is reached may be used.

次に、監視プログラム１１２１は、Ｓ１０６０で見積もった対策案生成予想時間と監視プログラム１１２１の実行間隔を足した値よりも、Ｓ１１００で見積もった絶対しきい値到達予想時間の方が大きいか、小さいかを計算する（Ｓ１１００）。絶対しきい値到達予想時間の方が小さい場合（Ｓ１１１０の結果がＮｏの場合）、監視プログラム１１２１は、次回監視プログラム１１２１実行時までに絶対しきい値１１０４４を超過してしまう可能性が高いことを意味している。そのため、監視プログラム１１２１は、当該リソースの当該性能メトリックにおいて、性能異常が発生していると判断する（Ｓ１０８０）。 Next, in the monitoring program 1121, whether the absolute threshold arrival estimated time estimated in S1100 is larger or smaller than the value obtained by adding the countermeasure proposal generation estimated time estimated in S1060 and the execution interval of the monitoring program 1121. Is calculated (S1100). If the estimated time to reach the absolute threshold is smaller (when the result of S1110 is No), the monitoring program 1121 is likely to exceed the absolute threshold 11044 by the time the next monitoring program 1121 is executed. Means. Therefore, the monitoring program 1121 determines that a performance abnormality has occurred in the performance metric of the resource (S1080).

Ｓ１０９０の判断の結果がＮｏの場合と、Ｓ１１１０の判断の結果がＹｅｓの場合、本実施例において監視プログラム１１２１は、性能異常が発生していないと判断する。なお、本実施例では警告しきい値を超過してから一定時間が経過した場合と、次回の性能監視処理実行後に対策案を生成した場合とでは、絶対しきい値を超過する可能性が高い場合の２ケースに該当する場合に性能異常が発生していると判断した。しかしながら、監視プログラム１１２１は、それ以外のケース、例えば過去一定回数以上しきい値違反を発生しているケースに該当する場合に性能異常が発生していると判断してもよい。本実施例で挙げた２ケース以外のケースを考慮する場合、監視プログラム１１２１は、Ｓ１０９０の判断の結果がＮｏの場合と、Ｓ１１１０の判断の結果がＹｅｓの場合とに、さらに性能異常を判断する処理を追加して実行していてもよい。 When the result of the determination in S1090 is No and the result of the determination in S1110 is Yes, the monitoring program 1121 determines that no performance abnormality has occurred in this embodiment. In this embodiment, there is a high possibility that the absolute threshold value will be exceeded when a certain time has passed since the warning threshold value was exceeded and when a countermeasure plan is generated after the next performance monitoring process is executed. It was judged that a performance abnormality occurred when the two cases corresponded to the case. However, the monitoring program 1121 may determine that a performance abnormality has occurred in other cases, for example, in cases where the threshold value has been violated more than a certain number of times in the past. When considering cases other than the two cases mentioned in this embodiment, the monitoring program 1121 further determines the performance abnormality when the judgment result of S1090 is No and when the judgment result of S1110 is Yes. Processing may be added and executed.

Ｓ１０００からＳ１１１０までの処理を全てのリソースに関して行った結果、性能異常が発生していない場合には対策案を生成する必要が無いため、監視プログラム１１２１は終了する（Ｓ１１２０）。 As a result of performing the processes from S1000 to S1110 for all resources, if no performance abnormality has occurred, it is not necessary to generate a countermeasure plan, so the monitoring program 1121 ends (S1120).

１以上のリソースについて性能異常が発生していた場合（Ｓ１１２０の結果がＹｅｓの場合）は、監視プログラム１１２１は、以降の処理を行い、対策案生成プログラム１１２２を実行する。 When a performance abnormality has occurred for one or more resources (when the result of S1120 is Yes), the monitoring program 1121 performs the subsequent processing and executes the countermeasure plan generation program 1122.

計算機システムを構成する各リソースは互いに関連しているため、特定のリソースの性能低下が原因となって、二次的に性能が低下しているリソースが存在している可能性がある。そのため、監視プログラム１１２１は、Ｓ１１３０にて各リソースの性能低下が、別のリソースの性能変化に起因するかを判断し、最終的な原因であるボトルネックリソースと性能メトリックを抽出する。この原因分析の方法としては、例えばＲＣＡ（ＲｏｏｔＣａｕｓｅＡｎａｌｙｓｉｓ）の公開技術を用いてよい。 Since each resource that composes a computer system is related to each other, there may be a resource whose performance is secondarily deteriorated due to the performance deterioration of a specific resource. Therefore, the monitoring program 1121 determines in S1130 whether the performance deterioration of each resource is caused by the performance change of another resource, and extracts the bottleneck resource and the performance metric which are the final causes. As a method of this cause analysis, for example, a public technique of RCA (Root Cause Analysis) may be used.

次に監視プログラム１１２１は、Ｓ１１３０で抽出したボトルネックリソースの性能メトリックについて、性能異常と判定されたリソースのうち警告しきい値を超過した最も早い時刻を開始時刻とし、現在時刻を終了時刻とし、開始時刻から終了時刻までを指定期間として指定して対策案生成プログラム１１２２を実行し、指定期間に対する対策案を生成する。そして、生成した対策案とともにアラートを表示装置４００に表示する。 Next, the monitoring program 1121 sets the earliest time that exceeds the warning threshold among the resources determined to be performance abnormal as the start time and the current time as the end time for the performance metric of the bottleneck resource extracted in S1130. The countermeasure plan generation program 1122 is executed by designating the period from the start time to the end time as the designated period, and the countermeasure plan for the designated period is generated. Then, an alert is displayed on the display device 400 together with the generated countermeasure plan.

監視間隔および指定期間が長いほど、対策案の精度が向上するが、しきい値を超過する可能性が高くなる。この性能監視処理によれば、ボトルネックリソースの性能メトリックがしきい値を超過することを防ぐと共に、対策案の精度を向上させることができる。 The longer the monitoring interval and the specified period, the better the accuracy of the countermeasure plan, but the more likely it is that the threshold will be exceeded. According to this performance monitoring process, it is possible to prevent the performance metric of the bottleneck resource from exceeding the threshold value and improve the accuracy of the countermeasure plan.

この性能監視処理によれば、管理計算機１００は、定期的にリソースの負荷を監視し、負荷が警告しきい値を超過する時間が警告しきい値超過許容時間を超えないように、且つ負荷が絶対しきい値を超えないように、指定期間を設定し、指定期間に対する対策案を生成することができる。 According to this performance monitoring process, the management computer 100 periodically monitors the load of the resource so that the time when the load exceeds the warning threshold does not exceed the warning threshold exceeding allowable time and the load exceeds. A specified period can be set so that the absolute threshold is not exceeded, and a countermeasure plan for the specified period can be generated.

次に、管理計算機１００の対策案生成処理について説明する。対策案生成処理は、管理計算機１００に記憶されている対策案生成プログラム１１２２を、プロセッサ１２０で実行する事によって実施される処理である。 Next, the countermeasure plan generation process of the management computer 100 will be described. The countermeasure plan generation process is a process executed by executing the countermeasure plan generation program 1122 stored in the management computer 100 on the processor 120.

図１５は、対策案生成処理のフローチャートである。 FIG. 15 is a flowchart of the countermeasure plan generation process.

対策案生成処理はユーザからの指示によって実行されてもよいし、性能異常を検知した場合に対策案生成プログラム１１２２によって実行されてもよい。 The countermeasure plan generation process may be executed according to an instruction from the user, or may be executed by the countermeasure plan generation program 1122 when a performance abnormality is detected.

対策案生成プログラム１１２２は、まずボトルネックリソースの種別、ボトルネックリソースの識別ＩＤ、対策案を生成する指定期間、警告しきい値１１０４３を違反している性能メトリックの種別を受信する（Ｓ２０００）。上記の情報は、表示装置４００よりユーザから指定されていてもよいし、監視プログラム１１２１から指定されていてもよい。なお、これらは本実施例における一例であって、他のプログラムによって対策案生成プログラム１１２２が呼び出されることを制限するものではない。 The countermeasure plan generation program 1122 first receives the bottleneck resource type, the bottleneck resource identification ID, the designated period for generating the countermeasure plan, and the type of the performance metric that violates the warning threshold 11043 (S2000). The above information may be specified by the user from the display device 400, or may be specified by the monitoring program 1121. It should be noted that these are examples in this embodiment, and do not limit that the countermeasure plan generation program 1122 is called by another program.

次に、対策案生成プログラム１１２２は、対策方針判断処理を実施する（Ｓ２０２０）。対策方針判断処理は、指定期間におけるボトルネックリソースの負荷傾向を分析し、性能低下の問題に関して効率的に、精度のよい対策案を生成するための対策方針を求める処理である。具体的なフローチャートについては後述する。 Next, the countermeasure plan generation program 1122 implements the countermeasure policy determination process (S2020). The countermeasure policy judgment process is a process of analyzing the load tendency of bottleneck resources in a specified period and requesting a countermeasure policy for efficiently generating an accurate countermeasure plan for the problem of performance deterioration. A specific flowchart will be described later.

次に、対策案生成プログラム１１２２は、対策手段選択処理を実施する（Ｓ２０３０）。対策手段選択処理は、ボトルネックリソースに関連する関連リソース群について、指定期間における負荷傾向を分析し、対応方針に対して適切な対策手段を選択する処理である。具体的なフローチャートについては後述する。 Next, the countermeasure plan generation program 1122 executes the countermeasure means selection process (S2030). The countermeasure selection process is a process of analyzing the load tendency of the related resource group related to the bottleneck resource in a specified period and selecting an appropriate countermeasure for the response policy. A specific flowchart will be described later.

対策案生成プログラム１１２２は、対策手段選択処理の結果を選択手段保持テーブル１１０９より取得する（Ｓ２０４０）。このとき、ボトルネックリソースの性能低下を解消する対策手段は複数存在する可能性があるため、選択手段保持テーブル１１０９には複数の対策案（プラン）が指定されたボトルネックリソース、性能メトリック、指定期間に対応するプランが複数個保持されている可能性がある。そのため、対策案生成プログラム１１２２は、以降の処理（Ｓ２０７０からＳ２１１０）を各プランについて実施する。 The countermeasure plan generation program 1122 acquires the result of the countermeasure means selection process from the selection means holding table 1109 (S2040). At this time, since there may be a plurality of countermeasures for eliminating the performance deterioration of the bottleneck resource, the bottleneck resource, the performance metric, and the designation for which a plurality of countermeasures (plans) are specified in the selection means holding table 1109. There is a possibility that multiple plans corresponding to the period are held. Therefore, the countermeasure plan generation program 1122 executes the subsequent processes (S2070 to S2110) for each plan.

なお、プラン１１０９０は、複数の対策手段の組み合わせである可能性がある。その場合では対策案生成プログラム１１２２は、プラン１１０９０を構成する各対策手段１１０９７について優先度１１０９８を参照し、優先度が高い対策手段１１０９７の順番に以降の処理（Ｓ２０７０からＳ２１１０）を実施する。 The plan 11090 may be a combination of a plurality of countermeasures. In that case, the countermeasure plan generation program 1122 refers to the priority 11098 for each countermeasure means 11097 constituting the plan 11090, and executes the subsequent processes (S2070 to S2110) in the order of the countermeasure means 11097 having the highest priority.

まず、対策案生成プログラム１１２２は、Ｓ２０７０において選択手段保持テーブル１１０９から対策手段１１０９７を選択し、対応する対策方針１１０９５と、関連リソース傾向１１０９６と、パラメータ候補リソース１１０９９とを取得する。 First, the countermeasure plan generation program 1122 selects the countermeasure means 11097 from the selection means holding table 1109 in S2070, and acquires the corresponding countermeasure policy 11095, the related resource tendency 11096, and the parameter candidate resource 11099.

次にＳ２０８０において、対策案生成プログラム１１２２は、Ｓ２０７０で取得した情報を基に対策手段適用管理テーブル１１０８を参照し、対策手段のパラメータを計算する方法を記載したスクリプトＩＤ１１０８７を取得する。 Next, in S2080, the countermeasure plan generation program 1122 refers to the countermeasure means application management table 1108 based on the information acquired in S2070, and acquires the script ID 11087 that describes the method of calculating the parameters of the countermeasure means.

対策案生成プログラム１１２２は、Ｓ２０９０において、Ｓ２０８０で取得したスクリプトＩＤ１０８７に対応するスクリプトを、記憶資源１１０に記憶されたパラメータ計算スクリプト群１１３の中から特定する。そして、対策案生成プログラム１１２２は、Ｓ２０７０で取得したパラメータ候補リソースと、特定されたスクリプトとを用いて、パラメータを計算する。 In S2090, the countermeasure plan generation program 1122 specifies the script corresponding to the script ID 1087 acquired in S2080 from the parameter calculation script group 113 stored in the storage resource 110. Then, the countermeasure plan generation program 1122 calculates the parameter using the parameter candidate resource acquired in S2070 and the specified script.

対策案生成プログラム１１２２は、スクリプトで計算した対策案を実施した場合について、ボトルネックリソースのしきい値違反を起こした性能メトリックの値の変化を見積もる（Ｓ２１００）。この見積もり方法は、Ｓ１１００の見積もり方法と同様であってもよいし、Ｓ１１００によって見積もられた結果に対策案を適用することで見積もってもよい。もしも、見積もりの結果、指定期間内の全時点について警告しきい値１１０４３を違反しない状態となるならば（Ｓ２１１０の結果がＹｅｓ）、より優先度１１０９８の低い対策手段を実施する必要は無いため、対策案生成プログラム１１２２は、次のプラン１１０９０について対策案生成の処理（Ｓ２０７０からＳ２１１０）を開始する。もしも、Ｓ２１１０の結果がＮｏならば、対策案生成プログラム１１２２は、より優先度１１０９８の低い対策手段について対策案生成の処理（Ｓ２０７０からＳ２１１０）を実施する。 The countermeasure plan generation program 1122 estimates the change in the value of the performance metric that caused the threshold violation of the bottleneck resource when the countermeasure plan calculated by the script is implemented (S2100). This estimation method may be the same as the estimation method of S1100, or may be estimated by applying a countermeasure plan to the result estimated by S1100. If, as a result of the estimation, the warning threshold value 11043 is not violated at all time points within the specified period (the result of S2110 is Yes), it is not necessary to implement countermeasures having a lower priority of 11098. The countermeasure plan generation program 1122 starts the countermeasure plan generation process (S2070 to S2110) for the next plan 11090. If the result of S2110 is No, the countermeasure plan generation program 1122 executes the countermeasure plan generation process (S2070 to S2110) for the countermeasure means having a lower priority of 11098.

その後、対策案生成プログラム１１２２は、生成した対策手段をプラン毎に表示装置４００に表示する（Ｓ２１２０）。表示の方法としては、例えばＸＭＬなど定型的な形式で表現されてもよい。例えば、その場合には表示装置４００に表示するのみならず、ファイルなどの形式で電子的に保存されてもよい。 After that, the countermeasure plan generation program 1122 displays the generated countermeasure means on the display device 400 for each plan (S2120). As a display method, it may be expressed in a standard format such as XML. For example, in that case, not only the display device 400 may be displayed, but also the file may be electronically stored in a format such as a file.

また、管理計算機１００は、生成した対策案を実施した場合の見積もりをグラフとして表現し、対策案とともに表示装置４００に提示させてもよい。これにより、対策案実行による変化を直感的に把握することが可能となる。 Further, the management computer 100 may represent the estimate when the generated countermeasure plan is implemented as a graph, and have the display device 400 present it together with the countermeasure plan. This makes it possible to intuitively grasp the changes caused by the implementation of the countermeasure plan.

また、生成された対策案は、対策手段に対応するＣＬＩの形式で表現されていてもよい。なお、それに伴い本実施例においては対策手段およびパラメータ以外の内容を表示することを限定するものではない。例えば、対策案は、対策案の内容が含まれているならばプログラムの形式で表現されていてもよい。 Further, the generated countermeasure plan may be expressed in the form of CLI corresponding to the countermeasure means. Along with this, in this embodiment, the display of contents other than the countermeasure means and the parameters is not limited. For example, the countermeasure plan may be expressed in the form of a program as long as the content of the countermeasure plan is included.

この対策案生成処理によれば、管理計算機１００は、適切なスクリプトを用いてパラメータを決定することができ、指定期間に対する対策案を生成することができる。 According to this countermeasure plan generation process, the management computer 100 can determine the parameters by using an appropriate script, and can generate the countermeasure plan for the designated period.

本実施例におけるスクリプトは、Ｓｃｒｉｐｔ１からＳｃｒｉｐｔ４である。Ｓｃｒｉｐｔ１からＳｃｒｉｐｔ４は、ストレージ装置３００内のプールの稼働率が上昇したことが性能問題の原因である場合に、プールの稼働率を警告しきい値１１０４３まで低下させる対策案を生成するためのスクリプトである。ここでのパラメータ候補リソースは、ＶＭである。特にＳｃｒｉｐｔ１とＳｃｒｉｐｔ２は、ＶＭストレージマイグレーションのパラメータを計算するための計算方法を記載したスクリプトである。Ｓｃｒｉｐｔ３とＳｃｒｉｐｔ４は、ボリュームのＩ／Ｏ量上限を制限するストレージ設定変更について、パラメータを計算するための計算方法を記載したスクリプトである。 The scripts in this embodiment are Script1 to Script4. Script1 to Script4 are scripts for generating a countermeasure plan to reduce the operating rate of the pool to the warning threshold 11043 when the operating rate of the pool in the storage device 300 has increased as the cause of the performance problem. is there. The parameter candidate resource here is a VM. In particular, Script1 and Script2 are scripts that describe a calculation method for calculating the parameters of VM storage migration. Script3 and Script4 are scripts that describe a calculation method for calculating parameters for a storage setting change that limits the upper limit of the I / O amount of a volume.

図１６は、Ｓｃｒｉｐｔ１のフローチャートである。 FIG. 16 is a flowchart of Script1.

Ｓｃｒｉｐｔ１は、ＶＭストレージマイグレーションを行うＶＭ７００と、移動先のデータストアとを決定する。具体的には、Ｓｃｒｉｐｔ１は、指定期間におけるボトルネックリソースの平均稼働率を警告しきい値１１０４３よりも小さい値に低下させることができるＶＭ７００の組み合わせのうち、ＶＭ７００のデータを移動させた場合の、各プールまたはＲＡＩＤグループの、指定期間における平均稼働率が最小となる組み合わせを求める。これは、例えば整数計画問題としてモデル化し、公知の整数計画問題ソルバなどを用いて計算してもよい。これが可能な理由は、対策方針判断処理によって、性能低下解消の効果を維持しつつ、最適化問題をより簡単な問題に変換できているためである。また、それでもリソース数が大きく、最適解が現実的な時間で生成できない場合には、例えば前掲の整数計画問題を線形緩和し、公知のアルゴリズムを用いて近似解を求めてもよい。 Script1 determines the VM700 that performs VM storage migration and the data store of the migration destination. Specifically, Script1 is a combination of VM700 that can reduce the average utilization rate of bottleneck resources in a specified period to a value smaller than the warning threshold 11043, when the data of VM700 is moved. Find the combination that minimizes the average utilization rate of each pool or RAID group during the specified period. This may be modeled as, for example, an integer programming problem and calculated using a known integer programming problem solver or the like. The reason why this is possible is that the optimization problem can be converted into a simpler problem while maintaining the effect of eliminating the performance deterioration by the countermeasure policy judgment process. If the number of resources is still large and the optimum solution cannot be generated in a realistic time, for example, the above-mentioned integer programming problem may be linearly relaxed and an approximate solution may be obtained using a known algorithm.

複数のＶＭの負荷の合計により、プールの稼働率が指定期間に亘ってしきい値を超える場合、Ｓｃｒｉｐｔ１を用いることにより、特定のリソースに負荷が集中しないようにＶＭを移動させることができる。 When the utilization rate of the pool exceeds the threshold value over a specified period due to the total load of a plurality of VMs, the VMs can be moved so that the load is not concentrated on a specific resource by using Script1.

図１７は、Ｓｃｒｉｐｔ２のフローチャートである。 FIG. 17 is a flowchart of Script2.

Ｓｃｒｉｐｔ２は、ＶＭストレージマイグレーションを行うＶＭ７００と、移動先のデータストアとを決定する。具体的には、Ｓｃｒｉｐｔ２は、指定期間におけるボトルネックリソースの最大稼働率を警告しきい値１１０４３よりも小さい値に低下させることができるＶＭ７００の組み合わせのうち、ＶＭ７００のデータを移動させた後の各プールまたはＲＡＩＤグループの、指定期間における最大稼働率が最小となる組み合わせを求める。これは、例えば整数計画問題としてモデル化し、公知の整数計画問題ソルバなどを用いて計算してもよい。これが可能な理由は、対策方針判断処理によって、性能低下解消の効果を維持しつつ、最適化問題をより簡単な問題に変換できているためである。また、それでもリソース数が大きく、最適解が現実的な時間で生成できない場合には、例えば前掲の整数計画問題を線形緩和し、公知のアルゴリズムを用いて近似解を求めてもよい。 Script2 determines the VM700 that performs VM storage migration and the data store of the migration destination. Specifically, in Script2, among the combinations of VM700 that can reduce the maximum utilization rate of the bottleneck resource in the specified period to a value smaller than the warning threshold 11043, each of the combinations of VM700 after moving the data of VM700. Find the combination that minimizes the maximum utilization rate of the pool or RAID group in the specified period. This may be modeled as, for example, an integer programming problem and calculated using a known integer programming problem solver or the like. The reason why this is possible is that the optimization problem can be converted into a simpler problem while maintaining the effect of eliminating the performance deterioration by the countermeasure policy judgment process. If the number of resources is still large and the optimum solution cannot be generated in a realistic time, for example, the above-mentioned integer programming problem may be linearly relaxed and an approximate solution may be obtained using a known algorithm.

指定期間のうち、特定の時点でプールの稼働率がしきい値を超える場合、Ｓｃｒｉｐｔ２によれば、移動先リソースのピーク値を最小にするようにＶＭを移動させることができる。 According to Script2, when the utilization rate of the pool exceeds the threshold value at a specific time during the specified period, the VM can be moved so as to minimize the peak value of the destination resource.

図１８は、Ｓｃｒｉｐｔ３のフローチャートである。 FIG. 18 is a flowchart of Script3.

Ｓｃｒｉｐｔ３は、ボトルネックリソースに負荷をかけているボリューム３７０に対するＩ／Ｏの上限値を決定する。Ｓｃｒｉｐｔ３は、まずボトルネックリソース（ストレージ装置３００内のプール）について、その稼働率が警告しきい値１１０４３以下となるための、プールのＩＯＰＳ値上限を計算する（Ｓ１１３３０）。 Script3 determines the upper limit of I / O for the volume 370 that is loading the bottleneck resource. The Script3 first calculates the upper limit of the IOPS value of the pool for the bottleneck resource (pool in the storage device 300) so that the operating rate becomes the warning threshold value 11043 or less (S11330).

実際にはサーバ２００やＶＭ７００がＩ／Ｏを発行する対象は、ボリューム３７０であり、プールに対して発行されるＩ／Ｏ量は、当該プールから生成されたボリューム３７０に対して発行されるＩ／Ｏ量の合計である。そのため、Ｓｃｒｉｐｔ３は、各ボリューム３７０のＩＯＰＳ合計が、Ｓ１１３３０で計算したプールのＩＯＰＳ上限と等しくなるように各ボリューム３７０のＩＯＰＳ上限値を決定する。その際、例えば指定期間における各ボリューム３７０の平均ＩＯＰＳを計算し、その比率に従ってプールのＩＯＰＳ上限を分配することで、各ボリューム３７０のＩＯＰＳ上限値を計算する。 Actually, the target for issuing I / O by the server 200 or VM700 is the volume 370, and the amount of I / O issued to the pool is the I / O issued to the volume 370 generated from the pool. It is the total amount of / O. Therefore, Script3 determines the IOPS upper limit of each volume 370 so that the total IOPS of each volume 370 is equal to the IOPS upper limit of the pool calculated in S11330. At that time, for example, the average IOPS of each volume 370 in the designated period is calculated, and the IOPS upper limit of the pool is distributed according to the ratio to calculate the IOPS upper limit of each volume 370.

ボリュームの負荷が指定期間に亘って安定しており、プールの稼働率が指定期間に亘ってしきい値を超える場合、Ｓｃｒｉｐｔ３によれば、指定期間に亘ってボリュームの負荷を抑えるように、各ボリューム３７０のＩＯＰＳ上限値を決定することができる。 If the volume load is stable over the specified period and the pool utilization exceeds the threshold over the specified period, according to Script3, each will reduce the volume load over the specified period. The IOPS upper limit of the volume 370 can be determined.

図１９は、Ｓｃｒｉｐｔ４のフローチャートである。 FIG. 19 is a flowchart of Script4.

Ｓｃｒｉｐｔ４は、ボトルネックリソースに負荷をかけているボリューム３７０に対するＩ／Ｏの上限値を決定する。Ｓｃｒｉｐｔ４では、Ｓ１１３３０と同処理でボトルネックリソースであるプールのＩＯＰＳ値上限を計算する（Ｓ１１３４０）。 Script4 determines the upper limit of I / O for the volume 370 that is loading the bottleneck resource. In Script4, the upper limit of the IOPS value of the pool, which is a bottleneck resource, is calculated in the same process as S11330 (S11340).

次に、対策案生成プログラム１１２２は、ボトルネックリソースであるプールに関連するボリューム３７０に対し、Ｓ１１３４０で計算したＩＯＰＳ値を上限値として設定する（Ｓ１１３４１）。 Next, the countermeasure plan generation program 1122 sets the IOPS value calculated in S11340 as the upper limit value for the volume 370 related to the pool which is a bottleneck resource (S11341).

指定期間のうち特定の時点でボリュームの負荷が高くなることで、プールの稼働率がしきい値を超える場合、Ｓｃｒｉｐｔ４によれば、特定のボリュームの負荷を抑えるように、ＩＯＰＳ上限値を決定することができる。 When the capacity of the pool exceeds the threshold value due to the load of the volume becoming high at a specific point in the specified period, according to Script4, the IOPS upper limit is determined so as to suppress the load of the specific volume. be able to.

なお、上記のスクリプトは一例であって、ボトルネックリソースや、性能メトリックや、対策手段や、対策方針によって上記以外のスクリプトが含まれていてもよい。また、本実施例においてはフローチャートの形式で表現したが、処理内容や処理順序が表現できる他の形式、例えばＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）の形式で表現されていてもよい。また、管理計算機１００は、ファイルなどの電子媒体に含まれるＸＭＬデータを動的に記憶資源に読み込む機能を有していてもよい。 The above script is an example, and scripts other than the above may be included depending on the bottleneck resource, the performance metric, the countermeasure means, and the countermeasure policy. Further, although it is expressed in the form of a flowchart in this embodiment, it may be expressed in another form in which the processing contents and the processing order can be expressed, for example, in the form of XML (Extension Markup Language). Further, the management computer 100 may have a function of dynamically reading XML data contained in an electronic medium such as a file into a storage resource.

図２０は、対策方針判断処理（Ｓ２０２０）を示すフローチャートである。 FIG. 20 is a flowchart showing a countermeasure policy determination process (S2020).

対策方針判断処理は、対策方針判断プログラム１１２３をプロセッサ１２０で実行することで、実行される。 The countermeasure policy determination process is executed by executing the countermeasure policy determination program 1123 on the processor 120.

まず、対策方針判断プログラム１１２３は、ボトルネックリソースの種別と、ボトルネックリソースのＩＤと、指定期間と、警告しきい値１１０４３を違反している性能メトリックの種別とを、受信する（Ｓ３０００）。 First, the countermeasure policy determination program 1123 receives the type of the bottleneck resource, the ID of the bottleneck resource, the specified period, and the type of the performance metric that violates the warning threshold value 11043 (S3000).

次に、対策方針判断プログラム１１２３は、ボトルネックリソースに対応する性能管理テーブル１１０５を参照し、指定されたボトルネックリソースＩＤと、性能メトリック種別と、指定期間に含まれる各時点の性能データとを取得する（Ｓ３０１０）。 Next, the countermeasure policy determination program 1123 refers to the performance management table 1105 corresponding to the bottleneck resource, and obtains the specified bottleneck resource ID, the performance metric type, and the performance data at each time point included in the specified period. Acquire (S3010).

次に、対策方針判断プログラム１１２３は、Ｓ３０１０で取得した、ボトルネックリソースの性能データの傾向を、複数のパタンに分類する。本実施例においては、分類の一例として、各時点における性能データの値にばらつきが少ない場合を“安定的”パタンとし、逆に特定の時点における性能データの値が他の時点の性能データの値と比較して大きく外れている場合を“不安定的”パタンとする、２パタンに分類する場合を説明する。 Next, the countermeasure policy determination program 1123 classifies the tendency of the performance data of the bottleneck resource acquired in S3010 into a plurality of patterns. In this embodiment, as an example of classification, the case where the value of the performance data at each time point has little variation is regarded as a "stable" pattern, and conversely, the value of the performance data at a specific time point is the value of the performance data at another time point. The case of classifying into two patterns will be described, in which the case where the pattern is significantly different from the above is regarded as the “unstable” pattern.

対策方針判断プログラム１１２３は、Ｓ３０１０で取得した性能データが安定的パタンか不安定的パタンかを判別するために、指定期間に亘る性能データの平均値とボトルネック負荷偏差しきい値を計算する。例えば、対策方針判断プログラム１１２３は、指定期間に亘る性能データの標準偏差を算出し、標準偏差に、予め設定された係数を乗ずることで、ボトルネック負荷偏差しきい値を算出する。そして、対策方針判断プログラム１１２３は、各時点の性能データの値に関して、平均値に対する性能データの値の偏差がボトルネック負荷偏差しきい値以上となる時点があるか否かを判断する（Ｓ３０２０、Ｓ３０３０）。対策方針判断プログラム１１２３は、Ｓ３０３０の結果が、Ｙｅｓの場合、不安定的パタンであると判断し、Ｎｏの場合、安定的パタンであると判断する。なお、Ｓ３０３０で用いた標準偏差を用いる計算方法は外れ値を求める方法の一例であり、他の外れ値検出方法を用いてもよい。 The countermeasure policy determination program 1123 calculates the average value of the performance data and the bottleneck load deviation threshold value over the specified period in order to determine whether the performance data acquired in S3010 is a stable pattern or an unstable pattern. For example, the countermeasure policy determination program 1123 calculates the standard deviation of the performance data over a specified period, and calculates the bottleneck load deviation threshold value by multiplying the standard deviation by a preset coefficient. Then, the countermeasure policy determination program 1123 determines whether or not there is a time point in which the deviation of the performance data value with respect to the average value becomes equal to or greater than the bottleneck load deviation threshold value with respect to the performance data value at each time point (S3020, S3030). If the result of S3030 is Yes, the countermeasure policy determination program 1123 determines that the pattern is unstable, and if No, it determines that the pattern is stable. The calculation method using the standard deviation used in S3030 is an example of a method for obtaining outliers, and other outlier detection methods may be used.

そして、Ｓ３０３０の結果、不安定的パタンに分類された場合には、指定期間において特定の時点において大きくしきい値違反を起こしていることで性能異常が発生している可能性が高いと考えられるため、対策方針判断プログラム１１２３は、指定期間においてピーク値を取る時点の性能データの値を優先的に改善する対策方針を採ると決定する（Ｓ３０４０）。以降の説明においては本方針を“ピーク下げ”と記載する。 Then, when the pattern is classified as unstable as a result of S3030, it is highly probable that a performance abnormality has occurred due to a large threshold violation at a specific time point in the specified period. Therefore, the countermeasure policy determination program 1123 determines to adopt a countermeasure policy for preferentially improving the value of the performance data at the time when the peak value is taken in the designated period (S3040). In the following explanation, this policy will be described as "peak reduction".

一方で、Ｓ３０４０の結果、安定的パタンに分類された場合には、指定期間において定常的にしきい値違反を起こしている可能性が高いと考えられるため、対策方針判断プログラム１１２３は、指定期間において各時点の性能データの値を均一に改善する対策方針を採ると決定する（Ｓ３０５０）。以降の説明においては本方針を“平均下げ”と記載する。 On the other hand, if it is classified as a stable pattern as a result of S3040, it is highly likely that the threshold value is constantly violated during the designated period. Therefore, the countermeasure policy determination program 1123 is used during the designated period. It is decided to take a countermeasure policy to uniformly improve the value of the performance data at each time point (S3050). In the following explanation, this policy will be described as "average reduction".

この対策方針判断処理によれば、ボトルネックリソースの負荷の傾向が安定的であるか不安定的であるかを判定し、その判定結果に基づいて適切な対策方針を決定することができる。 According to this countermeasure policy determination process, it is possible to determine whether the load tendency of the bottleneck resource is stable or unstable, and determine an appropriate countermeasure policy based on the determination result.

図２１は、対策手段選択処理（Ｓ２０３０）を示すフローチャートである。 FIG. 21 is a flowchart showing a countermeasure means selection process (S2030).

対策手段選択処理は、対策手段選択プログラム１１２４をプロセッサ１２０で実行することで、実行される。 The countermeasure means selection process is executed by executing the countermeasure means selection program 1124 on the processor 120.

まず、対策手段選択プログラム１１２４は、ボトルネックリソースの種別と、ボトルネックリソースのＩＤと、指定期間と、警告しきい値１１０４３を違反している性能メトリック種別と、対策方針とを、受信する（Ｓ４０００）。 First, the countermeasure means selection program 1124 receives the bottleneck resource type, the bottleneck resource ID, the specified period, the performance metric type that violates the warning threshold 11043, and the countermeasure policy ( S4000).

次に対策手段選択プログラム１１２４は、指定されたボトルネックリソース種別の性能メトリックについて、性能データを改善する効果のある対策手段を抽出する（Ｓ４０１０）。具体的には、対策手段選択プログラム１１２４は、対策手段管理テーブル１１０６を参照し、Ｓ４０００で受信したボトルネックリソース種別１１０６０と、警告しきい値違反を起こしているボトルネック性能メトリック種別１１０６１に対応する対策手段１１０６２と、パラメータリソース種別１１０６３と、組み合わせ可能対策手段１１０６４とを、抽出する。 Next, the countermeasure measure selection program 1124 extracts countermeasure measures effective in improving the performance data for the performance metric of the specified bottleneck resource type (S4010). Specifically, the countermeasure means selection program 1124 refers to the countermeasure means management table 1106, and corresponds to the bottleneck resource type 11060 received in S4000 and the bottleneck performance metric type 11061 causing the warning threshold violation. The countermeasure means 11062, the parameter resource type 11063, and the countermeasure means 11064 that can be combined are extracted.

ここでボトルネックリソースに対して複数の対策手段１１０６２が適用できる可能性がある。また、各対策手段によってパラメータリソース種別が異なることから、対策手段選択プログラム１１２４は、以降の処理（Ｓ４０３０からＳ４１２０）を対策手段毎に実行する。 Here, there is a possibility that a plurality of countermeasures 11062 can be applied to the bottleneck resource. Further, since the parameter resource type differs depending on each countermeasure means, the countermeasure means selection program 1124 executes the subsequent processing (S4030 to S4120) for each countermeasure means.

対策手段選択プログラム１１２４は、Ｓ４０３０において、Ｓ４０１０で取得した対策手段のパラメータリソース種別について、ＶＭ管理テーブル１１０１、ホスト管理テーブル１１０２、ストレージボリューム管理テーブル１１０３などの構成情報を管理する各種テーブルから、該当するリソースの情報を抽出する（Ｓ４０３０）。例えば、ボトルネックリソースがストレージ装置３００内部のプールであった場合、対策手段としてＶＭストレージマイグレーションとＩ／Ｏ量上限制御の２つの対策手段１１０６２がとり得ることを示している。そして、ＶＭストレージマイグレーションを実行するためにはパラメータとしてＶＭ７００を設定する必要があり、Ｉ／Ｏ量上限制御を実行するためにはパラメータとしてボリューム３７０を設定する必要があることを示している。 The countermeasure means selection program 1124 corresponds to the parameter resource type of the countermeasure means acquired in S4010 from various tables that manage configuration information such as the VM management table 1101, the host management table 1102, and the storage volume management table 1103 in S4030. Extract resource information (S4030). For example, when the bottleneck resource is the pool inside the storage device 300, it is shown that two countermeasure measures 11062, VM storage migration and I / O amount upper limit control, can be taken as countermeasure measures. It is shown that the VM 700 needs to be set as a parameter in order to execute the VM storage migration, and the volume 370 needs to be set as a parameter in order to execute the I / O amount upper limit control.

ここで、該当するパラメータリソース種別の関連リソースの情報が、各種構成情報を管理するテーブルに存在しない場合（Ｓ４０４０の結果がＮｏの場合）、対策手段選択プログラム１１２４は、対策案を生成することができないため、Ｓ４０２０に戻り次の対策手段について同様の処理を実施する。 Here, when the information of the related resource of the corresponding parameter resource type does not exist in the table for managing various configuration information (when the result of S4040 is No), the countermeasure means selection program 1124 may generate a countermeasure plan. Since it cannot be done, the process returns to S4020 and the same processing is performed for the next countermeasure means.

Ｓ４０４０の結果がＹｅｓの場合、対策手段選択プログラム１１２４は、以降の処理によって各関連リソースの負荷傾向をパタンに分類する。まずは、対策手段選択プログラム１１２４は、関連リソースにおいて着目する性能メトリックを特定するために、メトリック管理テーブル１１０７を参照し、ボトルネックリソース種別１１０７０とボトルネック性能メトリック種別１１０７１とパラメータリソース種別１１０７２とに対応する、パラメータ性能メトリック種別１１０７３を抽出する（Ｓ４０５０）。例えばストレージ装置３００内のプールの性能メトリックである稼働率は、ＶＭ７００の性能メトリックであるＩＯＰＳ、または、ボリューム３７０の性能メトリックであるＩＯＰＳに関連していることを示している。 When the result of S4040 is Yes, the countermeasure means selection program 1124 classifies the load tendency of each related resource into patterns by the subsequent processing. First, the countermeasure means selection program 1124 refers to the metric management table 1107 in order to specify the performance metric of interest in the related resource, and corresponds to the bottleneck resource type 11070, the bottleneck performance metric type 11071, and the parameter resource type 11072. The parameter performance metric type 11073 is extracted (S4050). For example, the utilization rate, which is a performance metric of the pool in the storage device 300, indicates that it is related to the IOPS, which is the performance metric of the VM700, or the IOPS, which is the performance metric of the volume 370.

次に、対策手段選択プログラム１１２４は、関連リソース毎に負荷傾向を分析する。本実施例においては、対策方針判断プログラム１１２３と同様、各関連リソースを安定的パタンと不安定的パタンに分類する場合について説明する。具体的には、対策手段選択プログラム１１２４は、関連リソース毎にＳ４０７０とＳ４０８０の処理を行う。 Next, the countermeasure means selection program 1124 analyzes the load tendency for each related resource. In this embodiment, as in the countermeasure policy determination program 1123, a case where each related resource is classified into a stable pattern and an unstable pattern will be described. Specifically, the countermeasure means selection program 1124 processes S4070 and S4080 for each related resource.

対策手段選択プログラム１１２４は、Ｓ４０７０において性能管理テーブル１１０５を参照し、指定期間における関連リソースのパラメータ性能メトリック種別１１０７３の性能データを抽出する。そして、Ｓ４０８０において、Ｓ４０７０で抽出した性能データについて外れ値判定を行うことで、関連リソースが安定的パタンであるか否かを判定する。ここで実施する処理は、Ｓ３０２０と同様でもよい（Ｓ４０８０）。例えば、対策手段選択プログラム１１２４は、指定期間に亘る性能データの平均値と関連負荷偏差しきい値とを計算し、平均値に対する性能データの値の偏差が関連負荷偏差しきい値以上となる時点があるか否かを判断する。関連負荷偏差しきい値は例えば、標準偏差に、予め設定された係数を乗じた値である。 The countermeasure means selection program 1124 refers to the performance management table 1105 in S4070, and extracts the performance data of the parameter performance metric type 11073 of the related resource in the specified period. Then, in S4080, by performing outlier determination on the performance data extracted in S4070, it is determined whether or not the related resource has a stable pattern. The process performed here may be the same as in S3020 (S4080). For example, the countermeasure means selection program 1124 calculates the average value of the performance data and the related load deviation threshold value over a specified period, and when the deviation of the performance data value with respect to the average value becomes equal to or larger than the related load deviation threshold value. Determine if there is. The associated load deviation threshold is, for example, a value obtained by multiplying the standard deviation by a preset coefficient.

Ｓ４０３０で抽出したすべての関連リソースをパタンに分類した後、対策手段選択プログラム１１２４は、安定的パタンに分類された関連リソースの数が、不安定的パタンに分類された関連リソースの数より多いか否かを判断する（Ｓ４０９０）。その結果がＹｅｓの場合であれば、対策手段選択プログラム１１２４は、指定されたボトルネックリソースに対して当該対策手段のパラメータとなる関連リソース群は安定的パタンであると判断する（Ｓ４１１０）。逆にＮｏの場合であれば、対策手段選択プログラム１１２４は、関連リソース群は不安定的パタンであると判断する（Ｓ４１００）。 After classifying all the related resources extracted in S4030 into patterns, the countermeasure selection program 1124 indicates whether the number of related resources classified into stable patterns is larger than the number of related resources classified into unstable patterns. It is determined whether or not (S4090). If the result is Yes, the countermeasure means selection program 1124 determines that the related resource group that is the parameter of the countermeasure means has a stable pattern for the specified bottleneck resource (S4110). On the contrary, if No, the countermeasure means selection program 1124 determines that the related resource group has an unstable pattern (S4100).

次に、対策手段選択プログラム１１２４は、Ｓ４０００で受信した対策方針と、Ｓ４１００またはＳ４１１０で判断した関連リソースのパタンとをつき合わせ、当該対策手段がボトルネックリソースの性能低下を解消する効果があるか否かを判断する。具体的には、対策手段選択プログラム１１２４は、対策手段適用管理テーブル１１０８を参照し、ボトルネックリソース種別１１０８０と、性能メトリック種別１１０８１と、対策手段１１０８２と、対策方針１１０８３と、関連リソース傾向１１０８４に合致する適用可否１１０８５の値を取得する。対策手段選択プログラム１１２４は、その結果が、Ｙｅｓであれば、当該対策手段は有効であると判断し、Ｎｏであれば当該対策手段は有効でないと判断する（Ｓ４１２０）。 Next, the countermeasure measure selection program 1124 collates the countermeasure policy received in S4000 with the pattern of the related resource determined in S4100 or S4110, and is the countermeasure measure effective in eliminating the performance deterioration of the bottleneck resource? Judge whether or not. Specifically, the countermeasure means selection program 1124 refers to the countermeasure means application management table 1108, and sets the bottleneck resource type 11080, the performance metric type 11081, the countermeasure means 11082, the countermeasure policy 11083, and the related resource tendency 11084. Acquires the matching applicability 11085 value. If the result is Yes, the countermeasure means selection program 1124 determines that the countermeasure means is effective, and if No, determines that the countermeasure means is not effective (S4120).

例えば、対策方針１１０８３が平均下げかつ、関連リソース傾向１１０８４が不安定的パタンである場合では、各関連リソースが夫々異なる時点において負荷が高まっているために、各関連リソースを集合すると指定期間を通して安定した負荷傾向に見えていると考えられる。そのため、このとき対策手段１１０８２がＶＭストレージマイグレーションである場合では、特定のＶＭ７００のデータを異なるプールに移動させたとしても、ＶＭ７００の負荷が高まるタイミングで移動先のプールにおいて警告しきい値違反を起こす可能性が高い。よって、適用可否１１０８５はＮｏと判断される。 For example, when the countermeasure policy 11083 is lowered on average and the related resource tendency 11084 is an unstable pattern, the load is increasing at different points in time for each related resource, so that when each related resource is assembled, it is stable throughout the specified period. It is considered that the load tendency seems to have increased. Therefore, if the countermeasure 11082 is VM storage migration at this time, even if the data of a specific VM700 is moved to a different pool, a warning threshold violation occurs in the destination pool at the timing when the load of the VM700 increases. Probability is high. Therefore, applicability 11085 is determined to be No.

また、対策方針１１０８３がピーク下げかつ、関連リソース傾向１１０８４が不安定的パタンである場合、各関連リソースが同じ時点において負荷が高まっており、各関連リソースを集合すると特定の時点で大きくボトルネックリソースの負荷が高まる状態となっていると考えられる。そのため、このとき対策手段１１０８２がＩ／Ｏ量上限制御である場合、各ボリューム３７０に対して同様にＩ／Ｏ量制限をかける必要があり、ピーク時点において各ボリューム３７０を用いるアプリケーションに対して大きな影響を発生させてしまう。よって、適用可否はＮｏと判断される。 Further, when the countermeasure policy 11083 has a peak decrease and the related resource tendency 11084 has an unstable pattern, the load of each related resource is increasing at the same time point, and when each related resource is aggregated, a large bottleneck resource is obtained at a specific time point. It is considered that the load on the vehicle is increasing. Therefore, at this time, when the countermeasure means 11082 is the I / O amount upper limit control, it is necessary to similarly limit the I / O amount to each volume 370, which is large for the application using each volume 370 at the peak time. It will cause an impact. Therefore, the applicability is judged as No.

この対策手段選択処理によれば、関連リソースの負荷の傾向が安定的であるか不安定的であるかを判定し、その判定結果と対策方針とに基づいて適切な対策手段を決定することができる。 According to this countermeasure selection process, it is possible to determine whether the load tendency of related resources is stable or unstable, and determine an appropriate countermeasure based on the determination result and the countermeasure policy. it can.

以上で説明したように、本実施例によれば、管理計算機１００は、指定期間におけるボトルネックリソースの負荷変化の傾向に対応した対策案の生成方針を決定することができる。 As described above, according to the present embodiment, the management computer 100 can determine the generation policy of the countermeasure plan corresponding to the tendency of the load change of the bottleneck resource in the designated period.

また、本実施例によれば、管理計算機１００は、関連リソース群の特徴を表現する１つの負荷傾向のパタンを判断することができる。それにより、管理計算機１００は、パラメータ計算の計算量が増大してしまう対策手段や、副作用が大きくなるためにアプリケーションに対する影響が大きくなる対策手段を除外することができる。 Further, according to the present embodiment, the management computer 100 can determine one load tendency pattern expressing the characteristics of the related resource group. As a result, the management computer 100 can exclude the countermeasure measures that increase the calculation amount of the parameter calculation and the countermeasure measures that have a large influence on the application due to the large side effects.

また、管理計算機１００は、ボトルネックリソースの負荷傾向から導出される対策方針に基づき、対策手段のパラメータを計算することで、性能低下を解消するために必要なだけのパラメータを最小限の計算で求めることが可能となる。 In addition, the management computer 100 calculates the parameters of the countermeasure means based on the countermeasure policy derived from the load tendency of the bottleneck resource, so that the parameters necessary for eliminating the performance deterioration can be calculated with the minimum calculation. It becomes possible to ask.

（２−１）本実施例の形態の概要 (2-1) Outline of the embodiment of this embodiment

本実施例においては、ボトルネックリソースの関連リソースをパタン毎に分類し、そのパタンに適用可能な対策手段を導出することによって、複数の対策手段を組み合わせた対策案を導出可能となる。これにより、実施例２において、管理計算機１００は、複数のパタンの関連リソースが混在していた場合であっても、パラメータ計算の計算量を削減しつつ、アプリケーションや各ＶＭに対する副作用が小さい対策案を生成することが可能となる。 In this embodiment, by classifying the related resources of the bottleneck resource for each pattern and deriving the countermeasures applicable to the pattern, it is possible to derive a countermeasure plan combining a plurality of countermeasures. As a result, in the second embodiment, the management computer 100 reduces the amount of calculation of parameter calculation even when related resources of a plurality of patterns are mixed, and has a small side effect on the application and each VM. Can be generated.

実施例２に係る計算機システムの構成や、計算機システムで管理される情報は実施例１で説明した計算機システムと同様のため、図示は省略する。また、実施例２に係る計算機システムで実行されるプログラムのうち、対策手段選択プログラム１１２４以外のプログラムの動作は、実施例１で説明したものと同様である。そのため、以下では、対策手段選択プログラム１１２４の動作のうち、実施例１で説明したものと異なる点を中心に説明する。 Since the configuration of the computer system according to the second embodiment and the information managed by the computer system are the same as those of the computer system described in the first embodiment, the illustration is omitted. Further, among the programs executed by the computer system according to the second embodiment, the operations of the programs other than the countermeasure means selection program 1124 are the same as those described in the first embodiment. Therefore, in the following, among the operations of the countermeasure means selection program 1124, the points different from those described in the first embodiment will be mainly described.

（２−２）各装置動作の詳細 (2-2) Details of operation of each device

図２２は、実施例２に係る対策手段選択処理（Ｓ２０３０）を示すフローチャートである。 FIG. 22 is a flowchart showing a countermeasure means selection process (S2030) according to the second embodiment.

まず、対策手段選択プログラム１１２４は、実施例１のＳ４０１０までの処理を行う。その後、対策手段選択プログラム１１２４は、対策手段毎に、実施例１のＳ４０３０からＳ４０５０までと同様の処理を行う。さらに、対策手段選択プログラム１１２４は、その後、関連リソース毎に、実施例１のＳ４０７０からＳ４０８０までと同様の処理を行う。 First, the countermeasure means selection program 1124 performs the processes up to S4010 of the first embodiment. After that, the countermeasure means selection program 1124 performs the same processing as in S4030 to S4050 of the first embodiment for each countermeasure means. Further, the countermeasure means selection program 1124 then performs the same processing as in S4070 to S4080 of the first embodiment for each related resource.

その後、本実施例において、対策手段選択プログラム１１２４は、Ｓ４０８０の外れ値判定の結果、当該関連リソースが安定的パタンであるか不安定的パタンであるかを判別する。本実施例においては、実施例１（Ｓ３０２０、Ｓ４０８０）と同様の方法で外れ値を判定すると想定する（Ｓ５０００）。Ｓ５０００の結果がＹｅｓであれば、対策手段選択プログラム１１２４は、当該関連リソースが安定的パタンであると判断し、当該関連リソースを安定的パタンの関連リソース群のひとつとして記憶資源に記憶する（Ｓ５０１０）。一方で、Ｓ５０００の結果がＮｏであれば、対策手段選択プログラム１１２４は、当該関連リソースは不安定的パタンであると判断し、対策手段選択プログラム１１２４は当該関連リソースを不安定的パタンの関連リソース群のひとつとして記憶資源に記憶する（Ｓ５０２０）。 After that, in the present embodiment, the countermeasure means selection program 1124 determines whether the related resource has a stable pattern or an unstable pattern as a result of the outlier determination of S4080. In this embodiment, it is assumed that the outliers are determined by the same method as in Example 1 (S3020, S4080) (S5000). If the result of S5000 is Yes, the countermeasure means selection program 1124 determines that the related resource has a stable pattern, and stores the related resource in the storage resource as one of the related resource groups of the stable pattern (S5010). ). On the other hand, if the result of S5000 is No, the countermeasure means selection program 1124 determines that the related resource has an unstable pattern, and the countermeasure means selection program 1124 determines that the related resource is an unstable pattern related resource. It is stored in a storage resource as one of the groups (S5020).

対策手段選択プログラム１１２４は、Ｓ５０２０までの処理を各関連リソースに対して実施すると、次にＳ５０１０、Ｓ５０２０で分類した各パタンに対応する関連リソース群のうち、含まれる関連リソースの数が最も多いリソース群を選択する（Ｓ５０３０）。 When the countermeasure means selection program 1124 executes the processing up to S5020 for each related resource, the resource having the largest number of related resources among the related resource groups corresponding to each pattern classified in S5010 and S5020 next. Select a group (S5030).

そして、対策手段選択プログラム１１２４は、Ｓ５０３０で選択した関連リソース群の傾向に対して、当該対策手段がボトルネックリソースの性能低下を解消する効果があるか否かを、対策手段適用管理テーブル１１０８を参照して判断する（Ｓ５０４０）。判断の内容や、具体例はＳ４１２０と同様である。 Then, the countermeasure means selection program 1124 sets the countermeasure means application management table 1108 to determine whether or not the countermeasure means is effective in eliminating the performance deterioration of the bottleneck resource with respect to the tendency of the related resource group selected in S5030. Judgment is made by reference (S5040). The content of the judgment and specific examples are the same as in S4120.

Ｓ５０４０で判断した結果がＹｅｓの場合、当該対策手段はボトルネックリソースの性能低下の問題に関連するリソースのうち、最も多数の関連リソースに対して適用することができる対策手段である。そのため、対策手段を組み合わせる場合に、当該対策手段が最もボトルネックリソースの性能低下を解消する効果が大きいと考えられる。よって、対策手段選択プログラム１１２４は、当該対策手段を最も高い優先度（優先度１）の対策手段として記憶資源に記憶し、後述する対策手段組み合わせ生成処理に利用する（Ｓ５０６０）。 If the result determined in S5040 is Yes, the countermeasure is a countermeasure that can be applied to the largest number of related resources among the resources related to the problem of performance deterioration of the bottleneck resource. Therefore, when the countermeasures are combined, it is considered that the countermeasures have the greatest effect of eliminating the performance deterioration of the bottleneck resource. Therefore, the countermeasure means selection program 1124 stores the countermeasure means in the storage resource as the countermeasure means having the highest priority (priority 1), and uses it for the countermeasure means combination generation process described later (S5060).

Ｓ５０５０で判断した結果がＮｏの場合、当該対策手段はボトルネックリソースの関連リソースのうち、最も多数の関連リソースに対して適用することができない対策手段である。そのため、対策手段選択プログラム１１２４は、当該対策手段を他の対策手段と組み合わせても性能低下の問題を解消することができないと判断し、当該対策手段に対する処理を終了する。 If the result determined in S5050 is No, the countermeasure means is a countermeasure measure that cannot be applied to the largest number of related resources among the related resources of the bottleneck resource. Therefore, the countermeasure means selection program 1124 determines that the problem of performance deterioration cannot be solved even if the countermeasure means is combined with other countermeasure means, and ends the process for the countermeasure means.

次に対策手段選択プログラム１１２４は、対策組み合わせ生成処理により、最多数の関連リソース群以外の関連リソース群のパタンを基に、当該対策手段と組み合わせる他の対策手段を判断する（Ｓ５０７０）。対策組み合わせ生成処理の詳細は後述する。 Next, the countermeasure means selection program 1124 determines other countermeasure means to be combined with the countermeasure means based on the pattern of the related resource group other than the largest number of related resource groups by the countermeasure combination generation process (S5070). The details of the countermeasure combination generation process will be described later.

最後に、Ｓ５０６０で選択した対策手段と、Ｓ５０８０で決定した対策手段を選択手段保持テーブル１１０９に登録する。対策手段１１０９７登録時には、対応する優先度１１０９８としてＳ５０６０で選択した優先度か、対策手段組み合わせ生成処理で選択した優先度を共に設定する。また、対策手段１１０９７の登録時には、対応するパラメータ候補リソース１１０９９として対策手段に対応する関連リソース群に含まれる関連リソースを設定する。 Finally, the countermeasure means selected in S5060 and the countermeasure means determined in S5080 are registered in the selection means holding table 1109. At the time of registering the countermeasure means 11097, both the priority selected in S5060 as the corresponding priority 11098 and the priority selected in the countermeasure means combination generation process are set. Further, when the countermeasure means 11097 is registered, the related resource included in the related resource group corresponding to the countermeasure means is set as the corresponding parameter candidate resource 11099.

図２３は、実施例２に係る対策手段組み合わせ生成処理を示すフローチャートである。 FIG. 23 is a flowchart showing a countermeasure means combination generation process according to the second embodiment.

対策手段組み合わせ生成処理は、対策手段選択プログラム１１２４から呼び出されるサブプログラムである。 The countermeasure means combination generation process is a subprogram called from the countermeasure means selection program 1124.

対策手段組み合わせ生成処理は、まずボトルネックリソースの種別およびＩＤと、警告しきい値違反を起こしている性能メトリック種別と、対策方針と、選択済みの対策手段と、Ｓ５０１０またはＳ５０２０で分類された関連リソース群とを受信する（Ｓ６０００）。 In the countermeasure combination generation process, first, the bottleneck resource type and ID, the performance metric type causing the warning threshold violation, the countermeasure policy, the selected countermeasure, and the association classified by S5010 or S5020. Receives the resource group (S6000).

その後、対策手段組み合わせ生成処理は、Ｓ６０００で受信した関連リソース群のうち、リソース数最多の関連リソース群を除いた関連リソース群を、リソース数が多い順に選択し、以降の処理を行う（Ｓ６０１０）。 After that, in the countermeasure means combination generation process, among the related resource groups received in S6000, the related resource groups excluding the related resource group having the largest number of resources are selected in descending order of the number of resources, and the subsequent processing is performed (S6010). ..

まず、対策手段組み合わせ生成処理は、対策手段管理テーブル１１０６を参照し、Ｓ６０００で受信した対策手段と、Ｓ６０４０で選択済みの対策手段のすべての対策手段とに対して、組み合わせ可能対策手段１１０６４を取得し、取得されたすべての組み合わせ可能対策手段１１０６４に共通して含まれる対策手段を抽出する（Ｓ６０２０）。 First, in the countermeasure means combination generation process, the countermeasure means management table 1106 is referred to, and the countermeasure means 11064 that can be combined is acquired for the countermeasure means received in S6000 and all the countermeasure means of the countermeasure means selected in S6040. Then, the countermeasure measures commonly included in all the acquired countermeasure measures 11064 that can be combined are extracted (S6020).

さらに対策手段組み合わせ生成処理は、Ｓ６０００で受信したボトルネックリソース種別１１０８０と、性能メトリック種別１１０８１と、対策方針１１０８３と、Ｓ６０１０で選択された関連リソース群の関連リソース傾向１１０８４とを用いて、Ｓ６０２０で抽出された対策手段毎に、対策手段適用管理テーブル１１０８を参照し、性能低下を解消するために効果のある対策手段を抽出する（Ｓ６０３０）。 Further, the countermeasure means combination generation process is performed in S6020 by using the bottleneck resource type 11080 received in S6000, the performance metric type 11081, the countermeasure policy 11083, and the related resource tendency 11084 of the related resource group selected in S6010. For each of the extracted countermeasure measures, the countermeasure measure application management table 1108 is referred to, and effective countermeasure measures for eliminating the performance deterioration are extracted (S6030).

次に、対策手段組み合わせ生成処理は、Ｓ６０３０で抽出した各対策手段と、以前のＳ６０４０の処理により生成された対策手段の組み合わせとを掛け合わせて、組み合わせを生成し、生成した組み合わせを新たに記憶する（Ｓ６０４０）。 Next, the countermeasure means combination generation process multiplies each countermeasure measure extracted in S6030 with the combination of the countermeasure means generated by the previous process of S6040 to generate a combination, and newly stores the generated combination. (S6040).

もしも、Ｓ６０１０からＳ６０４０までの処理をすべての関連リソース群について実施したならば、対策手段組み合わせ生成処理は処理を終了し、生成した対策手段の組み合わせを処理の呼び出し元に返却する。そうでないならば、対策手段組み合わせ生成処理は、Ｓ６０１０に処理を移し、次にリソース数の多い関連リソース群についてＳ６０２０からＳ６０４０までの処理を行う。 If the processes from S6010 to S6040 are performed for all the related resource groups, the countermeasure means combination generation process ends the process, and the generated combination of countermeasure measures is returned to the caller of the process. If not, the countermeasure means combination generation process shifts the process to S6010, and then performs the processes from S6020 to S6040 for the related resource group having the largest number of resources.

もし、期間内の全ての時点で性能低下を防ぐように、大量の構成変更手段や構成変更手段のパラメータの組み合わせの中から適切な対策案を生成しようとすると、現実的な計算時間で表示することができない。以上に述べた各実施例によれば、対策手段の候補を限定し、その上で対策手段のパラメータ計算の計算量を削減することができる。これにより、過去の期間に関して、負荷が再現した場合に性能低下が再現しない対策案を効率的に生成し、表示することができる。 If you try to generate an appropriate countermeasure plan from a large number of configuration change means or a combination of parameters of the configuration change means so as to prevent performance deterioration at all points in the period, it will be displayed in a realistic calculation time. Can't. According to each of the above-described embodiments, it is possible to limit the candidates for the countermeasure means and reduce the calculation amount of the parameter calculation of the countermeasure means. As a result, it is possible to efficiently generate and display a countermeasure plan in which the performance deterioration is not reproduced when the load is reproduced for the past period.

過去の期間は、指定期間等に対応する。ボトルネック負荷データは、ボトルネックリソースの性能データ等に対応する。関連負荷データは、関連リソースの性能データ等に対応する。ボトルネック負荷範囲は、ボトルネックリソースの性能データの平均値に対する偏差がボトルネック負荷偏差しきい値以下である範囲等に対応する。関連負荷範囲は、関連リソースの性能データの平均値に対する偏差が関連負荷偏差しきい値以下である範囲等に対応する。複数の対策方針候補は、対策方針１１０８３に示された値等に対応する。複数の対策手段候補は、対策手段１１０６２に示された値等に対応する。複数の判定結果は、関連リソース傾向１１０８４に示された値等に対応する。 The past period corresponds to the designated period, etc. The bottleneck load data corresponds to the performance data of the bottleneck resource. The related load data corresponds to the performance data of related resources. The bottleneck load range corresponds to the range in which the deviation from the average value of the performance data of the bottleneck resource is equal to or less than the bottleneck load deviation threshold value. The related load range corresponds to a range in which the deviation from the average value of the performance data of the related resource is equal to or less than the related load deviation threshold value. The plurality of countermeasure policy candidates correspond to the values and the like shown in the countermeasure policy 11083. The plurality of countermeasure measure candidates correspond to the values and the like shown in the countermeasure means 11062. The plurality of determination results correspond to the values and the like shown in the related resource tendency 11084.

以上、本発明の実施形態を説明したが、これは本発明の説明のための例示であって、本発明の範囲を上記構成に限定する趣旨ではない。本発明は、他の種々の形態でも実施する事が可能である。 Although the embodiments of the present invention have been described above, this is an example for explaining the present invention, and does not mean that the scope of the present invention is limited to the above configuration. The present invention can also be implemented in various other forms.

１０…計算機システム、１００…管理計算機、１１０…記憶資源、１２０…プロセッサ、２００…サーバ、２１０…プロセッサ、２２０…記憶資源、３００…ストレージ装置、３１０…プロセッサ、３２０…メモリ、３６０…記憶デバイス群、３７０…ボリューム、３８０…キャッシュメモリ、３９０…コントローラ、４００…表示装置、５００…通信ネットワーク、６００…通信ネットワーク、７００…ＶＭ、８００…ハイパバイザ 10 ... Computer system, 100 ... Management computer, 110 ... Storage resource, 120 ... Processor, 200 ... Server, 210 ... Processor, 220 ... Storage resource, 300 ... Storage device, 310 ... Processor, 320 ... Memory, 360 ... Storage device group , 370 ... Volume, 380 ... Cache Memory, 390 ... Controller, 400 ... Display Device, 500 ... Communication Network, 600 ... Communication Network, 700 ... VM, 800 ... Hyper Processor

Claims

It is a management method of a computer system using a management computer.
The management computer
The relative causes a is the bottleneck resource degradation of the computer system of the prior period among a plurality of resources in the computer system, the bottleneck load data is time-series data of the load of the bottleneck resource of the period Acquired,
For the related resource that affects the load of the bottleneck resource among the plurality of resources, the related load data which is the time series data of the load of the related resource during the period is acquired.
When the bottleneck load range is calculated based on the bottleneck load data and all the values in the bottleneck load data are within the bottleneck load range, the tendency of the bottleneck load data is stable. Judge,
When it is determined that the tendency of the bottleneck load data is stable, the performance degradation can be resolved from among a plurality of countermeasure policy candidates indicating the policy for determining the parameters necessary for operating the computer system. Select a countermeasure policy and
Based on the countermeasure policy and the tendency of the related load data, the countermeasure means for resolving the performance deterioration is selected from a plurality of countermeasure measure candidates indicating the operation of the computer system.
The parameters of the countermeasure means are determined based on the countermeasure policy .
Management method.

Said management computer, if the tendency of the previous SL bottleneck load data is determined to be stable, based on the result of the determination of the bottleneck tendency of the load data, selects the countermeasure policy,
The management method according to claim 1.

When the management computer determines that the tendency of the bottleneck load data is stable, the management computer selects a countermeasure policy candidate that reduces the load of the bottleneck resource over the entire period as the countermeasure policy. ,
The management method according to claim 2.

When the management computer determines that the tendency of the bottleneck load data is not stable, the management computer selects a countermeasure policy candidate for reducing the peak value of the load of the bottleneck resource within the period as the countermeasure policy.
The management method according to claim 3.

The management computer
A plurality of related load data are acquired for each of the plurality of related resources that affect the bottleneck resource among the plurality of resources .
The related load range is calculated based on each related load data, and when all the values in each related load data are within the corresponding related load range, it is determined that the tendency of the corresponding related load data is stable. ,
When it is determined that the tendency of the corresponding related load data is stable, the countermeasure means is selected based on the result of the determination of the tendency of the plurality of related load data.
The management method according to claim 4 .

The first countermeasure measure candidate among the plurality of countermeasure means candidates indicates that the related resource in the bottleneck resource is moved to another resource of the same type as the bottleneck resource in the computer system.
In the management computer , the tendency of the bottleneck load data is determined to be stable, and the number of related load data determined to be stable among the plurality of related load data is not stable. When the number is less than the number of related load data determined to be, a countermeasure measure candidate other than the first countermeasure means candidate is selected as the countermeasure means from the plurality of countermeasure means candidates.
The management method according to claim 5 .

The second countermeasure candidate among the plurality of countermeasure candidates indicates that the load on the related resource is reduced.
The management computer determines that the tendency of the bottleneck load data is not stable, and that the number of related load data determined to be stable among the plurality of related load data is not stable. When the number of related load data is less than the number of related load data, a countermeasure measure candidate other than the second countermeasure means candidate is selected as the countermeasure means from the plurality of countermeasure means candidates.
The management method according to claim 6 .

When the bottleneck resource is a pool and the related resource is a VM (virtual machine), the first countermeasure candidate is to move the data of the VM to another pool.
The management method according to claim 7 .

When the bottleneck resource is the pool and the related resource is the VM, the second countermeasure candidate is to limit the I / O amount of the VM.
The management method according to claim 8 .

The management computer
For each tendency of the related load data, a plurality of countermeasures are determined by selecting the countermeasure means using the corresponding related resource based on the countermeasure policy and the tendency of the related load data .
For each trend prior SL-related load data, based on the security objectives, to determine the parameters of the corresponding protective means,
The management method according to claim 5 .

It said management computer, and said is the determined determined measure means parameters, Ru is displayed on the display device,
The management method according to claim 1.

Memory and
With the processor connected to the memory and computer system
With
The processor
Of the plurality of resources in the computer system, the bottleneck load data which is the time-series data of the load of the bottleneck resource in the period is applied to the bottleneck resource which is the cause of the performance deterioration of the computer system in the past period. Acquired,
For the related resource that affects the load of the bottleneck resource among the plurality of resources, the related load data which is the time series data of the load of the related resource during the period is acquired.
When the bottleneck load range is calculated based on the bottleneck load data and all the values in the bottleneck load data are within the bottleneck load range, the tendency of the bottleneck load data is stable. Judge,
When it is determined that the tendency of the bottleneck load data is stable, the performance degradation can be resolved from among a plurality of countermeasure policy candidates indicating the policy for determining the parameters necessary for operating the computer system. Select a countermeasure policy and
Based on the countermeasure policy and the tendency of the related load data, the countermeasure means for resolving the performance deterioration is selected from a plurality of countermeasure measure candidates indicating the operation of the computer system.
The parameters of the countermeasure means are determined based on the countermeasure policy.
Management calculator.