JP2018063518A5

JP2018063518A5 -

Info

Publication number: JP2018063518A5
Application number: JP2016200758A
Authority: JP
Filing date: 2016-10-12
Publication date: 2019-05-30
Anticipated expiration: 2036-10-12

Description

Management server, management method and program thereof

本発明は、計算機システムを管理するための管理サーバに関する。 The present invention relates to a management server for managing a computer system.

計算機システムで問題が発生したとき、計算機システム上で稼動するアプリケーションプログラムへの影響を分析する管理システムがある（例えば、特許文献１を参照）。特許文献１に記載された障害原因抽出装置は、記憶部と、相関破壊伝播検出部とを含む。ここで、記憶部は、システムにおける複数種別の性能値を含む性能情報の時系列をもとに生成された、入力となる種別の性能値から出力となる種別の性能値への変換を行う相関関数を１以上含む相関モデルを記憶する。相関破壊伝播検出部は、一つの種別である基点の性能値から直接または間接的に変換可能であり、かつ、相関関数の入力とならない、他の種別の性能値への変換に使用される相関関数である基点伝播関数の数をもとに、基点の性能値がシステムに与える影響度を算出する。 When a problem occurs in a computer system, there is a management system that analyzes the influence on an application program running on the computer system (see, for example, Patent Document 1). The failure cause extraction device described in Patent Document 1 includes a storage unit and a correlation breakdown propagation detection unit. Here, the storage unit is a correlation that performs conversion from the performance value of the input type to the performance value of the output type generated based on the time series of performance information including performance values of multiple types in the system. Store a correlation model that contains one or more functions. The correlation destruction propagation detection unit can be directly or indirectly converted from the performance value of one type of base point, and is used for conversion to another type of performance value that does not become an input of the correlation function Based on the number of base point propagation functions that are functions, calculate the degree of influence that the base point performance value has on the system.

国際公開第１１／０９９３４１号WO 11/099341

アプリケーションプログラムは様々な処理を含むため、アプリケーションプログラムの管理者は、計算機システム上で問題が発生した場合、影響が生じる処理とその理由を特定し、迅速に対策を講じたい。しかし、前述した特許文献１では、計算機システム上で問題が発生した場合、アプリケーション単位での性能と計算機システムの稼働データとの相関の高さから影響の有無を判定するため、影響が生じる処理とその理由の特定が困難である。 Since the application program includes various processes, the administrator of the application program would like to identify the process that has an influence and the reason for it and take measures promptly if a problem occurs on the computer system. However, in Patent Document 1 described above, when a problem occurs on the computer system, the influence of the processing is generated because the presence or absence of the influence is determined from the level of the correlation between the performance on an application basis and the operation data of the computer system. It is difficult to identify the reason.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、アプリケーションプログラムを実行する計算機システムを管理する管理サーバであって、プログラムを実行するプロセッサと、前記プログラムを格納する記憶装置とを備え、前記プロセッサは、前記アプリケーションプログラムに含まれる処理の特性を抽出し、前記処理の特性と前記計算機システムの構成要素との相関の分析によって、前記アプリケーションプログラムに含まれる処理と前記計算機システムの構成要素との相関を特定し、前記プロセッサは、前記特定された相関に基づいて、前記計算機システムの稼働状況と前記アプリケーションプログラムに含まれる処理の特性との関係を特定する。 The following is a representative example of the invention disclosed in the present application. That is, a management server that manages a computer system that executes an application program, comprising: a processor that executes the program; and a storage device that stores the program, wherein the processor determines the characteristics of processing included in the application program. The correlation between the process included in the application program and the component of the computer system is identified by extracting and analyzing the correlation between the characteristic of the process and the component of the computer system, and the processor identifies the correlation between the component and the process of the computer system. Based on the correlation, the relationship between the operation status of the computer system and the characteristic of the process included in the application program is specified.

本発明の代表的な実施の形態によれば、アプリケーションプログラムの中で影響を受ける処理を特定できる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the representative embodiment of the present invention, it is possible to identify the affected process in the application program. Problems, configurations, and effects other than those described above will be clarified by the description of the following embodiments.

第一の実施形態にかかる計算機システムにおける影響分析処理の効果を示す図である。It is a figure which shows the effect of the influence analysis process in the computer system concerning 1st embodiment. 第一の実施例におけるシステムの構成例を示すブロック図である。It is a block diagram showing an example of composition of a system in a 1st example. 第一の実施例における処理特性の一例を示す図である。It is a figure which shows an example of the processing characteristic in a 1st Example. 第一の実施例における相関データの一例を示す図である。It is a figure which shows an example of the correlation data in a 1st Example. 第一の実施例における計算機システムの構成情報の一例を示す図である。It is a figure which shows an example of the structure information of the computer system in a 1st Example. 第一の実施例におけるアプリケーションの構成情報の一例を示す図である。It is a figure which shows an example of the structure information of the application in a 1st Example. 第一の実施例における相関学習処理のフローチャートである。It is a flowchart of the correlation learning process in a 1st Example. 第一の実施例における影響分析処理のフローチャートである。It is a flowchart of the influence analysis process in a 1st Example. 第一の実施例における関連アプリの特定処理の概要を示す図である。It is a figure which shows the outline | summary of the specific process of the related application in a 1st Example. 第一の実施例における処理毎の影響分析処理の概要を示す図である。It is a figure which shows the outline | summary of the influence analysis process for every process in a 1st Example. 第一の実施例における管理サーバ出力する画面の例を示す図である。It is a figure which shows the example of the screen which the management server output in a 1st Example outputs. 第二の実施例におけるシステムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the system in a 2nd Example. 第二の実施例における原因分析処理のフローチャートである。It is a flowchart of a cause analysis process in the second embodiment.

以後の説明では「ａａａテーブル」、「ａａａリスト」、「ａａａＤＢ（Ｄａｔａｂａｓｅ）」、（ａａａは任意の文字列）等の表現にて本実施例の情報を説明するが、これら情報は必ずしもその形式で情報が保存されている必要は無く、テーブル、リスト、ＤＢ、キュー、等のデータ構造以外で表現されていてもよい。そのため、データ構造に依存しないことを示すために「ａａａテーブル」、「ａａａリスト」、「ａａａＤＢ」等について「ａａａ情報」と称することがある。 In the following description, the information of the present embodiment will be described in terms of expressions such as "aaa table", "aaa list", "aaaDB (Database)", (aaa is any character string), etc. The information does not have to be stored, and may be represented by other than a data structure such as a table, a list, a DB, and a queue. Therefore, the “aaa table”, the “aaa list”, the “aaa DB”, and the like may be referred to as “aaa information” to indicate that they do not depend on the data structure.

また、各情報の内容を説明する際に、「識別情報」、「識別子」、「名」、「名前」、「ＩＤ（ＩＤｅｎｔｉｆｉｃａｔｉｏｎ）」という表現を用いるが、これらについてはお互いに置換が可能である。 In addition, when describing the contents of each information, the expressions “identification information”, “identifier”, “name”, “name”, “ID (IDentification)” are used, but these can be mutually replaced. is there.

また、以後の説明では「プログラム」を主語として説明を行う場合があるが、プログラムはプロセッサによって実行されることで定められた処理をメモリ及び通信ポート（通信制御デバイス）を用いながら行うため、プロセッサを主語とした説明としてもよい。また、プログラムを主語として開示された処理は管理サーバ等の計算機、情報処理装置が行う処理としてもよい。また、プログラムの一部又は全ては専用ハードウェアによって実現されてもよい。 Further, in the following description, the term "program" may be used as the subject, but since the program performs processing defined by being executed by the processor using the memory and the communication port (communication control device), the processor The description may be based on the subject of Further, the processing disclosed with the program as the subject may be processing performed by a computer such as a management server or an information processing apparatus. Also, part or all of the program may be realized by dedicated hardware.

また、各種プログラムは、プログラム配布サーバや、計算機が読み取り可能な記憶メディアによって各計算機にインストールされてもよい。この場合、プログラム配布サーバは、プロセッサと記憶資源を含み、記憶資源はさらに配布プログラムと配布対象であるプログラムを記憶する。そして、配布プログラムをプロセッサが実行することで、プログラム配布サーバのプロセッサは、配布対象のプログラムを他の計算機に配布する。 In addition, various programs may be installed in each computer by a program distribution server or a storage medium readable by a computer. In this case, the program distribution server includes a processor and a storage resource, and the storage resource further stores a distribution program and a program to be distributed. Then, when the processor executes the distribution program, the processor of the program distribution server distributes the distribution target program to another computer.

また、計算機は入出力装置を有する。入出力装置の例としてはディスプレイと、キーボードと、ポインタデバイスと、タブレット端末と、スマートフォンとが考えられるが、これ以外のデバイスであってもよい。また、入出力デバイスの代替としてシリアルインタフェースやイーサーネットインタフェースを入出力デバイスとし、当該インタフェースにディスプレイ又はキーボード又はポインタデバイスを有する表示用計算機を接続し、表示用情報を表示用計算機に送信したり、入力用情報を表示用計算機から受信することで、表示用計算機で表示を行ったり、入力を受け付けることで入出力装置での入力及び表示を代替してもよい。なお、以後インタフェースのことをＩ／Ｆと記述することがある。 Also, the computer has an input / output device. As an example of the input / output device, a display, a keyboard, a pointer device, a tablet terminal, and a smartphone can be considered, but other devices may be used. As an alternative to the input / output device, a serial interface or Ethernet interface is used as the input / output device, a display computer having a display, keyboard or pointer device is connected to the interface, and display information is transmitted to the display computer, By receiving the information for input from the display computer, display may be performed by the display computer, or input and display in the input / output device may be substituted by receiving input. Hereinafter, the interface may be described as I / F.

また、計算機は通信Ｉ／Ｆを有する。通信Ｉ／Ｆの例としては、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）接続端子と、ＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）の接続端子と、無線通信の接続装置と、が考えられるが、これら以外のデバイスであってもよい。 Also, the computer has a communication I / F. As an example of communication I / F, LAN (Local Area Network) connection terminal, SAN (Storage Area Network) connection terminal, and wireless communication connection device can be considered, but devices other than these may be used. Good.

以後、情報処理システムを管理し、本実施例の表示用情報を表示する一つ以上の計算機の集合を管理システムと呼ぶことがある。管理用の計算機（以下、管理サーバ）が表示用情報を表示する場合は管理サーバが管理システムである、また、管理サーバと表示用計算機の組み合わせも管理システムである。また、管理処理の高速化や高信頼化のために複数の計算機で管理サーバと同等の処理を実現してもよく、この場合は当該複数の計算機（表示を表示用計算機が行う場合は表示用計算機も含む）が管理システムである。 Hereinafter, a set of one or more computers that manage the information processing system and display the display information of this embodiment may be called a management system. When the management computer (hereinafter, management server ) displays the display information, the management server is a management system, and a combination of the management server and the display computer is also a management system. In addition, in order to speed up the management processing and increase reliability, processing equivalent to that of the management server may be realized by a plurality of computers, and in this case, the plurality of computers (for display when the display computer performs display) The computer is also a management system.

＜問題解決処理の例＞
図１は、本発明の第一の実施形態にかかる計算機システムにおける影響分析処理の効果を示す図である。 <Example of problem solving process>
FIG. 1 is a diagram showing the effect of the impact analysis process in the computer system according to the first embodiment of the present invention.

計算機システム１は、本実施例が適用されない計算機システムの例である。計算機システム１は、管理されるサーバ装置２０３と、管理されるストレージ装置２０４と、サーバ装置２０３上で稼動するアプリケーションプログラム２５０と、各装置を管理する管理サーバ２０１と、管理される装置上で稼動するミドルウェアアプリケーションとアプリケーションとを含む。 The computer system 1 is an example of a computer system to which the present embodiment is not applied. The computer system 1 is operated on a managed server device 203, a managed storage device 204, an application program 250 operated on the server device 203, a management server 201 for managing each device, and a managed device Include middleware applications and applications.

管理サーバ２０１は、管理されるストレージ装置２０４における問題が発生を検知して、アプリケーション性能と管理されるＩＴリソースの稼働状況との相関分析によって、影響を受けるアプリケーションがアプリ２及びアプリ３であることを算出する。しかしながら、アプリ２及びアプリ３のどの処理に影響が生じるかは特定できない。 The management server 201 detects the occurrence of a problem in the managed storage apparatus 204, and the application that is affected is the application 2 and the application 3 by the correlation analysis between the application performance and the operation status of the managed IT resource. Calculate However, it can not be specified which process of the application 2 and the application 3 is affected.

また、新規のアプリケーションに関しては、アプリケーション性能とＩＴリソース稼働状況との相関に関する学習データが不十分なため、ＩＴリソースであるストレージ装置２０４で発生した問題の影響を受けるか否かを判定できない。ここで、計算機システム１での問題とは、何らかの異常の発生や、異常の予兆である。アプリケーション管理者は、ＩＴリソースで問題が発生した場合、アプリケーションの利用者に影響が生じる前に対処したいが、通常、アプリケーションは複数の処理を含むサービスを提供しており、同じＩＴリソースの問題であっても処理によって影響の有無や程度が異なる。例えば、ディスクＩＯがボトルネックである場合、長期間の傾向データを検索するような大量のディスクアクセスが発生する処理は影響を受けるが、インメモリシミュレーションのようなディスクアクセスが発生しない処理は影響を受けない。 In addition, with regard to a new application, it is not possible to determine whether or not it is affected by a problem that occurs in the storage device 204 that is an IT resource, because learning data regarding the correlation between application performance and IT resource operating status is insufficient. Here, the problem in the computer system 1 is the occurrence of an abnormality or a sign of an abnormality. Application administrators want to deal with problems in IT resources before they affect the users of the application, but generally, applications provide services that include multiple processes, and the same IT resource problems Depending on the process, the presence or absence of the influence is different. For example, if the disk IO is a bottleneck, processing that generates a large amount of disk access that searches long-term trend data is affected, but processing that does not generate a disk access such as in-memory simulation does not. I do not receive it.

そのため、計算機システム１では、アプリケーション単位で影響の有無が分かっても、影響が生じる処理が分からず、必要な対応が判断できず、有効な対策を講ずるまでに時間がかかる。また、新規のアプリケーションに関しては、相関に関する学習データが不十分なため、相関分析によって影響の有無や程度を分析できない。 Therefore, in the computer system 1, even if the presence or absence of the influence is known on an application-by-application basis, the processing that exerts the influence is not known, the necessary response can not be determined, and it takes time to take effective measures. In addition, for new applications, correlation analysis can not analyze the presence or absence or extent of the impact because the learning data on correlation is insufficient.

計算機システム２は、本実施例が適用された計算機システムの例である。従来の計算機システム１ではアプリケーション性能とＩＴリソースの稼働状況の相関により影響を分析するが、計算機システム２ではアプリケーションから処理特性を抽出し、処理特性とＩＴリソースとの相関により影響を分析する。これにより、ＩＴリソースで問題が発生した時に、アプリケーションの処理単位での影響の有無を分析できる。また、アプリケーションに含まれる処理と処理の特性とを関連付けることによって、新規のアプリケーションでも、既存の相関関係を用いて影響を分析できる。 The computer system 2 is an example of a computer system to which the present embodiment is applied. In the conventional computer system 1, the influence is analyzed by the correlation between the application performance and the operating state of the IT resource, but in the computer system 2, the processing characteristic is extracted from the application, and the influence is analyzed by the correlation between the processing characteristic and the IT resource. In this way, when a problem occurs in an IT resource, it is possible to analyze the presence or absence of an influence on the processing unit of the application. Also, by associating the processing included in the application with the processing characteristics, even new applications can analyze the impact using existing correlations.

前述の通り、本実施例の方法は、影響分析処理において、アプリケーションの処理の特性を利用し、処理単位での影響の有無を判定できる効果を有する。 As described above, the method of the present embodiment has the effect of being able to determine the presence or absence of an influence in processing units in the influence analysis processing by using the characteristics of the processing of the application.

なお、図１に例示するシステム構成は、説明の簡易化のために、図２以降で説明する詳細なシステム構成の一部を省略し、一部を誇張して表している。 Note that the system configuration illustrated in FIG. 1 omits part of the detailed system configuration described in FIG.

＜実施例１＞
図２は、本発明の第一の実施例におけるシステムの構成例を示すブロック図である。 Example 1
FIG. 2 is a block diagram showing a configuration example of a system in the first embodiment of the present invention.

第一の実施例のシステムは、管理サーバ２０１と、管理サーバ２０１が管理する計算機システムを含む。管理サーバ２０１が管理する計算機システムは、サーバ装置２０３と、ストレージ装置２０４と、ネットワーク装置２０５と、クラウドサービス２０６とが、ネットワーク２０７やＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）を介して相互に接続されて構成される。 The system of the first embodiment includes a management server 201 and a computer system managed by the management server 201. The computer system managed by the management server 201 is configured by mutually connecting a server device 203, a storage device 204, a network device 205, and a cloud service 206 via a network 207 or a SAN (Storage Area Network). Ru.

また、例えば、管理サーバ２０１や、管理サーバ２０１内のプロセッサ２１１、主記憶デバイス２１２など、システムに含まれる各コンポーネントは、それぞれ一つずつしか図示していないが、それぞれが複数設けられてもよい。 For example, although each component included in the system such as the management server 201, the processor 211 in the management server 201, and the main storage device 212 is illustrated only one each, a plurality of each may be provided. .

管理サーバ２０１は、プロセッサ２１１と、主記憶デバイス２１２と、補助記憶装置２１３と、通信インタフェースとを有するコンピュータである。 The management server 201 is a computer having a processor 211, a main storage device 212, an auxiliary storage device 213, and a communication interface.

プロセッサ２１１は、主記憶デバイス２１２に格納されたプログラムを実行する。具体的には、プロセッサ２１１は、相関学習処理プログラム２２０と、影響分析処理プログラム２２１と、イベント検出プログラム２２２とを実行する。なお、相関学習処理プログラム２２０と、影響分析処理プログラム２２１と、イベント検出プログラム２２２との一部又は全部は、プロセッサ２１１で実行される代わりに、例えば集積回路（Field-Programmable Gate Array）などのハードウェアで実現してもよい。 The processor 211 executes a program stored in the main storage device 212. Specifically, the processor 211 executes a correlation learning processing program 220, an influence analysis processing program 221, and an event detection program 222. It should be noted that part or all of the correlation learning processing program 220, the impact analysis processing program 221, and the event detection program 222 are executed by the processor 211 instead of being executed by the processor 211, for example, hardware such as an integrated circuit (field-programmable gate array). It may be realized by hardware.

主記憶デバイス２１２は、不揮発性の記憶デバイスであるＲＯＭ及び揮発性の記憶デバイスであるＲＡＭを含む。ＲＯＭは、不変のプログラム（例えば、ＢＩＯＳ）などを格納する。ＲＡＭは、ＤＲＡＭ（Dynamic Random Access Memory）のような高速かつ揮発性の記憶デバイスであり、プロセッサ２１１が実行するプログラム及びプログラムの実行時に使用されるデータを一時的に格納する。具体的には、主記憶デバイス２１２は、相関学習処理プログラム２２０と、影響分析処理プログラム２２１と、イベント検出プログラム２２２と、対策生成プログラム２２４とを記憶する。 The main storage device 212 includes a ROM, which is a non-volatile storage device, and a RAM, which is a volatile storage device. The ROM stores an immutable program (for example, BIOS). The RAM is a high-speed and volatile storage device such as a dynamic random access memory (DRAM), and temporarily stores a program executed by the processor 211 and data used when the program is executed. Specifically, the main storage device 212 stores a correlation learning processing program 220, an influence analysis processing program 221, an event detection program 222, and a countermeasure generation program 224.

補助記憶装置２１３は、例えば、磁気記憶装置（ＨＤＤ）、フラッシュメモリ（ＳＳＤ）等の大容量かつ不揮発性の記憶デバイスによって構成され、プログラムの実行時に使用されるデータを格納する。補助記憶装置２１３は、外部装置へのＩ／Ｆ（不図示）や通信インタフェース２１６を介して管理サーバ２０１に接続される外部の記憶装置、例えばストレージ装置２０４でもよい。また、主記憶デバイス２１２と補助記憶デバイス２１３とが同一デバイスでもよい。 The auxiliary storage device 213 is constituted by, for example, a large-capacity and non-volatile storage device such as a magnetic storage device (HDD) or a flash memory (SSD), and stores data used when executing a program. The auxiliary storage device 213 may be an external storage device connected to the management server 201 via an I / F (not shown) to an external device or the communication interface 216, such as the storage device 204. Further, the main storage device 212 and the auxiliary storage device 213 may be the same device.

具体的には、補助記憶装置２１３は、処理特性２３１と、相関データ２３２と、構成情報２３３と、稼動データ２３４と、対策データ２３５とを格納する。処理特性２３１と、相関データ２３２と、構成情報２３３と、稼動データ２３４とは、それぞれ異なる補助記憶装置２１３に格納されてもよいし、その一部又は全部が主記憶デバイス２１２に格納されてもよいし、ネットワーク２０７を介して接続される別のサーバ装置２０３上の主記憶デバイス２５２や補助記憶装置２５４に格納されてもよいし、ネットワーク２０７を介して接続される別のストレージ装置２０４の補助記憶装置２６３に格納されてもよい。 Specifically, the auxiliary storage device 213 stores processing characteristics 231, correlation data 232, configuration information 233, operation data 234, and countermeasure data 235. The processing characteristics 231, the correlation data 232, the configuration information 233, and the operation data 234 may be stored in different auxiliary storage devices 213, or a part or all of them may be stored in the main storage device 212. It may be stored in the main storage device 252 or the auxiliary storage device 254 on another server device 203 connected via the network 207, or may be auxiliary to another storage device 204 connected via the network 207. It may be stored in the storage device 263.

処理特性２３１は、処理の種類の違いや、処理するデータ量や、扱うデータの種類などの処理の特徴量の情報である、例えば、処理特性２３１は、データベースに対する検索処理であるか、登録処理であるか、更新処理であるか、削除処理であるかなどの処理の種類でよい。処理特性２３１の詳細は図３で後述する。 The processing characteristic 231 is information on the feature amount of the processing such as the difference in the type of processing, the amount of data to be processed, the type of data to be handled, etc. It may be a type of processing such as whether the processing is update processing or deletion processing. Details of the processing characteristic 231 will be described later with reference to FIG.

相関データ２３２は、管理サーバが監視する計算機システムの監視情報間の相互の関係の情報である。例えば、相関データ２３２は、アプリケーションプログラムの応答時間とサーバ装置２０３のプロセッサの使用率との相互の関係の情報でよい。相関データ２３２の詳細は図４で後述する。 The correlation data 232 is information on the mutual relationship between monitoring information of the computer system monitored by the management server. For example, the correlation data 232 may be information on the correlation between the response time of the application program and the utilization of the processor of the server device 203. Details of the correlation data 232 will be described later with reference to FIG.

また、相関データ２３２は、管理サーバ２０１の稼動データ間の相関を含んでもよい。例えば、相関データ２３２は、サーバ装置２０３のアプリケーションプログラム２５０の応答時間が遅いときにクラウドサービス２０６の応答時間も遅いという関係や、ストレージ装置２０４の論理ボリューム２６０のＩＯＰＳが高いときにアプリケーションプログラム２５０のジョブの実行時間が遅いという関係や、サーバ装置２０３のアプリケーションプログラムの単位時間あたりのログ件数が多いときにアプリケーションプログラムの応答時間が遅いという関係の情報でよい。なお、相関データ２３２は、管理サーバ２０１が管理する計算機システムで使用されるデータ間の相関が分かればよい。この相関は、相関式の形式でもよいし、相関度が高い、低いなどの定性的な値でもよい。 Also, the correlation data 232 may include the correlation between the operation data of the management server 201. For example, the correlation data 232 has a relationship that the response time of the cloud service 206 is slow when the response time of the application program 250 of the server device 203 is slow, or when the IOPS of the logical volume 260 of the storage device 204 is high. The information may be information indicating that the execution time of the job is slow or that the response time of the application program is slow when the number of logs per unit time of the application program of the server apparatus 203 is large. The correlation data 232 may be correlation between data used in a computer system managed by the management server 201. This correlation may be in the form of a correlation equation, or may be a qualitative value such as high or low degree of correlation.

構成情報２３３は、管理される装置間の接続関係や、管理されるアプリケーションプログラムと管理される装置との関係の情報である。構成情報２３３の詳細は図５、図６で後述する。 The configuration information 233 is information on the connection relationship between the managed devices, and the relationship between the managed application program and the managed devices. Details of the configuration information 233 will be described later with reference to FIGS. 5 and 6.

稼動データ２３４は、管理サーバ２０１が管理する装置群や、当該装置で実行されるプログラムの稼動データである。例えば、アプリケーションプログラム２５０が出力する稼動ログや、仮想マシンの構成変更ログや、ストレージ装置２０４の論理ボリューム２６０の時系列のＩＯＰＳ実績データでよい。 Operating data 234, and device group management server 201 manages, in a working data of a program to be executed in the device. For example, it may be an operation log output by the application program 250, a configuration change log of a virtual machine, or time-series IOPS performance data of the logical volume 260 of the storage apparatus 204.

対策データ２３５は、影響範囲の分析結果とともに表示される対策案のデータである。 The countermeasure data 235 is data of a countermeasure plan displayed together with the analysis result of the influence range.

また、補助記憶装置２１３は、プロセッサ２１１が実行するプログラムを格納してもよい。すなわち、プログラムは、補助記憶装置２１３から読み出されて、主記憶デバイス２１２にロードされて、プロセッサ２１１によって実行される。 In addition, the auxiliary storage device 213 may store a program executed by the processor 211. That is, the program is read from the auxiliary storage device 213, loaded into the main storage device 212, and executed by the processor 211.

通信インタフェース２１６は、所定のプロトコルに従って、ネットワーク２０７に接続された他の装置（操作端末２０２、サーバ装置２０３など）との通信を制御するネットワークインタフェース装置（ＮＩＣ）である。 The communication interface 216 is a network interface device (NIC) that controls communication with other devices (such as the operation terminal 202 and the server device 203) connected to the network 207 according to a predetermined protocol.

管理サーバ２０１は、入力インタフェース及び出力インタフェースを有してもよい。入力インタフェースは、キーボードやマウスなどが接続され、管理者からの入力を受けるインタフェースである。出力インタフェースは、ディスプレイ装置やプリンタなどが接続され、管理サーバ２０１の状態やプログラムの実行結果を管理者が視認可能な形式で出力するインタフェースである。 The management server 201 may have an input interface and an output interface. The input interface is an interface to which a keyboard, a mouse and the like are connected and which receives an input from an administrator. The output interface is an interface to which a display device, a printer, and the like are connected, and which outputs the status of the management server 201 and the execution result of the program in a format that can be viewed by the administrator.

プロセッサ２１１が実行するプログラムは、リムーバブルメディア（ＣＤ−ＲＯＭ、フラッシュメモリなど）又はネットワークを介して管理サーバ２０１に提供され、非一時的記憶媒体である不揮発性の補助記憶装置２１３に格納される。このため、管理サーバ２０１は、リムーバブルメディアからデータを読み込むインタフェースを有するとよい。 The program executed by the processor 211 is provided to the management server 201 via removable media (CD-ROM, flash memory, etc.) or a network, and is stored in a non-volatile auxiliary storage device 213 which is a non-temporary storage medium. For this reason, the management server 201 may have an interface for reading data from removable media.

管理サーバ２０１は、物理的に一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。また、管理サーバ２０１上で実行されるプログラムは、同一の計算機上で別個のスレッドで動作してもよい。 The management server 201 is a computer system configured physically on one computer or on a plurality of logically or physically configured computers, and is a virtual system constructed on a plurality of physical computer resources. It may operate on a computer. Also, programs executed on the management server 201 may operate in separate threads on the same computer.

また、管理サーバ２０１には、操作端末２０２が接続されてもよい。操作端末２０２は、管理サーバ２０１を操作するコンピュータである。操作端末２０２は、入出力装置２４１を有する。入出力装置２４１は、管理者の操作によりデータを入出力する装置（例えば、キーボード、マウス、ディスプレイ装置、プリンタなど）である。入出力装置２４１に入力されたデータは、ネットワーク２０７を介して管理サーバ２０１に送信される。出力装置２４２は、管理サーバ２０１からのデータを出力するディスプレイ装置やプリンタなどである。 Also, the operation terminal 202 may be connected to the management server 201. The operation terminal 202 is a computer that operates the management server 201. The operation terminal 202 has an input / output device 241. The input / output device 241 is a device (for example, a keyboard, a mouse, a display device, a printer, etc.) that inputs and outputs data according to an operation of the administrator. The data input to the input / output device 241 is transmitted to the management server 201 via the network 207. The output device 242 is a display device, a printer, or the like that outputs data from the management server 201.

サーバ装置２０３は、管理サーバ２０１に監視されるコンピュータであり、アプリケーションプログラムを実行する。サーバ装置２０３は、アプリケーションプログラム２５０と、プロセッサ２５１と、主記憶デバイス２５２と、補助記憶装置２５３とを含む。 The server device 203 is a computer monitored by the management server 201 and executes an application program. The server device 203 includes an application program 250, a processor 251, a main storage device 252, and an auxiliary storage device 253.

アプリケーションプログラム２５０は、例えば、資材発注アプリケーションや電子商取引アプリケーションなどの業務サービスを提供するアプリケーションや、業務サービスを提供するアプリケーションが内部的に利用するアプリケーション（データベースなど）や、仮想マシンを提供するアプリケーション（Ｈｙｐｅｒｖｉｓｏｒなど）や、コンテナサービスを提供するＯＳなど、様々なアプリケーションプログラムを含む。サーバ装置２０３は、アプリケーションプログラムを実行できればよく、例えば、仮想計算機やコンテナなど、必ずしも物理的なサーバ装置の形態でなくてもよい。そのため、サーバ装置２０３は、必ずしも、物理的なプロセッサ２５１や主記憶デバイス２５２や補助記憶装置２５３を含まなくてもよいし、不図示のコンポーネントを含んでもよい。 The application program 250 may be, for example, an application that provides business services such as a material ordering application or an electronic commerce application, an application (such as a database) that is internally used by an application that provides business services, and an application that provides virtual machines ( Includes various application programs such as Hypervisor and OS that provides container service. The server device 203 may execute an application program, and may not necessarily be in the form of a physical server device, such as a virtual computer or a container. Therefore, the server device 203 may not necessarily include the physical processor 251, the main storage device 252, and the auxiliary storage device 253, or may include components (not shown).

ストレージ装置２０４は、管理サーバ２０１に管理される装置であり、サーバ２０３上で動作するプログラムや、管理サーバ２０１上で動作するプログラムが利用する記憶領域を提供する。ストレージ装置２０４は、論理ボリューム２６０と、通信Ｉ／Ｆ２６１と、ＩＯ処理ユニット２６２と、補助記憶装置２６３とを有する。 The storage device 204 is a device managed by the management server 201, and provides a program operating on the server 203 and a storage area used by a program operating on the management server 201. The storage device 204 has a logical volume 260, a communication I / F 261, an IO processing unit 262, and an auxiliary storage device 263.

補助記憶装置２６３は複数の不揮発性記憶デバイスを、例えばＲＡＩＤ（ＲｅｄｕｎｄａｎｔＡｒｒａｙｓｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ）などによって冗長化して、論理ボリューム２６０として提供してもよい。また、複数のＲＡＩＤ化された補助記憶装置２６３が仮想的に一つの補助記憶装置２６３として制御されてもよい。論理ボリューム２６０は、補助記憶装置２６３の記憶領域によって構成される。論理ボリューム２６０へのデータは、通信Ｉ／Ｆ２６１を経由して入出力され、ＩＯ処理ユニット２６２がデータの入出力を制御する。ストレージ装置２０４は、例えば主記憶デバイスなど、不図示のコンポーネントを含んでもよい。 The auxiliary storage device 263 may provide a plurality of non-volatile storage devices as a logical volume 260, for example, with redundancy by means of RAID (Redundant Arrays of Inexpensive Disks) or the like. Also, a plurality of RAID-set auxiliary storage devices 263 may be virtually controlled as one auxiliary storage device 263. The logical volume 260 is configured by the storage area of the auxiliary storage device 263. Data to the logical volume 260 is input / output via the communication I / F 261, and the IO processing unit 262 controls input / output of data. The storage device 204 may include components not shown, such as a main storage device.

ネットワーク装置２０５は、管理サーバ２０１に管理される装置であり、複数の通信Ｉ／Ｆ２７０を有する。ネットワーク装置２０５は、計算機システム中で、複数の装置を接続して、データを転送する。例えば、ネットワーク装置２０５は、複数のサーバ装置２０３間を接続したり、複数のストレージ装置２０４間を接続したり、サーバ装置２０３とストレージ装置２０４とを接続する。代表的なネットワーク装置として、例えばＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）スイッチやＦＣ（ＦｉｂｅｒＣｈａｎｎｅｌ）スイッチがあるが、ネットワーク装置２０５は他の種類の装置でもよい。 The network device 205 is a device managed by the management server 201, and has a plurality of communication I / Fs 270. The network device 205 connects a plurality of devices in a computer system to transfer data. For example, the network device 205 connects the plurality of server devices 203, connects the plurality of storage devices 204, and connects the server device 203 and the storage device 204. As a typical network device, for example, there are an IP (Internet Protocol) switch and an FC (Fiber Channel) switch, but the network device 205 may be another type of device.

クラウドサービス２０６は、管理サーバ２０１に管理される仮想的な装置であり、インターネットを介してサービスを提供するコンピュータである。クラウドサービス２０６の代表的な例としては、インターネットを介して提供されるサーバ装置２０３の機能や、インターネットを介して提供されるストレージ装置２０４の機能があるが、他のサービスを提供する機能でもよい。 The cloud service 206 is a virtual device managed by the management server 201, and is a computer that provides a service via the Internet. Typical examples of the cloud service 206 include the functions of the server apparatus 203 provided via the Internet and the functions of the storage apparatus 204 provided via the Internet, but may be a function for providing other services. .

なお、イベント検出プログラム２２２は、管理サーバ２０１ではなく、管理サーバ２０１が管理する計算機システム上に存在していてもよい。 The event detection program 222 may be present not on the management server 201 but on a computer system managed by the management server 201.

図３は、第一の実施例における処理特性２３１の一例を示す図である。 FIG. 3 is a view showing an example of the processing characteristic 231 in the first embodiment.

処理特性２３１は、アプリケーションプログラムの処理の特徴量を分類した情報が、例えばテーブル形式で保持されており、予め手動又は他のプログラムによって用意される。以後、処理特性と処理の特徴量との二通りの表現をすることがあるが、同じ情報を意味する。処理特性２３１は、ＩＤフィールド３０１と、処理内容フィールド３０２と、データ量フィールド３０３と、データ保存形式フィールド３０４とを含む。 The processing characteristic 231 is information in which the feature quantities of the processing of the application program are classified, for example, held in the form of a table, and is prepared in advance manually or by another program. Hereinafter, although there are cases where the processing characteristics and the feature quantities of processing are expressed in two ways, they mean the same information. The processing characteristic 231 includes an ID field 301, a processing content field 302, a data amount field 303, and a data storage format field 304.

ＩＤフィールド３０１は、処理特性の分類を一意に特定する識別情報を格納する。処理内容フィールド３０２と、データ量フィールド３０３と、データ保存形式フィールド３０４とは、アプリケーションによる処理の特徴量を格納する。処理の特徴量は、アプリケーションから抽出される。処理特性２３１は、例示したフィールドの一部を含まなくてもよいし、不図示の他のフィールドを含んでもよい。また、処理特性２３１に含まれるフィールドが複数のテーブルに分割して保存されてもよい。 The ID field 301 stores identification information that uniquely identifies the classification of processing characteristics. The processing content field 302, the data amount field 303, and the data storage format field 304 store feature amounts of processing by the application. Processing features are extracted from the application. The processing characteristic 231 may not include part of the illustrated fields, or may include other fields not shown. In addition, fields included in the processing characteristic 231 may be divided and stored in a plurality of tables.

図４は、相関データ２３２の一例を示す図である。 FIG. 4 is a view showing an example of the correlation data 232. As shown in FIG.

相関データ２３２は、処理特性により分類されるアプリケーションの性能と計算機システムの稼動データとの相関の情報が、例えばテーブル形式で保持されており、予め手動又は他のプログラムによって用意される。相関データ２３２は、ＩＤフィールド４０１と、ＳＶＣＰＵフィールド４０２と、ＳＶＭｅｍフィールド４０３と、ＳＴＬＵフィールド４０４とを含む。 The correlation data 232 is held, for example, in the form of a table, of correlation information between the performance of the application classified by the processing characteristic and the operation data of the computer system, and is prepared in advance manually or by another program. The correlation data 232 includes an ID field 401, an SV CPU field 402, an SV Mem field 403, and an ST LU field 404.

ＩＤフィールド４０１は、処理特性２３１の分類を一意に特定する識別情報であり、処理特性２３１のＩＤフィールド３０１と共通の値を格納する。ＳＶＣＰＵフィールド４０２と、ＳＶＭｅｍフィールド４０３と、ＳＴＬＵフィールド４０４とは、管理サーバ２０１が監視する計算機システムの稼動データの監視項目に関連する情報である。相関データ２３２は、例示したフィールドのうち一部のフィールドを含まなくてもよいし、不図示の他のフィールドを含んでもよい。また、これらのフィールドが複数のテーブルに分割して保存されてもよい。 The ID field 401 is identification information that uniquely identifies the classification of the processing characteristic 231, and stores a value common to the ID field 301 of the processing characteristic 231. The SV CPU field 402, the SV Mem field 403, and the ST LU field 404 are information related to monitoring items of operation data of the computer system monitored by the management server 201. The correlation data 232 may not include some of the illustrated fields, or may include other fields (not shown). Also, these fields may be divided and stored in a plurality of tables.

ＳＶＣＰＵフィールド４０２は、サーバ装置２０３のプロセッサ２１１の稼動データと処理特性２３１のアプリケーション性能との相関情報を格納する。また、ＳＶＭｅｍフィールド４０３は、サーバ装置２０３の主記憶デバイス２１２の稼動データと処理特性２３１のアプリケーション性能との相関情報を格納し、ＳＴＬＵフィールド４０４は、ストレージ装置２０４の論理ボリューム２６０の稼動データと処理特性２３１のアプリケーション性能との相関情報を格納する。ここでは簡易化のために、相関データを大中といったレベルを示す文字列で格納しているが、例えば相関式などの計算式の形で相関を保持してもよいし、相関度をクラスタリングなどの処理でクラス分けして算出したクラスの形で相関を保持してもよいし、その他の形式で相関を保持してもよい。 The SV CPU field 402 stores correlation information between operation data of the processor 211 of the server device 203 and application performance of the processing characteristic 231 . Further, the SV Mem field 403 stores correlation information between the operation data of the main storage device 212 of the server device 203 and the application performance of the processing characteristic 231 , and the ST LU field 404 shows the operation data of the logical volume 260 of the storage device 204. And correlation information of the application performance of the processing characteristic 231 is stored. Here, for simplification, the correlation data is stored as a character string indicating a level such as high or low, but for example, the correlation may be held in the form of a calculation formula such as a correlation formula, or the correlation degree may be clustered The correlation may be held in the form of a class calculated by classification in the process of (1), or the correlation may be held in another format.

また、論理ボリューム２６０の稼動データには、例えば応答時間やＩＯＰＳ（単位時間当たりの入出力処理リクエスト数）など、複数の監視項目が存在しているが、本実施例では簡易化のために省略して記載する。計算機システムの構成要素の監視項目ごとに、処理特性との相関データを保持してもよい。 Further, although there are a plurality of monitoring items such as response time and IOPS (the number of input / output processing requests per unit time) in the operation data of the logical volume 260, for example, this is omitted for simplification in this embodiment. To write. The correlation data with the processing characteristics may be held for each monitoring item of the component of the computer system.

次に、図５及び図６を用いて、前述した構成情報２３３の一例を説明する。構成情報２３３は、図５に示す計算機システムの構成情報テーブル５００と、図６に示すアプリケーションの構成情報テーブル６００とを含む。 Next, an example of the configuration information 233 described above will be described using FIGS. 5 and 6. The configuration information 233 includes the configuration information table 500 of the computer system shown in FIG. 5 and the configuration information table 600 of the application shown in FIG.

図５は、計算機システムの構成情報テーブル５００の一例を示す図である。 FIG. 5 is a diagram showing an example of the configuration information table 500 of the computer system.

計算機システムの構成情報テーブル５００は、管理サーバ２０１が管理する計算機システムの構成要素の論理的又は物理的な接続関係の情報が、例えばテーブル形式で保持されており、予め手動又は他のプログラムによって用意される。計算機システムの構成情報テーブル５００は、アプリフィールド５０１と、サーバフィールド５０２と、プロセッサフィールド５０３と、ストレージフィールド５０４と、論理ボリュームフィールド５０５とを含む。 The configuration information table 500 of the computer system holds logical or physical connection information of the components of the computer system managed by the management server 201, for example, in the form of a table, and is prepared in advance manually or by other programs. Be done. The computer system configuration information table 500 includes an application field 501, a server field 502, a processor field 503, a storage field 504, and a logical volume field 505.

アプリフィールド５０１は、アプリケーションを一意に特定するための識別情報を格納する。サーバフィールド５０２は、アプリケーションが稼動しているサーバ装置２０３を特定するための識別情報を格納する。プロセッサフィールド５０３は、アプリケーションを実行しているプロセッサ２１１を特定するための識別情報を格納する。ストレージフィールド５０４は、アプリケーションが利用するデータを保存するストレージ装置２０４を特定するための識別情報を格納する。論理ボリュームフィールド５０５は、アプリケーションが利用するデータを保存する論理ボリューム２６０を特定するための識別情報を格納する。 The application field 501 stores identification information for uniquely identifying an application. The server field 502 stores identification information for specifying the server apparatus 203 on which the application is operating . The processor field 503 stores identification information for identifying the processor 211 that is executing an application. The storage field 504 stores identification information for specifying the storage device 204 that stores data used by the application. The logical volume field 505 stores identification information for specifying the logical volume 260 storing data used by the application.

計算機システムの構成情報テーブル５００は、例示したフィールドの一部を含まなくてもよいし、不図示の他のフィールドを含んでもよい。また、これらのフィールドが複数のテーブルに分割して保存されてもよい。不図示の他のフィールドとして、例えば、クラウドサービス２０６に関するサービスのバージョン情報やＡＰＩ仕様のバージョンの情報が記録されてもよいし、ストレージ装置２０４の補助記憶装置２６３の記憶媒体の種類（ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）であるか、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｉｓｋ）であるか等）や、サーバ装置２０３のプロセッサ２５１の性能（動作周波数など）の情報が記録されてもよい。計算機システムの構成情報テーブル５００は、管理サーバ２０１が管理する計算機システムのコンポーネント、例えば、サーバ装置２０３のアプリケーションプログラム２５０や、ストレージ装置２０４の論理ボリューム２６０など、稼動データを取得する対象であれば、全てのコンポーネントのデータが記録されていることが望ましい。 The configuration information table 500 of the computer system may not include part of the illustrated fields, or may include other fields not shown. Also, these fields may be divided and stored in a plurality of tables. As another field (not shown), for example, service version information regarding the cloud service 206 or information of the API specification version may be recorded, or the type of storage medium of the auxiliary storage device 263 of the storage device 204 (HDD (Hard Information of whether the disk drive is a disk drive), a solid state disk (SSD), or the like, or the performance (such as an operating frequency) of the processor 251 of the server apparatus 203 may be recorded. Configuration information table 500 of the computer system, components of the computer system management server 201 manages, for example, an application program 250 of the server device 203, such as a logical volume 260 of the storage device 204, if the subject of acquiring operating data, It is desirable that data of all components be recorded.

図６は、アプリケーションの構成情報テーブル６００の一例を示す図である。 FIG. 6 is a diagram showing an example of the configuration information table 600 of the application.

アプリケーションの構成情報テーブル６００は、管理サーバ２０１が管理するアプリケーションにおいて実行される処理と処理特性との対応関係の情報が、例えばテーブル形式で保持されており、予め手動又は他のアプリケーションプログラムによって用意される。アプリケーションの構成情報テーブル６００は、ＩＤフィールド６０１と、アプリフィールド６０２と、処理フィールド６０３と、処理特性フィールド６０４とを含む。ＩＤフィールド６０１は、アプリケーションにおいて実行される処理を一意に特定する識別情報を格納する。アプリフィールド６０２は、処理がどのアプリケーションに含まれるのかを特定する識別情報を格納する。処理フィールド６０３は、処理が実行される目的を人が認識できる見出しを格納する。処理特性フィールド６０４は、処理の特性を格納する。一つの処理に複数の処理特性が対応する場合もある。アプリケーションの構成情報テーブル６００は、例示したフィールドの一部を含まなくてもよいし、不図示の他のフィールドを含んでもよい。また、これらのフィールドが複数のテーブルに分割して保存されてもよい。 The configuration information table 600 of the application holds, for example, in the form of a table, information on the correspondence between processing executed in the application managed by the management server 201 and processing characteristics, and is prepared in advance manually or by another application program. Ru. The application configuration information table 600 includes an ID field 601, an application field 602, a processing field 603, and a processing characteristic field 604. The ID field 601 stores identification information that uniquely identifies a process to be performed in the application. The application field 602 stores identification information specifying which application the process is included in. The processing field 603 stores a header that allows a person to recognize the purpose for which the processing is to be performed. The processing characteristic field 604 stores processing characteristics. Multiple processing characteristics may correspond to one processing. The application configuration information table 600 may not include part of the illustrated fields, or may include other fields not shown. Also, these fields may be divided and stored in a plurality of tables.

次に、管理サーバ２０１が実行する相関学習処理を説明する。相関学習処理は、管理サーバ２０１のプロセッサ２１１が相関学習処理プログラム２２０を実行することによって実現される処理である。 Next, the correlation learning process performed by the management server 201 will be described. The correlation learning process is a process implemented by the processor 211 of the management server 201 executing the correlation learning process program 220.

図７は、管理サーバ２０１が実行する相関学習処理の手順例を示すフローチャートである。 FIG. 7 is a flowchart illustrating an example of a procedure of correlation learning processing executed by the management server 201.

まず、相関学習処理が実行される際のトリガを説明する。相関学習処理は、管理者からの指示によって実行されてよい。管理者の指示は、操作端末２０２の入出力装置２４１又は管理サーバ２０１の不図示の入出力装置から入力される。また、管理サーバ２０１が他のプログラムの出力結果を受けて、相関学習処理を実行してもよい。また、相関学習処理は、所定のタイミングで実行されてもよい。所定のタイミングは、定期的（例えば、１時間ごと）であったり、スケジューラに指定された時刻の到来を契機でよい。また、管理サーバ２０１が、管理される計算機システムの構成変更（例えば、アプリケーションプログラムの更新、仮想マシンのマイグレーションなど）を検出した場合に相関学習処理を実行してもよい。また、管理サーバ２０１が、管理される計算機システムの挙動の変化（例えば、ユーザからアプリケーションへのアクセスの傾向の変化、アクセスするデータ量の変化、管理される計算機システムの稼動データが閾値を超過、管理される計算機システムの稼動データが特定のパターンに一致など）を検出した場合に相関学習処理を実行してもよい。また、管理者による障害対策の実行の完了を検出した場合に相関学習処理を実行してもよい。また、管理サーバ２０１が記憶している相関データに基づいて算出された予測値と実測値との差異を検出した場合に相関学習処理を実行してもよい。これらの相関学習処理の契機となる事象は、イベント検出プログラム２２２によって検出されるとよい。 First, a trigger when the correlation learning process is performed will be described. The correlation learning process may be performed according to an instruction from the administrator. The administrator's instruction is input from an input / output device 241 of the operation terminal 202 or an unshown input / output device of the management server 201. Further, the management server 201 may execute the correlation learning process in response to the output result of another program. Also, the correlation learning process may be performed at a predetermined timing. The predetermined timing may be periodic (for example, every hour) or may be triggered by the arrival of a time designated by the scheduler. Further, the correlation learning process may be executed when the management server 201 detects a configuration change (for example, update of an application program, migration of a virtual machine, etc.) of a managed computer system. In addition, the management server 201 changes the behavior of the managed computer system (for example, the change in the tendency of access from the user to the application, the change in the amount of accessed data, the operating data of the managed computer system exceeds the threshold value, The correlation learning process may be executed when operation data of the managed computer system detects a specific pattern, etc.). Further, the correlation learning process may be executed when it is detected that the administrator has completed the execution of the failure countermeasure. Further, the correlation learning process may be executed when the difference between the predicted value calculated based on the correlation data stored in the management server 201 and the actual measurement value is detected. An event that triggers these correlation learning processes may be detected by the event detection program 222.

図７に示すように、管理サーバ２０１は、処理特性の抽出処理（ステップＳ７０１）、構成情報の更新処理（ステップＳ７０２）及び相関の分析処理（ステップＳ７０３）を実行する。相関学習処理フロー７００は、不図示の他の処理ステップを含んでもよい。また、各処理ステップは、影響分析処理２２１が実行されるまでに１回以上実行さればよく、２回目以降に相関学習処理フロー７００が実行される場合は、図示する処理ステップの一部が実行されなくてもよい。また、相関学習処理フロー７００では、図示する処理ステップを実行する順序が異なってもよい。 As shown in FIG. 7, the management server 201 executes processing characteristic extraction processing (step S701), configuration information update processing (step S702), and correlation analysis processing (step S703). The correlation learning process flow 700 may include other process steps not shown. In addition, each processing step may be performed one or more times before the influence analysis processing 221 is performed, and when the correlation learning processing flow 700 is performed after the second time, a part of the processing steps illustrated is performed. It does not have to be done. Further, in the correlation learning process flow 700, the order of executing the illustrated process steps may be different.

処理特性の抽出処理（ステップＳ７０１）では、管理サーバ２０１は、アプリケーションが実行する処理の特徴量を抽出し、抽出した処理の特徴量を処理特性２３１に登録する。例えば、アプリケーションの稼動データから処理の実行時間を取得して、取得した実行時間を他のアプリケーションによりクラスタリングして特徴量を抽出する。また、データベースへのアクセスログから、処理が登録処理か、読み込み処理か、更新処理か、削除処理かを分類して特徴量を抽出してもよい。また、アプリケーションの稼動データからアクセスするデータ量を比較して特徴量を抽出してもよい。また、ユーザのアプリケーションのログイン履歴から、アプリケーションユーザ数を取得して特徴量を抽出してもよい。これらの特徴量の抽出処理は、手動で行ってもよい。処理特性の抽出処理（ステップＳ７０１）によって、処理の新たな特徴量が抽出できない場合、処理特性２３１に処理の特徴量が登録されなくてもよい。処理特性の抽出処理（ステップＳ７０１）の実行によって、新たにデータレコードが追加されてもよい。処理の新たな特徴量が検出され、データフィールドが増加してもよい。処理の特徴量の再計算によりデータフィールドが減少してもよい。 In processing characteristic extraction processing (step S 701), the management server 201 extracts feature amounts of processing executed by the application, and registers the extracted feature amounts of processing in the processing characteristics 231. For example, the execution time of the process is acquired from the operation data of the application, and the acquired execution time is clustered by another application to extract the feature value. Alternatively, the feature amount may be extracted from the access log to the database by classifying whether the process is a registration process, a read process, an update process, or a delete process. Also, the amount of data to be accessed may be compared from the operation data of the application to extract the feature amount. Alternatively, the number of application users may be acquired from the login history of the application of the user to extract the feature value. These feature quantity extraction processes may be performed manually. If a new feature of the process can not be extracted by the process of extracting the process characteristic (step S701), the feature of the process may not be registered in the process characteristic 231. A data record may be newly added by execution of the process characteristic extraction process (step S701). New features of the process may be detected and data fields may be increased. The data field may be reduced by recalculation of the feature quantity of the process.

構成情報の更新処理（ステップＳ７０２）では、管理サーバ２０１は、管理される計算機システムの構成情報を取得する。例えば、管理サーバ２０１は、仮想マシンがマイグレーションして、稼動するサーバ装置２０３が変更されたことを検出し、構成情報２３３に含まれる計算機システムの構成情報テーブル５００を更新する。また、アプリケーションが同じ処理を実行しても、データ量の増加に伴い処理特性が変化した場合、構成情報２３３に含まれるアプリケーションの構成情報テーブル６００の処理特性６０４を再度算出してもよい。また、アプリケーションの更新に伴い新機能が追加された場合、構成情報２３３に含まれるアプリケーションの構成情報テーブル６００に新たなデータレコードを追加してもよい。また、クラウドサービス２０６の更新に伴ってＡＰＩ仕様の変更を検出して、構成情報２３３に含まれる計算機システムの構成情報テーブル５００を更新してもよい。また、異常に例示した以外の処理を実行してもよい。 In the configuration information update process (step S702), the management server 201 acquires configuration information of the computer system to be managed. For example, the management server 201 detects that the virtual machine has migrated and the operating server apparatus 203 has been changed, and updates the configuration information table 500 of the computer system included in the configuration information 233. Further, even if the application executes the same processing, if the processing characteristic changes with the increase of the data amount, the processing characteristic 604 of the configuration information table 600 of the application included in the configuration information 233 may be calculated again. In addition, when a new function is added along with the update of the application, a new data record may be added to the configuration information table 600 of the application included in the configuration information 233. Further, a change in the API specification may be detected as the cloud service 206 is updated, and the configuration information table 500 of the computer system included in the configuration information 233 may be updated. In addition, processing other than that illustrated as abnormal may be executed.

相関の分析処理（ステップＳ７０３）では、管理サーバ２０１は、管理される計算機システムの稼動データと、処理特性２３１の特徴量を有するアプリケーションの処理性能との相関を分析し、相関データ２３２を更新する。例えば、ストレージ装置２０４の論理ボリュームへのＩＯＰＳが所定の値を超えた場合、処理特性２３１のＩＤ３０１がＣ２に該当するアプリケーションの処理群の応答時間が一様に増加する場合、ストレージ装置２０４の論理ボリュームのＩＯＰＳの増加と処理特性Ｃ２とを分析し、両者の相関度が高いという情報を記憶する。相関の分析処理は、手動又は他のプログラムによって実行されてもよい。相関の分析処理（ステップＳ７０３）で新たな相関関係が検出されなかった場合、相関データ２３２が更新されなくてもよい。また、相関の分析処理（ステップＳ７０３）では、相関データ２３２のデータフィールドに対応する相関を分析するのに十分な学習データが無い場合、例えば、相関データ２３２において図４に不図示のデータフィールドとしてストレージ装置２０４のＩＯ処理ユニット２６２に対応するＳＴＣＰＵフィールドが追加された場合に、相関を分析するために充分な量の稼動データ２３４として、ＩＯ処理ユニット及びアプリケーションプログラムの稼動データが存在しない場合、管理サーバ２０１が管理する計算機システムで他のプログラムによる処理を実行し、相関の分析に充分な稼動データを生成し、相関の分析処理（ステップＳ７０３）による効果を高めてもよい。 In the correlation analysis process (step S703), the management server 201 analyzes the correlation between the operation data of the managed computer system and the processing performance of the application having the feature amount of the processing characteristic 231, and updates the correlation data 232. . For example, when the IOPS to the logical volume of the storage apparatus 204 exceeds a predetermined value, when the response time of the processing group of the application corresponding to the ID 301 of the processing characteristic 231 corresponds to C2, the logic of the storage apparatus 204 increases. The increase in IOPS of the volume and the processing characteristic C2 are analyzed, and the information that the degree of correlation between the two is high is stored. The correlation analysis process may be performed manually or by another program. If a new correlation is not detected in the correlation analysis process (step S703), the correlation data 232 may not be updated. In the correlation analysis process (step S703), when there is not enough learning data to analyze the correlation corresponding to the data field of the correlation data 232, for example, the correlation data 232 is not shown in FIG. If the ST CPU field corresponding to the IO processing unit 262 of the storage device 204 has been added, as operation data 234 in an amount sufficient to analyze the correlation, if the operating data of the IO processing unit and the application program does not exist, The processing by another program may be executed by the computer system managed by the management server 201 to generate operation data sufficient for analysis of correlation, and the effect of the analysis processing of correlation (step S703) may be enhanced.

次に、管理サーバ２０１が実行する影響分析処理を説明する。影響分析処理は、管理サーバ２０１のプロセッサ２１１が影響分析処理プログラム２２１を実行することによって実現される処理である。 Next, the impact analysis process performed by the management server 201 will be described. The impact analysis process is a process implemented by the processor 211 of the management server 201 executing the impact analysis process program 221.

図８は、管理サーバ２０１が実行する影響分析処理の手順例を示すフローチャートである。影響分析処理は、管理サーバ２０１の入力装置２１４から入力される管理者からの指示によって実行されてもよい。また、影響分析処理は、他のプログラムからの指示を受けて、管理サーバ２０１が実行してもよい。また、管理サーバ２０１が管理する計算機システムが送信した問題発生の通知を、管理サーバ２０１は、通信インタフェース２１６を介して受信した場合に、影響分析処理を実行してもよい。 FIG. 8 is a flowchart illustrating an example of the procedure of the impact analysis process performed by the management server 201. The impact analysis process may be performed in accordance with an instruction from the administrator input from the input device 214 of the management server 201. Also, the influence analysis process may be executed by the management server 201 in response to an instruction from another program. Further, when the management server 201 receives, via the communication interface 216, the notification of the occurrence of a problem transmitted by the computer system managed by the management server 201, the management server 201 may execute the impact analysis process.

図８に示すように、管理サーバ２０１は、計算機システムの問題検出処理（ステップＳ８０１）、関連アプリの特定処理（ステップＳ８０２）、処理毎の影響分析処理（ステップＳ８０３）、対策の生成処理（ステップＳ８０４）及び対策の実行処理（ステップＳ８０５）を実行する。影響分析処理フロー８００は、不図示の他の処理ステップを含んでもよい。また、図示する処理ステップの一部が実行されなくてもよい。 As illustrated in FIG. 8, the management server 201 performs problem detection processing of the computer system (step S801), identification processing of related applications (step S802), impact analysis processing for each processing (step S803), and generation processing of measures (step S804) and the countermeasure execution process (step S805) are executed. The impact analysis processing flow 800 may include other processing steps not shown. In addition, some of the illustrated processing steps may not be performed.

計算機システムの問題検出処理（ステップＳ８０１）では、管理サーバ２０１は、管理される計算機システムで発生している問題を検出する。例えば、収集した計算機システムの稼動データと当該稼動データの閾値とを比較し、稼動データが閾値を超えている場合に、問題が発生したことを検出する。また、例えば、計算機システムの稼動データのテキストを解析し、「Ｅｒｒｏｒ」や「警告」などの特定の文字列が含まれる場合に、問題が発生したことを検出してもよい。また、例えば、稼動データの増加傾向から、数日以内に稼動データが閾値を超過することが予測される場合に、問題発生の予兆を検出してもよい。計算機システムの問題検出処理は、イベント検出プログラム２２２によって実行されるとよい。計算機システムの問題を検出した場合、問題が発生した箇所として、構成情報２３３の計算機システムの構成情報テーブル５００のレコードを特定可能な情報を出力する。例えば、名称がストレージ１であるストレージ装置２０４内の名称がＬＵ２である論理ボリュームで問題が発生したことが分かる情報である。 In the problem detection process of the computer system (step S801), the management server 201 detects a problem occurring in the managed computer system. For example, the collected operation data of the computer system is compared with the threshold of the operation data, and when the operation data exceeds the threshold, the occurrence of a problem is detected. Also, for example, the text of the operation data of the computer system may be analyzed to detect that a problem has occurred when a specific character string such as “Error” or “Warning” is included. Further, for example, from increasing the operating data, when the operating data within a few days exceeds the threshold value is predicted, it may detect a sign of the problem. The problem detection process of the computer system may be executed by the event detection program 222. When a problem of the computer system is detected, information which can specify a record of the configuration information table 500 of the computer system of the configuration information 233 is output as a place where the problem occurs. For example, the information indicates that a problem has occurred in a logical volume whose name is LU2 in the storage apparatus 204 whose name is Storage1.

関連アプリの特定（ステップＳ８０２）では、管理サーバ２０１は、問題が発生している計算機システムと関連するアプリケーションを特定する。管理サーバ２０１は、構成情報２３３の計算機システムの構成情報テーブル５００のうち、計算機システムの問題検出（ステップＳ８０１）で検出した問題が発生したフィールドに該当する計算機システムと接続関係にあるアプリフィールド５０１を抽出し、出力する。例えば、図５に図示する計算機システムの構成情報テーブル５００のうち、ストレージフィールド５０４がストレージ１であり、論理ボリュームフィールド５０５がＬＵ２に該当するのはアプリＡとアプリＢである。従って、ストレージ１に存在するＬＵ２の問題に関連のあるアプリケーションはアプリＡとアプリＢであることが分かる。 In the identification of the related application (step S802), the management server 201 identifies an application associated with the computer system in which the problem has occurred. Management server 201 of the computer configuration information table 500 in the system configuration information 233, the computer problem detection of app field 501 in the computer system problems detected at (step S801) corresponds to the field generated in the connection relationship Extract and output. For example, in the configuration information table 500 of the computer system illustrated in FIG. 5, the storage field 504 is the storage 1 and the logical volume field 505 corresponds to the LU 2 is the application A and the application B. Therefore, it can be understood that the applications related to the LU 2 problem existing in the storage 1 are the application A and the application B.

関連アプリの特定処理（ステップＳ８０２）の概要を図９に示す。図９に示すように、管理サーバが管理する計算機システムでは、アプリＡと、アプリＢと、アプリＣと、アプリＤと、アプリＥと、アプリＦとが稼動している。アプリＡと、アプリＢと、アプリＣと、アプリＤとはサーバ１で稼動しており、アプリＥと、アプリＦとがサーバ２で稼動している。また、アプリＡと、アプリＢと、アプリＣと、アプリＤと、アプリＥと、アプリＦとのいずれもストレージ１にデータを保存しており、アプリＣがストレージ１内の通信Ｉ／Ｆ１を利用しており、アプリＡとアプリＤとがストレージ１内の通信Ｉ／Ｆ２を利用しており、アプリＢがストレージ１内の通信Ｉ／Ｆ３を利用しており、アプリＥがストレージ１内の通信Ｉ／Ｆ４を利用しており、アプリＦがストレージ１内の通信Ｉ／Ｆ５を利用している。また、アプリＣとアプリＤとがＩＯ処理ユニット１を利用しており、アプリＡとアプリＢとアプリＥとアプリＦとがＩＯ処理ユニット２を利用している。また、アプリＣとアプリＤとが論理ボリュームＬＵ１上にデータを格納しており、アプリＡとアプリＢとが論理ボリュームＬＵ２上にデータを格納しており、アプリＥとアプリＦとが論理ボリュームＬＵ３上にデータを格納している。 The outline | summary of specific process (step S802) of a related application is shown in FIG. As shown in FIG. 9, in the computer system managed by the management server, an application A, an application B, an application C, an application D, an application E, and an application F are operating . The application A, the application B, the application C, and the application D are operating on the server 1, and the application E and the application F are operating on the server 2. In addition, all of application A, application B, application C, application D, application E, and application F store data in storage 1, and application C stores communication I / F 1 in storage 1. The application A and the application D use the communication I / F 2 in the storage 1, the application B uses the communication I / F 3 in the storage 1, and the application E is in the storage 1. The communication I / F 4 is used, and the application F uses the communication I / F 5 in the storage 1. Further, the application C and the application D use the IO processing unit 1, and the application A, the application B, the application E and the application F use the IO processing unit 2. Also, application C and application D store data on logical volume LU1, application A and application B store data on logical volume LU2, and application E and application F store logical volume LU3. Data is stored on the top.

また、論理ボリュームＬＵ１は、補助記憶装置１と補助記憶装置２と補助記憶装置３とを冗長化して構成されており、論理ボリュームＬＵ２は、補助記憶装置４と補助記憶装置５と補助記憶装置６とを冗長化して構成されており、論理ボリュームＬＵ３は、補助記憶装置７と補助記憶装置８と補助記憶装置９とを冗長化して構成されている。図示するようにアプリケーション毎に利用する計算機システムの構成要素の組み合わせが異なっている。関連アプリの特定処理（ステップＳ８０２）では、例えば、論理ボリュームＬＵ２で問題が発生した場合、ストレージ１を共有しているアプリケーションでもアプリＣとアプリＤとアプリＥとアプリＦとは、論理ボリュームＬＵ２と関連がなく、アプリＡとアプリＢとは、論理ボリュームＬＵ２と関連があることを特定するための処理である。 The logical volume LU1 is configured by making the auxiliary storage device 1, the auxiliary storage device 2 and the auxiliary storage device 3 redundant, and the logical volume LU2 has the auxiliary storage device 4, the auxiliary storage device 5 and the auxiliary storage device 6 , And the logical volume LU3 is configured by making the auxiliary storage device 7, the auxiliary storage device 8 and the auxiliary storage device 9 redundant. As illustrated, the combination of components of the computer system to be used is different for each application. In the related application identification process (step S802), for example, when a problem occurs in the logical volume LU2, even in the application sharing the storage 1, the application C, the application D, the application E, and the application F communicate with the logical volume LU2. There is no relation, and the application A and the application B are processes for specifying that they are related to the logical volume LU2.

処理毎の影響分析（ステップＳ８０３）では、計算機システムの問題検出（ステップＳ８０１）で出力された問題が発生している計算機システムの構成要素を特定するための識別情報と、関連アプリの特定（ステップＳ８０２）の出力である計算機システムの問題の影響をうけるアプリケーションを特定するための識別情報とを入力として受け付け、影響を受けるアプリケーションのうち、影響を受ける処理を特定するための識別情報を出力する。 In the impact analysis for each process (step S803), identification information for identifying the component of the computer system which has the problem output in the problem detection of the computer system (step S801) and identification of the related application (step S803) It receives as input the identification information for specifying the application affected by the problem of the computer system, which is the output of S802), and outputs the identification information for specifying the affected process among the affected applications.

処理毎の影響分析処理（ステップＳ８０３）では、管理サーバ２０１は、相関データ２３２のうち、問題が発生している計算機システムの構成要素と一致するフィールドの相関データを参照し、相関度が高い処理特性を特定することによって影響を受ける処理特性を算出する。相関度が高い処理特性の特定は、相関度が文字で記録されている場合は高い相関度を示す文字の処理特性を一致するかによって判定するとよいし、相関度が数値で記録されている場合は閾値との比較によって判定するとよいし、相関度が式で定められている場合は式によって算出された値が所定の条件に一致するかによって判定するとよい。 In the impact analysis process for each process (step S 803), the management server 201 refers to the correlation data of the field in the correlation data 232 that matches the component of the computer system in which the problem is occurring, and the process with a high degree of correlation. Calculate the processing characteristics that are affected by identifying the characteristics. When the degree of correlation is recorded as characters, it is preferable to determine whether the processing characteristics of characters showing a high degree of correlation match or when the degree of correlation is recorded as a numerical value. It is preferable to determine by comparing with a threshold, and when the degree of correlation is determined by an equation, it may be determined by whether the value calculated by the equation matches a predetermined condition.

次に、管理サーバ２０１は、構成情報２３３のアプリケーションの構成情報テーブル６００のうち、アプリフィールド６０２が影響を受けるアプリケーションと一致するレコードを取得し、取得したレコードのうち、影響を受ける処理特性と処理特性フィールド６０４が一致するレコードを、影響を受ける処理として出力する。 Next, the management server 201 acquires a record in the application configuration information table 600 of the configuration information 233 that matches the application affected by the application field 602, and among the acquired records, the affected processing characteristics and processing The records matching the property field 604 are output as affected processes.

例えば、図１０に例示するように、相関データ２３２を用いた場合、問題が発生している計算機システムの構成要素はストレージの論理ボリュームである、ストレージの論理ボリュームに対応するフィールドはＳＴＬＵフィールド４０４であり、処理特性Ｃ２及びＣ３が影響を受ける処理特性であると判定できる。そして、アプリケーションの構成情報テーブル６００を参照して、アプリＡとアプリＢに該当するレコードのうち、処理特性フィールド６０４がＣ２又はＣ３であるレコードは、ＩＤがＦ２の処理とＩＤがＦ３の処理であることが分かり、当該処理のＩＤであるＦ２とＦ３を出力する。 For example, as illustrated in FIG. 10, when correlation data 232 is used, the component of the computer system in which the problem occurs is a logical volume of storage, and the field corresponding to the logical volume of storage is ST LU field 404 It can be determined that the processing characteristics C2 and C3 are affected processing characteristics. Then, with reference to the configuration information table 600 of the application, among the records corresponding to the application A and the application B, the record whose processing characteristic field 604 is C2 or C3 is the process with the ID F2 and the process with the ID F3. It is understood that there is an ID of the process concerned, and F2 and F3 are outputted.

対策の生成処理（ステップＳ８０４）では、管理サーバ２０１は、問題が発生している計算機システムの構成要素と影響を受けるアプリと処理特性を入力し、対策を出力する。対策の生成処理は、対策生成プログラム２２４によって実行されるとよい。例えば、ストレージ１の論理ボリュームＬＵ２のＩＯＰＳが問題となって、アプリＡ及びアプリＢの処理特性Ｃ２及びＣ３の処理が影響を受けるという情報を入力した場合、アプリＢが利用する論理ボリュームをＬＵ２からＬＵ３に変更するという対策を出力する。対策の生成処理（ステップＳ８０４）は、手動又は他のプログラムによって実行されてもよい。また、処理毎の影響分析処理（ステップＳ８０３）において有意な影響がある処理が無いと判定された場合は、対策の生成処理（ステップＳ８０４）は実行されなくてもよい。また、管理サーバ２０１は影響分析処理フロー８００のここまでの処理の結果を、操作端末２０２の入出力装置２４１を介して管理者に出力してもよい。例えば、図１１に例示する画面を出力してもよい。 In the countermeasure generation process (step S804), the management server 201 inputs the components of the computer system in which the problem is occurring, the affected application, and the processing characteristics, and outputs the countermeasure. The countermeasure generation process may be executed by the countermeasure generation program 224. For example, when information is input that the processing of the processing characteristics C2 and C3 of the application A and application B is affected due to IOPS of the logical volume LU2 of the storage 1 being a problem, the logical volume used by the application B is changed from LU2 Output the countermeasure to change to LU3. The countermeasure generation process (step S804) may be performed manually or by another program. When it is determined that there is no processing that has significant influence in the impact analysis processing (step S803) for each processing, the countermeasure generation processing (step S804) may not be executed. Also, the management server 201 may output the result of the processing up to here of the influence analysis processing flow 800 to the administrator via the input / output device 241 of the operation terminal 202. For example, the screen illustrated in FIG. 11 may be output.

図１１に例示する画面には、大別すると、問題１１０１と影響１１０２と対策１１０３との３つの情報が表示される。問題１１０１は、計算機システムの問題検出処理（ステップＳ８０１）の出力を可視化した情報である。影響１１０２は、関連アプリの特定処理（ステップＳ８０２）の出力と、処理毎の影響分析処理（ステップＳ８０３）と出力と、処理特性２３１とを合わせて可視化した情報である。対策１１０３は、対策の生成処理（ステップＳ８０４）の出力を可視化した情報である。これらの情報を合わせて参照することによって、管理者は発生している問題、その影響、影響を解決するための対策案を容易に理解でき、どの対策案を実行すれば良いかを判断できる。 The screen illustrated in FIG. 11 is roughly classified into three pieces of information: a problem 1101, an influence 1102, and a measure 1103. The problem 1101 is information visualizing the output of the problem detection process (step S801) of the computer system. An influence 1102 is information obtained by visualizing the output of the related application identification process (step S802), the influence analysis process for each process (step S803), the output, and the process characteristic 231 in combination. The countermeasure 1103 is information visualizing the output of the countermeasure generation process (step S804). By referring to these pieces of information together, the administrator can easily understand the problem that has occurred, its impact, and the countermeasure plan for resolving the impact, and can judge which countermeasure plan should be implemented.

しかし、これらの情報は、問題１１０１と影響１１０２と対策１１０３とのそれぞれが別の画面で表示されてもよい。また、これらの情報のいくつかが別の管理サーバによって提供される場合には、一部の情報が表示されなくてもよい。また、生成された対策を自動的に実行する場合には、全て又は一部の情報が表示されなくてもよい。 However, these pieces of information may be displayed on separate screens of the problem 1101, the influence 1102, and the countermeasure 1103. Also, when some of these pieces of information are provided by another management server, some of the information may not be displayed. Furthermore, when the generated measures are automatically executed, all or part of the information may not be displayed.

対策の実行処理（ステップＳ８０５）では、管理サーバ２０１は、対策の生成（ステップＳ８０４）で出力された対策を入力とし、対策を実行するような命令手順を出力する。例えば、アプリＢが利用する論理ボリュームをＬＵ２からＬＵ３に変更する対策が入力された場合、アプリＢが利用しているデータを格納する論理ボリュームをＬＵ２からＬＵ３に変更する命令をストレージ装置２０４に出力する。本ステップは手動又は他のプログラムによって実行されてもよい。 In the countermeasure execution process (step S805), the management server 201 receives the countermeasure output in the countermeasure generation (step S804) as an input, and outputs an instruction procedure to execute the countermeasure. For example, when a measure to change the logical volume used by application B from LU2 to LU3 is input, a command to change the logical volume storing data used by application B from LU2 to LU3 is output to the storage device 204 Do. This step may be performed manually or by another program.

また、対策の実行処理（ステップＳ８０５）は実行されなくてもよい。例えば、対策の生成処理（ステップＳ８０４）が実行されなかった場合や、計算機システムの管理者が対策の生成処理（ステップＳ８０４）の出力を参照して、対策の実行が不要と判断した場合は、対策の実行処理（ステップＳ８０５）を実行しない。 Further, the countermeasure execution process (step S805) may not be performed. For example, when the countermeasure generation process (step S804) is not executed, or when the administrator of the computer system refers to the output of the countermeasure generation process (step S804) and determines that the execution of the countermeasure is unnecessary, The countermeasure execution process (step S805) is not executed.

以上説明したように、本発明の第一の実施例によれば、管理サーバ２０１は、管理される計算機システムの稼動データ２３４と、管理されるアプリケーションの処理特性２３１とを用いて相関データ２３２を生成し、管理される計算機システムで問題が発生した場合に、相関データ２３２と構成情報２３３とを用いて影響を受けるアプリケーションの処理を特定できる。また、計算機システムの管理者は、影響をうける処理の特徴量を把握できるため、計算機システムで発生した問題とその影響の因果関係を容易に推定できる。そのため、計算機システムの管理者は、迅速に必要な対策を検討し実行し、アプリケーションの性能を維持できる。 As described above, according to the first embodiment of the present invention, the management server 201 uses the operation data 234 of the computer system to be managed and the processing characteristics 231 of the application to be managed. When a problem occurs in a computer system to be generated and managed, the correlation data 232 and the configuration information 233 can be used to specify the processing of the affected application. Further, since the manager of the computer system can grasp the feature quantities of the affected process, it is possible to easily estimate the causal relationship between the problem occurring in the computer system and the influence thereof. Therefore, the administrator of the computer system can quickly consider and execute the necessary measures to maintain the performance of the application.

＜実施例２＞
図１２は、本発明の第二の実施例におけるシステムの構成例を示すブロック図である。 Example 2
FIG. 12 is a block diagram showing a configuration example of a system in the second embodiment of the present invention.

第二の実施例の管理サーバ２０１の主記憶デバイス２１２は、影響分析処理プログラム２２１を記憶しておらず、原因分析処理プログラム２２３を記憶している。他の構成及び機能は第一の実施例と同じであるので、同じ符号を付し、それらの説明は省略する。 The main storage device 212 of the management server 201 of the second embodiment does not store the influence analysis processing program 221, but stores the cause analysis processing program 223. The other configurations and functions are the same as those of the first embodiment, so the same reference numerals are given and the description thereof is omitted.

次に、管理サーバ２０１の原因分析処理プログラムを説明する。原因分析処理は、管理サーバ２０１のプロセッサ２１１が原因分析処理プログラム２２３を実行することによって実現される処理である。 Next, the cause analysis processing program of the management server 201 will be described. The cause analysis process is a process implemented by the processor 211 of the management server 201 executing the cause analysis processing program 223.

図１３は、管理サーバ２０１が実行する原因分析処理の手順例を示すフローチャートである。原因分析処理は、管理サーバ２０１の入力装置２１４から入力される管理者からの指示によって実行されてもよい。また、原因分析処理は、管理サーバ２０１が他のプログラムによる指示を受けて実行してもよい。また、管理サーバ２０１の管理される装置である計算機システムが送信した問題発生の通知を、管理サーバ２０１が通信インタフェース２１６を介して受信した際に原因分析処理を実行してもよい。 FIG. 13 is a flowchart illustrating an example of a procedure of cause analysis processing executed by the management server 201. The cause analysis process may be executed according to an instruction from the administrator input from the input device 214 of the management server 201. The cause analysis process may be executed by the management server 201 in response to an instruction from another program. Further, when the management server 201 receives, via the communication interface 216, the notification of the occurrence of a problem transmitted by the computer system that is a device managed by the management server 201, the cause analysis processing may be executed.

図１３に示すように、管理サーバ２０１は、アプリの問題検出処理（ステップＳ１３０１）、処理特性との対応検出処理（ステップＳ１３０２）、原因範囲の絞り込み処理（ステップＳ１３０３）及び原因箇所の算出処理（ステップＳ１３０４）を実行する。原因分析処理フロー１３００は、不図示の他の処理ステップを含んでもよい。 As illustrated in FIG. 13, the management server 201 performs application problem detection processing (step S1301), processing for detecting correspondence with processing characteristics (step S1302), narrowing processing of cause range (step S1303), and calculation processing of cause location ( Step S1304) is executed. The cause analysis processing flow 1300 may include other processing steps not shown.

アプリの問題検出処理（ステップＳ１３０１）では、管理サーバ２０１は、管理されるアプリケーションプログラム２５０の特定の処理で発生した問題を検出し、問題が発生しているアプリケーションプログラムとその処理を特定可能な情報を出力する。例えば、アプリケーションのエンドユーザの出力画面を表示するための処理時間が閾値を超えた問題を検出し、構成情報２３３に含まれるアプリケーションの構成情報テーブル６００におけるアプリフィールド６０２と処理フィールド６０３とを出力する。アプリケーションプログラムで発生した問題の検出は、イベント検出プログラム２２２が実行してもよいし、管理者が手動で実行してもよい。 In the application problem detection process (step S1301), the management server 201 detects a problem occurring in a specific process of the managed application program 250, and information that can specify an application program having a problem and its process. Output For example, a problem is detected in which the processing time for displaying the output screen of the end user of the application exceeds the threshold, and the application field 602 and the processing field 603 in the configuration information table 600 of the application included in the configuration information 233 are output. . The detection of a problem occurring in the application program may be executed by the event detection program 222 or may be manually executed by an administrator.

処理特性との対応検出処理（ステップＳ１３０２）では、管理サーバ２０１は、問題が発生している処理特性を特定する。具体的には、アプリの問題検出処理（ステップＳ１３０１）の出力を入力とし、構成情報２３３に含まれるアプリケーションの構成情報テーブル６００のうち、該当するアプリケーションの処理特性フィールド６０４を全て取得し、出力する。 In the processing of detecting correspondence with processing characteristics (step S1302), the management server 201 identifies the processing characteristics in which a problem has occurred. Specifically, the output of the application problem detection process (step S1301) is used as an input, and all the processing characteristic fields 604 of the corresponding application in the application configuration information table 600 included in the configuration information 233 are acquired and output. .

原因範囲の絞り込み処理（ステップＳ１３０３）では、管理サーバ２０１は、アプリケーションプログラム２５０で発生している問題の原因範囲の候補を出力する。例えば、管理サーバ２０１は、処理特性との対応検出処理（ステップＳ１３０２）の出力を入力とし、相関データ２３２の各フィールドに記憶されている相関データのうち、計算機システムの各構成要素の稼動データとアプリケーション性能（処理特性）との相関があると記憶されているデータフィールドを、原因範囲の候補として取得する。入力された処理特性とデータ列とのいずれの組み合わせでも相関がない場合、原因範囲の候補としてデータフィールドを出力してもよい。 In the process of narrowing down the cause range (step S1303), the management server 201 outputs the candidate of the cause range of the problem occurring in the application program 250. For example, the management server 201 receives, as an input, the output of the process of detecting correspondence with processing characteristics (step S1302), and among the correlation data stored in each field of the correlation data 232, operation data of each component of the computer system and A data field stored as having a correlation with application performance (processing characteristics) is acquired as a candidate for the cause range. If there is no correlation in any combination of the input processing characteristic and the data string, the data field may be output as a candidate of the cause range.

原因箇所の算出処理（ステップＳ１３０４）では、管理サーバ２０１は、構成情報２３３に含まれる計算機システムの構成情報テーブル５００のアプリフィールド５０１と、アプリの問題検出処理（ステップＳ１３０１）で出力されたアプリフィールド６０２とが一致する構成情報テーブル５００のデータレコードを取得し、取得したデータレコードにおいて、原因範囲の絞り込み処理（ステップＳ１３０３）で取得したデータフィールドと一致するデータレコードに記載されている計算機システムの構成要素を原因箇所の候補として出力する。原因箇所の候補として複数の構成要素が出力される場合、優先順位を付けて出力してもよい。例えば、原因箇所の候補として複数の構成要素が算出される場合、該当する構成要素における稼動データ２３４を参照し、通常通りの振る舞いと異なる傾向（例えば、ＣＰＵ使用率が通常よりも３０％高い、Ｄａｔａｂａｓｅのアクセスログの件数が閾値よりも多く記録されている）を検出した場合、原因箇所の候補として優先的に出力してもよい。 In the process of calculating the cause (step S1304), the management server 201 includes the application field 501 of the configuration information table 500 of the computer system included in the configuration information 233 and the application field output in the problem detection processing of the application (step S1301). The configuration of the computer system described in the data record of the configuration information table 500 that matches 602 and in the acquired data record, the data record that matches the data field acquired in the narrowing process of the cause range (step S1303) Output the element as a candidate for the cause. When multiple components are output as candidate causes, they may be prioritized and output. For example, when a plurality of components are calculated as candidate causes, the operation data 234 of the corresponding component is referred to, and the tendency different from the normal behavior (for example, the CPU usage rate is 30% higher than normal), When the number of database access logs is recorded more than a threshold value), it may be preferentially output as a candidate for the cause.

以上説明したように、本発明の第二の実施例によれば、管理サーバ２０１は、管理される計算機システムの稼動データ２３４と管理されるアプリケーションの処理特性２３１とを用いて相関データ２３２を生成し、管理されるアプリケーションプログラムで問題が発生した場合に、相関データ２３２と構成情報２３３とを用いて原因箇所である計算機システムの構成要素を特定する。このため、計算機システムの管理者は、迅速に必要な対策を検討し実行し、アプリケーションの性能を維持できる。 As described above, according to the second embodiment of the present invention, the management server 201 generates correlation data 232 using the operation data 234 of the managed computer system and the processing characteristic 231 of the managed application. When a problem occurs in the managed application program, the correlation data 232 and the configuration information 233 are used to specify the component of the computer system that is the cause. Therefore, the administrator of the computer system can quickly consider and execute the necessary measures to maintain the performance of the application.

以上に説明したように、本発明の実施例によると、管理サーバ２０１は、アプリケーションプログラムに含まれる処理の特性を抽出し、処理の特性と計算機システムの構成要素との相関の分析によって、アプリケーションプログラムに含まれる処理と計算機システムの構成要素との相関を特定して相関データ２３２を生成し、相関データ２３２に基づいて、計算機システムの稼動状況とアプリケーションプログラムに含まれる処理の特性との関係を特定するので、アプリケーションプログラムに含まれる処理の単位で、計算機システムに生じた異常によって影響を受ける範囲が分かる。 As described above, according to the embodiment of the present invention, the management server 201 extracts the characteristic of the process included in the application program, and analyzes the correlation between the characteristic of the process and the component of the computer system to obtain the application program. The correlation between the process included in the computer system and the component of the computer system is specified to generate correlation data 232, and based on the correlation data 232, the relationship between the operation status of the computer system and the characteristic of the process included in the application program is specified. Therefore, the unit of processing included in the application program can determine the range affected by the abnormality that has occurred in the computer system.

また、管理サーバ２０１は、計算機システムの構成変更（例えば、ハードウェアの変更、仮想計算機のマイグレーション、ストレージのディスクの変更など）を検出した場合、処理の特性の抽出及び処理の特性と前記計算機システムの構成要素との相関の分析の少なくとも一つを実行するので、学習によって相関データ２３２の精度を向上できる。特に、クラウド構成で頻繁に行われる構成変更に相関データの更新が追随可能となる。 Also, when the management server 201 detects a configuration change of the computer system (for example, change of hardware, migration of virtual computer, change of disk of storage, etc.), extraction of processing characteristics and characteristics of processing and the computer system The accuracy of the correlation data 232 can be improved by learning, since at least one of the analysis of the correlation with the component of H is performed. In particular, the correlation data update can follow the configuration change frequently performed in the cloud configuration.

また、管理サーバ２０１は、アプリケーションプログラムの利用傾向の変化（例えば、ユーザ数の増加、アプリケーションプログラムの機能の追加など）を検出した場合、処理の特性の抽出、及び処理の特性と前記計算機システムの構成要素との相関の分析の少なくとも一つを実行するので、学習によって相関データ２３２の精度を向上できる。特に、クラウド構成で頻繁に行われる構成変更に相関データの更新が追随可能となる。例えば、アプリケーションプログラムの機能追加によって、処理特性２３１になかった新たな処理が追加されたり、他の処理特性と相関関係が生じたり、処理特性２３１になかった新たな処理特性が追加され、当該処理特性と計算機システムの構成要素との相関が記録されたりする。 Also, when the management server 201 detects a change in application program usage tendency (for example, an increase in the number of users, addition of application program functions, etc.), extraction of processing characteristics, processing characteristics, and processing characteristics of the computer system Since at least one of the analysis of the correlation with the component is performed, the accuracy of the correlation data 232 can be improved by learning. In particular, the correlation data update can follow the configuration change frequently performed in the cloud configuration. For example, the addition of the function of the application program adds a new process not included in the processing characteristic 231, causes a correlation with other processing characteristics, or adds a new processing characteristic not included in the processing characteristic 231, The correlation between the characteristics and the components of the computer system may be recorded.

また、管理サーバ２０１は、計算機システムの異常又は異常の予兆を検出した場合、相関データ２３２に基づいて、異常によって影響を受けるアプリケーションプログラムの処理を特定するので、異常発生時に影響が生じる範囲を処理単位で特定でき、アプリケーション側の対策が可能となる。また、異常の予兆段階で（異常の発生前に）アプリケーション側の対策が可能となる。 In addition, when the management server 201 detects an abnormality or a sign of abnormality of the computer system, the management server 201 specifies the processing of the application program affected by the abnormality based on the correlation data 232. It can be specified in units, and measures on the application side can be made. In addition, it is possible to take measures on the application side (before the occurrence of an abnormality) at the omen stage of the abnormality.

また、管理サーバ２０１は、計算機システムの異常又は異常の予兆と、異常によって影響を受けるアプリケーションプログラムの処理とを表示するための画面データを出力するので、処理毎に異常によって生じる影響が分かる。また、計算機システムの異常によって影響を受ける処理の傾向が分かるので、計算機システム側の対策（ディスクの追加など）と、アプリケーション側の対策（アクセス制限など）とを連携して行うことができる。また、アプリケーションプログラムを改造するための情報を得ることができる。 In addition, since the management server 201 outputs screen data for displaying a computer system abnormality or a symptom of an abnormality and processing of an application program affected by the abnormality, the influence caused by the abnormality can be known for each processing. In addition, since the tendency of processing affected by abnormality of the computer system is known, it is possible to jointly carry out measures on the computer system side (such as addition of a disk) and measures on the application side (such as access restriction). Also, information for modifying the application program can be obtained.

また、管理サーバ２０１は、アプリケーションプログラムでの問題を検出した場合、相関データ２３２に基づいて、問題の原因である前記計算機システムの構成要素を特定するので、アプリケーションの挙動から計算機システム側の原因箇所を特定できる。このため、計算機システム側の保守が容易になる。 Further, when the management server 201 detects a problem in the application program, it identifies the component of the computer system that is the cause of the problem based on the correlation data 232. Can be identified. This facilitates maintenance on the computer system side.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加・削除・置換をしてもよい。 The present invention is not limited to the embodiments described above, and includes various modifications and equivalent configurations within the scope of the appended claims. For example, the embodiments described above are described in detail to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to those having all the configurations described. Also, part of the configuration of one embodiment may be replaced with the configuration of another embodiment. In addition, the configuration of another embodiment may be added to the configuration of one embodiment. In addition, with respect to a part of the configuration of each embodiment, another configuration may be added, deleted, or replaced.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサがそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 In addition, each configuration, function, processing unit, processing means, etc. described above may be realized by hardware, for example, by designing part or all of them with an integrated circuit, etc., and the processor realizes the respective functions. It may be realized by software by interpreting and executing the program to

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に格納することができる。 Information such as a program, a table, and a file for realizing each function can be stored in a memory, a hard disk, a storage device such as a solid state drive (SSD), or a recording medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 Further, control lines and information lines indicate what is considered to be necessary for explanation, and not all control lines and information lines necessary for mounting are shown. In practice, it can be considered that almost all configurations are mutually connected.

Claims

A management server connected to a computer system that executes an application program,
A processor for executing a program, and a storage device for storing the program;
The processor extracts the characteristic of the process included in the application program, and the process included in the application program and the component of the computer system by analyzing the correlation between the characteristic of the process and the component of the computer system Identify the correlation of
A management server, characterized in that the processor specifies a relation between an operation status of the computer system and a characteristic of a process included in the application program based on the specified correlation.

The management server according to claim 1, wherein
When the processor detects a configuration change of the computer system, the processor executes at least one of extraction of the characteristic of the process and analysis of correlation between the characteristic of the process and a component of the computer system. Management server

The management server according to claim 1, wherein
The processor executes at least one of extraction of the characteristic of the process and analysis of correlation between the characteristic of the process and a component of the computer system when detecting a change in the usage tendency of the application program. Management server to feature.

The management server according to claim 1, wherein
A management server, characterized in that, when the processor detects an abnormality of the computer system or a sign of an abnormality, processing of an application program affected by the abnormality is specified based on the specified correlation.

The management server according to claim 4, wherein
A management server characterized in that the processor outputs screen data for displaying an abnormality or a sign of an abnormality of the computer system and processing of an application program affected by the abnormality.

The management server according to claim 1, wherein
A management server, characterized in that, when the processor detects a problem in the application program, the processor identifies a component of the computer system that is the cause of the problem based on the identified correlation.

A management method in which a management server manages a computer system that executes an application program,
The management server includes a processor that executes a program, and a storage device that stores the program.
The management method is
The processor extracts the characteristic of the process included in the application program, and the process included in the application program and the component of the computer system by analyzing the correlation between the characteristic of the process and the component of the computer system Identifying the correlation of
A management method comprising: the processor identifying a relationship between an operation status of the computer system and a characteristic of a process included in the application program based on the identified correlation.

The management method according to claim 7, wherein
When the processor detects a configuration change of the computer system, the processor executes at least one of extraction of the characteristic of the process and analysis of correlation between the characteristic of the process and a component of the computer system. Management method.

The management method according to claim 7, wherein
The processor executes at least one of extraction of the characteristic of the process and analysis of correlation between the characteristic of the process and a component of the computer system when detecting a change in the usage tendency of the application program. Management method to feature.

The management method according to claim 7, wherein
The management method according to claim 1, wherein the processor specifies processing of an application program affected by the abnormality based on the identified correlation when detecting a sign of abnormality or abnormality of the computer system.

The management method according to claim 10, wherein
A management method comprising: outputting, by the processor, screen data for displaying an abnormality or a sign of an abnormality of the computer system and processing of an application program affected by the abnormality.

The management method according to claim 7, wherein
A management method comprising: when the processor detects a problem in the application program, identifying a component of the computer system that is the cause of the problem based on the identified correlation.

A program for a management server to manage a computer system that executes an application program,
The management server has a processor that executes the program, and a storage device that stores the program.
The program is
The characteristic of the process included in the application program is extracted, and the correlation between the process included in the application program and the component of the computer system is specified by analyzing the correlation between the characteristic of the process and the component of the computer system And the steps to
A program for causing the processor to execute a procedure for specifying the relationship between the operation status of the computer system and the characteristic of the process included in the application program based on the identified correlation.

The program according to claim 13, wherein
A program for causing the processor to execute a procedure of specifying processing of an application program affected by the abnormality based on the identified correlation when detecting a sign of abnormality or abnormality of the computer system.

The program according to claim 13, wherein
A program for causing the processor to execute a procedure for identifying a component of the computer system that is the cause of the problem based on the identified correlation when a problem with the application program is detected.