JP2017045079A

JP2017045079A - Cloud management method and cloud management system

Info

Publication number: JP2017045079A
Application number: JP2015164372A
Authority: JP
Inventors: 太郎北村; Taro Kitamura; 真法堂宮; Masanori Tamiya; 卓也島川; Takuya Shimakawa
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2015-08-24
Filing date: 2015-08-24
Publication date: 2017-03-02
Anticipated expiration: 2035-08-24
Also published as: JP6482984B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique capable of informing an occurrence of an incident in a business system including a point in a cloud system at which the incident has occurred to a management person of cloud systems based on the importance of the incident changing over time.SOLUTION: A cloud management system 10 operates a plurality of cloud systems to manage incidents on business servers of a plurality of tenants 13. An information integration server 11 includes: an incident reception section that receives and merges incident information from the business servers; a priority determination section that calculates the priority of the incidents in the plurality of cloud systems based on the type and time of occurrence of the merged incident information; and an output section that outputs a piece of incident information according to the calculated priority.SELECTED DRAWING: Figure 1

Description

本発明は、クラウドの管理システムに関する。 The present invention relates to a cloud management system.

下記特許文献１では対象システムの障害を含むインシデントをインシデント情報として第１のデータベースに管理し、前記対象システムの構成を構成情報として第２のデータベースに管理する構成管理システムと連携し、担当者の端末に対して情報の画面を提供するサービスポータルシステムと連携し、前記対象システムの障害を含むインシデントを監視する障害監視システムと連携することが開示されている。 In the following Patent Document 1, an incident including a failure of a target system is managed as incident information in a first database, and the configuration of the target system is managed as a configuration information in a second database. It is disclosed that it cooperates with a service portal system that provides an information screen to a terminal and cooperates with a failure monitoring system that monitors incidents including failures of the target system.

クラウド管理システムは、対象システムの構成、障害影響範囲及び障害影響先サービスを含むインシデント状況を可視化する画面を、前記構成情報及び前記インシデント情報を用いて作成し、担当者の端末に提供する第１の機能と、前記対象システムにおける障害許容性を考慮して設計される構成部位を含む構成を、構成管理モデルとして前記構成情報に設定する第２の機能を有する。 The cloud management system uses the configuration information and the incident information to create a screen for visualizing the incident status including the configuration of the target system, the failure influence range, and the failure influence destination service, and provides the screen to the person in charge. And a second function for setting, in the configuration information, a configuration including a configuration part designed in consideration of fault tolerance in the target system as a configuration management model.

そして、構成管理モデルでは、障害許容性を考慮して設計される構成部位を含む各構成部位を第１の構成アイテムとして設定し、前記第１の構成アイテムについての障害許容性を第２の構成アイテムとして設定し、第１、第２の構成アイテムを含む構成アイテム間の依存関係性をリンクとして設定する。第１の機能による画面では、構成アイテムをリンクで接続した構造で、対象システムの構成管理モデル、障害影響範囲及び障害影響先サービスを含むインシデント発生状況を表示することが開示されている。 In the configuration management model, each component part including a component part designed in consideration of fault tolerance is set as the first configuration item, and the fault tolerance for the first configuration item is set to the second configuration item. It is set as an item, and the dependency between configuration items including the first and second configuration items is set as a link. On the screen by the first function, it is disclosed that an incident occurrence status including a configuration management model, a failure influence range, and a failure influence destination service of a target system is displayed in a structure in which configuration items are connected by links.

特開2012-38028JP2012-38028

特許文献1では、クラウド環境や障害許容性などを考慮した構成の対象システムにおける障害影響範囲などの状況を可視化する技術が開示されているが、複数のクラウドシステムで構成されるハイブリッドクラウド環境で業務システムを構成したときに、どのクラウドシステムでインシデントが発生したのかをシステムの管理者に知らせることについては考慮されていない。
また、時間とともに変化するインシデントの重要性に基づいて、管理者が優先的に対応しなければならないインシデントを知らせることについても考慮されておらず、業務システムを構成する仮想計算機上で稼働する業務サーバが他のクラウドシステムへ移動したときに、どのクラウドシステムからのインシデント報告なのか、どの程度の影響があるのかとういうことを移動した先のクラウドシステムが提供するインシデント報告の仕組みを用いて管理者へ報告することについても考慮されていない。 Patent Document 1 discloses a technology that visualizes the situation such as the failure impact range in a target system configured with consideration of the cloud environment and fault tolerance, but works in a hybrid cloud environment composed of multiple cloud systems. It is not considered to inform the system administrator of which cloud system the incident occurred when the system was configured.
Also, there is no consideration given to the incident that the administrator must deal with preferentially based on the importance of incidents that change over time, and the business server that runs on the virtual machines that make up the business system When an incident moves to another cloud system, the administrator uses the incident reporting mechanism provided by the moved cloud system to determine which cloud system the incident report has and how much impact it has. It is not considered to report to

上記課題は複数のクラウドシステムで稼働する仮想計算機上で動作する複数テナントの業務サーバのインシデントを管理するクラウド管理システムで、業務サーバからのインシデント情報を受信しマージするインシデント受信部と、マージされたインシデント情報の種別と発生時刻に基づいてインシデントの前記複数のクラウドシステム内での優先度を求める優先度判定部と、求められた優先度に従ってインシデント情報を出力する出力部を備えるシステムによって解決される。 The above problem is a cloud management system that manages incidents of business servers of multiple tenants running on virtual machines running on multiple cloud systems, and has been merged with an incident receiver that receives and merges incident information from business servers Solved by a system including a priority determination unit that determines the priority of an incident in the plurality of cloud systems based on the type and occurrence time of incident information, and an output unit that outputs incident information according to the determined priority .

上記システムで解決されない個々の課題については実施例に記載された構成要素を追加することにより解決されるであろう。 Individual problems that cannot be solved by the above system will be solved by adding the components described in the embodiments.

本発明によれば、クラウドを用いて実現される業務システムのインシデントに対処する管理者の負荷を軽減できる。 ADVANTAGE OF THE INVENTION According to this invention, the burden of the administrator who copes with the incident of the business system implement | achieved using cloud can be reduced.

本実施例におけるクラウドシステムの構成を示すブロック図の例である。It is an example of the block diagram which shows the structure of the cloud system in a present Example. 本実施例におけるクラウド管理システムの情報集約サーバのブロック図の例である。It is an example of the block diagram of the information aggregation server of the cloud management system in a present Example. 本実施例におけるクラウド管理システムのナビゲーションサーバのブロック図の例である。It is an example of the block diagram of the navigation server of the cloud management system in a present Example. 本実施例におけるテナントのブロック図の例である。It is an example of the block diagram of a tenant in a present Example. 本実施例におけるデータセンタのブロック図の例である。It is an example of the block diagram of the data center in a present Example. 本実施例における物理計算機で稼働する監視サーバのブロック図の例である。It is an example of the block diagram of the monitoring server which operate | moves with the physical computer in a present Example. 本実施例におけるインシデント情報テーブルの例である。It is an example of the incident information table in a present Example. 本実施例における稼働データ情報テーブルの例である。It is an example of the operation data information table in a present Example. 本実施例におけるシステム情報テーブルの例である。It is an example of the system information table in a present Example. 本実施例におけるインシデント対応テーブルの例である。It is an example of the incident response table in a present Example. 本実施例における優先度付きインシデント情報のテーブルの例である。It is an example of the table of incident information with priority in a present Example. 本実施例における稼働データ情報テーブルの例である。It is an example of the operation data information table in a present Example. 本実施例における重要稼働データテーブルの例である。It is an example of the important operation data table in a present Example. 本実施例におけるインシデント情報表示テーブルの例である。It is an example of the incident information display table in a present Example. 本実施例におけるシステム基盤情報テーブルの例である。It is an example of the system infrastructure information table in a present Example. 本実施例におけるインシデント対応履歴テーブルの例である。It is an example of the incident response history table in a present Example. 本実施例におけるエスカレーション情報のテーブルの例である。It is an example of the table of the escalation information in a present Example. 本実施例におけるインシデント一覧画面の例である。It is an example of the incident list screen in a present Example. 本実施例におけるインシデント詳細表示画面の例である。It is an example of the incident detail display screen in a present Example. 本実施例における監視サーバの稼働データ監視フローチャートの例である。It is an example of the operation data monitoring flowchart of the monitoring server in a present Example. 本実施例における監視サーバの業務ログ監視フローチャートの例である。It is an example of the business log monitoring flowchart of the monitoring server in a present Example. 本実施例における情報集約サーバの優先度定義フローチャート１の例である。It is an example of the priority definition flowchart 1 of the information aggregation server in a present Example. 本実施例における情報集約サーバの優先度定義フローチャート２の例である。It is an example of the priority definition flowchart 2 of the information aggregation server in a present Example. 本実施例におけるナビゲーションサーバのインシデント取得・登録フローチャートの例である。It is an example of the incident acquisition and registration flowchart of the navigation server in a present Example. 本実施例におけるナビゲーションサーバのインシデント一覧表示フローチャートの例である。It is an example of the incident list display flowchart of the navigation server in a present Example. 本実施例における他クラウドシステムへの業務サーバ移行時のフローチャートの例である。It is an example of the flowchart at the time of the business server transfer to the other cloud system in a present Example.

図１は、本実施例クラウド管理システム１０と管理対象のテナントのシステムの全体構成図である。本実施例ではクラウド管理システム１０がネットワーク機器を介してインターネット経由でデータセンタ1からデータセンタNへ接続されている。顧客のテナント１３である業務システムは複数のデータセンタを用いて実現されたものもあり、各々のデータセンタは異なる事業者が提供するデータセンタである場合も考えられる。 FIG. 1 is an overall configuration diagram of a cloud management system 10 according to the present embodiment and a tenant system to be managed. In this embodiment, the cloud management system 10 is connected from the data center 1 to the data center N via the Internet via a network device. The business system that is the customer tenant 13 may be realized by using a plurality of data centers, and each data center may be a data center provided by a different operator.

図２は、情報集約サーバ１１のブロック図である。情報集約サーバ１１は、受信部１０１とネットワークインタフェース部１０２を有し、テナントの業務サーバから送信されてくるインシデントや稼働データをネットワークインタフェース部１０２を介して、受信部１０１で受信する。主記憶領域１８に格納されCPU１５で実行される優先度判定部１１１は、受信したインシデントをシステム情報テーブル１２１、インシデント対応テーブル１２２、稼働データ情報テーブル１２３、重要稼働データテーブル１２４の情報に基づいて優先度を判定し、確定した優先度情報をインシデントに付加する。インシデント登録部１１２は、優先度を付加されたインシデントを優先度付きインシデント情報テーブル１２５に登録する。 FIG. 2 is a block diagram of the information aggregation server 11. The information aggregation server 11 includes a receiving unit 101 and a network interface unit 102, and the receiving unit 101 receives incidents and operation data transmitted from a tenant business server via the network interface unit 102. The priority determination unit 111 stored in the main storage area 18 and executed by the CPU 15 prioritizes the received incident based on the information in the system information table 121, the incident response table 122, the operation data information table 123, and the important operation data table 124. Determining the degree and adding the determined priority information to the incident. The incident registration unit 112 registers the incident with the added priority in the incident information table with priority 125.

図３は、ナビゲーションサーバ１２のブロック図である。ナビゲーションサーバ１２は、表示部２０１とネットワークインタフェース部２０２を有する。この例では表示部２０１がナビゲーションサーバ１２に含まれているが、インターネットに接続されたブラウザ経由で表示するようにしても良い。ナビゲーションサーバは、情報集約サーバ１１が保持する優先度付きインシデント情報テーブル１２５の情報をネットワークインタフェース部２０２を介して取得するためのインシデント取得部２１１を有している。優先度付きインシデント登録部２１２は、システム基盤情報テーブル２２１から、取得したインシデントに関連するシステム基盤情報を抽出し、インシデントにシステム基盤情報、インシデントIDを付加して、インシデント情報表示テーブル２２２に登録する。インシデント対応履歴登録部２１３は、新規インシデントが発生し、インシデント情報表示テーブル２２２へ登録する際や、運用者がインシデントの対応情報を入力した際にインシデント対応履歴テーブル２２３に入力された対応情報を登録する。インシデント一覧表示部２１４は、インシデント情報表示テーブル２２２からインシデント情報を読み出し、インシデント対応履歴テーブル２２３からインシデントに関連する対応情報を抽出し、インシデントに情報を付加する。インシデント一覧表示部２１４は、インシデント情報を優先度順に表示部２０１を介して画面に表示する。システム基盤情報登録部２１５は、業務サーバが他のクラウドシステムに移動した際に、業務サーバから送られてきた移動先のシステム基盤情報を受け取り、システム基盤情報テーブル２２１の該当するレコードの情報を書き換える。メール送信部２１６は、インシデント発生時や、業務サーバが他のクラウドシステムに移動した際に、エスカレーション情報テーブル２２４から通知先を読み出し、ネットワークインタフェース部２０２を介してメールを送信する。インシデント取得部２１１、優先度付きインシデント登録部２１２、インシデント対応履歴登録部２１３等の各処理部は主記憶領域１９に格納され、CPU１６で実行される。
図４は、テナント１３のブロック図である。テナント１３は、データセンタ２１を複数有し、データセンタ内には業務システム２２が複数存在する。データセンタ２１は、インターネット網と接続する回線を有し、業務システム２２はその回線を介してインターネットに接続する。 FIG. 3 is a block diagram of the navigation server 12. The navigation server 12 includes a display unit 201 and a network interface unit 202. In this example, the display unit 201 is included in the navigation server 12, but it may be displayed via a browser connected to the Internet. The navigation server has an incident acquisition unit 211 for acquiring information of the priority-added incident information table 125 held by the information aggregation server 11 via the network interface unit 202. The priority-added incident registration unit 212 extracts system infrastructure information related to the acquired incident from the system infrastructure information table 221, adds the system infrastructure information and the incident ID to the incident, and registers them in the incident information display table 222. . The incident response history registration unit 213 registers the response information input to the incident response history table 223 when a new incident occurs and is registered in the incident information display table 222 or when the operator inputs incident response information. To do. The incident list display unit 214 reads the incident information from the incident information display table 222, extracts correspondence information related to the incident from the incident response history table 223, and adds information to the incident. The incident list display unit 214 displays incident information on the screen via the display unit 201 in order of priority. When the business server moves to another cloud system, the system infrastructure information registration unit 215 receives the destination system infrastructure information sent from the business server and rewrites the information of the corresponding record in the system infrastructure information table 221. . The mail transmission unit 216 reads a notification destination from the escalation information table 224 when an incident occurs or when the business server moves to another cloud system, and transmits a mail via the network interface unit 202. Each processing unit such as the incident acquisition unit 211, the incident registration unit 212 with priority, the incident response history registration unit 213, and the like is stored in the main storage area 19 and executed by the CPU 16.
FIG. 4 is a block diagram of the tenant 13. The tenant 13 has a plurality of data centers 21, and a plurality of business systems 22 exist in the data center. The data center 21 has a line connected to the Internet network, and the business system 22 connects to the Internet via the line.

図５は、データセンタ２１のブロック図である。業務システム２２の物理計算機３２上には仮想化ソフト３３が搭載されており、仮想化ソフト３３上ではVM（Virtual Machine）３１が複数稼働する。仮想化されていない物理計算機３２はOS３４が動作しているものもある。 FIG. 5 is a block diagram of the data center 21. Virtualization software 33 is installed on the physical computer 32 of the business system 22, and a plurality of VMs (Virtual Machines) 31 operate on the virtualization software 33. Some non-virtualized physical computers 32 run an OS 34.

図６は、物理計算機上で稼働する監視サーバ３５０のブロック図である。監視サーバ３５０は業務サーバを実行するVMと同じVMで実行されても良いし、業務サーバを実行するVMと独立したVMで実行されても良い。VM３１には監視サーバ３５０がインストールされており、OSイベントログ３２１や業務ログ３２２を監視するログ監視部３０１、業務サーバの稼働データ３２３を監視する稼働データ監視部を有している。 FIG. 6 is a block diagram of the monitoring server 350 operating on the physical computer. The monitoring server 350 may be executed by the same VM as the VM that executes the business server, or may be executed by a VM independent of the VM that executes the business server. A monitoring server 350 is installed in the VM 31, and includes a log monitoring unit 301 that monitors the OS event log 321 and the business log 322, and an operation data monitoring unit that monitors the operation data 323 of the business server.

監視サーバ３５０は物理計算機単位にインストールされても良いし、業務システム単位、テナント単位にインストールされても良い。インシデントの発生量や業務システムの規模を基にインストールすることにより効率的な監視が可能となる。
ログ監視部３０１は、ログ監視テーブル３１１から監視対象のログ情報を読み出し、特定の文字列がログに出力されると、インシデント生成部３１３を呼び出し、インシデントを生成する。生成されたインシデントは、送信部３０３にて情報集約サーバ１１に送信される。稼働データ監視部３０２は、稼働データ監視テーブル３１２から監視対象の稼働データ情報を読み出し、監視対象の稼働データ情報を取得する。取得された稼働データ情報は送信部３０３にて情報集約サーバ１１に送信される。また、取得した際に稼働データが閾値を超えていた場合は、インシデント生成部３１３を呼び出し、インシデントを生成し、送信部３０３にて情報集約サーバ１１にインシデントを送信する。ログ監視部３０１、稼働データ監視部３０２、送信部３０３等の処理部を含む監視サーバ３５０は主記憶領域３６０に格納されCPU１７で実行される。 The monitoring server 350 may be installed in units of physical computers, or may be installed in units of business systems or tenants. Efficient monitoring is possible by installing based on the amount of incidents and the size of the business system.
The log monitoring unit 301 reads the log information to be monitored from the log monitoring table 311 and, when a specific character string is output to the log, calls the incident generation unit 313 to generate an incident. The generated incident is transmitted to the information aggregation server 11 by the transmission unit 303. The operating data monitoring unit 302 reads the monitoring target operating data information from the operating data monitoring table 312 and acquires the monitoring target operating data information. The acquired operation data information is transmitted to the information aggregation server 11 by the transmission unit 303. If the operation data exceeds the threshold value at the time of acquisition, the incident generation unit 313 is called to generate an incident, and the transmission unit 303 transmits the incident to the information aggregation server 11. A monitoring server 350 including processing units such as a log monitoring unit 301, an operation data monitoring unit 302, and a transmission unit 303 is stored in the main storage area 360 and executed by the CPU 17.

図７は、業務サーバ３１が情報集約サーバ１１に送信するインシデント情報４００のテーブルと稼働データ情報４１０のテーブルを示す。図面ではスペースの問題で上下に分かれて記載されているが、本実施例では一つのテーブルとして実現された例で説明する。以下のテーブルの図面についても同様の表記である。インシデント情報４００は、テナントID４０１、インシデントグループ名４０２、インスタンスID４０３、重大度４０４、インシデント発生日時４０５、インシデント種別４０６、メッセージ４０７から構成される。テナントID４０１は、インシデントが発生した業務サーバが属するテナントのIDであり、各テナントを識別する。インシデントグループ名４０２は、インシデントが発生した業務サーバが属する業務システム２２の名称であり、同一テナント内の業務システム２２を識別する。インスタンスID４０３は、インシデントが発生した業務サーバの名称であり、同一業務システム２２内の業務サーバを識別する。重大度４０４は、発生したインシデントの重大度を示し、「Error」、「Warning」の2種類のいずれかが入力される。インシデント種別４０６は、インシデントの種類を示し、運用者によって自由に定義可能である。メッセージ４０７は、インシデントの内容を示す。稼働データ情報４１０は、テナントID４１１、インシデントグループ名４１２、インスタンスID４１３、取得日時４１４、稼働データ４１５から構成される。取得日時は稼働データ４１５を取得した日時であり、稼働データ４１５は、システム運用者によって指定された取得対象の稼働データ分だけ付加される。 FIG. 7 shows a table of incident information 400 and a table of operation data information 410 that the business server 31 transmits to the information aggregation server 11. In the drawing, the upper and lower parts are separately described due to space problems, but in the present embodiment, an example realized as one table will be described. The same notation applies to the drawings of the following tables. Incident information 400 includes tenant ID 401, incident group name 402, instance ID 403, severity 404, incident occurrence date / time 405, incident type 406, and message 407. The tenant ID 401 is an ID of a tenant to which the business server in which the incident has occurred and identifies each tenant. The incident group name 402 is the name of the business system 22 to which the business server where the incident occurred belongs, and identifies the business system 22 in the same tenant. The instance ID 403 is the name of the business server in which the incident has occurred, and identifies the business server in the same business system 22. The severity 404 indicates the severity of the incident that has occurred, and one of two types of “Error” and “Warning” is input. Incident type 406 indicates the type of incident and can be freely defined by the operator. A message 407 indicates the contents of the incident. The operation data information 410 includes a tenant ID 411, an incident group name 412, an instance ID 413, an acquisition date 414, and operation data 415. The acquisition date and time is the date and time when the operation data 415 is acquired, and the operation data 415 is added only for the operation data to be acquired designated by the system operator.

図８は、情報集約サーバ１１が有するシステム情報テーブル１２１とインシデント対応テーブル１２２のデータ構成を示す。システム情報テーブル１２１とインシデント対応テーブル１２２は、優先度判定部１１１がインシデントの優先度を定義するために読み出されるテーブルである。システム情報テーブル１２１は、テナントID５０１、インシデントグループ名５０２、業務機能５０３、サービス稼働率５０４、サービスコアタイム５０５から構成される。業務機能５０３は、業務システム２２内で稼働する業務機能の名称であり、サービス稼働率５０４は業務機能５０３のサービス稼働率を示す。サービスコアタイム５０５は業務機能５０３が最も利用される時間帯といったインシデントが発生した際に当該業務機能への影響度が大きい期間を示す。インシデント対応テーブル１２２は、テナントID５１１、インシデントグループ名５１２、インスタンスID５１３、メッセージ５１４、復旧リミット時間５１５、復旧作業時間５１６、業務機能５１７から構成される。復旧リミット時間５１５は、該当するインシデントを復旧しなければならいリミット時間を示し、復旧作業時間５１６は、そのインシデントに対する復旧作業に要する時間を示す。インシデントが発生した際には、優先度判定部１１１がインシデント情報４００のテナントID４０１、インシデントグループ名４０２、インスタンスID４０３、メッセージ４０７と合致するか比較し、発生したインシデントがインシデント対応テーブル１２２に情報が登録されているかを確認する。 FIG. 8 shows the data structure of the system information table 121 and the incident response table 122 that the information aggregation server 11 has. The system information table 121 and the incident response table 122 are tables that are read in order for the priority determination unit 111 to define the priority of incidents. The system information table 121 includes a tenant ID 501, an incident group name 502, a business function 503, a service operation rate 504, and a service core time 505. The business function 503 is the name of a business function that operates in the business system 22, and the service operation rate 504 indicates the service operation rate of the business function 503. The service core time 505 indicates a period during which the degree of influence on the business function is large when an incident such as a time zone in which the business function 503 is most used occurs. The incident correspondence table 122 includes a tenant ID 511, an incident group name 512, an instance ID 513, a message 514, a recovery limit time 515, a recovery work time 516, and a business function 517. The recovery limit time 515 indicates the limit time that the corresponding incident must be recovered, and the recovery work time 516 indicates the time required for the recovery work for the incident. When an incident occurs, the priority determination unit 111 compares the incident information 400 with the tenant ID 401, incident group name 402, instance ID 403, and message 407 of the incident information 400, and the incident is registered in the incident response table 122. Check if it is.

図９は、情報集約サーバ１１が有する優先度付きインシデント情報テーブル１２５のデータ構成を示す。優先度付きインシデント情報テーブル１２５は、テナントID５２１、インシデントグループ名５２２、業務機能５２３、優先度５２４、インスタンスID５２５、重大度５２６、インシデント発生日時５２７、インシデント種別５２８、メッセージ５２９、復旧リミット時間５３０、復旧作業時間５１６から構成される。インシデントが発生すると、優先度判定部１１１で優先度が定義され、インシデント登録部１１２にて優先度付きインシデント情報テーブル１２５にインシデント情報が登録される。また、ナビゲーションサーバ１２から定期的に情報が取得される。取得されたレコードは削除される。 FIG. 9 shows the data structure of the priority-added incident information table 125 that the information aggregation server 11 has. The incident information table with priority 125 includes tenant ID 521, incident group name 522, business function 523, priority 524, instance ID 525, severity 526, incident occurrence date / time 527, incident type 528, message 529, recovery limit time 530, recovery It consists of a work time 516. When an incident occurs, the priority determination unit 111 defines the priority, and the incident registration unit 112 registers the incident information in the incident information table 125 with priority. In addition, information is periodically acquired from the navigation server 12. The acquired record is deleted.

図１０は、情報集約サーバ１１が有する稼働データ情報テーブル１２３と重要稼働データテーブル１２４のデータ構成を示す。稼働データ情報テーブル１２３は、テナントID５４１、インシデントグループ名５４２、インスタンスID５４３、取得日時５４４、稼働データ５４５から構成される。情報集約サーバ１１は、業務サーバ３１から受信した稼働データ情報４１０をそのまま稼働データ情報テーブル１２３に登録する。重要稼働データテーブル１２４は、テナントID５５１、インシデントグループ名５５２、インスタンスID５５３、重要稼働データ５５４から構成される。重要稼働データ５５４は、業務サーバ３１上で取得している複数ある稼働データ３２３の中で最も重要となる稼働データを運用者によって２つ以上で最大４つまで登録することが可能である。２つのテーブルは、ともにインシデントが発生した際に、優先度判定部１１１にて優先度を定義する際に読み出されるテーブルである。 FIG. 10 shows the data structure of the operation data information table 123 and the important operation data table 124 that the information aggregation server 11 has. The operation data information table 123 includes a tenant ID 541, an incident group name 542, an instance ID 543, an acquisition date and time 544, and operation data 545. The information aggregation server 11 registers the operating data information 410 received from the business server 31 in the operating data information table 123 as it is. The important operation data table 124 includes a tenant ID 551, an incident group name 552, an instance ID 553, and important operation data 554. The important operation data 554 can register the most important operation data among a plurality of operation data 323 acquired on the business server 31 by the operator up to a maximum of four. The two tables are tables that are read when the priority determination unit 111 defines priorities when an incident occurs.

図１１は、ナビゲーションサーバ１２が有するインシデント情報表示テーブル２２２のデータ構成例を示す。インシデント情報表示テーブル２２２は、テナントID６０１、インシデントグループ名６０２、業務機能６０３、優先度６０４、インスタンスID６０５、インシデントID６０６、重大度６０７、インシデント発生日時６０８、インシデント種別６０９、メッセージ６１０、復旧リミット時間６１１、復旧作業時間６１２、基盤情報６１３、センタ情報６１４から構成される。インシデント情報表示テーブル２２２は、ユーザからインシデント一覧表示のリクエストが来た際に、インシデント表示部２１４から読み出されるテーブルである。インシデントID６０６は、優先度付きインシデント登録部２１２がインシデント情報表示テーブル２２２にインシデントを登録する際に生成されるインシデントを識別するIDである。基盤情報６１３は、業務サーバ３１が稼働しているクラウドシステムの名称やオンプレ環境かを示し、センタ情報６１４は、業務サーバ３１が稼働しているセンタの場所を示す。 FIG. 11 shows a data configuration example of the incident information display table 222 included in the navigation server 12. Incident information display table 222 includes tenant ID 601, incident group name 602, business function 603, priority 604, instance ID 605, incident ID 606, severity 607, incident occurrence date and time 608, incident type 609, message 610, recovery limit time 611, It consists of a recovery work time 612, infrastructure information 613, and center information 614. The incident information display table 222 is a table read from the incident display unit 214 when an incident list display request is received from a user. The incident ID 606 is an ID for identifying an incident that is generated when the incident registration unit with priority 212 registers an incident in the incident information display table 222. The base information 613 indicates the name of the cloud system on which the business server 31 is operating and the on-pre environment, and the center information 614 indicates the location of the center where the business server 31 is operating.

図１２は、ナビゲーションサーバ１２が有するシステム基盤情報テーブル２２１とインシデント対応履歴テーブル２２３のデータ構成例を示す。システム基盤情報テーブル２２１は、テナントID６２１、インシデントグループ名６２２、インスタンスID６２３、基盤情報６２４、センタ情報６２５から構成され、予め運用担当者によって情報が登録されるテーブルである。また、システム基盤情報テーブル２２１は、優先度付きインシデント登録部２１２がインシデント情報表示テーブル２２２にインシデントを登録する際に呼び出され、該当する基盤情報６２４、センタ情報６２５の情報がインシデントに付加される。インシデント対応履歴テーブル２２３は、テナントID６３１、インシデントグループ名６３２、インスタンスID６３３、インシデントID６３４、ユーザ名６３５、ステータス６３６、登録日時６３７、対応履歴６３８から構成される。インシデント対応履歴テーブル２２３は、インシデント発生時に新規レコードが作成され、ステータス６３６は「open」、ユーザ名６３５、対応履歴６３８は何も情報を入力せずに登録される。ユーザが情報を更新する際に、ユーザ名６３５には情報を入力したユーザ名が入り、ステータス６３６には対応内容に応じて「going」、「close」のいずれかが入力され、対応履歴６３８にはインシデントの対応内容が入力される。
図１３は、ナビゲーションサーバ１２が有するエスカレーション情報テーブル２２４のデータ構成を示す。エスカレーション情報テーブル２２４は、テナントID６４１、インシデントグループ名６４２、インスタンスID６４３、基盤情報６４４、センタ情報６４５、連絡先６４６から構成される。連絡先６４６は複数指定することができ、運用者によって自由に登録するこが可能である。エスカレーション情報テーブル２２４は、インシデント発生時にインシデントに対応する通知先として情報が読み出される。また、業務サーバ３１が別クラウドシステムに移行した際には、移行先のクラウドシステムの情報に基盤情報６４４、センタ情報６４５、連絡先６４６が更新される。 FIG. 12 shows a data configuration example of the system infrastructure information table 221 and the incident response history table 223 that the navigation server 12 has. The system infrastructure information table 221 includes a tenant ID 621, an incident group name 622, an instance ID 623, infrastructure information 624, and center information 625, and information is registered in advance by a person in charge of operations. Further, the system infrastructure information table 221 is called when the incident registration section 212 with priority registers an incident in the incident information display table 222, and the information of the corresponding infrastructure information 624 and center information 625 is added to the incident. The incident response history table 223 includes a tenant ID 631, an incident group name 632, an instance ID 633, an incident ID 634, a user name 635, a status 636, a registration date and time 637, and a response history 638. In the incident handling history table 223, a new record is created when an incident occurs, the status 636 is registered as “open”, the user name 635, and the handling history 638 are registered without inputting any information. When the user updates the information, the user name 635 contains the name of the user who entered the information, and the status 636 is input with either “going” or “close” according to the content of the correspondence. The incident response content is entered.
FIG. 13 shows the data structure of the escalation information table 224 that the navigation server 12 has. The escalation information table 224 includes a tenant ID 641, an incident group name 642, an instance ID 643, infrastructure information 644, center information 645, and a contact address 646. A plurality of contact addresses 646 can be designated and can be freely registered by the operator. Information is read from the escalation information table 224 as a notification destination corresponding to an incident when the incident occurs. Further, when the business server 31 is transferred to another cloud system, the base information 644, the center information 645, and the contact information 646 are updated to the information of the destination cloud system.

図１４は、ナビゲーションサーバ１２が表示するインシデント一覧７００の画面を示す。画面構成は優先度の高いインシデントを表示する「重要インシデント７１１」一覧を上部に、優先度の低いインシデントを表示する「インシデント７１２」一覧を下部に配置し、優先度によって一覧表示を区別した画面構成である。それぞれの一覧には、インシデント情報表示テーブル２２２から読み出された復旧リミット時間６１１、復旧作業時間６１２と現在時刻をもとに算出する「残り時間」の情報が表示され、「残り時間」が小さい順にインシデントが表示される。復旧リミット時間６１１が登録されていないインシデントは「残り時間」には「-」が表示される。また、「インシデント７１２」一覧で表示されているインシデントは、「残り時間」が時間の経過とともに小さくなっていき、3時間以下になると優先度が「高」に変更され、「重要インシデント７１１」一覧側で表示される。また、任意の文字列を入力して特定のインシデントのみ抽出可能な検索機能も有している。 FIG. 14 shows a screen of the incident list 700 displayed by the navigation server 12. The screen configuration is a screen configuration in which the “important incident 711” list for displaying high priority incidents is arranged at the top and the “incident 712” list for displaying low priority incidents is arranged at the bottom, and the list display is distinguished by priority. It is. In each list, information on the “remaining time” calculated based on the recovery limit time 611 and the recovery work time 612 read from the incident information display table 222 and the current time is displayed, and the “remaining time” is small. Incidents are displayed in order. For incidents for which the recovery limit time 611 is not registered, “-” is displayed in “Remaining time”. In addition, the incident displayed in the “incident 712” list has a “remaining time” that decreases with the passage of time. When the time is less than 3 hours, the priority is changed to “high”, and the “important incident 711” list. Displayed on the side. It also has a search function that can extract only a specific incident by inputting an arbitrary character string.

図１５は、ナビゲーションサーバ１２が表示するインシデント詳細表示７２０の画面を示す。インシデント詳細表示７２０は、インシデント一覧７００で表示されているインシデントを一つ選択し、選択した状態でインシデント詳細表示７０１のボタンをクリックした際の遷移先の画面である。画面は、インシデント一覧７００では表示されない「基盤情報」や「センタ情報」、インシデントの影響を受ける「業務機能」などの情報をインシデント情報表示テーブル２２２から読み出し表示する構成となっている。 FIG. 15 shows a screen of the incident details display 720 displayed by the navigation server 12. The incident detail display 720 is a transition destination screen when one incident displayed in the incident list 700 is selected and the button of the incident detail display 701 is clicked in the selected state. The screen is configured to read information from the incident information display table 222 such as “basic information”, “center information”, and “business function” affected by the incident that are not displayed in the incident list 700.

図１６は、監視サーバの動作を示すフローチャートである。 FIG. 16 is a flowchart showing the operation of the monitoring server.

ステップ８０１：稼働データ監視部３０２は、稼働データ監視テーブル３１２から監視対象の稼働データと閾値の情報を読み出し、該当する稼働データ３２３の数値を取得する。 Step 801: The operation data monitoring unit 302 reads the operation data and threshold information to be monitored from the operation data monitoring table 312 and acquires the numerical value of the corresponding operation data 323.

ステップ８０２：稼働データ監視部３０２は、取得した数値と閾値を比較し、閾値を超えていた場合は、ステップ８０５へ移る。 Step 802: The operating data monitoring unit 302 compares the acquired numerical value with a threshold value, and moves to step 805 if the threshold value is exceeded.

ステップ８０３：稼働データ監視部３０２は、送信部３０３を呼び出し、取得した稼働データ３２３から稼働データ情報４１０を生成し、情報集約サーバ１１に送信する。 Step 803: The operation data monitoring unit 302 calls the transmission unit 303, generates operation data information 410 from the acquired operation data 323, and transmits it to the information aggregation server 11.

ステップ８０４：稼働データ監視部３０２は、定義された監視間隔だけ待機し、ステップ８０１に戻る。 Step 804: The operating data monitoring unit 302 waits for the defined monitoring interval and returns to step 801.

ステップ８０５：稼働データ監視部３０２は、インシデント生成部３１３を呼び出し、インシデント情報４００を生成する。ステップ８０６：稼働データ監視部３０２は、送信部３０３を呼び出し、生成したインシデント情報４００を情報集約サーバ１１に送信し、ステップ８０３に移る。 Step 805: The operation data monitoring unit 302 calls the incident generation unit 313 to generate the incident information 400. Step 806: The operation data monitoring unit 302 calls the transmission unit 303, transmits the generated incident information 400 to the information aggregation server 11, and proceeds to step 803.

図１７は、監視サーバ３５０のログ監視の動作を示すフローチャートである。 FIG. 17 is a flowchart showing the log monitoring operation of the monitoring server 350.

ステップ８１１：ログ監視部３０１は、ログ監視テーブル３１３から監視対象のログと監視文字列の情報を読み出し、該当するOSイベントログ３２１や業務ログ３２２の情報を取得する。 Step 811: The log monitoring unit 301 reads information on a monitoring target log and a monitoring character string from the log monitoring table 313, and acquires information on the corresponding OS event log 321 and business log 322.

ステップ８１２：ログ監視部３０１は、取得したログ情報が更新されているか確認し、更新されていいなかった場合は、ステップ８１６へ移る。 Step 812: The log monitoring unit 301 confirms whether or not the acquired log information has been updated. If it has not been updated, the log monitoring unit 301 proceeds to step 816.

ステップ８１３：ログ監視部３０１は、取得したログ情報と監視文字列が一致するか比較し、一致しない場合は、ステップ８１６へ移る。 Step 813: The log monitoring unit 301 compares the acquired log information with the monitoring character string to determine whether they match, and if not, moves to step 816.

ステップ８１４：ログ監視部３０１は、インシデント生成部３１３を呼び出し、インシデント情報４００を生成する。
ステップ８１５：ログ監視部３０１は、送信部３０３を呼び出し、生成したインシデント情報４００を情報集約サーバ１１に送信する。 Step 814: The log monitoring unit 301 calls the incident generation unit 313 to generate incident information 400.
Step 815: The log monitoring unit 301 calls the transmission unit 303 and transmits the generated incident information 400 to the information aggregation server 11.

ステップ８１６：ログ監視部３０１は、定義された監視間隔だけ待機し、ステップ８１１に戻る。 Step 816: The log monitoring unit 301 waits for the defined monitoring interval, and returns to Step 811.

次に、情報集約サーバ１１の処理について説明する。情報集約サーバ１１は受信部１０１経由で監視サーバ３５０から送信された稼働データやインシデント情報を受信し、稼働データ登録部１１３が稼働データ情報テーブル１２３へ登録する。情報集約サーバ１１が受け持つ全ての監視サーバ３５０からの情報を受け取り、受け取った稼働データやインシデント情報をマージして保管する。 Next, processing of the information aggregation server 11 will be described. The information aggregation server 11 receives the operation data and incident information transmitted from the monitoring server 350 via the receiving unit 101, and the operation data registration unit 113 registers them in the operation data information table 123. Information from all the monitoring servers 350 handled by the information aggregation server 11 is received, and the received operation data and incident information are merged and stored.

図１８は、情報集約サーバ１１がインシデントに優先度を定義する動作を示すフローチャートである。 FIG. 18 is a flowchart illustrating an operation in which the information aggregation server 11 defines priority for an incident.

ステップ８２１：情報集約サーバ１１は、監視サーバ３５０から送信されたインシデント情報４００を受信部１０１から受信する。 Step 821: The information aggregation server 11 receives the incident information 400 transmitted from the monitoring server 350 from the receiving unit 101.

ステップ８２２：優先度判定部１１１は、インシデント対応テーブル１２２を読み出し、受信したインシデント情報４００が、インシデント対応テーブル１２２に登録されているインシデントか比較する。 Step 822: The priority determination unit 111 reads the incident correspondence table 122 and compares the received incident information 400 with an incident registered in the incident correspondence table 122.

ステップ８２３：インシデント情報４００がインシデント対応テーブル１２２に登録されていなかった場合は、ステップ８３０に移る。 Step 823: If the incident information 400 is not registered in the incident correspondence table 122, the process proceeds to Step 830.

ステップ８２４：優先度判定部１１１は、インシデント対応テーブル１２２から該当する業務機能５１７の情報を抽出し、インシデント情報４００に付加する。 Step 824: The priority determination unit 111 extracts information on the corresponding business function 517 from the incident correspondence table 122 and adds it to the incident information 400.

ステップ８２５：優先度判定部１１１は、インシデント対応テーブル１２２から該当する復旧リミット時間５１５を読み出し、復旧リミット時間５１５が登録されていない場合は、ステップ８３１に移る。 Step 825: The priority determination unit 111 reads the corresponding recovery limit time 515 from the incident response table 122, and if the recovery limit time 515 is not registered, the priority determination unit 111 proceeds to step 831.

ステップ８２６：優先度判定部１１１は、インシデント対応テーブル１２２から該当する復旧リミット時間５１５、復旧作業時間５１６を抽出し、インシデント情報４００に付加する。 Step 826: The priority determination unit 111 extracts the corresponding recovery limit time 515 and the recovery work time 516 from the incident correspondence table 122, and adds them to the incident information 400.

ステップ８２７：抽出した復旧リミット時間５１５が3時間を超える場合は、ステップ８３１に移る。 Step 827: If the extracted recovery limit time 515 exceeds 3 hours, the process proceeds to Step 831.

ステップ８２８：優先度判定部１１１は、インシデント情報４００に優先度情報「高」を付加する。 Step 828: The priority determination unit 111 adds priority information “high” to the incident information 400.

ステップ８２９：優先度判定部１１１は、インシデント登録部１１２を呼び出し、インシデント情報４００を優先度付きインシデント情報テーブル１２５に登録する。 Step 829: The priority determination unit 111 calls the incident registration unit 112 and registers the incident information 400 in the incident information table with priority 125.

ステップ８３０：優先度判定部１１１は、インシデント情報４００に優先度情報「低」を付加し、ステップ８２９に移る。 Step 830: The priority determination unit 111 adds priority information “low” to the incident information 400, and proceeds to step 829.

図１９は、情報集約サーバ１１がインシデントに優先度を定義する動作を示すフローチャートである。 FIG. 19 is a flowchart illustrating an operation in which the information aggregation server 11 defines priority for an incident.

ステップ８４１：優先度判定部１１１は、システム情報テーブル１２１から、インシデント情報４００のテナントID４０１、インシデントグループ名４０２、ステップ８２４でインシデント情報４００に付加した業務機能５１７が一致するレコードを読み出し、該当するレコードのサービス稼働率５０４の情報を取得する。 Step 841: The priority determination unit 111 reads from the system information table 121 a record in which the tenant ID 401 of the incident information 400, the incident group name 402, and the business function 517 added to the incident information 400 in Step 824 match, and the corresponding record Information of the service operation rate 504 is acquired.

ステップ８４２：サービス稼働率５０４が99.7％以上の場合、ステップ８４８に移る。 Step 842: If the service operation rate 504 is 99.7% or more, the procedure goes to Step 848.

ステップ８４３：優先度判定部１１１は、ステップ８４１で読み出したレコードのサービスコアタイム５０５の情報を取得する。 Step 843: The priority determination unit 111 acquires information on the service core time 505 of the record read in Step 841.

ステップ８４４：優先度判定部１１１は、重要稼働データテーブル１２４から、インシデント情報４００のテナントID４０１、インシデントグループ名４０２、インスタンスID４０３と一致するレコードを読み出し、該当するレコードの重要稼働データ５５４の情報を取得する。優先度判定部１１１は、さらに稼働データテーブル１２３から、インシデント情報４００のテナントID４０１、インシデントグループ名４０２、インスタンスID４０３と一致するレコードを読み出し、重要稼働データ５５４と一致する稼働データ５４５を取得する。 Step 844: The priority determination unit 111 reads a record that matches the tenant ID 401, the incident group name 402, and the instance ID 403 of the incident information 400 from the important operation data table 124, and acquires information on the important operation data 554 of the corresponding record. To do. The priority determination unit 111 further reads a record that matches the tenant ID 401, the incident group name 402, and the instance ID 403 of the incident information 400 from the operation data table 123, and acquires the operation data 545 that matches the important operation data 554.

ステップ８４５：インシデント情報４００のインシデント発生日時４０５が、ステップ８４２で取得したサービスコアタイム５０５内であり、かつステップ８４４で取得した稼働データ５４５の内、閾値を超過したデータが2つ以上ある場合は、ステップ８４８へ移る。 Step 845: When the incident occurrence date / time 405 of the incident information 400 is within the service core time 505 acquired at step 842 and there are two or more data exceeding the threshold among the operation data 545 acquired at step 844 , Step 848 is entered.

ステップ８４６：優先度判定部１１１は、インシデント情報４００に優先度情報「低」を付加することによりインシデントの優先順位を下げることが可能となる。 Step 846: The priority determination unit 111 can lower the priority of the incident by adding the priority information “low” to the incident information 400.

ステップ８４７：優先度判定部１１１は、インシデント登録部１１２を呼び出し、インシデント情報４００を優先度付きインシデント情報テーブル１２５に登録する。登録されたインシデント情報は集められた複数のクラウドシステムのインシデント情報がマージされているため、このクラウド管理システムが管理しているシステム内で発生しているインシデントのうち、最も優先度の高いインシデントから出力していくことが可能となる。 Step 847: The priority determination unit 111 calls the incident registration unit 112 and registers the incident information 400 in the incident information table 125 with priority. Since the registered incident information is merged from the collected incident information of multiple cloud systems, the incident with the highest priority among the incidents that occur in the system managed by this cloud management system. It becomes possible to output.

ステップ８４８：優先度判定部１１１は、インシデント情報４００に優先度情報「高」を付加し、ステップ８４７に移る。 Step 848: The priority determination unit 111 adds priority information “high” to the incident information 400, and proceeds to step 847.

図２０は、ナビゲーションサーバ１２がインシデントを取得・登録する動作のフローチャートである。 FIG. 20 is a flowchart of an operation in which the navigation server 12 acquires and registers an incident.

ステップ８６１：ナビゲーションサーバ１２は、インシデント取得部２１１から情報集約サーバ１１に接続する。 Step 861: The navigation server 12 connects from the incident acquisition unit 211 to the information aggregation server 11.

ステップ８６２：インシデント取得部２１１は、情報集約サーバ１１の優先度付きインシデント情報テーブル１２５から未取得のインシデント情報を取得する。 Step 862: The incident acquisition unit 211 acquires unacquired incident information from the priority incident information table 125 of the information aggregation server 11.

ステップ８６３：優先度付きインシデント登録部２１２は、取得したインシデント情報とシステム基盤情報テーブル２２１のテナントID６２１、インシデントグループ名６２２、インスタンスID６２３を比較し、一致するレコードを読み出し、該当するレコードの基盤情報６２４、センタ情報６２５の情報を取得する。 Step 863: The incident registration unit with priority 212 compares the acquired incident information with the tenant ID 621, the incident group name 622, and the instance ID 623 of the system infrastructure information table 221, reads the matching records, and the infrastructure information 624 of the corresponding records The information of the center information 625 is acquired.

ステップ８６４：優先度付きインシデント登録部２１２は、基盤情報６２４、センタ情報６２５を取得したインシデント情報に付加する。 Step 864: The priority-added incident registration unit 212 adds the base information 624 and the center information 625 to the acquired incident information.

ステップ８６５：優先度付きインシデント登録部２１２は、インシデントを識別するインシデントIDを生成し、インシデント情報に付加する。 Step 865: The priority-added incident registration unit 212 generates an incident ID for identifying the incident and adds it to the incident information.

ステップ８６６：インシデント対応履歴登録部２１３は、インシデント対応履歴テーブル２２３に新規レコードを追加し、テナントID６３１、インシデントグループ名６３２、インスタンスID、インシデントIDにはインシデント情報を入力する。ステータス６３６には「open」を入力し、ユーザ名６３５、対応内容６３７には何も入力しない。 Step 866: The incident response history registration unit 213 adds a new record to the incident response history table 223, and inputs incident information for the tenant ID 631, the incident group name 632, the instance ID, and the incident ID. “Open” is input to the status 636, and nothing is input to the user name 635 and the corresponding content 637.

ステップ８６７：優先度付きインシデント登録部２１２は、インシデント情報をインシデント情報表示テーブル２２２に登録する。 Step 867: The priority-affected incident registration unit 212 registers incident information in the incident information display table 222.

ステップ８６８：インシデント取得部２１１は、情報集約サーバ１１の優先度付きインシデント情報テーブル１２５のインシデント情報を全て取得していない場合は、ステップ８６２に移る。 Step 868: If the incident acquisition unit 211 has not acquired all of the incident information in the incident information table 125 with priority of the information aggregation server 11, the process proceeds to step 862.

ステップ８６９：インシデント取得部２１１は、定義された監視間隔だけ待機し、ステップ８６１に移る。 Step 869: The incident acquisition unit 211 waits for the defined monitoring interval, and proceeds to Step 861.

図２１は、ナビゲーションサーバ１２がインシデント一覧画面７００を表示する動作のフローチャートである。 FIG. 21 is a flowchart of an operation in which the navigation server 12 displays the incident list screen 700.

ステップ８８１：ナビゲーションサーバ１２は、ユーザからインシデント一覧画面７００の要求を受け付ける。 Step 881: The navigation server 12 receives a request for the incident list screen 700 from the user.

ステップ８８２：ナビゲーションサーバ１２のインシデント表示部２１４は、インシデント情報表示テーブル２２２からインシデント情報を取得する。 Step 882: The incident display unit 214 of the navigation server 12 acquires incident information from the incident information display table 222.

ステップ８８３：インシデント表示部２１４は、取得したインシデント情報の復旧リミット時間６１１が登録されていない場合は、ステップ８８９に移る。 Step 883: If the recovery limit time 611 of the acquired incident information is not registered, the incident display unit 214 proceeds to step 889.

ステップ８８４：インシデント表示部２１４は、（インシデント発生日時６０８＋復旧リミット時間６１１）−(現在時刻＋復旧作業時間６１２)で「残り時間」を算出する。 Step 884: The incident display unit 214 calculates “remaining time” by (incident occurrence date / time 608 + recovery limit time 611) − (current time + recovery work time 612).

ステップ８８５：インシデント情報の優先度６０４が「高」の場合は、ステップ８８７に移る。 Step 885: If the priority 604 of the incident information is “high”, the process proceeds to step 887.

ステップ８８６：ステップ８８４で算出した「残り時間」が3時間以下の場合は、ステップ８９０に移る。 Step 886: If the “remaining time” calculated in step 884 is 3 hours or less, the process proceeds to step 890.

ステップ８８７：インシデント表示部２１４は、「残り時間」の情報をインシデント情報に付加する。 Step 887: The incident display unit 214 adds the “remaining time” information to the incident information.

ステップ８８８：インシデント表示部２１４は、インシデント対応履歴テーブル２２３からインシデントID６３４が一致するレコードのステータス６３６を読み出し、インシデント情報に付加する。 Step 888: The incident display unit 214 reads the status 636 of the record with the matching incident ID 634 from the incident response history table 223 and adds it to the incident information.

ステップ８８９：インシデント表示部２１４は、優先度に応じて、インシデント情報を
インシデント一覧画面７００に表示する。インシデントの発生している業務サーバと同じ物理計算機で稼働している他の業務サーバが有る場合には、当該業務サーバについてもインシデントの影響を受けることを示す情報をインシデント画面７００に追加しても良い。 Step 889: The incident display unit 214 displays incident information on the incident list screen 700 according to the priority. If there is another business server running on the same physical computer as the business server where the incident occurs, information indicating that the business server is also affected by the incident may be added to the incident screen 700 good.

ステップ８９０：インシデント表示部２１４は、インシデント情報の優先度を「高」に変更し、ステップ８８７に移る。 Step 890: The incident display unit 214 changes the priority of the incident information to “high”, and proceeds to step 887.

図２２は、業務サーバ３１が他クラウドシステムへ移行した時の動作のフローチャートである。 FIG. 22 is a flowchart of the operation when the business server 31 is transferred to another cloud system.

ステップ９０１：業務サーバ３１が他のクラウドシステムへ移行する。 Step 901: The business server 31 moves to another cloud system.

ステップ９０２：業務サーバ３１は、送信部３０３から移行先クラウドシステム情報をナビゲーションサーバ１２に送信する。 Step 902: The business server 31 transmits the migration destination cloud system information from the transmission unit 303 to the navigation server 12.

ステップ９０３：ナビゲーションサーバ１２は、システム基盤情報登録部２１５を呼び出し、システム基盤情報テーブル２２１内の移行した業務サーバ３１に該当するレコードを読み出し、基盤情報６２４、センタ情報６２５を移行先クラウドシステム情報に更新する。 Step 903: The navigation server 12 calls the system infrastructure information registration unit 215, reads the record corresponding to the migrated business server 31 in the system infrastructure information table 221, and converts the infrastructure information 624 and the center information 625 to the migration destination cloud system information. Update.

ステップ９０４：ナビゲーションサーバ１２は、メール送信部２１６を呼び出し、移行した業務サーバ３１から送信された移行先クラウドシステム情報から、エスカレーション情報テーブル２２４内でテナントID６４１、基盤情報６４４、センタ情報６４５と一致するレコードの連絡先６４６を取得する。 Step 904: The navigation server 12 calls the mail transmission unit 216, and matches the tenant ID 641, infrastructure information 644, and center information 645 in the escalation information table 224 from the migration destination cloud system information transmitted from the migrated business server 31. The record contact information 646 is acquired.

ステップ９０５：メール送信部２１６は、連絡先６４６へ業務サーバ３１が移行したことを通知する。 Step 905: The mail transmission unit 216 notifies the contact address 646 that the business server 31 has been migrated.

１０・・・クラウド管理システム、１１・・・情報集約サーバ、１２・・・ナビゲーションサーバ、１３・・・テナント、３１・・・業務サーバ、１０１・・・受信部、１０２・・ネットワークインタフェース部、１１１・・・優先度判定部、１１２・・・インシデント登録部、１１３・・・稼働データ登録部、１２１・・・システム情報テーブル、１２２・・・インシデント対応テーブル、１２３・・・稼働データ情報テーブル、１２４・・・重要稼働データテーブル、１２５・・・優先度付きインシデント情報テーブル、２０１・・・表示部、２１４・・・インシデント一覧表示部、２２２・・・インシデント情報表示テーブル、３０１・・・ログ監視部、３０２・・・稼働データ監視部、３２１・・・OSイベントログ、３２２・・・業務ログ、３２３・・・稼働データ、７００・・・インシデント一覧画面、７２０・・・インシデント詳細表示画面。 DESCRIPTION OF SYMBOLS 10 ... Cloud management system, 11 ... Information aggregation server, 12 ... Navigation server, 13 ... Tenant, 31 ... Business server, 101 ... Receiving part, 102 ... Network interface part, 111: Priority determination unit, 112: Incident registration unit, 113: Operation data registration unit, 121 ... System information table, 122 ... Incident response table, 123 ... Operation data information table 124 ... Important operation data table, 125 ... incident information table with priority, 201 ... display unit, 214 ... incident list display unit, 222 ... incident information display table, 301 ... Log monitoring unit, 302... Operation data monitoring unit, 321... OS event log, 322. Log, 323 ... operational data, 700 ... incident list screen, 720... Incident details display screen.

Claims

A cloud management system for managing incidents of business servers of multiple tenants running on virtual machines running on multiple cloud systems,
An incident receiver that receives and merges incident information from the business server;
A priority determination unit that determines the priority of the incident in the plurality of cloud systems based on the type and occurrence time of the merged incident information;
A cloud management system comprising an output unit that outputs incident information according to a determined priority.

Each business server has priority information that describes the incident priority according to the incident type and time of occurrence.
The cloud management system according to claim 1, wherein the priority determination unit obtains the priority of an incident that has occurred based on information in the priority table.

A server load measurement unit that measures the load of each business server
3. The cloud management system according to claim 2, further comprising a priority adjustment unit that increases an incident priority of the business server during a time period in which the business server load measured by the server load measurement unit exceeds a predetermined value. .

A server load measurement unit that measures the load of each business server
The cloud management according to claim 2, further comprising a priority adjustment unit that lowers the incident priority of the business server during a time period when the business server load measured by the server load measurement unit falls below a predetermined value. system.

The priority information further stores information on the recovery time corresponding to the incident,
The cloud management system according to claim 2, wherein the output unit outputs information related to the remaining time necessary for handling the incident in association with the incident.

When a virtual machine moves to another cloud system, the information related to the incident of the business server that runs on the moved virtual machine is changed from the source cloud system to the destination cloud system and output as an incident of the destination cloud system. The cloud management system according to claim 4, wherein the system is a cloud management system.

A cloud management method for managing incidents of business servers of multiple tenants running on virtual machines running on multiple cloud systems,
The incident receiving unit receives and merges incident information from the business server,
The priority determination unit obtains the priority of the incident in the plurality of cloud systems based on the merged incident information type and occurrence time,
A cloud management method comprising: outputting incident information according to a priority obtained by an output unit.

Includes priority information that describes incident priority according to incident type and time of occurrence for each business server,
The cloud management method according to claim 7, wherein the priority determination unit obtains the priority of the incident that has occurred based on information in the priority table.

The server load measurement unit measures the load of each business server,
9. The cloud management method according to claim 8, wherein the priority adjustment unit increases the incident priority of the business server during a time period in which the business server load measured by the server load measurement unit exceeds a predetermined value.

The server load measurement unit measures the load of each business server,
9. The cloud management method according to claim 8, wherein the priority adjustment unit lowers the incident priority of the business server during a time period when the business server load measured by the server load measurement unit falls below a predetermined value. .

The priority information further stores information on the recovery time corresponding to the incident,
The cloud management method according to claim 8-11, wherein the output unit outputs information related to a remaining time necessary for handling an incident in association with the incident.

When a virtual machine moves to another cloud system, the information related to the incident of the business server that runs on the moved virtual machine is changed from the source cloud system to the destination cloud system, and the output unit becomes an incident as an incident of the destination cloud system The cloud management method according to claim 10, wherein: