JP2005099973A

JP2005099973A - Operation management system

Info

Publication number: JP2005099973A
Application number: JP2003330941A
Authority: JP
Inventors: Tadashi Iwata; 正岩田; Kazunari Hirayama; 和成平山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-09-24
Filing date: 2003-09-24
Publication date: 2005-04-14

Abstract

<P>PROBLEM TO BE SOLVED: To exhibit an optimum system configuration satisfying requirement performance in a present operation pattern by monitoring the operation pattern fluctuating every moment and performing performance prediction on the basis of the operation pattern. <P>SOLUTION: By monitoring an access log of a system, the operation pattern is extracted. Throughput and a response time in the operation pattern are calculated by a performance prediction simulation part. When prediction performance does not satisfy the requirement performance of the system, a target of a configuration change is specified by a determination policy, and recalculation is repeated by performance prediction simulation. The optimum configuration satisfying the requirement performance is exhibited to an output device as the proposal configuration. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、性能と価格を考慮したシステムを保守するシステム運用管理方法に関し、特にサービスレベルの低下やサーバダウンを予防するための技術に関する。 The present invention relates to a system operation management method for maintaining a system in consideration of performance and price, and more particularly to a technique for preventing a service level from being lowered and a server going down.

近年のシステムでは、ビジネス環境の急変に伴い、多くの既存ＩＴシステムとの統合や再編が行われており、同一システムの上で複数種類の業務が稼動している。その業務の割合(以降では業務パターンと呼ぶ)は、設計・開発段階で性能要件として定義し、その性能要件を満足するシステム構成を決定する（例えば、特許文献１参照）。 In recent years, with rapid changes in the business environment, integration and reorganization with many existing IT systems are performed, and a plurality of types of operations are operating on the same system. The business ratio (hereinafter referred to as business pattern) is defined as a performance requirement at the design / development stage, and a system configuration that satisfies the performance requirement is determined (for example, see Patent Document 1).

しかし、設計・開発段階で運用段階の業務パターンを正確に予測することは困難である。また、業務パターンは、時間変動，季節変動，機能変更等の要因により刻一刻と変動する。時間変動とは、朝にはログイン処理、夜にはバッチ処理など、時間により特定の業務が集中するような変動を指す。季節変動とは、月末や大型連休前などに、特定の業務が特定の時期にのみ集中するような変動を指す。機能変更とは、システム機能の追加や変更等により、システムに要求される負荷がダイナミックに変更されるような変動を指す。 However, it is difficult to accurately predict business patterns at the operation stage at the design / development stage. Also, the business pattern changes every moment due to factors such as time fluctuations, seasonal fluctuations, and function changes. Time fluctuation refers to fluctuation in which specific tasks are concentrated by time, such as login processing in the morning and batch processing in the evening. Seasonal fluctuation refers to fluctuations in which a specific job is concentrated only at a specific time, such as at the end of the month or before a large holiday. The function change refers to a change in which a load required for the system is dynamically changed by adding or changing a system function.

業務パターンの変動により、システムへの負荷の高い業務の割合が高くなると、システムの最大処理件数(限界スループット)が低下し、要求性能を満足しない状態が発生する。この状態でピーク時を迎えると、設計・開発段階での要求性能よりも前にサービスレベルの低下(応答時間が遅くなる状態)やサーバダウン(サーバが停止する状態)が発生する、という問題が発生する。 If the ratio of business with a high load on the system increases due to business pattern fluctuations, the maximum number of processes (marginal throughput) of the system decreases and a state that does not satisfy the required performance occurs. When the peak time is reached in this state, there is a problem that the service level decreases (response time becomes slow) or the server goes down (server stops) before the required performance at the design and development stage. Occur.

一方、業務パターンの変動により、システムへの負荷の高い業務の割合が低くなると、限界スループットが高くなり、要求性能を過剰に満足している状態が発生する。この状態ではシステムに過剰な投資を行っている、という問題が発生する。 On the other hand, if the ratio of tasks with a high load on the system decreases due to variations in task patterns, the marginal throughput increases and a state where the required performance is excessively satisfied occurs. In this state, there is a problem of excessive investment in the system.

計算機システムの業務量の変動やハードウェア構成の変更時の性能を予測する性能予測装置については、特許文献２に記載されている。その概要は、計算機システムの稼動情報とハードウェア構成情報とジョブ実行情報によりジョブ稼動状況をシミュレーションし、ユーザ指定による業務量の変更やハードウェア構成変更の指示により、ジョブ稼動状況の変化を予測するものである。 Patent Document 2 describes a performance prediction apparatus that predicts the performance of a computer system when the workload is changed or when the hardware configuration is changed. The outline is to simulate the job operation status based on computer system operation information, hardware configuration information, and job execution information, and to predict changes in job operation status based on user-specified changes in workload and hardware configuration changes. Is.

性能維持のための予防的対処を実現するＷＷＷサイトの性能監視装置については、特許文献３に記載されている。その概要は、ＷＷＷサイトにおけるアクセス量の時系列データを蓄積し、統計的な手法で将来的なアクセス変動を予測してアクセス毎の性能値を算出し、算出された性能値を元に定められた性能が維持できるか否かを判断するものである。 A WWW site performance monitoring apparatus that implements preventive measures for maintaining performance is described in Patent Document 3. The outline is determined based on the calculated performance value by accumulating time-series data of the access volume at the WWW site, predicting future access fluctuations using a statistical method, and calculating the performance value for each access. It is determined whether or not the performance can be maintained.

性能と価格を考慮したシステムを提案するシステム提案方法については、特許文献４に記載されている。その概要は、性能と価格を考慮したシステムを提案するシステム提案装置において、システム構成の定義内容を組み合わせたそれぞれの構成について応答時間及び合計価格を算出し、性能要件及び価格要件を満たすものを提案対象の構成として出力するものである。 A system proposing method for proposing a system considering performance and price is described in Patent Document 4. The outline is a system proposal device that proposes a system that considers performance and price, and calculates the response time and total price for each configuration that combines the definition contents of the system configuration, and proposes the one that satisfies the performance requirement and price requirement This is output as the target configuration.

特開平８−１３７７２５号公報JP-A-8-137725

特開平５−３２４３５８号公報JP-A-5-324358 特開２００２−２６８９２２号公報JP 2002-268922 A 特開２００２−１８３４１６号公報JP 2002-183416 A

上記の従来技術では業務パターンの変動の点について配慮がされていない。業務パターンが変動することにより、各サーバにおけるリソース消費量のパターンが変動するため、システムの最大処理件数(限界スループット)も変動してしまう。そのため限界スループットが要求性能を満足しない場合には、ピーク時にシステムのサービスレベルの低下やサーバダウンが発生するという問題があった。また要求性能を過剰に満足する場合には、システムに過剰な投資を行っているという問題があった。 In the above-described conventional technology, consideration is not given to the fluctuation of the business pattern. When the business pattern changes, the resource consumption pattern on each server also changes, so the maximum number of processes in the system (limit throughput) also changes. Therefore, when the limit throughput does not satisfy the required performance, there is a problem that the service level of the system is lowered or the server is down at the peak time. Further, when the required performance is excessively satisfied, there is a problem that excessive investment is made in the system.

本発明の目的は、刻一刻と変動する業務パターンを監視して、その業務パターンを元に性能予測を行うことにより、現在の業務パターンにおいて要求性能を満足する最適なシステム構成を提示することにある。 An object of the present invention is to present an optimum system configuration that satisfies the required performance in the current business pattern by monitoring the business pattern that changes every moment and predicting the performance based on the business pattern. is there.

上記目的を達成するために、運用管理対象システムのアクセスログを監視することにより業務パターンを抽出し、その業務パターンにおける応答時間およびスループットおよび各構成におけるリソース消費量を算出し、性能要件を満足する最適な構成を提案構成として提示するものである。 To achieve the above objectives, business patterns are extracted by monitoring the access logs of the operation management target system, response time and throughput in the business patterns and resource consumption in each configuration are calculated, and performance requirements are satisfied. An optimal configuration is presented as a proposed configuration.

本発明では、まずアクセスログを常時監視することにより、業務パターンを抽出する。 In the present invention, a business pattern is first extracted by constantly monitoring the access log.

次に、前記抽出した業務パターン及び現在のハードウェア構成情報を元に、性能予測シミュレーションを実行し、システムの応答時間とスループット、および各構成におけるリソース消費量を算出する。 Next, based on the extracted business pattern and current hardware configuration information, a performance prediction simulation is executed to calculate the response time and throughput of the system and the resource consumption in each configuration.

前記の算出により得られた応答時間とスループットが、要求性能を満足していることを判定し、不足または過剰であればハードウェア構成の拡張または縮小の警告メッセージを発行する。 It is determined that the response time and throughput obtained by the above calculation satisfy the required performance, and if it is insufficient or excessive, a warning message for hardware configuration expansion or contraction is issued.

リソース消費量の情報から、拡張または縮小するべきハードウェア構成を特定する条件を、判定ポリシーとして格納する。前記の算出により得られたリソース消費量と、判定ポリシーにより、システムの拡張または縮小の対象を特定する。 A condition for specifying a hardware configuration to be expanded or contracted from the resource consumption information is stored as a determination policy. The target of system expansion or reduction is specified based on the resource consumption obtained by the calculation and the determination policy.

ハードウェア構成の拡張または縮小をハードウェア構成情報にフィードバックし、性能予測シミュレーションによりシステムの拡張または縮小の効果を算出する。前記の算出を、要求性能を満足するまで繰り返す。 The expansion or reduction of the hardware configuration is fed back to the hardware configuration information, and the effect of the expansion or reduction of the system is calculated by performance prediction simulation. The above calculation is repeated until the required performance is satisfied.

本発明によれば、現在の業務パターンにおいて要求性能を満足する最適な構成を提示できるので、業務パターンの変動に合わせたシステム構成に最適化することができる。そのためシステムの性能不足を事前に検知することができるため、ピーク時におけるシステムのサービスレベルの低下やサーバダウンを事前に予防する効果がある。 According to the present invention, it is possible to present an optimal configuration that satisfies the required performance in the current business pattern, and therefore it is possible to optimize the system configuration in accordance with the change of the business pattern. As a result, it is possible to detect in advance system performance deficiencies, and this has the effect of preventing in advance a decrease in system service level and server down at peak times.

また、過剰なシステムリソースを検知することができるため、過剰なシステム投資を抑え、コストを低減する効果がある。 In addition, since excessive system resources can be detected, there is an effect of suppressing excessive system investment and reducing costs.

以下、本発明の実施例を図面を用いて具体的に説明する。 Embodiments of the present invention will be specifically described below with reference to the drawings.

図１は、本発明の実施例の構成を示す。図において、運用管理対象システム１０１は運用管理の対象となるシステム、運用管理装置１０２は運用管理対象システムの最適な構成を提示するための運用管理装置である。 FIG. 1 shows the configuration of an embodiment of the present invention. In the figure, an operation management target system 101 is a system to be an operation management target, and an operation management apparatus 102 is an operation management apparatus for presenting an optimum configuration of the operation management target system.

運用管理対象システム１０１は、複数のサーバ群で構成されており、各サーバ群で異なる業務を実行する。図１の例では、業務Ａがサーバ群１、業務Ｂがサーバ群１とサーバ群２とサーバ群３、業務Ｃがサーバ群１とサーバ群２を使用することを示している。各サーバ群はさらに複数のサーバで構成されており、同じサーバ群では同じ業務を実行することができ、負荷を分散する。情報収集エージェント１０３は、各サーバにおける業務の稼動情報を収集し、アクセスログ１０４に格納する。また情報収集エージェント１０３は、各サーバのＣＰＵやメモリおよびディスク等のサーバ固有情報を、サーバ性能情報１０５に格納する。アクセスログ１０４およびサーバ性能情報１０５は、ネットワークを用いて定期的に運用管理装置１０２に収集される。このネットワークはＬＡＮでも公衆網でもかまわない。したがって運用管理装置１０２は、運用管理対象システム１０１と同じサイトに存在しても良いし、公衆網を経由して異なるサイトに存在しても良い。 The operation management target system 101 includes a plurality of server groups, and each server group executes different tasks. In the example of FIG. 1, the business A uses the server group 1, the business B uses the server group 1, the server group 2, and the server group 3, and the business C uses the server group 1 and the server group 2. Each server group is composed of a plurality of servers, and the same server group can execute the same business and distribute the load. The information collection agent 103 collects operational information on each server and stores it in the access log 104. The information collection agent 103 stores server-specific information such as the CPU, memory, and disk of each server in the server performance information 105. The access log 104 and the server performance information 105 are periodically collected by the operation management apparatus 102 using a network. This network may be a LAN or a public network. Therefore, the operation management apparatus 102 may exist in the same site as the operation management target system 101, or may exist in a different site via a public network.

業務パターン抽出部１０６は、各サーバから収集されたアクセスログ１０４から、業務が実行されている割合（以降では業務パターン１０７と呼ぶ）を抽出する。また構成情報抽出部１０８は、各サーバから収集されたサーバ性能情報１０５からハードウェア構成情報１０９を抽出する。性能データベース１１０は、業務毎に消費するサーバのリソース量を格納する。性能予測シミュレーション部１１１は、運用管理対象システムの応答時間とスループットを算出する。システムの要求性能１１２は運用管理対象システムに要求される処理件数や応答時間およびピーク時の多重度の情報を格納する。判定ポリシー１１３は、拡張または縮小するハードウェア構成を特定する条件を格納する。要求性能判定部１１４は、予測された性能が要求性能を満足していることを判定する。出力装置１１５は、判定の結果およびハードウェア構成の変更箇所と効果を表示する。出力装置１１５と性能予測装置１０２は、同一の装置に実装しても良い。 The business pattern extraction unit 106 extracts a rate of business execution (hereinafter referred to as business pattern 107) from the access log 104 collected from each server. The configuration information extraction unit 108 extracts hardware configuration information 109 from the server performance information 105 collected from each server. The performance database 110 stores the amount of server resources consumed for each business. The performance prediction simulation unit 111 calculates the response time and throughput of the operation management target system. The required performance 112 of the system stores information on the number of processing required for the operation management target system, response time, and peak multiplicity. The determination policy 113 stores a condition for specifying a hardware configuration to be expanded or reduced. The required performance determination unit 114 determines that the predicted performance satisfies the required performance. The output device 115 displays the result of the determination and the change location and effect of the hardware configuration. The output device 115 and the performance prediction device 102 may be mounted on the same device.

以下、図２のフローチャートを参照しながら、性能最適化要求が入力されてから、性能最適化が終了するまでの本発明の動作を説明する。最適化要求は、１時間毎など一定の間隔で入力しても良いし、障害発生時やシステム管理者の手入力でも良い。ただし最適化要求の間隔を短くするとアクセスログ１０４やサーバ性能情報１０５を収集する負荷やハードウェア構成変更の負荷が高くなるため、適度な間隔で入力する。 Hereinafter, the operation of the present invention from when a performance optimization request is input until the performance optimization ends will be described with reference to the flowchart of FIG. Optimization requests may be input at regular intervals such as every hour, or may be input manually when a failure occurs or by a system administrator. However, if the interval between the optimization requests is shortened, the load for collecting the access log 104 and the server performance information 105 and the load for changing the hardware configuration are increased.

ステップ２０１は、各サーバに格納されているアクセスログ１０４を収集するステップである。ステップ２０２は、収集したアクセスログ１０４から業務パターン１０７を抽出するステップである。 Step 201 is a step of collecting the access log 104 stored in each server. Step 202 is a step of extracting the business pattern 107 from the collected access log 104.

ステップ２０３は、各サーバに格納されているサーバ性能情報１０５を収集するステップである。 Step 203 is a step of collecting server performance information 105 stored in each server.

ステップ２０４は、収集したサーバ性能情報１０５からハードウェア構成情報１０９を抽出するステップである。 Step 204 is a step of extracting hardware configuration information 109 from the collected server performance information 105.

ステップ２０５は、性能予測のシミュレーションを実行するステップである。このステップにおいて、現状または構成変更後の予測性能を算出する。性能予測シミュレーション部１１１は、前記で抽出された業務パターン１０７およびハードウェア構成情報１０９と、性能データベース１１０を元に、待ち行列理論や性能シミュレータを用いて、業務を同時に実行する人数（以降では多重度と呼ぶ）を変化したときの応答時間やスループットおよびリソース消費量を算出する。待ち行列理論や性能シミュレータでは、業務毎のリソース使用時間と発生頻度を入力することにより、応答時間とスループットおよび各構成におけるリソース消費量を算出することができる。業務毎のリソース使用時間は、性能データベース１１０に格納されている情報を入力する。業務毎の発生頻度は、抽出された業務パターン１０７と１から順に変化させた多重度を元に入力する。 Step 205 is a step of executing a performance prediction simulation. In this step, the current performance or the predicted performance after the configuration change is calculated. The performance prediction simulation unit 111 uses the queuing theory and performance simulator based on the business pattern 107 and hardware configuration information 109 extracted as described above and the performance database 110, and the number of people who perform business simultaneously (hereinafter, many Response time, throughput, and resource consumption are calculated. In the queue theory and the performance simulator, the response time and throughput and the resource consumption in each configuration can be calculated by inputting the resource usage time and the occurrence frequency for each business. Information stored in the performance database 110 is input as the resource usage time for each business. The occurrence frequency for each business is input based on the extracted business patterns 107 and the multiplicity changed in order from 1.

ステップ２０６は、前記で算出された予測性能がシステムの要求性能１１２を満足していることを判定するステップである。要求性能判定部１１４により、性能予測シミュレーション部１１１で算出された予測性能と、システムの要求性能１１２とを比較する。予測性能がシステムの要求性能１１２の許容範囲（例えば１００％〜１２０％）に収まっている場合には、ステップ２０７において出力装置１１５に現在のハードウェア構成を最適構成として提示し、性能最適化を終了する。 Step 206 is a step of determining that the predicted performance calculated above satisfies the required performance 112 of the system. The required performance determination unit 114 compares the predicted performance calculated by the performance prediction simulation unit 111 with the required performance 112 of the system. If the predicted performance is within an allowable range (for example, 100% to 120%) of the required performance 112 of the system, in step 207, the current hardware configuration is presented to the output device 115 as the optimal configuration, and performance optimization is performed. finish.

ステップ２０８は、システム管理者にハードウェア構成の拡張または縮小を警告するステップである。ステップ２０６の判定において許容範囲を下回っている場合、システムの要求性能を満たしていない状態であるため、システム管理者にハードウェア構成の拡張を要求する警告メッセージを発行する。一方、ステップ２０６の判定において許容範囲を上回っている場合、システムの要求性能を過剰に満たしている状態であるため、システム管理者にハードウェア構成の縮小を要求する警告メッセージを発行する。 Step 208 is a step of warning the system administrator about expansion or contraction of the hardware configuration. If it is below the allowable range in the determination in step 206, the system performance is not satisfied, so a warning message requesting the system administrator to expand the hardware configuration is issued. On the other hand, if it exceeds the allowable range in the determination in step 206, the system performance is excessively satisfied, so a warning message requesting the system administrator to reduce the hardware configuration is issued.

ステップ２０９は、システム管理者に予測性能の結果を表示するステップである。ステップ２０５で算出された予測性能およびシステムの限界時における各サーバのリソース消費量を表示する。また、システムの限界時における各サーバのリソース消費量と判定ポリシー１１３から、拡張および縮小するハードウェア構成の対象を特定し、システム管理者に提示する。 Step 209 is a step of displaying the result of the predicted performance to the system administrator. The predicted performance calculated in step 205 and the resource consumption of each server at the time of system limit are displayed. Further, the target of the hardware configuration to be expanded and contracted is specified from the resource consumption of each server at the time of the system limit and the determination policy 113, and presented to the system administrator.

ステップ２１０は、ハードウェア構成の拡張および縮小をハードウェア構成情報１０９にフィードバックするステップである。ハードウェア構成の拡張および縮小の対象は、システム管理者が対話的に指示しても良い。 Step 210 is a step of feeding back hardware configuration expansion and contraction to the hardware configuration information 109. The target of expansion and contraction of the hardware configuration may be instructed interactively by the system administrator.

運用管理対象システム１０１が、自動的に構成を変更する機能（システム管理者の手作業によるシステム構成の変更が不要な機能）を持つ場合には、ハードウェア構成を変更するステップ２１０と連動することにより、自動的に最適構成を維持することができる。 When the operation management target system 101 has a function for automatically changing the configuration (a function that does not require a system administrator to manually change the system configuration), it is linked with the step 210 for changing the hardware configuration. Thus, the optimum configuration can be automatically maintained.

前記でフィードバックされたハードウェア構成情報１０９を用いて、ステップ２０５の性能予測シミュレーションにおいて、システムの拡張または縮小後の予測性能を再計算する。前記の再計算を、予測性能が要求性能を満足するまで繰り返す。 Using the hardware configuration information 109 fed back as described above, the predicted performance after system expansion or reduction is recalculated in the performance prediction simulation of step 205. The recalculation is repeated until the predicted performance satisfies the required performance.

図３は、業務パターン抽出部１０６で作成されるテーブルの一例を示す。図３（ａ）のキーワードテーブル３０１は、業務が実行されたことを特定するキーワード３０４を格納する。図の例では、/homepage.htmlが業務Ａ（ホームページの表示)、/jsp/kensaku.jspが業務Ｂ（ファイルの検索)、/cgi/download.cgiが業務Ｃ（ファイルのダウンロード）を特定するキーワードである。図３（ｂ）のアクセスログテーブル３０２は、各サーバで採取されたアクセスログ１０４を収集して格納する。キーワード３０４がアクセスログテーブル３０２に記録されているとき、対応する業務が１件実行されたことを示す。したがってアクセスログテーブル３０２から、実行された業務を特定するキーワード３０４を検索して集計することにより、各業務の実行件数３０５が算出される。図３（ｃ）の業務パターンテーブル３０３は、各業務の実行件数３０５と、各業務が実行している割合を表す業務パターン１０７を格納する。図の例では２０件の業務が実行され、その内訳は、業務Ａ（ホームページの表示）が５０％（１０件）、業務Ｂ（ファイルの検索）が２０％（４件）、業務Ｃ（ファイルのダウンロード）が１５％（３件）であったことを示す。 FIG. 3 shows an example of a table created by the business pattern extraction unit 106. The keyword table 301 in FIG. 3A stores a keyword 304 that specifies that a task has been executed. In the example in the figure, /homepage.html identifies job A (homepage display), /jsp/kensaku.jsp identifies job B (file search), and /cgi/download.cgi identifies job C (file download). It is a keyword. The access log table 302 in FIG. 3B collects and stores the access log 104 collected by each server. When the keyword 304 is recorded in the access log table 302, it indicates that one corresponding job has been executed. Therefore, the number of executions 305 of each business is calculated by searching the access log table 302 for the keywords 304 that specify the business that has been executed and totaling them. The business pattern table 303 in FIG. 3C stores the number of executions 305 of each business and a business pattern 107 representing the rate at which each business is executed. In the example shown in the figure, 20 jobs are executed, and the breakdown is 50% (10) for job A (homepage display), 20% (4) for job B (file search), and job C (file). Download) was 15% (3 cases).

図４は、構成情報抽出部１０８で作成されるテーブルの一例を示す。図４（ａ）のサーバ性能情報１０５は、各サーバに格納されている。図４（ｂ）のハードウェア構成情報１０９は、各サーバのサーバ性能情報１０５を収集して作成され、各サーバ群を構成するサーバ台数４０１と、ＣＰＵの種類４０２と、ＣＰＵの数４０３と、メモリ量４０４と、ディスクの転送能力を示すディスク性能４０５が格納されている。図の例ではサーバ群１は４台のサーバで構成されており、ＣＰＵの種類はＣＰＵ名１、ＣＰＵ数は２個、メモリ量は２ＧＢ、ディスク性能は１ＭＢ／秒であることを示している。 FIG. 4 shows an example of a table created by the configuration information extraction unit 108. The server performance information 105 in FIG. 4A is stored in each server. The hardware configuration information 109 of FIG. 4B is created by collecting the server performance information 105 of each server, and the number of servers 401, the type of CPU 402, the number of CPUs 403 constituting each server group, A memory amount 404 and a disk performance 405 indicating the disk transfer capability are stored. In the example shown in the figure, the server group 1 is composed of four servers, the CPU type is CPU name 1, the number of CPUs is 2, the amount of memory is 2 GB, and the disk performance is 1 MB / second. .

図５は、各業務を１件実行するために必要な各サーバ群のリソース消費量を格納する性能データベース１１０の一例を示す。このリソース消費量は設計・開発段階にあらかじめ測定して、性能データベースに格納する。図の例では、サーバの構成５０１、サーバ群１で消費されるＣＰＵ使用時間５０２、メモリ使用量５０３、ディスク使用時間５０４、サーバ群２で消費されるＣＰＵ使用時間５０５、メモリ使用量５０６、ディスク使用時間５０７が格納されている。業務Ａでサーバ構成にＣＰＵ名１を使用した場合には、サーバ群１において、ＣＰＵを０．３３秒、メモリを１０ＫＢ、ディスクを０．２５秒消費することを示している。 FIG. 5 shows an example of the performance database 110 that stores the resource consumption of each server group necessary to execute one task. This resource consumption is measured in advance in the design / development stage and stored in the performance database. In the illustrated example, the server configuration 501, CPU usage time 502 consumed by the server group 1, memory usage 503, disk usage time 504, CPU usage time 505 consumed by server group 2, memory usage 506, disk The usage time 507 is stored. When the CPU name 1 is used for the server configuration in the business A, the server group 1 consumes 0.33 seconds for the CPU, 10 KB for the memory, and 0.25 seconds for the disk.

図６は、待ち行列理論を用いた性能予測シミュレーションの算出モデルを示す。
サーバ群毎に待ち行列が発生する。各待ち行列の窓口数は、そのサーバ群のサーバ台数である。業務毎の発生頻度は、業務パターン１０７と１から順に変化させた多重度を元に入力する。多重度２０人の場合、５０％の１０人が業務Ａ、２０％の４人が業務Ｂ、１５％の３人が業務Ｃを同時に行っている。業務毎のリソース消費量は、性能データベース１１０を用いる。この算出モデルにより、各構成における応答時間やスループットおよびリソース消費量が算出される。各構成における応答時間の合計が、システムの応答時間を示す。 FIG. 6 shows a calculation model for performance prediction simulation using queuing theory.
A queue is generated for each server group. The number of windows in each queue is the number of servers in the server group. The frequency of occurrence for each business is input based on the multiplicity that is changed sequentially from the business patterns 107 and 1. In the case of multiplicity of 20 people, 50% of 10 people are doing business A, 20% of 4 people are doing business B, and 15% of 3 people are doing business C at the same time. The performance database 110 is used for resource consumption for each business. With this calculation model, the response time, throughput, and resource consumption in each configuration are calculated. The total response time in each configuration indicates the response time of the system.

図７は、応答時間とスループットの算出結果一例を示す。多重度７０１は運用管理対象システム１０１において業務を同時に実行する人数、応答時間７０２は多重度７０１を変化したときの応答時間（秒）、スループット７０３は多重度７０１を変化したときのスループット（件／秒）を示す。変曲値７０５は、リソースの競合により応答時間が急激に変化する多重度を示す。多重度７０１が増加すると、応答時間７０２は変曲値７０５までは緩やかに上昇し、変曲値７０５を越えると急激に上昇する。一方スループット７０３は、変曲値７０５までは多重度７０１に比例して増加し、変曲値７０５を越えると一定値（限界スループット７０４）に近づく。この関係グラフより、運用管理対象システム１０１に対して業務パターン１０７で業務を実行した場合には、変曲値７０５でシステムの限界となることを示している。 FIG. 7 shows an example of calculation results of response time and throughput. The multiplicity 701 is the number of people who simultaneously execute tasks in the operation management target system 101, the response time 702 is the response time (seconds) when the multiplicity 701 is changed, and the throughput 703 is the throughput when the multiplicity 701 is changed (case / Seconds). The inflection value 705 indicates the multiplicity at which the response time changes rapidly due to resource contention. When the multiplicity 701 increases, the response time 702 gradually increases up to the inflection value 705 and rapidly increases when the inflection value 705 is exceeded. On the other hand, the throughput 703 increases in proportion to the multiplicity 701 until the inflection value 705, and approaches a constant value (limit throughput 704) when the inflection value 705 is exceeded. From this relationship graph, it is shown that when a job is executed with respect to the operation management target system 101 with the job pattern 107, the inflection value 705 is a system limit.

図８は、ステップ２０５で算出されるリソース消費量の一例を示す。図の例では多重度毎に、サーバ群１で消費されるＣＰＵ使用率８０１、メモリ使用率８０２、ディスク使用率８０３、サーバ群２で消費されるＣＰＵ使用率８０４、メモリ使用率８０５、ディスク使用率８０６が算出されている。 FIG. 8 shows an example of the resource consumption calculated in step 205. In the example of the figure, the CPU usage rate 801, the memory usage rate 802, the disk usage rate 803, the CPU usage rate 804 consumed by the server group 2, the memory usage rate 805, and the disk usage for each multiplicity. A rate 806 is calculated.

図９は、ハードウェア構成を変更する条件を格納する判定ポリシー１１３の一例を示す。この例では、ＣＰＵ使用率の判定条件９０１、メモリ使用率の判定条件９０２、ディスク使用率の判定条件９０３、判定結果９０４が格納されている。図８の例では、サーバ群１のリソース使用率８０７が高く、図９の判定ポリシー項番１の条件に合致するため、サーバ群１のサーバ台数を増加する、と判定する。またサーバ群２のＣＰＵ使用率８０８が低く、判定ポリシー項番６の条件に合致するため、サーバ群２のＣＰＵ数を減少する、と判定する。 FIG. 9 shows an example of a determination policy 113 that stores conditions for changing the hardware configuration. In this example, a CPU usage rate determination condition 901, a memory usage rate determination condition 902, a disk usage rate determination condition 903, and a determination result 904 are stored. In the example of FIG. 8, since the resource usage rate 807 of the server group 1 is high and matches the condition of the determination policy item number 1 of FIG. 9, it is determined that the number of servers in the server group 1 is increased. Further, since the CPU usage rate 808 of the server group 2 is low and matches the condition of the determination policy item number 6, it is determined that the number of CPUs of the server group 2 is reduced.

図１０は、出力装置１１５で表示される判定の結果およびハードウェア構成の変更箇所と効果の一例を示す。この例では、現状の業務パターン１００１、現状のハードウェア構成１００２、現状の予測性能１００３、現状のシステム限界時における各サーバのリソース消費量１００４、性能最適化後のハードウェア構成１００５、性能最適化後の予測性能１００６、性能最適化後のシステム限界時における各サーバのリソース消費量１００７、システム管理者への警告メッセージ１００８が表示されている。図の例では、サーバ群１のサーバ台数を増加するメッセージと、サーバ群２のＣＰＵ数を減少するメッセージを発行し、性能最適化後の予測性能に与える効果を表示している。 FIG. 10 shows an example of the determination result displayed on the output device 115 and the change location and effect of the hardware configuration. In this example, the current business pattern 1001, the current hardware configuration 1002, the current predicted performance 1003, the resource consumption 1004 of each server at the current system limit, the hardware configuration 1005 after performance optimization, and the performance optimization A predicted performance 1006 later, a resource consumption 1007 of each server at the time of system limit after performance optimization, and a warning message 1008 to the system administrator are displayed. In the example of the figure, a message for increasing the number of servers in the server group 1 and a message for decreasing the number of CPUs in the server group 2 are issued, and the effect on the predicted performance after performance optimization is displayed.

以上の様に、刻一刻と変動する業務パターンを監視して、その業務パターンを元に性能予測を行うことにより、現在の業務パターンにおいて要求性能を満足する最適なシステム構成を提示することができる。これにより、サービスレベルの低下やサーバダウンを事前に予防することができる。また、過剰なシステム投資を抑え、コストを低減することができる。 As described above, by monitoring a business pattern that changes every moment and performing performance prediction based on the business pattern, it is possible to present an optimal system configuration that satisfies the required performance in the current business pattern. . As a result, it is possible to prevent a decrease in service level and server down in advance. Moreover, excessive system investment can be suppressed and cost can be reduced.

本発明の概要である。1 is an overview of the present invention. 本発明の一実施例のフローチャートである。It is a flowchart of one Example of this invention. 業務パターンの抽出の例である。It is an example of extraction of a business pattern. ハードウェア構成情報の抽出の例である。It is an example of extraction of hardware configuration information. 性能データベースの例である。It is an example of a performance database. 性能予測シミュレーションモデルの例である。It is an example of a performance prediction simulation model. 応答時間とスループットの算出結果の例である。It is an example of a calculation result of response time and throughput. リソース消費量の算出結果の例である。It is an example of the calculation result of resource consumption. 判定ポリシーの例である。It is an example of a judgment policy. ハードウェア構成の表示の例である。It is an example of a display of a hardware configuration.

Explanation of symbols

１０１：運用管理対象システム、１０２：運用管理装置、１０３：情報収集エージェント、
１０４：アクセスログ、１０５：サーバ性能情報、１０６：業務パターン抽出部、
１０７：業務パターン、１０８：構成情報抽出部、１０９：ハードウェア構成情報、
１１０：性能データベース、１１１：性能予測シミュレーション部、
１１２：システム要求性能、１１３：判定ポリシー、１１４：要求性能判定部、
１１５：出力装置
101: Operation management target system, 102: Operation management apparatus, 103: Information collection agent,
104: access log, 105: server performance information, 106: business pattern extraction unit,
107: business pattern, 108: configuration information extraction unit, 109: hardware configuration information,
110: Performance database, 111: Performance prediction simulation unit,
112: System required performance 113: Determination policy 114: Required performance determination unit
115: Output device

Claims

In system operation management that takes into account changes in business patterns, means for collecting system access logs and extracting business patterns, means for extracting hardware configuration information from the system configuration, business patterns and hardware extracted above Means for calculating response time and throughput and resource consumption of each configuration from the hardware configuration information by performance prediction simulation, means for determining that the required performance is satisfied from the calculated response time and throughput, and system management An operation management system comprising means for warning a user of expansion or contraction of a hardware configuration.

The apparatus according to claim 1, further comprising means for specifying an object of a hardware configuration to be expanded or reduced and presenting an optimal configuration from the resource consumption of each configuration calculated in the performance prediction simulation. Operation management system.