JP2014178865A

JP2014178865A - Bottleneck analysis device, method, and program

Info

Publication number: JP2014178865A
Application number: JP2013052200A
Authority: JP
Inventors: Toshiyuki Sakai; 俊之坂井; Masayoshi Umeda; 昌義梅田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-03-14
Filing date: 2013-03-14
Publication date: 2014-09-25

Abstract

PROBLEM TO BE SOLVED: To provide a bottleneck analysis device, method, and program capable of easily identifying a bottleneck factor in a decentralized processing system.SOLUTION: A bottleneck analysis device for servers configuring a decentralized processing system uses server resource information to calculate a maldistribution factor score showing a degree of maldistribution in processing of the decentralized processing system for each server. Also, the bottleneck analysis device uses failure information to calculate a failure factor score showing a degree of a failure occurring in each server. Then, the bottleneck analysis device calculates for each server a server influence degree that is {(the number of requests to the server÷the number of requests to the entire decentralized processing system)+(the number of clients whose requests are received with the server÷the total number of clients)}, and calculates a bottleneck score that is {the server influence degree of the server×(the maldistribution factor score of the server+the failure factor score of the server)}.

Description

本発明は、分散処理システムにおけるボトルネック分析装置、ボトルネック分析方法、および、プログラムに関する。 The present invention relates to a bottleneck analysis device, a bottleneck analysis method, and a program in a distributed processing system.

従来、クライアントからのリクエストを複数のサーバ（ワーカ）に分散させて処理する分散処理システムがある。このような分散処理システムにおいて、処理のボトルネックとなる要因を発見し、その要因のシステムへの影響度を把握することはシステムを円滑に運用する上で非常に重要である。 Conventionally, there is a distributed processing system that processes a request from a client by distributing it to a plurality of servers (workers). In such a distributed processing system, it is very important to find a factor that becomes a bottleneck of processing and grasp the influence of the factor on the system in order to smoothly operate the system.

特開２０１２−１６８７０２号公報JP 2012-168702 A 特開２００１−１９５２８５号公報JP 2001-195285 A

ここで、分散処理システムはそれぞれのサーバが連携して処理する。このため、分散処理システムにおけるボトルネックは、分散処理システムを構成するサーバそれぞれの処理能力の低下や故障等のみならず、サーバ間における処理の偏りによっても発生する。しかし、前記した従来技術（特許文献１、特許文献２参照）は、サーバそれぞれの処理能力の低下や故障等、各サーバを個別にみた際のボトルネックしか特定できず、ボトルネックの要因特定としては不十分であった。そこで、本発明は、分散処理システムにおけるサーバ間における処理の偏りを考慮し、ボトルネックの要因特定を十分に行うことを目的とする。 Here, the distributed processing system performs processing in cooperation with each server. For this reason, the bottleneck in the distributed processing system occurs not only due to a decrease in processing capacity or failure of each server constituting the distributed processing system, but also due to processing bias among servers. However, the above-described prior art (see Patent Document 1 and Patent Document 2) can specify only the bottleneck when each server is individually viewed, such as a decrease in processing capacity or failure of each server. Was insufficient. In view of this, an object of the present invention is to sufficiently identify the cause of a bottleneck in consideration of processing bias among servers in a distributed processing system.

前記した課題を解決するため、本発明は、分散処理システムを構成するサーバのサーバリソース情報、前記サーバへのリクエスト数、および、前記サーバでリクエストを受け付けたクライアント数を、前記サーバごとに対応付けて示したサーバ情報を記憶するサーバ情報記憶部と、前記サーバリソース情報もしくは前記サーバリソース情報と前記サーバの故障情報を用いて、前記サーバごとに、当該サーバの前記分散処理システムにおける処理のボトルネックの各要因スコアを計算する要因スコア計算部と、前記サーバ情報に示される、前記サーバへのリクエスト数、および、前記サーバでリクエストを受け付けたクライアント数の少なくとも一方を用いて、前記サーバごとに、前記分散処理システムにおける当該サーバのリクエストの集中の度合いを示すサーバ影響度を計算する影響度計算部と、前記サーバごとに、前記各要因スコアの合計値を、前記サーバ影響度で重み付けをした値であるボトルネックスコアを計算するスコア統合部とを備え、前記要因スコア計算部は、前記サーバリソース情報を用いて、前記サーバごとに、当該サーバの前記分散処理システムにおける処理の偏りの度合いを示す偏りの要因スコアを計算する偏り要因スコア計算部と、前記故障情報を用いて、前記サーバごとに、前記サーバで発生している故障の度合いを示す故障の要因スコアの計算を行う故障要因スコア計算部、および、前記サーバリソース情報を用いて、前記サーバごとに、所定の閾値を用いた条件判定式により条件判定を行い、その判定結果により閾値の要因スコアの計算する閾値要因スコア計算部の少なくとも一方とを備えることを特徴とするボトルネック分析装置とした。 In order to solve the above-described problems, the present invention associates the server resource information of the servers that constitute the distributed processing system, the number of requests to the server, and the number of clients that have received the request with the server for each server. A server bottleneck of processing in the distributed processing system of the server for each server by using the server information storage unit that stores the server information shown in the above, and the server resource information or the server resource information and the server failure information. For each server, using at least one of the number of requests to the server indicated in the server information and the number of clients that received a request at the server, Concentration of requests of the server in the distributed processing system An impact calculation unit that calculates a server influence degree indicating a degree; and a score integration unit that calculates a bottleneck score that is a value obtained by weighting the total value of each factor score for each server with the server influence degree; The factor score calculation unit calculates a bias factor score indicating a degree of processing bias in the distributed processing system of the server for each server using the server resource information. And, using the failure information, for each server, a failure factor score calculation unit that calculates a failure factor score indicating the degree of failure occurring in the server, and using the server resource information, For each of the servers, a threshold is required for performing a condition determination using a condition determination formula using a predetermined threshold, and calculating a threshold factor score based on the determination result. And a bottleneck analysis apparatus, characterized in that it comprises at least one of the score calculation unit.

このようなボトルネック分析装置によれば、分散処理システムにおける各サーバ間の処理の偏りを考慮したボトルネック分析を行うことができる。さらに、ボトルネック分析装置は、当該サーバのボトルネックスコアを計算するにあたり、サーバごとに、前記分散処理システムにおける当該サーバのリクエストの集中の度合いを示す値であるサーバ影響度も用いる。これにより、ボトルネックスコアの値は、分散処理システムにおいて、当該サーバがボトルネックになった場合の影響度を反映した値とすることができる。これにより、ボトルネック分析装置のユーザは、このボトルネックスコアの値を確認することで、分散処理システムのどのサーバがボトルネックとなっているか、またそのサーバがボトルネックになることの影響度はどの程度かを把握しやすくなる。これにより、ボトルネック分析装置のユーザは、分散処理システムにおけるボトルネックの要因特定を十分に行うことができる。 According to such a bottleneck analysis apparatus, it is possible to perform a bottleneck analysis in consideration of processing bias among servers in a distributed processing system. Further, when calculating the bottleneck score of the server, the bottleneck analysis apparatus also uses a server influence degree that is a value indicating the degree of concentration of requests of the server in the distributed processing system for each server. Thus, the value of the bottleneck score can be a value reflecting the degree of influence when the server becomes a bottleneck in the distributed processing system. As a result, the user of the bottleneck analysis device confirms the value of this bottleneck score to determine which server of the distributed processing system is the bottleneck and the degree of influence of that server becoming the bottleneck. It becomes easy to grasp how much. Thereby, the user of the bottleneck analysis apparatus can sufficiently specify the cause of the bottleneck in the distributed processing system.

本発明によれば、分散処理システムにおけるボトルネックの要因特定を十分に行うことができる。 According to the present invention, it is possible to sufficiently identify the cause of the bottleneck in the distributed processing system.

図１は、分散処理システムを示した図である。FIG. 1 is a diagram showing a distributed processing system. 図２は、ボトルネック分析装置の構成を示した図である。FIG. 2 is a diagram showing the configuration of the bottleneck analyzer. 図３は、ボトルネック分析装置の処理手順を示した図である。FIG. 3 is a diagram showing a processing procedure of the bottleneck analyzer. 図４は、サーバＡのサーバ情報を例示した図である。FIG. 4 is a diagram illustrating server information of the server A. 図５は、格納ラック情報およびラベル情報を例示した図である。FIG. 5 is a diagram illustrating storage rack information and label information. 図６は、スコア計算用情報を例示した図である。FIG. 6 is a diagram illustrating score calculation information. 図７は、スコア計算用情報を例示した図である。FIG. 7 is a diagram illustrating score calculation information. 図８は、偏りの要因スコア、閾値の要因スコア、および、故障の要因スコアの値を例示した図である。FIG. 8 is a diagram illustrating values of bias factor scores, threshold factor scores, and failure factor scores. 図９は、サーバ影響度を例示した図である。FIG. 9 is a diagram illustrating the server influence degree. 図１０は、ボトルネックスコア、サーバ影響度、偏りの要因スコア、閾値の要因スコア、および、故障の要因スコアの値を例示した図である。FIG. 10 is a diagram illustrating values of a bottleneck score, a server influence degree, a bias factor score, a threshold factor score, and a failure factor score. 図１１は、サーバ（ワーカ）に関する情報の表示画面例である。FIG. 11 is an example of a display screen for information related to a server (worker). 図１２は、サーバ（マスタ）に関する情報の表示画面例である。FIG. 12 is an example of a display screen for information on the server (master). 図１３は、サーバ情報について、要因スコアの種別ごとに、その要因スコアの計算の元となった各項目の値の一覧を示した表示画面例である。FIG. 13 is an example of a display screen showing a list of values of each item that is the basis of calculation of the factor score for each type of factor score for the server information. 図１４は、ボトルネック箇所となっている可能性が高いサーバにおける各処理（処理１〜処理５）のレスポンス時間を示した表示画面例である。FIG. 14 is an example of a display screen showing the response time of each process (Process 1 to Process 5) in a server that is likely to be a bottleneck. 図１５は、サーバ情報記憶部に格納される各サーバのサーバ情報のうち、所定の項目に関する値（各要因スコアの元になった値）を、他のサーバにおける値や、分散処理システム全体の平均値とともに表示した表示画面例である。FIG. 15 shows the server information of each server stored in the server information storage unit for the value related to a predetermined item (value based on each factor score), the value in another server, and the entire distributed processing system. It is an example of the display screen displayed with the average value. 図１６は、所定のサーバへのリクエスト状況を示した表示画面例である。FIG. 16 is an example of a display screen showing a request status to a predetermined server. 図１７は、ボトルネック分析装置の機能を実現するためのプログラムを実行するコンピュータを示す図である。FIG. 17 is a diagram illustrating a computer that executes a program for realizing the functions of the bottleneck analysis apparatus.

以下、図面を参照しながら、本発明の実施の形態を説明する。まず、本実施の形態のボトルネック分析装置１０の分析対象である分散処理システムについて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. First, a distributed processing system that is an analysis target of the bottleneck analysis apparatus 10 of the present embodiment will be described.

図１に示すように、分散処理システムは、複数のサーバ２０から構成され、例えば、マスタのサーバ２０と、ワーカのサーバ２０とを備える。マスタのサーバ２０は、リクエストの送信元のクライアント３０に対し、どのサーバ２０（どのワーカのサーバ２０）にアクセスすればよいかをネットワークを介して指示する。そして、この指示を受信したクライアント３０は、この指示されたサーバ２０（ワーカ）にアクセスする。その後、ワーカのサーバ２０は、クライアント３０からのリクエストに基づき所定の処理を実行し、その実行結果をリクエストの送信元のクライアント３０へ返す。ここで、ボトルネック分析装置１０は、分散処理システムのボトルネックの分析に必要な情報、例えば、各サーバ２０の故障情報、サーバリソース情報、システムが提供する各種情報等（以上まとめて「サーバ情報」という）を、ネットワークを介して取得し、サーバ情報記憶部１３１に格納する。そして、ボトルネック分析装置１０は、このサーバ情報を参照して、分散処理システムにおける各サーバ２０の処理の偏りを考慮したボトルネックの分析を行う。 As illustrated in FIG. 1, the distributed processing system includes a plurality of servers 20 and includes, for example, a master server 20 and a worker server 20. The master server 20 instructs which server 20 (which worker server 20) should be accessed via the network to the client 30 that has transmitted the request. The client 30 that has received this instruction accesses the instructed server 20 (worker). Thereafter, the worker server 20 executes predetermined processing based on the request from the client 30 and returns the execution result to the client 30 that is the transmission source of the request. Here, the bottleneck analysis apparatus 10 includes information necessary for analyzing the bottleneck of the distributed processing system, such as failure information of each server 20, server resource information, various information provided by the system, etc. Is obtained via the network and stored in the server information storage unit 131. Then, the bottleneck analyzer 10 refers to this server information and analyzes the bottleneck in consideration of the processing bias of each server 20 in the distributed processing system.

このボトルネック分析装置１０の構成例を説明する。図２に示すように、ボトルネック分析装置１０は、記憶部１３、制御部１２、および、入出力部（入力部および出力部）１１を備える。 A configuration example of the bottleneck analyzer 10 will be described. As illustrated in FIG. 2, the bottleneck analysis apparatus 10 includes a storage unit 13, a control unit 12, and an input / output unit (input unit and output unit) 11.

記憶部１３は、サーバ情報記憶部１３１と、スコア記憶部１３２とを備える。サーバ情報記憶部１３１は、前記したとおり、サーバ情報を記憶する。このサーバ情報は、各サーバ２０の故障情報、サーバリソース（リソース）情報、システムが提供する各種情報等を含み、このサーバ情報は、制御部１２がボトルネックスコアや、サーバ影響度を計算する際に参照される。スコア記憶部１３２は、制御部１２により計算されたサーバ２０それぞれのボトルネックスコアや、当該サーバ２０がボトルネックになったときの影響度等（詳細は後記）を記憶する。入出力部１１は、外部装置からボトルネックスコアや、サーバ影響度を計算するときに用いる各種情報の入力を受け付けたり、制御部１２による処理結果を外部装置へ出力したりする。 The storage unit 13 includes a server information storage unit 131 and a score storage unit 132. As described above, the server information storage unit 131 stores server information. This server information includes failure information of each server 20, server resource (resource) information, various information provided by the system, and the like. This server information is used when the control unit 12 calculates a bottleneck score and a server influence level. To be referenced. The score storage unit 132 stores the bottleneck score of each server 20 calculated by the control unit 12, the degree of influence when the server 20 becomes a bottleneck, and the like (details will be described later). The input / output unit 11 receives input of various information used when calculating the bottleneck score and server influence degree from the external device, and outputs the processing result by the control unit 12 to the external device.

制御部１２は、データ収集部１２０、データ統合部１２１、データ選択部１２２、要因スコア計算部１２３、影響度計算部１２７、スコア統合部１２８および表示処理部１２９を備え、これらにより、サーバ２０それぞれのボトルネックスコアや、当該サーバ２０がボトルネックになったときの影響度等を計算する。また、この計算結果を表示画面上に表示させる。 The control unit 12 includes a data collection unit 120, a data integration unit 121, a data selection unit 122, a factor score calculation unit 123, an influence degree calculation unit 127, a score integration unit 128, and a display processing unit 129. And the degree of influence when the server 20 becomes a bottleneck. Further, the calculation result is displayed on the display screen.

次に、図３を用いて、図２のボトルネック分析装置１０の処理手順を説明する。まず、ボトルネック分析装置１０のデータ収集部１２０は、システム（分散処理システム）の各サーバ２０の各種ログ、統計情報等を取得する。また、各サーバ２０の故障情報、リソース情報も取得する。そして、データ収集部１２０は、取得した情報をサーバ情報記憶部１３１に格納する（Ｓ１：各種情報取得、格納）。 Next, the processing procedure of the bottleneck analyzer 10 of FIG. 2 will be described with reference to FIG. First, the data collection unit 120 of the bottleneck analysis apparatus 10 acquires various logs, statistical information, and the like of each server 20 of the system (distributed processing system). Also, failure information and resource information of each server 20 are acquired. Then, the data collection unit 120 stores the acquired information in the server information storage unit 131 (S1: Various information acquisition and storage).

次に、データ選択部１２２は、入出力部１１経由でボトルネックの分析対象とする時間帯の指示入力を受け付けると、サーバ情報記憶部１３１に格納されるサーバ情報のうち、分析対象の時間帯の情報を取得する（Ｓ２：分析対象の時間帯の情報取得）。 Next, when the data selection unit 122 receives an instruction input of the time zone to be analyzed for bottleneck via the input / output unit 11, the time zone to be analyzed among the server information stored in the server information storage unit 131. (S2: Acquisition of information on the time zone to be analyzed).

そして、要因スコア計算部１２３は、Ｓ２で取得した各情報に関して、所定の閾値や各サーバ２０における偏りを考慮し、各サーバ２０がボトルネック要因となる可能性を示すスコアを計算する（Ｓ３：ボトルネック要因となる可能性の計算）。そして、要因スコア計算部１２３は、計算結果を、スコア記憶部１３２に格納する。 Then, the factor score calculation unit 123 calculates a score indicating that each server 20 may become a bottleneck factor for each piece of information acquired in S2 in consideration of a predetermined threshold value and a bias in each server 20 (S3: Calculation of potential bottleneck factors). Then, the factor score calculation unit 123 stores the calculation result in the score storage unit 132.

また、影響度計算部１２７は、Ｓ２で取得した情報のうち、各種ログに含まれるリクエスト情報（各クライアント３０からサーバ２０へのリクエスト数等）を元に、各サーバ２０におけるボトルネック要因の影響度（サーバ影響度）を表すスコアを計算する（Ｓ４：ボトルネック要因の影響度の計算）。つまり、当該サーバ２０がボトルネックとなることで、リクエストの送信元のクライアント３０にどの程度影響を及ぼすかを計算する。影響度計算部１２７は、計算したサーバ影響度を、スコア記憶部１３２に格納する。 In addition, the influence calculation unit 127 determines the influence of the bottleneck factor in each server 20 based on the request information (the number of requests from each client 30 to the server 20) included in various logs among the information acquired in S2. A score representing degree (server influence degree) is calculated (S4: calculation of influence degree of bottleneck factor). In other words, it is calculated how much the server 20 becomes a bottleneck and affects the client 30 that is the source of the request. The influence degree calculation unit 127 stores the calculated server influence degree in the score storage unit 132.

その後、スコア統合部１２８は、Ｓ３，Ｓ４で計算された情報を統合し、サーバ２０ごとに、当該サーバ２０がボトルネック箇所である可能性を表すスコア（ボトルネックスコア）を計算する（Ｓ５：各スコアの統合）。そして、スコア統合部１２８は、計算したボトルネックスコアをスコア記憶部１３２に格納する。 Thereafter, the score integration unit 128 integrates the information calculated in S3 and S4, and calculates, for each server 20, a score (bottleneck score) indicating the possibility that the server 20 is a bottleneck location (S5: Integration of each score). Then, the score integration unit 128 stores the calculated bottleneck score in the score storage unit 132.

そして、表示処理部１２９は、データ選択部１２２等からの指示に基づき、Ｓ３〜Ｓ５で計算したスコア値を示す表示情報を作成する（Ｓ６：表示情報の作成）。例えば、表示処理部１２９は、Ｓ３〜Ｓ５で計算した値を示す表示情報（表示画面）をＨＴＭＬ（HyperText Markup Language）等により作成する。 And the display process part 129 produces the display information which shows the score value calculated by S3-S5 based on the instruction | indication from the data selection part 122 grade | etc., (S6: Creation of display information). For example, the display processing unit 129 creates display information (display screen) indicating the values calculated in S3 to S5 using HTML (HyperText Markup Language) or the like.

その後、表示処理部１２９は、Ｓ６で作成された表示情報を要求に応じて表示する（Ｓ７：出力）。 Thereafter, the display processing unit 129 displays the display information created in S6 in response to the request (S7: output).

図２に戻って、ボトルネック分析装置１０の各構成要素を詳細に説明する。 Returning to FIG. 2, each component of the bottleneck analyzer 10 will be described in detail.

記憶部１３のサーバ情報記憶部１３１は、分散処理システムを構成するサーバ２０の故障情報、サーバ２０のサーバリソース情報（リソース情報）、システムが提供するサーバ２０に関する各種情報を、サーバ２０ごとに対応付けて示したサーバ情報を記憶する。この故障情報は、例えば、/var/log/messagesやmcelog等の故障情報を示すログファイルや、smartctl等の故障を診断するコマンドにより得られた出力結果等である。リソース情報は、例えば、vmstat、netstat等のリソース情報を出力するコマンドにより得られた出力結果や、/proc/net/dev、/proc/meminfo等のOS標準で出力される統計情報である。システムが提供する各種情報は、各サーバ２０におけるリクエスト数、リクエストを受け付けたクライアント数、各種レスポンス等の性能情報や、保持データ量、キャッシュヒット率等である。このシステムが提供する各種情報は、ボトルネック分析装置１０が各サーバ２０へシステムログや統計情報の取得コマンドを送信することにより取得される。 The server information storage unit 131 of the storage unit 13 corresponds to the failure information of the server 20 constituting the distributed processing system, the server resource information (resource information) of the server 20, and various information related to the server 20 provided by the system for each server 20. The server information indicated is stored. This failure information is, for example, a log file indicating failure information such as / var / log / messages or mcelog, or an output result obtained by a command for diagnosing a failure such as smartctl. The resource information is, for example, an output result obtained by a command that outputs resource information such as vmstat or netstat, or statistical information output based on an OS standard such as / proc / net / dev or / proc / meminfo. Various types of information provided by the system are the number of requests in each server 20, the number of clients that have accepted the request, performance information such as various responses, the amount of retained data, the cache hit rate, and the like. Various types of information provided by this system are acquired when the bottleneck analysis apparatus 10 transmits a system log or statistical information acquisition command to each server 20.

なお、このサーバ情報は、各サーバ２０に接続されるスイッチ（図示省略）の情報を含んでいてもよい。この場合、スイッチのリソース情報は、当該スイッチを経由するネットワークトラフィック、当該スイッチからのエラーパケット数等であり、故障情報はポートの状態（up／down）等の情報である。これらの情報は、SNMP（Simple Network Management Protocol）等により取得される。 The server information may include information on a switch (not shown) connected to each server 20. In this case, the switch resource information is network traffic passing through the switch, the number of error packets from the switch, and the like, and the failure information is information such as a port status (up / down). Such information is acquired by SNMP (Simple Network Management Protocol) or the like.

このサーバ情報の例を、図４に示す。ここでは、サーバＡのサーバ情報をテーブル形式にまとめたものを示している。例えば、情報ごとに、当該情報の種別、その情報の項目、その情報の示す値、その情報の作成時刻等が示される。ここでの「種別」とは情報の収集元を表す属性であり、例えば属性値には、サーバリソース、スイッチ、故障情報、システムログ、システム統計情報等がある。また「項目」とは、「種別」が示すデータの収集元に対し、収集されるデータの種類を一意に区別するための属性である。例えば、「種別」の属性値がサーバリソースの場合、当該属性値に対する「項目」の属性値としては、ロードアベレージ、iowait、ディスクのwriteスループット等であり、「種別」の属性値がシステムログの場合、当該属性値に対する「項目」の属性値は、クライアント数、write件数等である。 An example of this server information is shown in FIG. Here, the server information of the server A is shown in a table format. For example, for each information, the type of the information, the item of the information, the value indicated by the information, the creation time of the information, and the like are indicated. Here, the “type” is an attribute representing the information collection source. For example, the attribute value includes a server resource, a switch, failure information, a system log, system statistical information, and the like. The “item” is an attribute for uniquely identifying the type of data collected from the data collection source indicated by the “type”. For example, when the attribute value of “type” is a server resource, the attribute value of “item” for the attribute value is load average, iowait, disk write throughput, etc., and the attribute value of “type” is the system log In this case, the attribute value of “item” for the attribute value is the number of clients, the number of writes, and the like.

また、このサーバ情報記憶部１３１は、ラベル情報も格納する。このラベル情報は、サーバ情報に示される各情報の分析方法を示した情報であり、例えば、図５に示すように、サーバ情報に示される情報の種別、項目ごとに、その情報の分析方法をラベル値で示した情報である。このラベル情報は、その情報の分析方法が閾値を用いた条件判定である場合、その条件判定で用いる閾値も含む。また、図示を省略しているが、各分析方法で用いる式を含んでいてもよい。さらに、このサーバ情報記憶部１３１は、各サーバ２０が格納されるラックの情報（格納ラック情報）も含んでいてもよい（図５のラック情報参照）。このラベル情報および格納ラック情報は、例えば、入出力部１１経由で、ボトルネック分析装置１０のユーザにより入力される。 The server information storage unit 131 also stores label information. This label information is information indicating the analysis method of each information indicated in the server information. For example, as shown in FIG. 5, the information analysis method is indicated for each type and item of information indicated in the server information. This is information indicated by a label value. This label information includes a threshold value used in the condition determination when the analysis method of the information is a condition determination using a threshold value. Although not shown in the figure, formulas used in each analysis method may be included. Further, the server information storage unit 131 may also include information on the rack in which each server 20 is stored (storage rack information) (see rack information in FIG. 5). The label information and the storage rack information are input by the user of the bottleneck analyzer 10 via the input / output unit 11, for example.

図２のスコア記憶部１３２は、制御部１２により計算されたサーバ２０それぞれの、偏りの要因スコア、閾値の要因スコア、故障の要因スコア、サーバ影響度、ボトルネックスコア等を記憶する。スコア記憶部１３２に記憶される情報の詳細は、図面を用いて後記する。 The score storage unit 132 in FIG. 2 stores the bias factor score, threshold factor score, failure factor score, server influence level, bottleneck score, and the like of each of the servers 20 calculated by the control unit 12. Details of the information stored in the score storage unit 132 will be described later with reference to the drawings.

制御部１２は、データ収集部１２０、データ統合部１２１、データ選択部１２２、要因スコア計算部１２３、影響度計算部１２７、スコア統合部１２８および表示処理部１２９を備える。 The control unit 12 includes a data collection unit 120, a data integration unit 121, a data selection unit 122, a factor score calculation unit 123, an influence degree calculation unit 127, a score integration unit 128, and a display processing unit 129.

データ収集部１２０は、サーバ２０それぞれから、サーバ情報（故障情報、サーバリソース情報等）を取得し、サーバ情報記憶部１３１に格納する。 The data collection unit 120 acquires server information (failure information, server resource information, etc.) from each server 20 and stores it in the server information storage unit 131.

データ統合部１２１は、サーバ情報記憶部１３１から、データ選択部１２２により指示された時間帯のサーバ情報を取得する。また、データ統合部１２１は、ラベル情報（図５参照）を参照して、取得したサーバ情報の各情報の分析方法（図５におけるラベルの項目の値）を取得する。そして、データ統合部１２１は、これら取得した情報を統合してスコア計算用情報を作成する。すなわち、データ統合部１２１は、サーバ情報記憶部１３１から取得したサーバ情報に対し、サーバ情報の各情報の項目について、ラベル情報（図５参照）を参照して、偏り／閾値／故障のデータの種別を示すいずれかのラベルを付与する（図６の「種別１」）。また、ラベルが「閾値」を含む場合は当該閾値の値を付与する（図６の「閾値」）。さらに、データ統合部１２１は、サーバ情報の収集元のサーバ２０の識別情報（例えば、ＩＰアドレス等）を付与する（図６の「サーバ」）。なお、データ統合部１２１は、ラベル情報がサーバ情報の各情報の分析方法（「偏り」や「閾値」）に用いる式の情報が含まれていれば、スコア計算用情報に、その式の情報も付与してもよい。さらに、前記したラック情報（図５参照）があれば、データ統合部１２１は、スコア計算用情報に、各サーバ２０の格納ラックの情報を付与してもよい（図６の「所属ラック」）。このスコア計算用情報は、サーバ情報記憶部１３１の所定領域に格納され、要因スコア計算部１２３や影響度計算部１２７、スコア統合部１２８が、各スコアやサーバ影響度を計算するときに参照される。 The data integration unit 121 acquires server information in the time period designated by the data selection unit 122 from the server information storage unit 131. Further, the data integration unit 121 refers to the label information (see FIG. 5) and acquires the analysis method (value of the label item in FIG. 5) of each piece of information of the acquired server information. The data integration unit 121 integrates the acquired information to create score calculation information. That is, for the server information acquired from the server information storage unit 131, the data integration unit 121 refers to the label information (see FIG. 5) for each information item of the server information, and stores the bias / threshold / failure data. One of the labels indicating the type is assigned (“Type 1” in FIG. 6). If the label includes “threshold”, the value of the threshold is assigned (“threshold” in FIG. 6). Further, the data integration unit 121 gives identification information (for example, an IP address) of the server 20 from which the server information is collected (“server” in FIG. 6). Note that if the label information includes information on an expression used for an analysis method (“bias” or “threshold”) of each piece of information in the server information, the data integration unit 121 includes information on the expression in the score calculation information. May also be provided. Furthermore, if there is the rack information (see FIG. 5), the data integration unit 121 may add information on the storage rack of each server 20 to the score calculation information (“affiliated rack” in FIG. 6). . This score calculation information is stored in a predetermined area of the server information storage unit 131, and is referenced when the factor score calculation unit 123, the influence degree calculation part 127, and the score integration part 128 calculate each score and server influence degree. The

図２のデータ選択部１２２は、ボトルネック分析の対象とするサーバ情報の範囲の指示や、表示処理部１２９により表示させる表示情報の指示を受け付ける。例えば、データ選択部１２２は、入出力部１１経由で、サーバ情報記憶部１３１に格納されるサーバ情報のうち、どの時間帯の情報を取得すべきかの指示入力を受け付けると、この指示入力をデータ統合部１２１へ出力し、データ統合部１２１は、サーバ情報記憶部１３１から、データ選択部１２２により指示された範囲のサーバ情報を取得する。 The data selection unit 122 in FIG. 2 receives an instruction for a range of server information to be subjected to bottleneck analysis and an instruction for display information to be displayed by the display processing unit 129. For example, when the data selection unit 122 receives an instruction input indicating which time zone information should be acquired from the server information stored in the server information storage unit 131 via the input / output unit 11, the data selection unit 122 receives the instruction input as data. The data integration unit 121 outputs the server information in the range specified by the data selection unit 122 from the server information storage unit 131.

（要因スコア計算）
要因スコア計算部１２３は、サーバ情報記憶部１３１に格納されたスコア計算用情報（図６参照）を読み出し、サーバ２０ごとにボトルネックの要因となりうる各要因のスコアを計算する。要因スコア計算部１２３は、偏りの要因スコアを計算する偏り要因スコア計算部１２４、閾値要因スコアを計算する閾値要因スコア計算部１２５、および、故障要因スコアを計算する故障要因スコア計算部１２６を備える。つまり、分散処理システムにおけるボトルネック要因としては、大きく、サーバ２０間での処理の偏り、サーバ２０の故障、サーバ２０における処理量やリソース使用量が所定の閾値以上（あるいは閾値未満）となっていることのいずれかに分類されるので、要因スコア計算部１２３は、それぞれの要因についてスコアを計算する。 (Factor score calculation)
The factor score calculation unit 123 reads the score calculation information (see FIG. 6) stored in the server information storage unit 131, and calculates the score of each factor that can be a bottleneck factor for each server 20. The factor score calculator 123 includes a bias factor score calculator 124 that calculates a bias factor score, a threshold factor score calculator 125 that calculates a threshold factor score, and a failure factor score calculator 126 that calculates a failure factor score. . In other words, the bottleneck factor in the distributed processing system is large. Processing bias among the servers 20, failure of the server 20, processing amount and resource usage in the server 20 are greater than or equal to a predetermined threshold (or less than the threshold). Therefore, the factor score calculation unit 123 calculates a score for each factor.

（偏り要因スコア）
偏り要因スコア計算部１２４は、スコア計算用情報（図６参照）に示される情報のうち、分析方法が偏りである各項目の情報（つまり、スコア計算用情報における「種別１」の値が「偏り」である情報)について、所定の計算式により、各サーバ２０の偏りの要因スコアを計算する。例えば、偏り要因スコア計算部１２４は、以下の式（１）により、分散処理システムのサーバＭ（サーバ２０）の偏りの要因スコアを計算する。 (Bias factor score)
The bias factor score calculation unit 124 has information of each item whose analysis method is biased among the information shown in the score calculation information (see FIG. 6) (that is, the value of “type 1” in the score calculation information is “ The bias factor score of each server 20 is calculated using a predetermined calculation formula. For example, the bias factor score calculation unit 124 calculates the bias factor score of the server M (server 20) of the distributed processing system by the following equation (1).

偏りの要因スコア＝｜サーバＭにおける項目Ｎの全時間の平均値−全サーバにおける項目Ｎの全時間の平均値|÷各サーバにおける項目Ｎの平均値の最大値…式（１） Bias factor score = | Average value of item N for all time on server M−Average value of item N for all time on all servers | ÷ Maximum value of item N on each server (1)

以下に、式（１）を用いた偏りの要因スコアの計算例を説明する。ここでは、図７に例示するスコア計算用情報を用いて計算する場合を例に説明する。図７は、図６に例示したスコア計算用情報のうち、「種別１」が「偏り」であり、項目が「ロードアベレージ」である情報を抜き出したものである。ここで、式（１）におけるサーバＭが「サーバＢ」、項目Ｎが「ロードアベレージ」であるとすると、サーバＡ（図７の範囲（１））のロードアベレージの値の平均は、（１１＋１３＋６＋１０＋１１）÷５＝１０．２である。同様にサーバＢ（図７の範囲（２））のロードアベレージの値の平均は５．２、サーバＣ（図７の範囲（３））のロードアベレージの値の平均は５．４である。また、全サーバ（図７の範囲（４））のロードアベレージの値の平均は（１０．２＋５．２＋５．４）÷３＝６．９である。よって、サーバＢのロードアベレージの偏りの要因スコアは、以下に示すように０．１２となる。 Hereinafter, an example of calculation of the bias factor score using Expression (1) will be described. Here, a case where calculation is performed using the score calculation information illustrated in FIG. 7 will be described as an example. FIG. 7 shows information extracted from the score calculation information illustrated in FIG. 6 with “type 1” being “bias” and the item being “load average”. Here, if the server M in the formula (1) is “server B” and the item N is “load average”, the average of the load average values of the server A (range (1) in FIG. 7) is (11 + 13 + 6 + 10 + 11). ) ÷ 5 = 10.2. Similarly, the average of the load average values of the server B (range (2) in FIG. 7) is 5.2, and the average of the load average values of the server C (range (3) in FIG. 7) is 5.4. The average of the load average values of all servers (range (4) in FIG. 7) is (10.2 + 5.2 + 5.4) ÷ 3 = 6.9. Therefore, the factor score of the load average bias of the server B is 0.12, as shown below.

サーバＢの偏りの要因スコア＝｜範囲（２）の値の平均−範囲（４）の値の平均｜÷範囲(１)の値の平均の最大値
＝｜５．２−６．９｜÷ｍａｘ（１０．２，５．２，５．４）
＝１．２÷１０．２＝０．１２ Server B bias factor score = | average of values in range (2) −average of values in range (4) | ÷ maximum average of values in range (1) = | 5.2−6.9 | ÷ max (10.2, 5.2, 5.4)
= 1.2 ÷ 10.2 = 0.12

（閾値の要因スコア）
閾値要因スコア計算部１２５は、スコア計算用情報（図６参照）に示される情報のうち、閾値を用いて分析する（つまり、スコア計算用情報における「種別１」の値が「閾値」である）情報について、閾値の要因スコアを決定（計算）する。すなわち、閾値要因スコア計算部１２５は、スコア計算用情報（図６参照）に示される値のうち閾値を用いた条件判定を行うものについては、その閾値を用いた条件判定の結果により、閾値の要因スコア（０または１）を決定する。閾値要因スコア計算部１２５は、例えば、以下の式（２）により、分散処理システムのサーバＭ（サーバ２０）の閾値の要因スコアを計算する。 (Threshold factor score)
The threshold factor score calculation unit 125 analyzes using the threshold among the information shown in the score calculation information (see FIG. 6) (that is, the value of “type 1” in the score calculation information is “threshold”). ) For information, determine (calculate) a threshold factor score. That is, the threshold factor score calculation unit 125 performs the condition determination using the threshold value among the values indicated in the score calculation information (see FIG. 6), based on the result of the condition determination using the threshold value, Determine the factor score (0 or 1). The threshold factor score calculation unit 125 calculates the threshold factor score of the server M (server 20) of the distributed processing system by, for example, the following equation (2).

if(サーバMにおける項目Ｎの値の平均値≧閾値)
then 閾値の要因スコア=１
else 閾値の要因スコア=０…式（２） if (mean value of item N in server M ≧ threshold)
then threshold factor score = 1
else Threshold factor score = 0 (2)

つまり、サーバＭの項目Ｎの値の平均値が、スコア計算用情報（図６参照）に示される閾値以上であった場合、この項目Ｎに関する閾値の要因スコアを１とし、閾値未満であった場合は０とする。なお、閾値を用いた条件判定の内容およびその判定結果により付与する閾値の要因スコアの値は、前記の内容や値に限定されない。 That is, when the average value of the item N of the server M is equal to or greater than the threshold value indicated in the score calculation information (see FIG. 6), the threshold factor score for this item N is set to 1 and is less than the threshold value. In this case, 0 is set. Note that the content of the condition determination using the threshold and the value of the threshold factor score given by the determination result are not limited to the above-described content and value.

（故障の要因スコア）
故障要因スコア計算部１２６は、スコア計算用情報（図６参照）に示される情報のうち、分析方法が故障を用いた分析である情報（つまり、スコア計算用情報における「種別１」の値が「故障」である)値について、故障の要因スコア（故障要因スコア）を決定（計算）する。この故障の要因スコアは、サーバ２０で発生している故障の度合いを示す値であり、例えば、故障要因スコア計算部１２６は、以下の式（３）により、故障の要因スコアを計算する。 (Failure factor score)
The failure factor score calculation unit 126 includes information indicating that the analysis method is analysis using failure among the information shown in the score calculation information (see FIG. 6) (that is, the value of “type 1” in the score calculation information is A failure factor score (failure factor score) is determined (calculated) for the value of “failure”. The failure factor score is a value indicating the degree of failure occurring in the server 20. For example, the failure factor score calculation unit 126 calculates a failure factor score by the following equation (3).

if(サーバMにおける項目Ｎの故障がいずれかの時間で発生)
then 故障の要因スコア=１
else 故障の要因スコア=０…式（３） if (failure of item N on server M occurs at any time)
then Failure factor score = 1
else Failure factor score = 0 ... Formula (3)

つまり、サーバＭの項目Ｎに関する故障がいずれかの時間で発生していた場合、この項目Ｎに関する故障の要因スコアを１とし、故障が発生していなかった場合は０とする。 That is, if a failure relating to item N of server M has occurred at any time, the failure factor score relating to this item N is set to 1, and 0 if no failure has occurred.

ただし、スイッチ間のネットワークのスループット等、ラック単位で値が決まるものに関しては、要因スコア計算部１２３は、式（１）〜式（３）におけるサーバ２０をラック（図５の格納ラック情報参照）に置き換えて計算する。 However, for factors such as network throughput between switches whose values are determined in rack units, the factor score calculation unit 123 racks the server 20 in the equations (1) to (3) (see storage rack information in FIG. 5). Replace with and calculate.

ここで計算された各サーバ２０の各項目に関する、偏りの要因スコア、閾値の要因スコア、および、故障の要因スコアの値（図８参照）は、スコア記憶部１３２の所定領域に記憶される。 The bias factor score, threshold factor score, and failure factor score values (see FIG. 8) for each item of each server 20 calculated here are stored in a predetermined area of the score storage unit 132.

影響度計算部１２７は、スコア計算用情報（図６参照）に示される、各サーバ２０へのリクエスト数、および、各サーバ２０でリクエストを受け付けたクライアント数の少なくともいずれかを参照し、サーバ２０ごとに、分散処理システムにおける当該サーバ２０のリクエストの集中の度合いを示すサーバ影響度を計算する。例えば、影響度計算部１２７は、以下の式（４）に基づき、分散処理システムのサーバＭ（サーバ２０）のサーバ影響度を計算する。ここでのリクエスト数は、例えば、システムログにおけるwrite件数（図６参照）や、read件数等、サーバ２０に対するすべてのリクエストに関する処理件数を用いる。 The influence degree calculation unit 127 refers to at least one of the number of requests to each server 20 and the number of clients that have received a request at each server 20 indicated in the score calculation information (see FIG. 6). Each time, a server influence degree indicating a degree of concentration of requests of the server 20 in the distributed processing system is calculated. For example, the influence degree calculation unit 127 calculates the server influence degree of the server M (server 20) of the distributed processing system based on the following equation (4). As the number of requests here, for example, the number of processes related to all requests to the server 20 such as the number of writes in the system log (see FIG. 6) and the number of reads are used.

サーバＭのサーバ影響度＝（サーバＭへのリクエスト数÷分散処理システム全体へのリクエスト数）+（サーバＭでリクエストを受け付けたクライアント数÷全クライアント数）…式（４） Server influence degree of server M = (number of requests to server M ÷ number of requests to the entire distributed processing system) + (number of clients that received requests on server M ÷ number of all clients) (4)

なお、サーバＭが分散処理システムにおけるマスタである場合、マスタには常にリクエスト処理が集中するので、影響度計算部１２７は、このサーバ２０のサーバ影響度を１とする。 When the server M is the master in the distributed processing system, the request processing always concentrates on the master, so the impact calculation unit 127 sets the server impact of the server 20 to 1.

影響度計算部１２７により計算された各サーバ２０のサーバ影響度の値（図９参照）は、スコア記憶部１３２の所定領域に記憶される。 The server influence value (see FIG. 9) of each server 20 calculated by the influence calculation unit 127 is stored in a predetermined area of the score storage unit 132.

スコア統合部１２８は、各サーバ２０のサーバ影響度、および、各要因スコアの合計値を用いて、各サーバ２０のボトルネックスコア（分散処理システムにおいて各サーバ２０がどの程度ボトルネックとなっているかを示すスコア）を計算する。ここでのボトルネックスコアの計算は、スコア統合部１２８が、サーバ２０ごとに、このサーバ２０の各要因スコアの合計値に対し、このサーバ２０のサーバ影響度で重み付けをすることで行う。スコア統合部１２８は、例えば、以下の式（５）に示すように、分散処理システムのサーバＭ（サーバ２０）の各要因スコアの値の合計値に、サーバＭのサーバ影響度を掛けた値を計算することにより、サーバＭのボトルネックスコアを求める。 The score integration unit 128 uses the server influence degree of each server 20 and the total value of each factor score to determine the bottleneck score of each server 20 (how much each server 20 is a bottleneck in the distributed processing system). Is calculated). The calculation of the bottleneck score here is performed by weighting the total value of the factor scores of the server 20 by the server integration degree of the server 20 for each server 20. For example, as shown in the following formula (5), the score integration unit 128 is a value obtained by multiplying the total value of the factor scores of the server M (server 20) of the distributed processing system by the server influence degree of the server M. By calculating the bottleneck score of the server M.

サーバＭのボトルネックスコア＝サーバＭのサーバ影響度×（α×サーバＭの偏り要因スコアの合計値＋β×サーバＭの故障要因スコアの合計値+γ×サーバＭの閾値要因スコア）…式（５） Server M bottleneck score = server influence degree of server M × (α × total value of bias factor score of server M + β × total value of failure factor score of server M + γ × threshold factor score of server M) Expression ( 5)

なお、式（５）における、α、β、γは各要因スコアの総和におけるスケールを合わせるために用いる係数であり、例えば、αは（１÷各サーバ２０の偏りの要因スコアの総和の最大値）、βは（１÷各サーバ２０の閾値の要因スコアの総和の最大値）、γは（１÷各サーバ２０の故障の要因スコアの総和の最大値）である。 In Equation (5), α, β, and γ are coefficients used to adjust the scale of the sum of each factor score. For example, α is (1 ÷ maximum sum of factor scores of bias of each server 20). ), Β is (1 ÷ maximum sum of threshold factor scores of each server 20), and γ is (1 ÷ maximum sum of failure factor scores of each server 20).

また、スコア統合部１２８は、式（５）における各要因スコアの合計値を計算するとき、複数のサーバ２０に影響する可能性がある値（例えば、スイッチ間のネットワークのスループット等）に関しては、影響する可能性のあるサーバ２０すべての各要因スコアの値を加えるものとする。 In addition, when the score integration unit 128 calculates the total value of each factor score in Expression (5), with respect to values that may affect the plurality of servers 20 (for example, network throughput between switches), The value of each factor score of all the servers 20 that may be affected is added.

なお、スコア統合部１２８は、マスタのサーバ２０（サーバＬ）のボトルネックスコアについては、以下の式（６）によって計算する。 The score integration unit 128 calculates the bottleneck score of the master server 20 (server L) by the following equation (6).

サーバＬのボトルネックスコア=サーバＬのサーバ影響度×（サーバＬの閾値の要因スコアの総和＋サーバＬの故障の要因スコアの総和）…式（６） Server L bottleneck score = Server impact level of server L × (Total sum of server L threshold factor scores + Total sum of server L failure factor scores) (6)

このようにスコア統合部１２８は、ボトルネックスコアを計算する際、サーバ２０の偏り要因スコアおよびサーバ影響度を用いるので、当該サーバ２０が他のサーバ２０に比べて処理負荷が大きい場合（処理の偏りが大きい場合）や、多数のリクエスト、または、多数のクライアントからのリクエストを受け付けるサーバ２０である場合に、そのことをボトルネックスコアに反映させることができる。 As described above, the score integration unit 128 uses the bias factor score and the server influence degree of the server 20 when calculating the bottleneck score. Therefore, when the server 20 has a larger processing load than the other servers 20 (processing If the server 20 accepts a large number of requests or requests from a large number of clients, this can be reflected in the bottleneck score.

スコア統合部１２８は、計算した各サーバ２０のボトルネックスコアと、要因スコア計算部１２３により計算された各サーバ２０の各要因スコアとを合わせて、例えば、図１０に示すような情報を作成し、スコア記憶部１３２に格納する。 The score integration unit 128 combines the calculated bottleneck score of each server 20 with each factor score of each server 20 calculated by the factor score calculation unit 123, for example, creates information as shown in FIG. And stored in the score storage unit 132.

図２の表示処理部１２９は、データ選択部１２２からの指示入力に基づき、スコア記憶部１３２に格納された各スコア（ボトルネックスコア、サーバ影響度、各要因スコア）を外部装置（表示装置等。図示省略）へ表示させる。例えば、ボトルネック分析装置１０のユーザは、図１０に例示した各サーバ２０のボトルネックスコアを閲覧することで、分散処理システムのどのサーバ２０がボトルネックになっている可能性が高いかを判断することができる。さらに、表示処理部１２９は、データ選択部１２２からの指示入力に基づき、所定のサーバ２０（例えば、ボトルネックスコアが高いサーバ２０等）について、そのサーバ２０のボトルネックスコアの計算に用いた各要因スコアや、サーバ影響度も表示させるようにしてもよい。このようにすることで、ボトルネック分析装置１０のユーザは、ボトルネックの特定を行いやすくなる。 2 displays each score (bottleneck score, server influence degree, each factor score) stored in the score storage unit 132 based on an instruction input from the data selection unit 122 as an external device (display device or the like). (Not shown). For example, the user of the bottleneck analysis apparatus 10 determines which server 20 in the distributed processing system is likely to be a bottleneck by browsing the bottleneck score of each server 20 illustrated in FIG. can do. Further, based on the instruction input from the data selection unit 122, the display processing unit 129 uses each of the server 20 (for example, the server 20 having a high bottleneck score) used for calculating the bottleneck score of the server 20 You may make it display a factor score and a server influence degree. By doing in this way, it becomes easy for the user of the bottleneck analyzer 10 to specify a bottleneck.

例えば、表示処理部１２９が、図１０に例示した各スコアやサーバ影響度を示す表示情報を作成することで、ボトルネック分析装置１０のユーザは、サーバＢが最もボトルネックスコアが高く、分散処理システムにおいてボトルネックになっている可能性が最も高いこと（サーバＢのボトルネックスコア「１．３３」）や、サーバＢがボトルネックになることで大きな影響を及ぼす可能性が高いこと（サーバ影響度「１．１０」）を推測できる。また、このサーバＢのボトルネックの要因のうち、偏りの要因スコアは「０．５４」であり、閾値の要因スコアは「０．６７」あり、故障の要因スコア「０．００」であることから、サーバＢにおけるボトルネックは、処理の偏りや、サーバＢにおける処理能力の低下が原因である可能性が高いことがわかる。 For example, the display processing unit 129 creates display information indicating each score and the server influence degree illustrated in FIG. 10, so that the user of the bottleneck analysis apparatus 10 has the highest bottleneck score and the distributed processing The possibility of a bottleneck in the system being the highest (Server B bottleneck score “1.33”), and the possibility that Server B will become a bottleneck is likely to have a significant impact (Server Impact) Degree "1.10"). Among the bottleneck factors of Server B, the bias factor score is “0.54”, the threshold factor score is “0.67”, and the failure factor score is “0.00”. From this, it can be seen that the bottleneck in the server B is likely to be caused by a process bias or a decrease in the processing capability of the server B.

なお、ボトルネック分析装置１０は、分散処理システムにおけるボトルネック要因を、（１）サーバ２０間での処理の偏り、（２）サーバ２０の故障、（３）サーバ２０における処理量やリソース使用量が所定の閾値以上（あるいは閾値未満）となっていることの三種に分類し、それぞれの要因についてスコアを計算することとしたが、これ以外に分類し、スコアを計算するようにしてももちろんよい。また、ボトルネック分析装置１０のスコア統合部１２８は、ボトルネックスコアの計算にあたり、偏りの要因スコアと、閾値の要因スコアと、故障の要因スコアとを用いることとしたが、偏りの要因スコアと、閾値の要因スコアおよび故障の要因スコアのいずれか一方とを用いるようにしてもよい。つまり、要因スコア計算部１２３は、偏り要因スコア計算部１２４と、閾値要因スコア計算部１２５および故障要因スコア計算部１２６の少なくともいずれか一方とを備えていればよい。 The bottleneck analysis apparatus 10 determines the cause of the bottleneck in the distributed processing system as follows: (1) processing bias among servers 20, (2) server 20 failure, (3) processing amount and resource usage in the server 20. Are classified into three types of which are equal to or more than a predetermined threshold (or less than the threshold), and the score is calculated for each factor. However, it is of course possible to classify other than this and calculate the score. . The score integration unit 128 of the bottleneck analysis apparatus 10 uses the bias factor score, the threshold factor score, and the failure factor score in calculating the bottleneck score. Any one of a threshold factor score and a failure factor score may be used. That is, the factor score calculation unit 123 may include the bias factor score calculation unit 124 and at least one of the threshold factor score calculation unit 125 and the failure factor score calculation unit 126.

また、表示処理部１２９が作成する表示画面は、図１１〜図１６に示す画面であってもよい。以下に説明する表示画面例は、表示処理部１２９が、データ選択部１２２からの指示に基づき、スコア記憶部１３２に格納された各サーバ２０のボトルネックスコア、サーバ影響度、各要因スコアの値、および、サーバ情報記憶部１３１に格納されたサーバ情報をもとに作成される。 In addition, the display screen created by the display processing unit 129 may be the screen shown in FIGS. In the display screen example described below, based on an instruction from the data selection unit 122, the display processing unit 129 has the bottleneck score, the server influence level, and the factor score values of each server 20 stored in the score storage unit 132. And based on the server information stored in the server information storage unit 131.

例えば、図１１に示す画面例は、どのサーバ２０がボトルネックとなっているかを示した画面であり、サーバ２０ごとにボトルネックスコア、サーバ影響度、要因スコア（閾値／故障／偏り）を示す。各サーバ２０の並び順としては、ボトルネックスコアが高い順に並べることで、ユーザは、どのサーバ２０にボトルネックが発生している可能性が高いかを把握しやすくなる。また、表示処理部１２９は、各サーバ２０のサーバ影響度や各要因スコアを併せて画面上に表示することで、何がボトルネック要因となっているかを把握しやすくなる。また、表示処理部１２９は、画面上にボトルネックスコアの最大/平均/最小を表示することで、ユーザは、個別のサーバ２０にボトルネック要因があるのか、もしくは、分散処理システム全体にボトルネックとなっている要因があるのかを把握しやすくなる。 For example, the screen example shown in FIG. 11 is a screen showing which server 20 is a bottleneck, and shows a bottleneck score, a server influence degree, and a factor score (threshold / failure / bias) for each server 20. . By arranging the servers 20 in order of increasing bottleneck score, the user can easily grasp which server 20 is likely to have a bottleneck. Further, the display processing unit 129 makes it easy to grasp what is the bottleneck factor by displaying the server influence degree and each factor score of each server 20 together on the screen. Further, the display processing unit 129 displays the maximum / average / minimum of the bottleneck score on the screen, so that the user has a bottleneck factor in the individual server 20 or the bottleneck in the entire distributed processing system. It becomes easy to grasp whether there is a factor.

また、サーバ２０がマスタである場合、表示内容は図１１とほぼ同様であるが、表示処理部１２９は、サーバ影響度や偏りの要因スコアを表示しないようにする（図１２参照）。 When the server 20 is the master, the display content is almost the same as that in FIG. 11, but the display processing unit 129 does not display the server influence degree or the bias factor score (see FIG. 12).

また、表示処理部１２９は、図１３〜図１６に例示するように、要因スコアごとにその要因スコアの詳細情報を表示するようにしてもよい。表示処理部１２９が、このような表示画面を表示することで、例えば、ユーザが図１１や図１２に示す表示画面上でボトルネックとなっているサーバ２０の見当をつけた後、詳細を分析しやすくなる。 Further, as illustrated in FIGS. 13 to 16, the display processing unit 129 may display detailed information of the factor score for each factor score. The display processing unit 129 displays such a display screen. For example, after the user has determined the server 20 that is a bottleneck on the display screen illustrated in FIGS. 11 and 12, the details are analyzed. It becomes easy to do.

図１３は、サーバ２０におけるサーバ情報について、要因スコアの種別ごとに、その要因スコアの計算の元となった各項目の値の一覧を示した表示画面例である。このような表示画面によれば、ユーザは、要因スコアの種別ごとに、その要因スコアの元となった値を確認することができる。なお、図１３の閾値の要因スコアおよび故障の要因スコアにおける「発生台数」は、例えば、分散処理システム全体においてスコアが１となったサーバ２０の台数である。また、偏りの要因スコアに関しては、システム全体の平均値を併せて表示する。このような表示画面によれば、ユーザは要因スコアごとに他のサーバ２０の状況との比較をしやすくなる。 FIG. 13 is an example of a display screen showing a list of values of each item that is a source of calculation of the factor score for each type of factor score for the server information in the server 20. According to such a display screen, the user can confirm the value which became the origin of the factor score for every kind of factor score. The “number of occurrences” in the threshold factor score and the failure factor score in FIG. 13 is, for example, the number of servers 20 having a score of 1 in the entire distributed processing system. Further, regarding the bias factor score, the average value of the entire system is also displayed. According to such a display screen, it becomes easy for the user to compare the situation of other servers 20 for each factor score.

図１４は、ボトルネック箇所となっている可能性が高いサーバ２０における各処理（処理１〜処理５）のレスポンス時間を示した表示画面例である。表示処理部１２９は、図１４に例示するように、表示画面上に、クライアント３０等における上位アプリ（ＡＰ）からのリクエスト間隔（ＡＰリクエスト間隔）の設定値および実測値に加え、分散処理システム全体における各処理（処理１〜処理５）のレスポンス時間の平均値を併せて表示することで、ユーザは、ボトルネックとなっている可能性が高いサーバ２０においてボトルネックとなっている処理がどの処理であるかを把握しやすくなる。 FIG. 14 is an example of a display screen showing the response time of each process (process 1 to process 5) in the server 20 that is likely to be a bottleneck location. As illustrated in FIG. 14, the display processing unit 129 displays the entire distributed processing system on the display screen in addition to the setting value and the actual measurement value of the request interval (AP request interval) from the upper application (AP) in the client 30 or the like. By displaying the average response time of each process (process 1 to process 5) together, the user can determine which process is a bottleneck in the server 20 that is likely to be a bottleneck. It becomes easy to grasp whether it is.

図１５は、サーバ情報記憶部１３１に格納される各サーバ２０のサーバ情報のうち、所定の項目に関する値（各要因スコアの元になった値）を、他のサーバ２０における値や、分散処理システム全体の平均値とともに表示した表示画面例である。このような表示画面によれば、ユーザは、ある要因スコアの元となった値について、他のサーバ２０の値と比較して高い値なのか否かや、分散処理システム全体の平均に比べて高い値なのか否かを把握しやすくなる。 FIG. 15 shows a value related to a predetermined item (a value based on each factor score) in the server information of each server 20 stored in the server information storage unit 131. It is the example of a display screen displayed with the average value of the whole system. According to such a display screen, the user can determine whether the value that is the basis of a certain factor score is higher than the values of other servers 20 or the average of the entire distributed processing system. It becomes easy to grasp whether the value is high or not.

図１６は、所定のサーバ２０へのリクエスト状況を示した表示画面例である。図１６では、ＡＰがクライアント等の上位アプリを示し、Ｐａ、Ｐｂが分散処理システムのプロセスを表し、それぞれをつなぐ線の太さが各プロセスへのリクエスト量を表している。また、Ｐａ、Ｐｂを囲む線は、それぞれのプロセスがどのサーバ２０で実行されるかを示している。例えば、Ｐａ＿３およびＰｂ＿３は、サーバＡで実行されるプロセスである。このような表示画面によれば、ユーザは、例えば、サーバ影響度の高いサーバ２０やボトルネックになっている可能性が高いサーバ２０がどの上位アプリからどの程度の量のリクエスト量を受け付けているかを把握しやすくなる。 FIG. 16 is an example of a display screen showing a request status to a predetermined server 20. In FIG. 16, AP indicates a higher-level application such as a client, Pa and Pb indicate processes of the distributed processing system, and the thickness of a line connecting each indicates the request amount to each process. Further, the lines surrounding Pa and Pb indicate on which server 20 each process is executed. For example, Pa_3 and Pb_3 are processes executed on the server A. According to such a display screen, for example, the server 20 that has a high server influence level or the server 20 that has a high possibility of being a bottleneck receives a request amount from which higher-level application. It becomes easy to grasp.

また、前記した実施の形態において説明したボトルネック分析装置１０が実行する処理をコンピュータが実行可能な言語で記述したプログラムで実現してもよい。この場合、コンピュータがプログラムを実行することにより、実施の形態と同様の効果を得ることができる。さらに、かかるプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより前記した実施の形態と同様の処理を実現してもよい。以下に、図１に示したボトルネック分析装置１０と同様の機能を実現するプログラムを実行するコンピュータの一例を説明する。 Further, the processing executed by the bottleneck analysis apparatus 10 described in the above embodiment may be realized by a program described in a language that can be executed by a computer. In this case, the same effect as the embodiment can be obtained by executing the program by the computer. Further, such a program may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer and executed to execute the same processing as in the above-described embodiment. Hereinafter, an example of a computer that executes a program that implements the same function as the bottleneck analysis apparatus 10 illustrated in FIG. 1 will be described.

図１７に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有し、これらの各部はバス１０８０によって接続される。 As illustrated in FIG. 17, the computer 1000 includes, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、図１７に例示するように、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、図１７に例示するように、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、図１７に例示するように、ディスクドライブ１０４１に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブに挿入される。シリアルポートインタフェース１０５０は、図１７に例示するように、例えばマウス１０５１、キーボード１０５２に接続される。ビデオアダプタ１０６０は、図１７に例示するように、例えばディスプレイ１０６１に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012 as illustrated in FIG. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031 as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1041 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive. The serial port interface 1050 is connected to, for example, a mouse 1051 and a keyboard 1052 as illustrated in FIG. The video adapter 1060 is connected to a display 1061, for example, as illustrated in FIG.

ここで、図１７に例示するように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、プログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０３１に記憶される。 Here, as illustrated in FIG. 17, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the program is stored in, for example, the hard disk drive 1031 as a program module in which a command to be executed by the computer 1000 is described.

また、上記実施形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出し、アクセス監視手順、アクセス制御手順、プロセス監視手順、プロセス制御手順を実行する。 The various data described in the above embodiment is stored as program data, for example, in the memory 1010 or the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1031 to the RAM 1012 as necessary, and executes an access monitoring procedure, an access control procedure, a process monitoring procedure, and a process control procedure.

なお、プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、監視プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ネットワーク（ＬＡＮ、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 related to the program are not limited to being stored in the hard disk drive 1031, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive or the like. . Alternatively, the program module 1093 and the program data 1094 related to the monitoring program are stored in another computer connected via a network (LAN, WAN (Wide Area Network), etc.) and read by the CPU 1020 via the network interface 1070. May be.

１０ボトルネック分析装置
１１入出力部
１２制御部
１３記憶部
２０サーバ
３０クライアント
１２０データ収集部
１２１データ統合部
１２２データ選択部
１２３要因スコア計算部
１２４偏り要因スコア計算部
１２５閾値要因スコア計算部
１２６故障要因スコア計算部
１２７影響度計算部
１２８スコア統合部
１２９表示処理部
１３１サーバ情報記憶部
１３２スコア記憶部 DESCRIPTION OF SYMBOLS 10 Bottleneck analyzer 11 Input / output part 12 Control part 13 Storage part 20 Server 30 Client 120 Data collection part 121 Data integration part 122 Data selection part 123 Factor score calculation part 124 Bias factor score calculation part 125 Threshold factor score calculation part 126 Failure Factor score calculation unit 127 Influence calculation unit 128 Score integration unit 129 Display processing unit 131 Server information storage unit 132 Score storage unit

Claims

Server information storage unit for storing server information indicating the server resource information of servers constituting the distributed processing system, the number of requests to the server, and the number of clients that have received requests at the server in association with each server When,
Using the server resource information or the server resource information and the failure information of the server, for each server, a factor score calculation unit that calculates each factor score of a bottleneck of processing in the distributed processing system of the server,
The degree of concentration of requests of the server in the distributed processing system for each server, using at least one of the number of requests to the server and the number of clients that received the request at the server indicated in the server information An impact calculation unit for calculating the server impact indicating
A score integration unit that calculates a bottleneck score that is a value obtained by weighting the total value of each factor score for each server with the server influence degree;
The factor score calculator
Using the server resource information, for each of the servers, a bias factor score calculation unit that calculates a bias factor score indicating a degree of processing bias in the distributed processing system of the server,
Using the failure information, for each server, a failure factor score calculation unit that calculates a failure factor score indicating a degree of failure occurring in the server, and using the server resource information, the server For each condition determination by a condition determination formula using a predetermined threshold, and at least one of the threshold factor score calculation unit for calculating the factor score of the threshold according to the determination result,
A bottleneck analysis apparatus comprising:

The server impact is
For each server,
{(Number of requests to the server / number of requests to the entire distributed processing system) + (number of clients that received requests on the server / number of all clients)}
The bottleneck analysis apparatus according to claim 1, wherein the bottleneck analysis apparatus is a value indicating

The bottleneck score is
For each server,
{Server influence level of the server x (total value of the factor scores)}
The bottleneck analysis apparatus according to claim 1, wherein the bottleneck analysis apparatus is a value indicating

The server information storage unit
For each type of the server resource information, identification information indicating whether to use the processing bias, condition determination using a threshold, or both as an analysis method of the server resource information of the type Further store the label information shown,
The bias factor score calculation unit
Among the server resource information, in the label information, for the server resource information specified as using at least the processing bias as the analysis method, the factor score of the bias is calculated.
The threshold factor score calculation unit
Among the server resource information, in the label information, for the server resource information specified to use at least a condition determination using the threshold as the analysis method, a factor score of the threshold is calculated. The bottleneck analyzer according to any one of claims 1 to 3.

The bottleneck analyzer further includes:
A score storage unit for storing the bias factor score of each of the servers, the threshold factor score, the failure factor score, the server influence level, and the bottleneck score;
Based on the instruction input from the input unit, at least one of the bias factor score, the threshold factor score, the failure factor score, the server influence level, and the bottleneck score of each of the servers is displayed. The bottleneck analysis device according to claim 4, further comprising: a display processing unit that performs:

The display processing unit further includes:
Based on the instruction input from the input unit, the server information from which the bias factor score, the threshold factor score, the failure factor score, the server influence level, and the bottleneck score are calculated is displayed. The bottleneck analyzer according to claim 5, wherein:

Bottleneck analyzer
Using the server resource information of the server constituting the distributed processing system or the server resource information and the failure information of the server, each factor score of processing bottleneck in the distributed processing system of the server is calculated for each server Factor score calculation step;
Using at least one of the number of requests to the server and the number of clients that received the request at the server, a server influence degree indicating the degree of concentration of requests of the server in the distributed processing system is calculated for each server. An impact calculation step to perform,
A score integration step of calculating a bottleneck score that is a value obtained by weighting the total value of each factor score with the server influence degree for each server,
The factor score calculating step includes:
Using the server resource information, for each server, a bias factor score calculating step for calculating a bias factor score indicating a degree of processing bias in the distributed processing system of the server;
A failure factor score calculation step of calculating a failure factor score indicating the degree of failure occurring in the server for each server using the failure information, and the server resource information A bottleneck analysis method characterized by including at least one of a threshold factor score calculation step of performing a condition determination by a condition determination formula using a predetermined threshold and calculating a threshold factor score based on the determination result.

Bottleneck analyzer
Using the server resource information of the server constituting the distributed processing system or the server resource information and the failure information of the server, each factor score of processing bottleneck in the distributed processing system of the server is calculated for each server Factor score calculation step;
Using at least one of the number of requests to the server and the number of clients that received the request at the server, a server influence degree indicating the degree of concentration of requests of the server in the distributed processing system is calculated for each server. An impact calculation step to perform,
A score integration step of calculating a bottleneck score that is a value obtained by weighting the total value of each factor score for each server with the server influence degree,
The factor score calculating step includes:
Using the server resource information, for each server, a bias factor score calculating step for calculating a bias factor score indicating a degree of processing bias in the distributed processing system of the server;
A failure factor score calculation step of calculating a failure factor score indicating the degree of failure occurring in the server for each server using the failure information, and the server resource information A program including at least one threshold factor score calculation step of performing a condition determination using a condition determination formula using a predetermined threshold and calculating a threshold factor score based on the determination result.