WO2015145676A1

WO2015145676A1 - Supervisor computer and supervising method

Info

Publication number: WO2015145676A1
Application number: PCT/JP2014/058918
Authority: WO
Inventors: 峰義増田; 裕教江丸
Original assignee: 株式会社日立製作所
Priority date: 2014-03-27
Filing date: 2014-03-27
Publication date: 2015-10-01

Abstract

　A management computer for managing an information processing system that executes information processing by a path composed of a row of multiple stages of resource elements has: a probe management means for selecting a path from among paths that can be supervised by a probe, among probes supervising a path passing through a supervisory resource element that is a resource element to be supervised so that the number of paths passing through the supervisory resource element is greater than or equal to a prescribed minimum number of paths, that supervises a path passing through the greatest number of uncovered supervisory resource elements that are a supervisory resource element in which the number of paths that pass through does not reach the minimum number of paths, making the selected path a supervisory path that is a path to be supervised, and setting a probe supervising the supervisory path as a supervisory probe that is a probe to be supervised; a collection means for collecting the results of supervision by the supervisory probe; and a statistical processing means for determining a supervisory resource element that caused a performance degradation by a cut and divide process according to a co-occurrence pattern based on the supervision result of the supervisory probe.

Description

Monitoring computer and monitoring method

The present invention relates to a technique for measuring the performance of an IT system.

A method based on baseline analysis is often used as a method for detecting a sign of a performance failure in an IT (Information Technology) system. In this method, a program (probe) for measuring the performance of the IT system is installed in the IT system, and the result measured by the probe is compared with the measurement result (baseline) at normal time. Depending on whether or not the difference between the measurement result and the baseline exceeds a predetermined threshold value, it can be determined whether or not the measurement result greatly deviates from the baseline, or whether or not the measurement result deviates from the baseline. Performance failure when the measurement result of the probe greatly deviates from the baseline, or when the frequency of occurrence of the probe measurement result temporarily showing an outlier that is out of the baseline (hereinafter referred to as spike) increases. It is determined that there is a sign of

Regarding the measurement location, Patent Document 1 describes that a DPI device that performs measurement is arranged at a port of a switch through which a path with a high probability of failure or a path with concentrated flows passes.

JP2011-205301A

In the method for detecting a sign of system performance failure by the above-described baseline analysis, it is a precondition that a spike whose measurement result is out of the baseline can be detected. Therefore, the time interval of measurement by the probe tends to be narrow so that the spike can be detected without missing it. This is because, for example, if spikes that can be detected by finely measuring at intervals of seconds are measured at intervals of minutes, the measurement results are statistically averaged and the spikes cannot be seen.

However, if the measurement by the probe is performed at fine time intervals, the amount of resources such as CPU (Central Processing Unit) and memory to be used for the measurement or the place where the measurement result is stored increases. Means.

Also, even if a spike can be detected, the cause of the spike may not be identified (isolated). For example, even if a response time to an I / O (Input / Output) request to the storage is measured and a spike whose response time temporarily increases can be detected, the cause of the spike is due to a single phenomenon. It may not be possible to isolate which resource in the storage is due to tightness.

 Although it is difficult to isolate the cause with a single spike, it may be possible to isolate the cause by analyzing the pattern of multiple spikes. As described above, there is a method of performing separation by analyzing a plurality of spike generation patterns. For example, it is assumed that spikes occur in a plurality of I / O requests almost simultaneously, and the plurality of I / O requests are processed in common by resources in a certain storage. In that case, if it is known in advance that these multiple I / O requests are processed in common in the storage, it is possible to determine that there is a high possibility that a spike has occurred due to this common resource. . That is, the cause can be specified from the co-occurrence pattern of spikes.

This method can identify the resource that caused the spike with high accuracy. However, as a premise, it is necessary to detect more spikes. This requires placing a larger number of probes. In other words, in the storage example described above, probes that measure at fine time intervals that can detect fine spikes at most parts of the server connected to the storage, on the order of hundreds to thousands. It is none other than placement.

As described above, a great deal of cost is required to arrange a large number of probes that measure at fine time intervals. On the other hand, in order to identify the cause of the spike in the co-occurrence relationship, a large number of probes that perform measurement at time intervals that can detect the spike are required.

An object of the present invention is to provide a technique for determining a decrease in performance of an IT system at a low cost.

A management computer according to an aspect of the present invention is a management computer that manages an information processing system that executes information processing using a path in which a plurality of resource elements are connected, and passes through a monitoring resource element that is a resource element to be monitored. Among the probes that monitor the path that passes through the monitoring resource element so that the number of paths that pass is equal to or greater than the predetermined minimum number of paths, the number of paths that pass is a monitoring resource element that has not reached the minimum number of paths The path that selects the path that passes the most uncovered monitoring resource elements is selected from the paths that can be monitored, and the selected path is set as a monitoring path that is a monitoring target path. A probe management means for setting a probe for monitoring as a monitoring probe to be monitored, and the monitoring probe And a statistical processing means for determining the monitoring resource element that caused the performance degradation by the separation based on the co-occurrence pattern based on the monitoring result of the monitoring probe. .

The present invention makes it possible to isolate resources that cause performance problems at the lowest possible cost.

1 is a system configuration diagram of an information processing system according to a first embodiment. It is a figure for demonstrating a path | pass structure. It is a figure for demonstrating schematic operation | movement of the information processing system by 1st Embodiment. 3 is a diagram illustrating an example of a resource configuration table 30. FIG. 4 is a diagram showing an example of a probe configuration management table 40. FIG. It is a figure which shows an example of the monitoring request | requirement table. It is a figure which shows an example of the co-occurrence condition table. It is a figure which shows an example of the spike log | history table. It is a figure which shows an example of a resource performance log | history table. It is a flowchart of a probe selection process. It is a supplementary explanatory drawing of the flowchart of FIG. It is a whole flowchart of a probe selection process. It is a detailed flowchart of step S13. It is a detailed flowchart of step S23. FIG. 15 is a supplementary explanatory diagram of the flowcharts of FIGS. It is a flowchart of the probe reselection process with respect to the change of the path configuration. FIG. 17 is a supplementary explanatory diagram of the flowchart of FIG. 16. It is a flowchart of the probe selection process at the time of monitoring object addition. It is a figure which shows a resource selection screen. It is a figure which shows the presentation screen of a probe selection plan. It is a figure which shows the monitoring result screen. It is a figure which shows the resource structure table 30 in a modification. It is a flowchart of a resource set / division process. It is a system configuration figure in a 2nd embodiment. It is a figure which shows the content of the resource structure table 30 in 2nd Embodiment. It is a figure which shows the probe structure table 40 in 2nd Embodiment.

Embodiments will be described with reference to the drawings.

<< First Embodiment >>
<System configuration>

FIG. 1 is a system configuration diagram of an information processing system according to the first embodiment. The management computer 1, server 19, and storage 12 are connected via a LAN 11. The management computer 1 collects information from the server 19 and the storage 12 via the LAN 11 and transmits operation instructions to the server 19 and the storage 12 via the LAN 11. The server 19 and the storage 12 are also connected by a SAN (Storage Area Network) 18. The server 19 transmits an I / O processing request to the storage 12 via the SAN 18, and the storage 12 processes the I / O request and returns a response to the server 19.

The management computer 1 includes a CPU 2, a memory 3, a computer storage 4, a display I / F 8, and an NW I / F 9. The display I / F 8 is connected to the display device 10, and the NW I / F 9 is connected to the LAN 11.

The computer storage 4 stores a probe management program 5, a collection program 6, and a statistical processing program 7. These programs are read into the memory 3 at the time of startup and executed by the CPU 2.

The memory 3 stores a resource configuration table 30, a probe configuration table 40, a monitoring request table 50, a co-occurrence condition table 60, a spike history table 70, and a resource performance history table 80. The resource configuration table 30 stores configuration information of infrastructure resources that are managed by the management computer 1.

The server 19 includes a CPU 20, a memory 21, a display I / F 24, an in-computer storage 25, an HBA (Host Bus Adapter) 26, and an NW I / F 27. The NW I / F 27 is connected to the LAN 11 and the HBA 26 is connected to the SAN 18.

The application 21 and the probe 23 are stored in the memory 21. These are programs, which are read into the CPU 20 and executed. The VOL 28 is a logical device (not shown) created by the storage 12. From the server 19, this logical device is recognized as a disk area.

The application 22 requests the VOL 28 to read / write data. At this time, an I / O request is issued to the storage 12 having the data entity of the VOL 28. An I / O request issued from the application 22 is delivered from the HBA 26 to the storage 12 through a path 18. The storage 12 processes the I / O request, and returns the result to the application 22 following the previous path in reverse.

Probe 23 measures the processing of this I / O request. For example, the probe measures the response time from when an I / O request is issued to the storage 12 until the result returns. The probe 23 also measures the number of I / O requests (IOPS) processed per second. The probe 23 analyzes the data measured in this way by the above-described baseline monitoring technique, and detects spikes (measured values that deviate significantly from the average measured values). The detected spikes are collected by the collection program 6 and stored in the spike history table 70.

The probe 23 has a function of changing the measurement interval. That is, the probe 23 can perform the measurement described above at a relatively short time interval of the order of seconds, or can be performed at a relatively long time interval of the order of minutes. It is also possible to simultaneously measure the second order and the minute order, stop the second order measurement, and perform only the minute order measurement. Such a change in the measurement interval can be performed by the determination of the probe 23 itself, or can be changed by the determination of an external program, for example, the probe management program 5 of the management computer 1.

As described above, the probe 23 can be mounted as, for example, an OS (not shown) driver that directly measures individual I / O requests, or the OS measures I / O requests. It can also be implemented as a program that periodically collects statistical information on results.

The storage 12 includes a SW (switch) 13, a Port 14, a Processor 15, a Pool 16, and a Cache 17. Port14, Processor15, Pool16, and Cache17 are mutually connected via SW13. Port 14 is connected to SAN 18. Pool 16 is a storage area composed of a plurality of disks and stores data. One Pool 16 may be composed of a plurality of types of media (SSD, SAS HDD). An I / O request issued from the application 22 to the VOL 27 is processed by the processor 15 through the port 14. For example, when the I / O request is data reading, the processor 15 checks whether there is requested data on the cache 17, and if not, acquires the data from the Pool 16 storing the data and returns the data to the application 22.

In the following explanation, terms will be explained using FIG.

FIG. 2 is a diagram for explaining the path configuration.

FIG. 2A is an explanatory diagram of the path. The path is obtained by artificially connecting resources (Port 14, Processor 15, Pool 16, and Cache 17) in the storage 12 used by the VOL 28. Each of these resources is referred to as a resource element. I / O requests to the VOL 28 are processed using resource elements on this path. The probe 23 monitors an I / O request. This is described as monitoring the path. The response time measurement described above is an example of path monitoring. A path monitored by the probe is referred to as a monitoring path.

FIG. 2B is an explanatory diagram of a path group. A path group is a group composed of paths having a common part of resource elements. FIG. 2B illustrates a path from VOL1 (path 1) and a path from VOL2 (path 2). Path 1 and path 2 are common except for Port, and belong to one path group. A common part of this path is referred to as a common part or a common path. On the other hand, a portion that is not a common portion is described as a single portion (Port corresponds to a single portion in FIG. 2B).

FIG. 2C is an explanatory diagram of a single intersection. As in FIG. 2B, there are two VOLs 28, and a path 1 and a path 2 are output from each. These paths intersect at the processor. At this time, the processor is described as a single intersection. Further, it is noted that pass 1 and pass 2 have a single intersection. A path passing through the resource element is referred to as a passing path, and the number (the number of monitoring paths) is referred to as a passing path number. In the figure, the number of passing paths of the processor is 2, and the other resource elements are 1. Also, if the number of passing paths exceeds the specified value, the resource is covered.

<General operation>
Here, the operation of a program operating on the management computer 1 will be described. FIG. 3 is a diagram for explaining a schematic operation of the information processing system according to the present embodiment.

(1) First, the collection program 6 collects the configuration information of the resources inside the storage from the storage 12 and stores them in the resource configuration table 30. The collection program 6 collects the configuration information of the server 19 and the configuration information of the probe 23 and the VOL 28 from the probe 23 of the server 19 and stores them in the probe configuration table 40. Next, the probe management program 5 displays a screen for accepting a monitoring request from the administrator on the display device 10 via the display I / F 8 and stores the input from the administrator in the monitoring request table 50.

(2) to (3) Next, the probe management program 5 refers to the contents stored in the resource configuration table 30 and the probe configuration table 40, determines which probe 23 is to be measured in seconds, and probes. 23 is instructed to measure. At the same time, the probe management program 5 creates a rule (co-occurrence condition) for identifying which resource in the storage 12 is the cause of the spike from the combination of spikes measured by the probe 23 and stores it in the co-occurrence condition table 60. .

(4) The probe 23 instructed to monitor finely measures the I / O request and detects a spike. The collection program 6 collects spike records detected from the probe 23 and stores them in the spike history table 70. Further, the collection program 6 collects the performance information of each resource (measured in the minute order) from the storage 12 and stores it in the resource performance history table 80.

(5) The statistical processing program 7 analyzes the spike information stored in the spike history table 70 in accordance with the co-occurrence conditions stored in the co-occurrence condition table 60, identifies the resource causing the spike, and determines the result. Record in the spike history table 70. The statistical processing program 7 displays the result on the display device 10 and presents it to the administrator.

<Table>
Next, the configuration of each table stored in the memory 3 of the management computer 1 will be described.

FIG. 4 is a diagram illustrating an example of the resource configuration table 30. The resource configuration table 30 stores resource configuration information in the storage 12. That is, for each storage 12 (uniquely identified by the storage ID 31), internal resources (uniquely identified by the resource ID 33) are grouped and stored for each resource type 32. The resource type 32 includes, for example, Port, Processor, Cache, Pool, and Port. 4, the resource ID 33 of Port 14 is PT1, PT2..., The resource ID 33 of Processor 15 is PR1, PR2..., The resource ID 33 of Cache 17 is CA1, CA2..., And the resource ID 33 of Pool 16 is PO1, PO2. It is written as ... This notation is used in the following description.

FIG. 5 is a diagram showing an example of the probe configuration management table 40. The probe configuration table 40 stores the monitoring contents of each probe 23. In other words, which probe 23 (identified by probe ID 41) operates on which server 19 (identified by server ID 43) and which VOL 28 (identified by VOL ID) is monitored (monitored in the order of seconds) The flag 42 stores “Y”).

Also, the probe configuration table 40 stores information on the path 46 (identified by the path ID 44), that is, resources in the storage 12 used by each VOL 28. In FIG. 5, the resource ID 48 of the resource configuring each path 46 (the same identifier as the resource ID of the resource configuration table 30) and the resource type 47 are stored. Further, a path group ID 49 is stored as attached information of each path 46. The path group ID 49 indicates information for distinguishing the single part / common part of the path 46 and the ID of the path group to which the common part belongs. When one path 46 includes both null and a value other than null (0 in the figure) as shown in the rows (401a, 401b) in FIG. 5, a single resource is null. A non-null resource belongs to the common part, and the same path belongs to the 0th path group.

FIG. 6 is a diagram illustrating an example of the monitoring request table 50. In the monitoring request table 50, the monitoring resource requested by the administrator and the monitoring certainty are recorded. The monitoring request table 50 includes an administrator request content (identified by a request ID 51), a monitoring designated device (identified by a device ID 52. Server ID 43 or storage ID 31), a monitoring designated resource (identified by a resource ID 53), and The minimum number of paths 54 is stored. In order to identify the causative resource from the co-occurrence state of spikes, multiple paths that pass through the same resource are monitored. The minimum number of paths 54 is the minimum number of monitoring paths that pass through the same resource. is there. The minimum number of paths 54 determines the certainty of monitoring. If a large number of paths are set, the certainty of identifying the causative resource increases, and if it is set small, the certainty decreases. That is, as the number of paths for monitoring one resource increases, the certainty of the cause resource separation increases.

FIG. 7 is a diagram illustrating an example of the co-occurrence condition table 60. The co-occurrence condition table 60 stores path combination conditions for specifying the cause resource of the measured spike. That is, for each condition (identified by condition ID), a path co-occurrence condition 63, a resource (identified by resource ID 62) that is presumed to be the cause of the spike when the co-occurrence condition is satisfied, and a condition creation time 64 Is stored. For example, “P1 NOT (P2 & P3 & P4 & P5)” is stored in the co-occurrence condition 63 of the row 65a. Here, this means that a spike has occurred in the path P1, but no spike has occurred in the paths P2 to P5. When the manner of occurrence of the spike satisfies this condition, it is estimated that the cause is the tightness of the resource PT1 stored in the resource ID 62.

FIG. 8 is a diagram illustrating an example of the spike history table 70. The spike history table 70 stores spikes measured by the probe 23 and resources estimated from the occurrence of the spikes, that is, analysis results. That is, for each spike (identified by spike ID 71), the generation time 72, the generated VOL (identified by VOL ID), and the response time 74 indicating the magnitude of the spike are stored. Also, the table shows the result of analysis of these spikes by the statistical processing program 7, that is, the resource (identified by resource ID 75) estimated to be the cause of the spike, and the co-occurrence condition table used for the estimation. 60 conditions (identified by condition ID 76) are stored. For example, in the row 77a, the spike identified by the spike ID 0 indicates that the cause resource of the spike is identified as PT1 because the condition ID 76 matches the condition “1”. Note that the resource ID 75 and the condition ID 76 being “null” (line 77b) indicate that the spike has not been analyzed yet. Further, “unknown cause” may be recorded in the resource ID 75 and the condition ID 76.

FIG. 9 is a diagram showing an example of a resource performance history table. In the resource performance history table 80, each resource in the storage 12, the access performance to the VOL 28 measured by the probe 23 (measurement record in minute order), and the like are recorded. In each row of the table, a resource whose performance is measured (identified by resource ID 81), a measurement time 82, a measured metric 83, and its value 84 are recorded.

<Processing flow>
FIG. 10 is a flowchart of the probe selection process. FIG. 11 is a supplementary explanatory diagram of the flowchart of FIG.

A probe selection process that covers several resource elements (for example, the monitoring request shown in 55b of FIG. 6) specified by the administrator will be described. In FIG. 11, Processors 1 and 2 (hatched resource elements; hereinafter referred to as PR1 and PR2) are designated as monitoring targets. Note that the following processing entities are all the probe management program 5.

(S1) The resource element designated for monitoring is identified. If the resource element is directly specified as in the row 55b of the monitoring request table 50 (FIG. 6), nothing is done. When the VOL 28 of the specific server 19 is designated as in the row 55a, the probe configuration table 40 is referred to and the resource element used by the designated VOL 28 is specified.

(S2) Referring to the probe configuration table 40, a probe having a path including the specified resource element is specified, and a probe group is created for each resource element. For example, in FIG. 11, the probe group PG1 = {probe 1, probe 2} having a path passing through PR1 and the probe group PG2 = {probe 2, probe 3} of PR2 are obtained.

(S3) It is determined whether the number of passing paths of each resource (PR1 and PR2) is equal to or greater than the value of the minimum number of paths 54. If it is above, the process ends. If it is less, the process proceeds to S4.

(S4) Select one resource element (for example, PR2) whose number of passing paths is insufficient.

(S5) From the probe group of the resource element selected in S4, select the probe with the largest number of duplicates. The overlap number is the number of probe groups that include the probe. For example, in FIG. 11, the probe 2 (multiple number 2) included in both PG1 and PG2 is selected.

(S6) With reference to the probe configuration table 40, all the paths possessed by the probe selected in S5 are acquired. In FIG. 11, the path 1 and path 2 of the probe 2 are specified.

(S7) Among the paths acquired in S6, only the selected monitoring path and the path that is a single intersection in the specified resource element (PR1 or PR2) are left. Note that, in particular, a user or the like has previously selected a path to be monitored, and there is usually a monitoring path already selected at the start of processing.

(S8) The path remaining in S7 is set as a monitoring path. The monitoring flag 42 in the row of the same path stored in the probe configuration table 40 is changed to Y. At this time, the number of passing paths of the specified resource element is updated. The probe management program 5 instructs the probe selected to monitor the path to monitor at a fine time interval (second order). In addition, the probe management program 5 updates the co-occurrence condition table 60 with the selected path. For example, in FIG. 11, when the monitoring path of Processor 2 is set to path 2 and path 3, a line is added to the co-occurrence condition table 60, the resource ID of the same line is Processor 2, and the co-occurrence condition 63 is “path 2 & path 3”. And That is, PR2 is added to the co-occurrence condition table 60 of FIG. 7, and the co-occurrence condition is P2 & P3. Thereafter, the processes from S3 to S8 are repeated until the condition of S3 is satisfied.

Next, probe selection processing when the administrator designates monitoring of all resources in the storage 12 will be described.

12 to 15 are used for explanation. FIG. 12 is an overall flowchart of the probe selection process. FIG. 13 is a detailed flowchart of step S13. FIG. 14 is a detailed flowchart of step S23. FIG. 15 is a supplementary explanatory diagram of the flowcharts of FIGS.

In the overall flow shown in FIG. 12, the number of resources in the storage 12 is aggregated for each resource type (Port, Processor, etc.), and the monitoring path and its probe 23 are ordered in the order from the resource type with the largest number of elements to the resource type with the smallest number. And the probe configuration table 40 and the co-occurrence condition table 60 are updated.

FIG. 12 will be described.

(S10) Referring to the resource configuration table 30, the number of resources in the target storage 12 is totaled for each resource type. At this time, the covered resource element, that is, the resource for which the monitoring path and its probe 23 are determined in the previous cycle of the loop is excluded from the aggregation.

(S11) If probe selection processing is executed for all resource types, the process proceeds to S16, and if not processed, the process proceeds to S12.

(S12) Select one resource type in descending order of the number of resource elements obtained in S10.

(S13) The monitoring path and its probe are selected so that the resource element of the resource type selected in S12 is a single part and the resource elements of other resource types are the common part. This process will be described in detail later with reference to FIGS.

(S14) The monitoring path selected in S13 is recorded in the probe management table 40. That is, the monitoring flag 42 of the entry selected for the monitoring path is updated to Y.

(S15) A row is created in the co-occurrence condition table 60 based on the monitoring path selected in S13. This process is the same as that of FIG.

(S16) The probes 23 having the monitoring path selected in S13 are identified with reference to the probe management table 40, and monitoring at a fine monitoring interval is instructed to those probes.

Hereinafter, FIG. 13 will be described with reference to the supplementary explanatory diagram of FIG.

(S20) Referring to the resource configuration table 30, the internal resource of the storage 12 designated by the administrator is acquired. Next, the probe management table 40 is referred to and a path passing through those resources is acquired. Next, a common path of these paths is specified, and a path group is created. At this time, a path group is created so that the resource type selected in S12 is a single part and the other resource types are a common part.

This will be described with reference to FIG. Here, it is assumed that Port is the resource type selected in S12. The path 1 is composed of resource elements whose numbers are 1, 1, 1, 1 in the order of Port, Processor, Cache, Pool. Similarly, the path 2 is composed of resource

elements having numbers

2, 1, 1, 1. Path 1 and path 2 pass through the same resource element with a resource type other than Port (common path 1). Therefore, path 1 and path 2 belong to the same path group. Similarly, the path 3 and the path 4 belong to the same path group different from the path groups of the path 1 and the path 2.

(S21) The common paths created in S20 are excluded if the number of passing paths is less than the threshold. Here, the threshold value is a numerical value designated by the minimum number of paths 54 in the monitoring request table 50. For example, in the example shown in FIG. 15A, common path 1 and common path 2 are created, and the number of passing paths is 2, respectively. This value is compared with the numerical value designated by the minimum number of paths 54. By this step, common paths with a small number of passing paths are excluded. In general, as the number of passing paths increases, the number of data used for determining the co-occurrence condition increases, so the accuracy of the co-occurrence determination increases. Also, it is strong against a decrease in the number of passing paths when the path is changed by the configuration change (this will be described later). Therefore, common paths whose number of passing paths does not meet the standard are excluded at this point. Note that the numerical value specified by the minimum number of paths 54 may not be used as it is for the threshold value. For example, a value obtained by multiplying the minimum number of paths 54 by a constant coefficient, for example, 2 may be used as the threshold value. Further, in order to prevent excessive exclusion of common paths, the processing from S21 to S23 may be looped while the coefficient is gradually decreased.

(S22) For the common paths remaining in S20 and S21, a common path forming a single intersection is obtained. This will be described with reference to FIG. In this figure, path 1 and path 2 constitute common path 1, and path n and path m constitute common path n. These common paths are compared with each other to obtain a combination of common paths forming a single intersection. In FIG. 15B, the common path 1 and the common path 2 form a single intersection at Cache1. At this time, Cache 1 is “covered” by common path 1 and common path m. In other words, in the spike generated due to the tightness of Cache 1, the spike occurs almost simultaneously in the paths (

paths

1, 2, and n, m) that constitute the common path 1 and the common path m, and thus causes other resources. Can be distinguished from spikes. Since Cache1 is covered at this point, the cache is the target of the resource element excluded in S12.

(S23) Finally, the paths to be included in the common path selected in S22 are determined. In the example of FIG. 15B, this corresponds to selecting a path that is actually designated as a monitoring path from among

paths

1 and 2 included in the common path 1. However, since the number of passing paths is small in this example, both paths are selected as monitoring paths. The processing in this step will be described in detail with reference to FIG.

FIG. 14 shows processing for selecting a monitoring path from paths belonging to a path group. In this process, a path belonging to a path group is added according to a condition, and a path with many points is selected so that the number of paths constituting the path group is equal to or greater than a threshold value. As a result, the monitoring path can be configured with a favorable path.

(S30) A complete duplicate path is added. That is, a path having a plurality of paths in which passing resource elements completely overlap is added. For example, in FIG. 15C, the path 1 of the probe 1 passes through the resource elements numbered 1, 1, 1, 1 in the order of Port, Processor, Cache, and Pool. In the path 2 of the probe 2, the resource elements that pass through completely overlap with the path 1. At this time, the path 1 and the path 2 are completely overlapping paths, and the overlapping number is 2. Such a path is added. The number of points to be added may be a fixed value or may be proportional to the overlap number. The reason for adding points to the completely duplicated path is to make it possible to easily prepare an alternative path when the path configuration is changed.

(S31) A probe having a large number of paths belonging to any of the common paths is identified, and points are added to those paths. For example, in FIG. 15C, the probe 3 has a path 3 belonging to the common path 1 and a path 4 belonging to the common path 2. At this time, path 3 and path 4 are added. The number of points to be added may be proportional to the number of paths that one probe has. In the example of the probe 3, the number of points may be proportional to two of the path 3 and the path 4. This makes it possible to select a path so that the number of probes is reduced as much as possible.

(S32) In the resource performance history table 80 shown in FIG. 9, points are added to paths with a large flow rate per path, for example, an I / O amount per second (IOPS). That is, the resource ID 81 of the resource performance history table 80 is referred to, the path is obtained by referring to the probe management table 40 from the VOL described therein, and the VOL with a large IOPS value 84 recorded in the resource performance history table 80 is obtained. Add points to the path. The number of points to be added may be a constant value or a point proportional to the flow rate. This is because a path requiring more I / O is more advantageous for detecting a spike.

(S33) Select the target path group for determining the monitoring path. At this time, the path group having the smallest (number of path candidates−number of passing paths) is selected. Here, the number of path candidates refers to the number of paths that are not selected as monitoring paths of other path groups among the paths constituting the path group. The number of passing paths refers to the number of paths selected as monitoring paths among the paths constituting the path group. For example, in FIG. 15C, the common path 2 is composed of a path 4, a path 5, and a path 6 having a processor, cache, and pool as common parts. Paths 4 to 6 pass through

Ports

3, 4, and 5, respectively. Further, the path 7 belonging to the common path 3 passes through Port 5 in the same manner as the path 6. When the path 7 is selected as the monitoring path for the common path 3, the number of path candidates for the common path 2 is 2, which is obtained by subtracting one of the paths 7 from three of the paths 4 to 6. Here, if the path 4 has already been selected as the monitoring path for the common path 2, (the number of path candidates−the number of passing paths) of the common path 2 will be 1 at 2-1.

(S34) The path with the most points is selected from the path candidates as the monitoring path.

(S35) It is determined whether the number of passing paths of all path groups is greater than or equal to the threshold. Here, the number of passing paths is the number of paths selected as monitoring paths among the paths constituting the path group. The threshold is the minimum number of paths 54 in the monitoring request table 50. If all the path groups satisfy this condition, the process proceeds to S36, and if not, the process returns to S33.

(S36) This step is performed when the conditions shown in S35 are satisfied, but all the resource elements belonging to a single part have not yet been covered. For example, in FIG. 15C, it is assumed that the path 4 and the path 6 are selected as the monitoring paths among the

paths

4, 5, and 6 belonging to the common path 2, and the path 5 is not selected. At this time, Port 4 remains uncovered. In this step, first, in this way, uncovered resource elements belonging to a single part are specified. Next, the path having the highest score is selected from the paths passing through the resource element, and is set as the monitoring path of the path group to which the path belongs.

In addition, as shown here, the resource to be monitored is not determined by allocating the path to the path group by selecting the monitoring path so that the number of monitoring paths passing through the monitored resource element is equal to or greater than the threshold. There is also a method of allocating monitoring paths to path groups so that the number of monitoring paths passing through the elements is as even as possible. This has the effect of reducing the possibility that the number of passing paths falls below the threshold when the path configuration is changed by increasing the minimum number of passing paths of the common unit.

Note that the path configuration may change due to the operation of the information processing system. For example, in order to reduce the load on a Pool whose performance is tight, a VOL that places data on the Pool may be moved to another Pool at the administrator's discretion. At this time, the Pool through which this VOL path passes changes. That is, the path configuration changes.

In this way, when a path whose resource element to be used is changed is a monitoring path, it is necessary to reselect a probe in accordance with a new path configuration. However, we want to minimize the selection of monitoring paths and probes by reselection. This is because, if the monitoring path and the selection of the probe are changed too much, the continuity of performance data and spike recording that have been measured so far will be lost.

FIG. 16 is a flowchart of probe reselection processing for a path configuration change. The processing subject of this flow is the probe management program 5. FIG. 17 is a supplementary explanatory diagram of the flowchart of FIG. 16.

(S40) The monitoring path configuration change is received. The collection program 6 periodically collects configuration information and extracts the difference. The probe management program 5 receives this difference, refers to the monitoring flag 42 in the probe configuration table 40, and determines whether the changed path is a monitoring path. Here, it is assumed that the path 1 passing through the resource element r1 before the configuration change is changed to the path 2 passing through the resource element r2 belonging to the same resource type as the resource element r2.

(S41) Referring to the probe configuration table 40, the presence or absence of a completely duplicated path 1 is checked.

(S42) Instead of the path 1, the completely duplicated path (pass 3) acquired in S41 is set as a new monitoring path. At this time, the monitoring flag 42 of the path 3 is changed to Y, and all the paths 1 recorded in the co-occurrence condition 63 of the co-occurrence condition table 60 are changed to the path 3.

(S43) Referring to the path group ID 49 of the probe configuration table 40, it is checked whether or not the resource element r1 passed before the change is a single part. If it is a single part, the process proceeds to S44. If not, that is, if it is a common part, the process proceeds to S48. FIG. 17A shows a configuration change when the unit is a single unit, and FIG. 17B shows a configuration change when the unit is a common unit.

(S44) Hereinafter, an alternative path selection process when the changed part is a single part will be described in S44 to S47. Note that the supplementary explanatory diagram of FIG. First, in this step, a path that passes through the resource element r1 that has passed before the change and passes through another path group that is already monitored is searched. For example, assume that path 1 belonging to common path 1 is changed to path 2 in FIG. At this time, Port 1 through which path 1 has passed is changed to Port 2 in path 2, and there is no path covering Port 1. Therefore, an alternative path that covers Port1 must be selected. Therefore, a path that passes through Port 1 and passes through a common path other than the common path 1 (however, already monitored) is searched (in FIG. 17C, one of such paths is path n).

(S45) If a path is found in S44, the process proceeds to S46, and if not, the process ends with no alternative path. Alternatively, the monitoring path may be selected according to the flow shown in FIG. 10 for the resource element r1.

(S46) When there are a plurality of paths searched in S44, the path belonging to the path group with the smallest number of passing paths is selected. This is to increase the minimum number of passing paths as described in S36.

(S47) The co-occurrence condition table 60 is updated according to the selection of S46.

(S48) Hereinafter, an alternative path selection process when the changed part is a common part from S48 to S410 will be described. Note that the supplementary explanatory diagram of FIG. First, in this step, it is checked whether or not the number of passing paths−1 of the path group to which the path 1 which is the path before the change belongs exceeds a threshold value (explained in S3 and S21). If the threshold value is exceeded, selection of an alternative path is unnecessary and the process proceeds to S49, and if not, the process proceeds to S410.

(S49) The co-occurrence condition 63 in the co-occurrence condition table 60 is updated. Specifically, path 1 is excluded from the conditions based on the path group to which path 1 belongs. For example, it is assumed that the path group is composed of five paths 1 to 5. At this time, if the co-occurrence condition 63 includes a line including the condition based on this path group (path 1 & path 2 & path 3 & path 4 & path 5), path 1 is excluded from this line (path 2 & path 3 & path 4 & Update to pass 5).

(S410) The path is transferred from a path group with a sufficient number of passing paths, and the number of passing paths is recovered to a threshold value or more. This will be described with reference to FIG. It is assumed that the path 1 that has passed through the common part Processor 1 is changed to the path 2. At this time, it is assumed that the number of passing paths of the common path 1 decreases by 1 and falls below the threshold. Therefore, a single part (Port-n) through which the path (path m) belonging to the common path 1 passes is specified, and the path group (common path n) to which the path (path n) covering the single part belongs is specified. Is identified. Among such path groups, a path (path n) belonging to a path group having the largest number of passing paths, that is, having the largest difference from the threshold value is set as a new member of the common path 1.

In the operation of the information processing system, the administrator may add resources to be monitored. Next, a process of additionally selecting a monitoring path and the probe 23 when this monitoring target resource is newly added will be described.

FIG. 18 is a flowchart of probe selection processing when a monitoring target is added. The subject of this processing is the probe management program 5. The probe management program 5 receives information on the newly designated resource element, checks whether there is a co-occurrence condition corresponding to the resource element, and if there is a co-occurrence condition, monitors the path constituting the co-occurrence condition To the probe 23, if not, a path and a probe for monitoring the resource element are newly selected.

(S50) Receives information on resource elements for which monitoring is newly specified.

(S51) The co-occurrence condition table 60 is searched for a row corresponding to the resource element designated for monitoring. If not, the process proceeds to S52, and if present, the process proceeds to S53.

(S52) The monitoring path and probe 23 of the resource element for which monitoring is specified are selected. This selection method may be a method of selecting individual resource elements as starting points (shown in FIG. 9), or monitoring paths and probes 23 for all resource elements in the storage 12 including the specified resource elements. The method of selection (shown in FIG. 12) may be used.

(S53) The co-occurrence condition 63 of the line searched in S51 is acquired, and the monitoring flag 42 of the path constituting the co-occurrence condition is updated to Y. At the same time, the probe having the path is instructed to monitor the path.

<Screen>
Hereinafter, screen display in the present embodiment will be described with reference to FIGS. 19 to 21. The management computer 1 exchanges information with the administrator via these screens.

FIG. 19 is a diagram showing a resource selection screen. On this screen, the administrator designates a resource that requires monitoring at a fine time interval. In FIG. 19, it is assumed that the storage 1 has already been selected from the plurality of storages 12 managed by the management computer 1.

The resource selection screen includes a server list 190 and a resource list 191. In each row of the server list 190, server information 192 related to the storage 1 is displayed. The administrator designates a server for which detailed monitoring is required with a check box at the left end. The administrator can also select all servers by checking the all selection check box 193. Also, the administrator changes the value of the minimum number of paths 194 as appropriate.

In each line of the resource list 191, resources in the storage 1 are displayed in each line. Similar to the server list 190, the administrator can select a resource for which detailed monitoring is required in this list. Further, the minimum number of paths 197 can be set for each resource element, and the administrator can set and change the value of the minimum number of paths 197 for which detailed monitoring is required.

When the administrator selects the OK button 198, the input content is sent to the probe management program 5. The probe management program 5 stores the input content in the monitoring request table 50. Thereafter, the probe management program 5 starts probe selection calculation.

FIG. 20 is a diagram showing a probe selection proposal presentation screen. This screen is a screen for presenting the probe selection result calculated by the probe management program 5 so as to satisfy the monitoring request of the administrator. This screen includes a probe summary, a cover resource summary 202, and a resource-specific monitoring path configuration 203.

In the probe summary, the number of probes 200 required for monitoring and the number 201 of monitoring paths are displayed. The number 201 of monitoring paths required for monitoring is obtained by referring to the probe management table 40 and totaling the monitoring paths that pass through the resources of the storage 1. The number of probes 200 required for monitoring is obtained by counting the number of probes having those monitoring paths.

In the cover resource summary 202, the number of resources for each resource type of the storage 1 and the number of resources covered by the current probe selection (the number of cover resources) are presented.

In the monitoring path configuration by resource 203, the monitoring path configuration corresponding to each monitoring target resource is displayed in each row. Each row displays a co-occurrence condition corresponding to the monitoring target resource, a monitoring path or a monitoring path group constituting the co-occurrence condition, and the number of paths (the number of passing paths). Such information can be acquired from the co-occurrence condition table 60. Further, as supplementary information, IOPS measurement data representing the flow rate of these paths is displayed together.

If there is no problem with the presented probe selection plan, the administrator presses the OK button 204 and approves the selection plan. If there is a problem, the Cancel button 205 is pressed to return to the resource selection screen of FIG.

FIG. 21 is a diagram showing a monitoring result screen. On this screen, the management computer 1 totals and statistically processes the spikes measured by the probe 23, and the result of extracting spikes caused by the resources designated for monitoring is displayed. Here, the administrator reads a spike increase tendency of a resource from the displayed spike history, and determines that a sign of performance failure appears in the resource. In the screen of FIG. 21, Port 1 of the storage 1 has already been selected.

The monitoring result screen is composed of spike statistics 210 and spike history 211. The spike statistics 210 displays spike statistical information for one week related to the currently selected resource (Port 1 in FIG. 21). In the spike statistics 210, (a) the number of spikes measured, (b) the number of spikes attributed to other resources, (c) the number of spikes attributed to Port1, and the previous week ratio of (c) are displayed.

Hereinafter, assuming that the co-occurrence condition 63 corresponding to Port 1 stored in the co-occurrence condition table 60 is “path 1 NOT (pass 2)”, the calculation method from (a) to (c) will be described. The value of (a) is obtained as follows. That is, the statistical processing program 7 totals the number of rows corresponding to the VOL related to pass 1, that is, the number of spikes measured in pass 1 from the spike history table 70. This value is displayed in (a). The value of (b) is obtained by aggregating the number of rows whose cause resource ID 75 is other than Port1, that is, other resources that are not Port1, among the rows obtained by the calculation of (a). The value of (c) is obtained by calculation formulas (a)-(b). From these values, in particular, (c) the number of spikes caused by Port1 and the ratio of the previous week, the administrator can read the increase in spikes caused by Port1, and determine that the performance of Port1 is tight. it can.

The spike history 211 is a graph showing the numerical values shown in the spike statistics. In this graph, the I / O response time measured in pass 1 is recorded. Also, it can be seen from this graph that a total of 6 spikes were measured (spike 212 due to Port 1). Among these spikes, spikes indicated by dotted lines indicate

spikes

213a and 213b caused by other resources. As a result, the administrator can easily read from the graph that these spikes are not caused by Port1.

In the above description of the present embodiment, the resource and the resource element have been described as one-to-one, but this is not necessarily required. A plurality of resources may be handled as one resource element, or one resource may be divided into a plurality of resource elements. Below, such a modification is described.

FIG. 22 is a diagram showing a resource configuration table 30 in the modification. FIG. 22 is different from the resource configuration table 30 shown in FIG. 4 in that information about attributes of resources (attribute 34 and attribute value 35) is added. A row 36a in FIG. 22 is information on PT1 which is a Port resource. The row 36a records that PT1 belongs to the trunk TR1, which is a group of Ports, as attribute information of PT1. For Ports belonging to the same trunk, traffic is automatically distributed according to the load. In this case, a trunk that is a set of a plurality of resources (Ports) can be handled as one resource element.

Similarly, the row 30b is information on PR1, which is a processor resource, and indicates that PR1 belongs to a processor group called PRG1. In the processor group, processing is automatically distributed among the processors belonging to the processor group in accordance with the processor load, similarly to the previous trunk. At this time, as in the case of the previous trunk, the processor group can be handled as one resource element.

In addition, the row 36c stores information on PO1, which is a Pool resource. From the attribute information (attribute 34 and attribute value 35) in the row 36c, it can be seen that PO1 is composed of a plurality of media (SSD and SAS) having different processing speeds. In such a Pool, performance characteristics such as response time vary greatly depending on the storage medium of data requested by the I / O to be processed. Such a Pool can be regarded as having a plurality of resource elements having different performance characteristics for each medium. Therefore, PO1 composed of SSD and SAS may be divided into SSD resource elements and SAS resource elements.

FIG. 23 is a flowchart of resource set / division processing. FIG. 23 shows a process in which the probe management program 5 refers to the resource configuration table 30 to combine a plurality of resources into one resource element or divide one resource into a plurality of resource elements.

(S60) With reference to the attribute 34 and the attribute value 35, if a plurality of resources belong to one group where the load is distributed, the resource belonging to the same group is set as one resource element. In the above example, the load-distributed group corresponds to a Port trunk or a processor group.

(S61) If the resource attribute information (attribute 34 and attribute value 35) indicates that the resource is composed of several resources having different performance characteristics, the resource is divided into resource elements for each performance characteristic. .

As described above, according to this embodiment, the number of paths that pass through the resource element (monitoring resource element) to be monitored is the minimum number of paths that pass until the number of paths that pass through the predetermined minimum number of paths is exceeded. Since the probe that monitors the path that passes the largest number of monitor resource elements (uncovered monitor resource elements) that has not reached the number is selected from the paths that can be monitored, select as many monitor resource elements as possible. While monitoring with the probe, the monitoring path is selected until the minimum number of paths for isolating the monitoring resource element that caused the performance degradation is secured, and the measurement of the system performance is realized at a low cost.

In addition, by introducing a predetermined rule for path selection, such as selecting a path according to a predetermined rule from paths that can be monitored by a probe that monitors the path that passes through the most uncovered monitoring resource elements. Therefore, it is possible to set a more preferable monitoring path.

In addition, since the path whose monitoring resource element has a single intersection is given priority, it becomes easier to separate the managed resource from other managed resources, and the management that caused the performance degradation It is easier to identify the target resource element.

In addition, since a path used for processing with a large amount of processing is prioritized, such as adding a flow rate per path, for example, a path with a large I / O amount per second (IOPS), performance degradation such as spikes is reduced. In addition to monitoring the path that is likely to cause priority, it becomes easier to detect performance degradation such as spikes, and it becomes easier to identify the managed resource that caused the performance degradation.

In addition, since the path is selected so that the total number of the monitoring resource elements through which the selected path passes for each monitoring resource element is equal to or greater than a predetermined value, the monitoring resource elements included in the monitored path are increased. By selecting the path, it becomes easy to separate the managed resource elements.

Alternatively, if the paths are selected so that the total number of monitor resource elements that the selected path passes through for each monitor resource element is equal, the ease of carving for each managed resource element varies. Disappear.

In addition, since a path is selected with priority given to a completely duplicated path, when the path configuration of the selected path (resource through which the path passes) is changed, a path to be monitored is easily prepared instead of that path. can do. When the monitoring resource element that actually passes through the monitoring path is changed, if there is a completely duplicated path, it is used as a monitoring path instead of the monitoring path. And can be switched easily.

In addition, when the monitoring resource element that the monitoring path passes is changed, if there is no complete duplicate path of the monitoring path, other monitoring resources are selected from the paths that pass through the monitoring resource element included in the monitoring path. Among the paths that pass the same monitoring path as the group of monitoring paths that pass, the path that passes the same monitoring path as the group with the smallest number of monitoring paths included in the group is used as the monitoring path. At the same time, it is possible to improve the distinguishability of other monitoring resource elements.

<< Second Embodiment >>

The management computer 1 in the first embodiment is intended to obtain a minimum probe necessary for specifying the cause when the access performance of the storage system deteriorates. The management computer 1 in the second embodiment is obtained by replacing the target from a storage system with an application.

Just as the storage system in the first embodiment is composed of several resources, the application is also composed of several program processes. These program processes correspond to resources in the first embodiment. The program process is, for example, a process in a program module or a database table (or access process to the database table).

Application provides some service to application users. For example, if the application is a Web search system, the service is a service that returns a Web page that matches a specific keyword to the user. The user designates a service and sends a processing request (service request) to the application. The application executes the request and returns the result to the user.

In the first embodiment, the probe 23 measures the response time of the I / O request from the server to the storage system, whereas the probe 23 in the second embodiment is used until the application returns a service request to the user. Measure the response time. The service in the second embodiment corresponds to the path in the first embodiment. In the first embodiment, a path is composed of a series of resources that process I / O requests sent to the storage system through the path. Similarly to this, the service corresponding to the path in the second embodiment is configured by a series of application program processing for processing a service request from the user.

In summary, the second embodiment is similar to the first embodiment, and the monitoring target is changed from a storage system to an application. In the first embodiment, a resource corresponds to a program process, a path corresponds to a service, and the response time monitored by the probe 23 is a service response time.

FIG. 24 is a system configuration diagram according to the second embodiment. As can be seen at a glance, most of the parts in FIG. 24 overlap with those of the first embodiment shown in FIG. Therefore, here, a description will be given mainly of the difference.

The application to be monitored in this embodiment includes the IP switch 102, the Web server 103, and the database server 106. These are connected to the management computer 1 via the LAN 11. These are connected by a business LAN 101 of a different system from the LAN 11 and can communicate with each other. The Web server 103 and the database server 106 are ordinary computers provided with permanent storage devices such as a CPU, memory, and HDD. A Web program 104 that is a part of an application operates on the Web server 103. The Web server 103 includes a large number of program modules 105 that constitute the Web program 104.

The database server 106 operates a database program 107 that is also a part of the application. The database server 106 also has a database table 108 in which application data is stored.

The service request sent from the application user first enters the IP switch 102 through the business LAN 101. The IP switch 102 sends it to the Web server 103. In the Web server 103, the Web program 104 receives the service request, reads the program module 104 related to the service request, and executes predetermined processing. If the data possessed by the application is necessary for the predetermined processing, the Web program 104 further transmits a service request to the database server 105. In the database server 105, the database program 107 receives this, executes predetermined data processing on the database table 108 related to the service request, and returns the result to the requesting Web program 104. The Web program 104 further executes predetermined processing and returns the result to the user via the IP switch 102.

The service monitoring server 100 is a computer that includes a storage device such as a CPU, a memory, and an HDD and executes a program. On the service monitoring server 100, a probe 23 which is a kind of program operates. The service monitoring server 100 is connected to the IP switch 102. The IP switch 102 copies a service request packet passing through the business LAN 101 and an application response packet to the service request packet, and transmits the duplicated packet to the service monitoring server 100. The probe 23 calculates and records the response time for each service from the time difference between these service request / response packets.

The probe 23 calculates the response time of the service and monitors the value. When a spike is detected in the response time, the service request in which the spike occurred, the detected time, and the response time are recorded. The recorded contents are periodically collected by the collection program 6 operating on the management computer 1 and stored in the spike history table 70.

The process of calculating the response time by collating the duplicated packet is a high-load process that consumes a large amount of CPU and memory. Therefore, the probe 23 has a function of limiting the service for which the response time is calculated. Thereby, the consumption of CPU and memory can be reduced. The management computer 1 instructs the probe 23 to select a target service.

Next, each table on the management computer 1 will be described. Here, a description will be given of portions where the contents of the table are different from those of the first embodiment.

FIG. 25 is a diagram showing the contents of the resource configuration table 30 in the second embodiment.

The resource configuration table 30 in the first embodiment stores the resource configuration inside the storage. In the second embodiment, for each server (server ID 31) configuring an application, program processing corresponding to resources operating on the servers is stored in the resource configuration table 30. The resource stores a unique identifier (resource ID 33) for each resource type 32. For example, FIG. 25 shows that there are resources PM1, PM2, PM3... Whose resource type is a program module on a server called Web-Sv1 that constitutes an application.

FIG. 26 is a diagram showing a probe configuration table 40 in the second embodiment.

The probe configuration table 40 stores the following configuration information of the probe 23, configuration information of the probe 23, and monitoring information of the probe 23. The configuration information of the probe 23 includes the identifier of the probe 23 (probe ID 41) and the service monitoring server 100 (server ID 43) on which the probe 23 is operating. The service configuration information includes a service identifier 410, a resource such as a program module used by the service (service 46, resource type 47, resource ID, path group ID 49), service URL 411 (when the application is a Web application), and the like. Is included. The monitoring information of the probe 23 includes the presence / absence of service monitoring by the probe 23 (monitoring flag 42).

The service configuration information may be input manually by the administrator based on the application design information, or the collection program 6 may collect / analyze and input traces and logs output by the application during application execution. Good.

Based on the information in these tables, the probe management program 5 selects the minimum services necessary to identify the performance degradation of the resources (program modules and database tables) and limits the response time to those services. The probe 23 is instructed to monitor. The method for selecting the minimum service is the same as the method for selecting the minimum probe in the first embodiment. Therefore, the processing flow described in the first embodiment can be applied as it is. A term such as “path” may be replaced with the corresponding term in the second embodiment.

As mentioned above, each embodiment mentioned above is an illustration for explanation of the present invention, and is not the meaning which limits the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the gist of the present invention.

DESCRIPTION OF SYMBOLS 1 ... Management computer, 10 ... Display apparatus, 100 ... Service monitoring server, 101 ... Business LAN, 102 ... IP switch, 103 ... Web server, 104 ... Web program, 104 ... Program module, 105 ... Database server, program module, 106 ... Database server, 107 ... Database program, 108 ... Database table, 11 ... LAN, 12 ... Storage, 13 ... SW, 14 ... Port, 15 ... Processor, 16 ... Pool, 17 ... Cache, 18 ... SAN, 19 ... Server, 2 ... CPU, 20 ... CPU, 21 ... memory, 25 ... computer storage, 26 ... HBA, 30 ... resource configuration table, 4 ... computer storage

Claims

A management computer that manages an information processing system that executes information processing by a path in which a plurality of resource elements are connected,
The number of paths that pass through the probes that monitor the paths that pass through the monitored resource element so that the number of paths that pass through the monitored resource element that is the resource element to be monitored is equal to or greater than the predetermined minimum number of paths Select the path from the paths that can be monitored by the probe that monitors the path that passes the largest number of uncovered monitor resource elements that are monitor resource elements that have not reached the minimum number of paths, and monitor the selected path A probe management unit that sets a monitoring path that is a target path and sets a probe that monitors the monitoring path as a monitoring probe that is a probe to be monitored;
A collecting means for collecting a monitoring result by the monitoring probe;
And a statistical processing unit that determines a monitoring resource element that has caused the performance degradation based on the co-occurrence pattern based on the monitoring result of the monitoring probe.
The management computer according to claim 1, wherein the probe management means selects a path according to a predetermined rule from paths that can be monitored by a probe that monitors a path that passes through the most uncovered monitoring resource elements.
3. The management computer according to claim 2, wherein the probe management means selects the path according to the rule that a path from which a set of paths where the monitoring resource element is a single intersection is prioritized is selected.
3. The management computer according to claim 2, wherein the probe management means selects the path according to the rule that a path used for processing with a large amount of processing is given priority.
The probe management unit selects the path according to the rule that the total number of the monitoring resource elements through which the selected path passes is greater than or equal to a predetermined value for each monitoring resource element. The listed management computer.
The probe management means selects the path according to the rule that the total number of monitor resource elements that the selected path passes through is equal for each monitor resource element. Management computer.
3. The management computer according to claim 2, wherein the probe management means selects the path according to the rule that priority is given to a path in which other paths in which the monitoring resource elements included completely overlap each other exist.
When the monitoring resource element through which the monitoring path passes is changed, the probe management means determines that the monitoring path includes a monitoring resource that completely overlaps the monitoring resource that the monitoring path included before the change. The management computer according to claim 1, wherein a monitoring path is used instead.
When the monitoring resource element through which the monitoring path passes is changed, the probe management unit is configured so that there is no path including a monitoring resource that completely overlaps the monitoring resource included in the monitoring path before the change. The number of monitoring paths included in the group among the paths that pass through the same monitoring path as the monitoring path group that passes through other monitoring resources among the paths that pass through the monitoring resource element included in the monitoring path The management computer according to claim 8, wherein a path that passes through the same monitoring path as the smallest group is used as a monitoring path instead of the monitoring path.
A management method for managing an information processing system that executes information processing by a path in which a plurality of resource elements are connected,
Among the probes that monitor the path that passes through the monitoring resource element so that the number of paths that pass through the monitoring resource element that is the resource element to be monitored by the probe management means is equal to or greater than the predetermined minimum number of paths. Select the path from the paths that can be monitored by the probe that monitors the path that passes the largest number of uncovered monitor resource elements, which are monitor resource elements that have not reached the minimum number of paths. The monitored path is a monitoring path that is a monitoring target path, and a probe that monitors the monitoring path is set as a monitoring probe that is a monitoring target probe.
The collecting means collects the monitoring result by the monitoring probe,
A management method in which a statistical processing means determines a monitoring resource element that causes a performance degradation by carving by a co-occurrence pattern based on a monitoring result of the monitoring probe.
The management method according to claim 10, wherein the probe management means selects a path according to a predetermined rule from paths that can be monitored by a probe that monitors a path that passes through the most uncovered monitoring resource elements.
The management method according to claim 11, wherein the probe management unit selects the path according to the rule that priority is given to a path from which a set of paths for which the monitoring resource element is a single intersection is selected.