WO2015145676A1 - Supervisor computer and supervising method - Google Patents

Supervisor computer and supervising method Download PDF

Info

Publication number
WO2015145676A1
WO2015145676A1 PCT/JP2014/058918 JP2014058918W WO2015145676A1 WO 2015145676 A1 WO2015145676 A1 WO 2015145676A1 JP 2014058918 W JP2014058918 W JP 2014058918W WO 2015145676 A1 WO2015145676 A1 WO 2015145676A1
Authority
WO
WIPO (PCT)
Prior art keywords
path
monitoring
probe
paths
resource
Prior art date
Application number
PCT/JP2014/058918
Other languages
French (fr)
Japanese (ja)
Inventor
峰義 増田
裕教 江丸
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2014/058918 priority Critical patent/WO2015145676A1/en
Publication of WO2015145676A1 publication Critical patent/WO2015145676A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport

Definitions

  • the present invention relates to a technique for measuring the performance of an IT system.
  • a method based on baseline analysis is often used as a method for detecting a sign of a performance failure in an IT (Information Technology) system.
  • a program for measuring the performance of the IT system is installed in the IT system, and the result measured by the probe is compared with the measurement result (baseline) at normal time.
  • the measurement result baseline
  • a predetermined threshold value it can be determined whether or not the measurement result greatly deviates from the baseline, or whether or not the measurement result deviates from the baseline.
  • Performance failure when the measurement result of the probe greatly deviates from the baseline, or when the frequency of occurrence of the probe measurement result temporarily showing an outlier that is out of the baseline (hereinafter referred to as spike) increases. It is determined that there is a sign of
  • Patent Document 1 describes that a DPI device that performs measurement is arranged at a port of a switch through which a path with a high probability of failure or a path with concentrated flows passes.
  • the time interval of measurement by the probe tends to be narrow so that the spike can be detected without missing it. This is because, for example, if spikes that can be detected by finely measuring at intervals of seconds are measured at intervals of minutes, the measurement results are statistically averaged and the spikes cannot be seen.
  • the amount of resources such as CPU (Central Processing Unit) and memory to be used for the measurement or the place where the measurement result is stored increases. Means.
  • the cause of the spike may not be identified (isolated). For example, even if a response time to an I / O (Input / Output) request to the storage is measured and a spike whose response time temporarily increases can be detected, the cause of the spike is due to a single phenomenon. It may not be possible to isolate which resource in the storage is due to tightness.
  • I / O Input / Output
  • This method can identify the resource that caused the spike with high accuracy.
  • it is necessary to detect more spikes. This requires placing a larger number of probes.
  • probes that measure at fine time intervals that can detect fine spikes at most parts of the server connected to the storage, on the order of hundreds to thousands. It is none other than placement.
  • An object of the present invention is to provide a technique for determining a decrease in performance of an IT system at a low cost.
  • a management computer is a management computer that manages an information processing system that executes information processing using a path in which a plurality of resource elements are connected, and passes through a monitoring resource element that is a resource element to be monitored.
  • the number of paths that pass is a monitoring resource element that has not reached the minimum number of paths.
  • the path that selects the path that passes the most uncovered monitoring resource elements is selected from the paths that can be monitored, and the selected path is set as a monitoring path that is a monitoring target path.
  • a probe management means for setting a probe for monitoring as a monitoring probe to be monitored, and the monitoring probe And a statistical processing means for determining the monitoring resource element that caused the performance degradation by the separation based on the co-occurrence pattern based on the monitoring result of the monitoring probe.
  • the present invention makes it possible to isolate resources that cause performance problems at the lowest possible cost.
  • FIG. 1 is a system configuration diagram of an information processing system according to a first embodiment. It is a figure for demonstrating a path
  • 3 is a diagram illustrating an example of a resource configuration table 30.
  • FIG. 4 is a diagram showing an example of a probe configuration management table 40.
  • FIG. It is a figure which shows an example of the monitoring request
  • FIG. 15 is a supplementary explanatory diagram of the flowcharts of FIGS. It is a flowchart of the probe reselection process with respect to the change of the path configuration.
  • FIG. 17 is a supplementary explanatory diagram of the flowchart of FIG. 16. It is a flowchart of the probe selection process at the time of monitoring object addition. It is a figure which shows a resource selection screen. It is a figure which shows the presentation screen of a probe selection plan. It is a figure which shows the monitoring result screen.
  • FIG. 1 is a system configuration diagram of an information processing system according to the first embodiment.
  • the management computer 1, server 19, and storage 12 are connected via a LAN 11.
  • the management computer 1 collects information from the server 19 and the storage 12 via the LAN 11 and transmits operation instructions to the server 19 and the storage 12 via the LAN 11.
  • the server 19 and the storage 12 are also connected by a SAN (Storage Area Network) 18.
  • the server 19 transmits an I / O processing request to the storage 12 via the SAN 18, and the storage 12 processes the I / O request and returns a response to the server 19.
  • SAN Storage Area Network
  • the management computer 1 includes a CPU 2, a memory 3, a computer storage 4, a display I / F 8, and an NW I / F 9.
  • the display I / F 8 is connected to the display device 10, and the NW I / F 9 is connected to the LAN 11.
  • the computer storage 4 stores a probe management program 5, a collection program 6, and a statistical processing program 7. These programs are read into the memory 3 at the time of startup and executed by the CPU 2.
  • the memory 3 stores a resource configuration table 30, a probe configuration table 40, a monitoring request table 50, a co-occurrence condition table 60, a spike history table 70, and a resource performance history table 80.
  • the resource configuration table 30 stores configuration information of infrastructure resources that are managed by the management computer 1.
  • the server 19 includes a CPU 20, a memory 21, a display I / F 24, an in-computer storage 25, an HBA (Host Bus Adapter) 26, and an NW I / F 27.
  • the NW I / F 27 is connected to the LAN 11 and the HBA 26 is connected to the SAN 18.
  • the application 21 and the probe 23 are stored in the memory 21. These are programs, which are read into the CPU 20 and executed.
  • the VOL 28 is a logical device (not shown) created by the storage 12. From the server 19, this logical device is recognized as a disk area.
  • the application 22 requests the VOL 28 to read / write data. At this time, an I / O request is issued to the storage 12 having the data entity of the VOL 28. An I / O request issued from the application 22 is delivered from the HBA 26 to the storage 12 through a path 18. The storage 12 processes the I / O request, and returns the result to the application 22 following the previous path in reverse.
  • Probe 23 measures the processing of this I / O request. For example, the probe measures the response time from when an I / O request is issued to the storage 12 until the result returns. The probe 23 also measures the number of I / O requests (IOPS) processed per second. The probe 23 analyzes the data measured in this way by the above-described baseline monitoring technique, and detects spikes (measured values that deviate significantly from the average measured values). The detected spikes are collected by the collection program 6 and stored in the spike history table 70.
  • IOPS I / O requests
  • the probe 23 has a function of changing the measurement interval. That is, the probe 23 can perform the measurement described above at a relatively short time interval of the order of seconds, or can be performed at a relatively long time interval of the order of minutes. It is also possible to simultaneously measure the second order and the minute order, stop the second order measurement, and perform only the minute order measurement. Such a change in the measurement interval can be performed by the determination of the probe 23 itself, or can be changed by the determination of an external program, for example, the probe management program 5 of the management computer 1.
  • the probe 23 can be mounted as, for example, an OS (not shown) driver that directly measures individual I / O requests, or the OS measures I / O requests. It can also be implemented as a program that periodically collects statistical information on results.
  • the storage 12 includes a SW (switch) 13, a Port 14, a Processor 15, a Pool 16, and a Cache 17. Port14, Processor15, Pool16, and Cache17 are mutually connected via SW13. Port 14 is connected to SAN 18.
  • Pool 16 is a storage area composed of a plurality of disks and stores data.
  • One Pool 16 may be composed of a plurality of types of media (SSD, SAS HDD).
  • An I / O request issued from the application 22 to the VOL 27 is processed by the processor 15 through the port 14. For example, when the I / O request is data reading, the processor 15 checks whether there is requested data on the cache 17, and if not, acquires the data from the Pool 16 storing the data and returns the data to the application 22.
  • FIG. 2 is a diagram for explaining the path configuration.
  • FIG. 2A is an explanatory diagram of the path.
  • the path is obtained by artificially connecting resources (Port 14, Processor 15, Pool 16, and Cache 17) in the storage 12 used by the VOL 28. Each of these resources is referred to as a resource element.
  • I / O requests to the VOL 28 are processed using resource elements on this path.
  • the probe 23 monitors an I / O request. This is described as monitoring the path.
  • the response time measurement described above is an example of path monitoring.
  • a path monitored by the probe is referred to as a monitoring path.
  • FIG. 2B is an explanatory diagram of a path group.
  • a path group is a group composed of paths having a common part of resource elements.
  • FIG. 2B illustrates a path from VOL1 (path 1) and a path from VOL2 (path 2).
  • Path 1 and path 2 are common except for Port, and belong to one path group.
  • a common part of this path is referred to as a common part or a common path.
  • a portion that is not a common portion is described as a single portion (Port corresponds to a single portion in FIG. 2B).
  • FIG. 2C is an explanatory diagram of a single intersection.
  • FIG. 2B there are two VOLs 28, and a path 1 and a path 2 are output from each. These paths intersect at the processor.
  • the processor is described as a single intersection.
  • pass 1 and pass 2 have a single intersection.
  • a path passing through the resource element is referred to as a passing path, and the number (the number of monitoring paths) is referred to as a passing path number.
  • the number of passing paths of the processor is 2, and the other resource elements are 1. Also, if the number of passing paths exceeds the specified value, the resource is covered.
  • FIG. 3 is a diagram for explaining a schematic operation of the information processing system according to the present embodiment.
  • the collection program 6 collects the configuration information of the resources inside the storage from the storage 12 and stores them in the resource configuration table 30.
  • the collection program 6 collects the configuration information of the server 19 and the configuration information of the probe 23 and the VOL 28 from the probe 23 of the server 19 and stores them in the probe configuration table 40.
  • the probe management program 5 displays a screen for accepting a monitoring request from the administrator on the display device 10 via the display I / F 8 and stores the input from the administrator in the monitoring request table 50.
  • the probe management program 5 refers to the contents stored in the resource configuration table 30 and the probe configuration table 40, determines which probe 23 is to be measured in seconds, and probes. 23 is instructed to measure. At the same time, the probe management program 5 creates a rule (co-occurrence condition) for identifying which resource in the storage 12 is the cause of the spike from the combination of spikes measured by the probe 23 and stores it in the co-occurrence condition table 60. .
  • the probe 23 instructed to monitor finely measures the I / O request and detects a spike.
  • the collection program 6 collects spike records detected from the probe 23 and stores them in the spike history table 70. Further, the collection program 6 collects the performance information of each resource (measured in the minute order) from the storage 12 and stores it in the resource performance history table 80.
  • the statistical processing program 7 analyzes the spike information stored in the spike history table 70 in accordance with the co-occurrence conditions stored in the co-occurrence condition table 60, identifies the resource causing the spike, and determines the result. Record in the spike history table 70.
  • the statistical processing program 7 displays the result on the display device 10 and presents it to the administrator.
  • FIG. 4 is a diagram illustrating an example of the resource configuration table 30.
  • the resource configuration table 30 stores resource configuration information in the storage 12. That is, for each storage 12 (uniquely identified by the storage ID 31), internal resources (uniquely identified by the resource ID 33) are grouped and stored for each resource type 32.
  • the resource type 32 includes, for example, Port, Processor, Cache, Pool, and Port. 4, the resource ID 33 of Port 14 is PT1, PT2..., The resource ID 33 of Processor 15 is PR1, PR2..., The resource ID 33 of Cache 17 is CA1, CA2..., And the resource ID 33 of Pool 16 is PO1, PO2. It is written as ... This notation is used in the following description.
  • FIG. 5 is a diagram showing an example of the probe configuration management table 40.
  • the probe configuration table 40 stores the monitoring contents of each probe 23. In other words, which probe 23 (identified by probe ID 41) operates on which server 19 (identified by server ID 43) and which VOL 28 (identified by VOL ID) is monitored (monitored in the order of seconds)
  • the flag 42 stores “Y”).
  • the probe configuration table 40 stores information on the path 46 (identified by the path ID 44), that is, resources in the storage 12 used by each VOL 28.
  • the resource ID 48 of the resource configuring each path 46 (the same identifier as the resource ID of the resource configuration table 30) and the resource type 47 are stored.
  • a path group ID 49 is stored as attached information of each path 46.
  • the path group ID 49 indicates information for distinguishing the single part / common part of the path 46 and the ID of the path group to which the common part belongs.
  • one path 46 includes both null and a value other than null (0 in the figure) as shown in the rows (401a, 401b) in FIG. 5, a single resource is null.
  • a non-null resource belongs to the common part, and the same path belongs to the 0th path group.
  • FIG. 6 is a diagram illustrating an example of the monitoring request table 50.
  • the monitoring request table 50 includes an administrator request content (identified by a request ID 51), a monitoring designated device (identified by a device ID 52. Server ID 43 or storage ID 31), a monitoring designated resource (identified by a resource ID 53), and The minimum number of paths 54 is stored.
  • the minimum number of paths 54 is the minimum number of monitoring paths that pass through the same resource. is there.
  • the minimum number of paths 54 determines the certainty of monitoring. If a large number of paths are set, the certainty of identifying the causative resource increases, and if it is set small, the certainty decreases. That is, as the number of paths for monitoring one resource increases, the certainty of the cause resource separation increases.
  • FIG. 7 is a diagram illustrating an example of the co-occurrence condition table 60.
  • the co-occurrence condition table 60 stores path combination conditions for specifying the cause resource of the measured spike. That is, for each condition (identified by condition ID), a path co-occurrence condition 63, a resource (identified by resource ID 62) that is presumed to be the cause of the spike when the co-occurrence condition is satisfied, and a condition creation time 64 Is stored. For example, “P1 NOT (P2 & P3 & P4 & P5)” is stored in the co-occurrence condition 63 of the row 65a. Here, this means that a spike has occurred in the path P1, but no spike has occurred in the paths P2 to P5. When the manner of occurrence of the spike satisfies this condition, it is estimated that the cause is the tightness of the resource PT1 stored in the resource ID 62.
  • FIG. 8 is a diagram illustrating an example of the spike history table 70.
  • the spike history table 70 stores spikes measured by the probe 23 and resources estimated from the occurrence of the spikes, that is, analysis results. That is, for each spike (identified by spike ID 71), the generation time 72, the generated VOL (identified by VOL ID), and the response time 74 indicating the magnitude of the spike are stored. Also, the table shows the result of analysis of these spikes by the statistical processing program 7, that is, the resource (identified by resource ID 75) estimated to be the cause of the spike, and the co-occurrence condition table used for the estimation. 60 conditions (identified by condition ID 76) are stored.
  • the spike identified by the spike ID 0 indicates that the cause resource of the spike is identified as PT1 because the condition ID 76 matches the condition “1”.
  • the resource ID 75 and the condition ID 76 being “null” indicate that the spike has not been analyzed yet.
  • “unknown cause” may be recorded in the resource ID 75 and the condition ID 76.
  • FIG. 9 is a diagram showing an example of a resource performance history table.
  • the resource performance history table 80 each resource in the storage 12, the access performance to the VOL 28 measured by the probe 23 (measurement record in minute order), and the like are recorded.
  • a resource whose performance is measured (identified by resource ID 81), a measurement time 82, a measured metric 83, and its value 84 are recorded.
  • FIG. 10 is a flowchart of the probe selection process.
  • FIG. 11 is a supplementary explanatory diagram of the flowchart of FIG.
  • a probe selection process that covers several resource elements (for example, the monitoring request shown in 55b of FIG. 6) specified by the administrator will be described.
  • Processors 1 and 2 hatchched resource elements; hereinafter referred to as PR1 and PR2) are designated as monitoring targets. Note that the following processing entities are all the probe management program 5.
  • the overlap number is the number of probe groups that include the probe. For example, in FIG. 11, the probe 2 (multiple number 2) included in both PG1 and PG2 is selected.
  • S7 Among the paths acquired in S6, only the selected monitoring path and the path that is a single intersection in the specified resource element (PR1 or PR2) are left. Note that, in particular, a user or the like has previously selected a path to be monitored, and there is usually a monitoring path already selected at the start of processing.
  • the path remaining in S7 is set as a monitoring path.
  • the monitoring flag 42 in the row of the same path stored in the probe configuration table 40 is changed to Y.
  • the number of passing paths of the specified resource element is updated.
  • the probe management program 5 instructs the probe selected to monitor the path to monitor at a fine time interval (second order).
  • the probe management program 5 updates the co-occurrence condition table 60 with the selected path. For example, in FIG. 11, when the monitoring path of Processor 2 is set to path 2 and path 3, a line is added to the co-occurrence condition table 60, the resource ID of the same line is Processor 2, and the co-occurrence condition 63 is “path 2 & path 3”. And That is, PR2 is added to the co-occurrence condition table 60 of FIG. 7, and the co-occurrence condition is P2 & P3. Thereafter, the processes from S3 to S8 are repeated until the condition of S3 is satisfied.
  • probe selection processing when the administrator designates monitoring of all resources in the storage 12 will be described.
  • FIG. 12 is an overall flowchart of the probe selection process.
  • FIG. 13 is a detailed flowchart of step S13.
  • FIG. 14 is a detailed flowchart of step S23.
  • FIG. 15 is a supplementary explanatory diagram of the flowcharts of FIGS.
  • the number of resources in the storage 12 is aggregated for each resource type (Port, Processor, etc.), and the monitoring path and its probe 23 are ordered in the order from the resource type with the largest number of elements to the resource type with the smallest number. And the probe configuration table 40 and the co-occurrence condition table 60 are updated.
  • resource type Port, Processor, etc.
  • FIG. 12 will be described.
  • the monitoring path selected in S13 is recorded in the probe management table 40. That is, the monitoring flag 42 of the entry selected for the monitoring path is updated to Y.
  • the probes 23 having the monitoring path selected in S13 are identified with reference to the probe management table 40, and monitoring at a fine monitoring interval is instructed to those probes.
  • FIG. 13 will be described with reference to the supplementary explanatory diagram of FIG.
  • (S20) Referring to the resource configuration table 30, the internal resource of the storage 12 designated by the administrator is acquired. Next, the probe management table 40 is referred to and a path passing through those resources is acquired. Next, a common path of these paths is specified, and a path group is created. At this time, a path group is created so that the resource type selected in S12 is a single part and the other resource types are a common part.
  • Port is the resource type selected in S12.
  • the path 1 is composed of resource elements whose numbers are 1, 1, 1, 1 in the order of Port, Processor, Cache, Pool.
  • the path 2 is composed of resource elements having numbers 2, 1, 1, 1.
  • Path 1 and path 2 pass through the same resource element with a resource type other than Port (common path 1). Therefore, path 1 and path 2 belong to the same path group.
  • the path 3 and the path 4 belong to the same path group different from the path groups of the path 1 and the path 2.
  • the common paths created in S20 are excluded if the number of passing paths is less than the threshold.
  • the threshold value is a numerical value designated by the minimum number of paths 54 in the monitoring request table 50.
  • the threshold value is a numerical value designated by the minimum number of paths 54 in the monitoring request table 50.
  • the number of passing paths is 2, respectively. This value is compared with the numerical value designated by the minimum number of paths 54.
  • common paths with a small number of passing paths are excluded.
  • the number of passing paths increases, the number of data used for determining the co-occurrence condition increases, so the accuracy of the co-occurrence determination increases. Also, it is strong against a decrease in the number of passing paths when the path is changed by the configuration change (this will be described later).
  • the numerical value specified by the minimum number of paths 54 may not be used as it is for the threshold value.
  • a value obtained by multiplying the minimum number of paths 54 by a constant coefficient, for example, 2 may be used as the threshold value.
  • the processing from S21 to S23 may be looped while the coefficient is gradually decreased.
  • FIG. 14 shows processing for selecting a monitoring path from paths belonging to a path group.
  • a path belonging to a path group is added according to a condition, and a path with many points is selected so that the number of paths constituting the path group is equal to or greater than a threshold value.
  • the monitoring path can be configured with a favorable path.
  • a complete duplicate path is added. That is, a path having a plurality of paths in which passing resource elements completely overlap is added.
  • the path 1 of the probe 1 passes through the resource elements numbered 1, 1, 1, 1 in the order of Port, Processor, Cache, and Pool.
  • the path 2 of the probe 2 the resource elements that pass through completely overlap with the path 1.
  • the path 1 and the path 2 are completely overlapping paths, and the overlapping number is 2.
  • the number of points to be added may be a fixed value or may be proportional to the overlap number. The reason for adding points to the completely duplicated path is to make it possible to easily prepare an alternative path when the path configuration is changed.
  • a probe having a large number of paths belonging to any of the common paths is identified, and points are added to those paths.
  • the probe 3 has a path 3 belonging to the common path 1 and a path 4 belonging to the common path 2.
  • path 3 and path 4 are added.
  • the number of points to be added may be proportional to the number of paths that one probe has.
  • the number of points may be proportional to two of the path 3 and the path 4. This makes it possible to select a path so that the number of probes is reduced as much as possible.
  • (S33) Select the target path group for determining the monitoring path.
  • the path group having the smallest (number of path candidates ⁇ number of passing paths) is selected.
  • the number of path candidates refers to the number of paths that are not selected as monitoring paths of other path groups among the paths constituting the path group.
  • the number of passing paths refers to the number of paths selected as monitoring paths among the paths constituting the path group.
  • the common path 2 is composed of a path 4, a path 5, and a path 6 having a processor, cache, and pool as common parts. Paths 4 to 6 pass through Ports 3, 4, and 5, respectively. Further, the path 7 belonging to the common path 3 passes through Port 5 in the same manner as the path 6.
  • the number of path candidates for the common path 2 is 2, which is obtained by subtracting one of the paths 7 from three of the paths 4 to 6.
  • the path 4 has already been selected as the monitoring path for the common path 2
  • (the number of path candidates ⁇ the number of passing paths) of the common path 2 will be 1 at 2-1.
  • (S35) It is determined whether the number of passing paths of all path groups is greater than or equal to the threshold.
  • the number of passing paths is the number of paths selected as monitoring paths among the paths constituting the path group.
  • the threshold is the minimum number of paths 54 in the monitoring request table 50. If all the path groups satisfy this condition, the process proceeds to S36, and if not, the process returns to S33.
  • S36 This step is performed when the conditions shown in S35 are satisfied, but all the resource elements belonging to a single part have not yet been covered. For example, in FIG. 15C, it is assumed that the path 4 and the path 6 are selected as the monitoring paths among the paths 4, 5, and 6 belonging to the common path 2, and the path 5 is not selected. At this time, Port 4 remains uncovered. In this step, first, in this way, uncovered resource elements belonging to a single part are specified. Next, the path having the highest score is selected from the paths passing through the resource element, and is set as the monitoring path of the path group to which the path belongs.
  • the resource to be monitored is not determined by allocating the path to the path group by selecting the monitoring path so that the number of monitoring paths passing through the monitored resource element is equal to or greater than the threshold.
  • the path configuration may change due to the operation of the information processing system. For example, in order to reduce the load on a Pool whose performance is tight, a VOL that places data on the Pool may be moved to another Pool at the administrator's discretion. At this time, the Pool through which this VOL path passes changes. That is, the path configuration changes.
  • FIG. 16 is a flowchart of probe reselection processing for a path configuration change.
  • the processing subject of this flow is the probe management program 5.
  • FIG. 17 is a supplementary explanatory diagram of the flowchart of FIG. 16.
  • the monitoring path configuration change is received.
  • the collection program 6 periodically collects configuration information and extracts the difference.
  • the probe management program 5 receives this difference, refers to the monitoring flag 42 in the probe configuration table 40, and determines whether the changed path is a monitoring path.
  • the path 1 passing through the resource element r1 before the configuration change is changed to the path 2 passing through the resource element r2 belonging to the same resource type as the resource element r2.
  • FIG. 17A shows a configuration change when the unit is a single unit
  • FIG. 17B shows a configuration change when the unit is a common unit.
  • the co-occurrence condition 63 in the co-occurrence condition table 60 is updated. Specifically, path 1 is excluded from the conditions based on the path group to which path 1 belongs. For example, it is assumed that the path group is composed of five paths 1 to 5. At this time, if the co-occurrence condition 63 includes a line including the condition based on this path group (path 1 & path 2 & path 3 & path 4 & path 5), path 1 is excluded from this line (path 2 & path 3 & path 4 & Update to pass 5).
  • the path is transferred from a path group with a sufficient number of passing paths, and the number of passing paths is recovered to a threshold value or more. This will be described with reference to FIG. It is assumed that the path 1 that has passed through the common part Processor 1 is changed to the path 2. At this time, it is assumed that the number of passing paths of the common path 1 decreases by 1 and falls below the threshold. Therefore, a single part (Port-n) through which the path (path m) belonging to the common path 1 passes is specified, and the path group (common path n) to which the path (path n) covering the single part belongs is specified. Is identified. Among such path groups, a path (path n) belonging to a path group having the largest number of passing paths, that is, having the largest difference from the threshold value is set as a new member of the common path 1.
  • the administrator may add resources to be monitored. Next, a process of additionally selecting a monitoring path and the probe 23 when this monitoring target resource is newly added will be described.
  • FIG. 18 is a flowchart of probe selection processing when a monitoring target is added.
  • the subject of this processing is the probe management program 5.
  • the probe management program 5 receives information on the newly designated resource element, checks whether there is a co-occurrence condition corresponding to the resource element, and if there is a co-occurrence condition, monitors the path constituting the co-occurrence condition To the probe 23, if not, a path and a probe for monitoring the resource element are newly selected.
  • the monitoring path and probe 23 of the resource element for which monitoring is specified are selected.
  • This selection method may be a method of selecting individual resource elements as starting points (shown in FIG. 9), or monitoring paths and probes 23 for all resource elements in the storage 12 including the specified resource elements. The method of selection (shown in FIG. 12) may be used.
  • FIG. 19 is a diagram showing a resource selection screen. On this screen, the administrator designates a resource that requires monitoring at a fine time interval. In FIG. 19, it is assumed that the storage 1 has already been selected from the plurality of storages 12 managed by the management computer 1.
  • the resource selection screen includes a server list 190 and a resource list 191.
  • server information 192 related to the storage 1 is displayed.
  • the administrator designates a server for which detailed monitoring is required with a check box at the left end.
  • the administrator can also select all servers by checking the all selection check box 193. Also, the administrator changes the value of the minimum number of paths 194 as appropriate.
  • each line of the resource list 191 resources in the storage 1 are displayed in each line. Similar to the server list 190, the administrator can select a resource for which detailed monitoring is required in this list. Further, the minimum number of paths 197 can be set for each resource element, and the administrator can set and change the value of the minimum number of paths 197 for which detailed monitoring is required.
  • the input content is sent to the probe management program 5.
  • the probe management program 5 stores the input content in the monitoring request table 50. Thereafter, the probe management program 5 starts probe selection calculation.
  • FIG. 20 is a diagram showing a probe selection proposal presentation screen.
  • This screen is a screen for presenting the probe selection result calculated by the probe management program 5 so as to satisfy the monitoring request of the administrator.
  • This screen includes a probe summary, a cover resource summary 202, and a resource-specific monitoring path configuration 203.
  • the number of probes 200 required for monitoring and the number 201 of monitoring paths are displayed.
  • the number 201 of monitoring paths required for monitoring is obtained by referring to the probe management table 40 and totaling the monitoring paths that pass through the resources of the storage 1.
  • the number of probes 200 required for monitoring is obtained by counting the number of probes having those monitoring paths.
  • cover resource summary 202 the number of resources for each resource type of the storage 1 and the number of resources covered by the current probe selection (the number of cover resources) are presented.
  • the monitoring path configuration corresponding to each monitoring target resource is displayed in each row.
  • Each row displays a co-occurrence condition corresponding to the monitoring target resource, a monitoring path or a monitoring path group constituting the co-occurrence condition, and the number of paths (the number of passing paths).
  • Such information can be acquired from the co-occurrence condition table 60. Further, as supplementary information, IOPS measurement data representing the flow rate of these paths is displayed together.
  • the administrator presses the OK button 204 and approves the selection plan. If there is a problem, the Cancel button 205 is pressed to return to the resource selection screen of FIG.
  • FIG. 21 is a diagram showing a monitoring result screen.
  • the management computer 1 totals and statistically processes the spikes measured by the probe 23, and the result of extracting spikes caused by the resources designated for monitoring is displayed.
  • the administrator reads a spike increase tendency of a resource from the displayed spike history, and determines that a sign of performance failure appears in the resource.
  • Port 1 of the storage 1 has already been selected.
  • the monitoring result screen is composed of spike statistics 210 and spike history 211.
  • the spike statistics 210 displays spike statistical information for one week related to the currently selected resource (Port 1 in FIG. 21). In the spike statistics 210, (a) the number of spikes measured, (b) the number of spikes attributed to other resources, (c) the number of spikes attributed to Port1, and the previous week ratio of (c) are displayed.
  • the value of (a) is obtained as follows. That is, the statistical processing program 7 totals the number of rows corresponding to the VOL related to pass 1, that is, the number of spikes measured in pass 1 from the spike history table 70. This value is displayed in (a).
  • the value of (b) is obtained by aggregating the number of rows whose cause resource ID 75 is other than Port1, that is, other resources that are not Port1, among the rows obtained by the calculation of (a).
  • the value of (c) is obtained by calculation formulas (a)-(b). From these values, in particular, (c) the number of spikes caused by Port1 and the ratio of the previous week, the administrator can read the increase in spikes caused by Port1, and determine that the performance of Port1 is tight. it can.
  • the spike history 211 is a graph showing the numerical values shown in the spike statistics.
  • the I / O response time measured in pass 1 is recorded.
  • a total of 6 spikes were measured (spike 212 due to Port 1).
  • spikes indicated by dotted lines indicate spikes 213a and 213b caused by other resources. As a result, the administrator can easily read from the graph that these spikes are not caused by Port1.
  • the resource and the resource element have been described as one-to-one, but this is not necessarily required.
  • a plurality of resources may be handled as one resource element, or one resource may be divided into a plurality of resource elements. Below, such a modification is described.
  • FIG. 22 is a diagram showing a resource configuration table 30 in the modification.
  • FIG. 22 is different from the resource configuration table 30 shown in FIG. 4 in that information about attributes of resources (attribute 34 and attribute value 35) is added.
  • a row 36a in FIG. 22 is information on PT1 which is a Port resource.
  • the row 36a records that PT1 belongs to the trunk TR1, which is a group of Ports, as attribute information of PT1. For Ports belonging to the same trunk, traffic is automatically distributed according to the load. In this case, a trunk that is a set of a plurality of resources (Ports) can be handled as one resource element.
  • the row 30b is information on PR1, which is a processor resource, and indicates that PR1 belongs to a processor group called PRG1.
  • PR1 is a processor resource
  • PRG1 a processor group called PRG1.
  • processing is automatically distributed among the processors belonging to the processor group in accordance with the processor load, similarly to the previous trunk. At this time, as in the case of the previous trunk, the processor group can be handled as one resource element.
  • the row 36c stores information on PO1, which is a Pool resource.
  • PO1 is composed of a plurality of media (SSD and SAS) having different processing speeds.
  • performance characteristics such as response time vary greatly depending on the storage medium of data requested by the I / O to be processed.
  • Such a Pool can be regarded as having a plurality of resource elements having different performance characteristics for each medium. Therefore, PO1 composed of SSD and SAS may be divided into SSD resource elements and SAS resource elements.
  • FIG. 23 is a flowchart of resource set / division processing.
  • FIG. 23 shows a process in which the probe management program 5 refers to the resource configuration table 30 to combine a plurality of resources into one resource element or divide one resource into a plurality of resource elements.
  • the resource belonging to the same group is set as one resource element.
  • the load-distributed group corresponds to a Port trunk or a processor group.
  • resource attribute information indicates that the resource is composed of several resources having different performance characteristics
  • the resource is divided into resource elements for each performance characteristic.
  • the number of paths that pass through the resource element (monitoring resource element) to be monitored is the minimum number of paths that pass until the number of paths that pass through the predetermined minimum number of paths is exceeded. Since the probe that monitors the path that passes the largest number of monitor resource elements (uncovered monitor resource elements) that has not reached the number is selected from the paths that can be monitored, select as many monitor resource elements as possible. While monitoring with the probe, the monitoring path is selected until the minimum number of paths for isolating the monitoring resource element that caused the performance degradation is secured, and the measurement of the system performance is realized at a low cost.
  • a predetermined rule for path selection such as selecting a path according to a predetermined rule from paths that can be monitored by a probe that monitors the path that passes through the most uncovered monitoring resource elements. Therefore, it is possible to set a more preferable monitoring path.
  • a path used for processing with a large amount of processing is prioritized, such as adding a flow rate per path, for example, a path with a large I / O amount per second (IOPS), performance degradation such as spikes is reduced.
  • IOPS I / O amount per second
  • the path is selected so that the total number of the monitoring resource elements through which the selected path passes for each monitoring resource element is equal to or greater than a predetermined value, the monitoring resource elements included in the monitored path are increased. By selecting the path, it becomes easy to separate the managed resource elements.
  • the paths are selected so that the total number of monitor resource elements that the selected path passes through for each monitor resource element is equal, the ease of carving for each managed resource element varies. Disappear.
  • a path to be monitored is easily prepared instead of that path. can do.
  • the monitoring resource element that actually passes through the monitoring path is changed, if there is a completely duplicated path, it is used as a monitoring path instead of the monitoring path. And can be switched easily.
  • the monitoring resource element that the monitoring path passes is changed, if there is no complete duplicate path of the monitoring path, other monitoring resources are selected from the paths that pass through the monitoring resource element included in the monitoring path.
  • the paths that pass the same monitoring path as the group of monitoring paths that pass the path that passes the same monitoring path as the group with the smallest number of monitoring paths included in the group is used as the monitoring path. At the same time, it is possible to improve the distinguishability of other monitoring resource elements.
  • the management computer 1 in the first embodiment is intended to obtain a minimum probe necessary for specifying the cause when the access performance of the storage system deteriorates.
  • the management computer 1 in the second embodiment is obtained by replacing the target from a storage system with an application.
  • the application is also composed of several program processes. These program processes correspond to resources in the first embodiment.
  • the program process is, for example, a process in a program module or a database table (or access process to the database table).
  • Application provides some service to application users.
  • the service is a service that returns a Web page that matches a specific keyword to the user.
  • the user designates a service and sends a processing request (service request) to the application.
  • the application executes the request and returns the result to the user.
  • the probe 23 measures the response time of the I / O request from the server to the storage system, whereas the probe 23 in the second embodiment is used until the application returns a service request to the user. Measure the response time.
  • the service in the second embodiment corresponds to the path in the first embodiment.
  • a path is composed of a series of resources that process I / O requests sent to the storage system through the path.
  • the service corresponding to the path in the second embodiment is configured by a series of application program processing for processing a service request from the user.
  • the second embodiment is similar to the first embodiment, and the monitoring target is changed from a storage system to an application.
  • a resource corresponds to a program process
  • a path corresponds to a service
  • the response time monitored by the probe 23 is a service response time.
  • FIG. 24 is a system configuration diagram according to the second embodiment. As can be seen at a glance, most of the parts in FIG. 24 overlap with those of the first embodiment shown in FIG. Therefore, here, a description will be given mainly of the difference.
  • the application to be monitored in this embodiment includes the IP switch 102, the Web server 103, and the database server 106. These are connected to the management computer 1 via the LAN 11. These are connected by a business LAN 101 of a different system from the LAN 11 and can communicate with each other.
  • the Web server 103 and the database server 106 are ordinary computers provided with permanent storage devices such as a CPU, memory, and HDD.
  • a Web program 104 that is a part of an application operates on the Web server 103.
  • the Web server 103 includes a large number of program modules 105 that constitute the Web program 104.
  • the database server 106 operates a database program 107 that is also a part of the application.
  • the database server 106 also has a database table 108 in which application data is stored.
  • the service request sent from the application user first enters the IP switch 102 through the business LAN 101.
  • the IP switch 102 sends it to the Web server 103.
  • the Web program 104 receives the service request, reads the program module 104 related to the service request, and executes predetermined processing. If the data possessed by the application is necessary for the predetermined processing, the Web program 104 further transmits a service request to the database server 105.
  • the database program 107 receives this, executes predetermined data processing on the database table 108 related to the service request, and returns the result to the requesting Web program 104.
  • the Web program 104 further executes predetermined processing and returns the result to the user via the IP switch 102.
  • the service monitoring server 100 is a computer that includes a storage device such as a CPU, a memory, and an HDD and executes a program.
  • a probe 23 which is a kind of program operates.
  • the service monitoring server 100 is connected to the IP switch 102.
  • the IP switch 102 copies a service request packet passing through the business LAN 101 and an application response packet to the service request packet, and transmits the duplicated packet to the service monitoring server 100.
  • the probe 23 calculates and records the response time for each service from the time difference between these service request / response packets.
  • the probe 23 calculates the response time of the service and monitors the value. When a spike is detected in the response time, the service request in which the spike occurred, the detected time, and the response time are recorded. The recorded contents are periodically collected by the collection program 6 operating on the management computer 1 and stored in the spike history table 70.
  • the process of calculating the response time by collating the duplicated packet is a high-load process that consumes a large amount of CPU and memory. Therefore, the probe 23 has a function of limiting the service for which the response time is calculated. Thereby, the consumption of CPU and memory can be reduced.
  • the management computer 1 instructs the probe 23 to select a target service.
  • FIG. 25 is a diagram showing the contents of the resource configuration table 30 in the second embodiment.
  • the resource configuration table 30 in the first embodiment stores the resource configuration inside the storage.
  • program processing corresponding to resources operating on the servers is stored in the resource configuration table 30.
  • the resource stores a unique identifier (resource ID 33) for each resource type 32.
  • FIG. 25 shows that there are resources PM1, PM2, PM3...
  • Whose resource type is a program module on a server called Web-Sv1 that constitutes an application.
  • FIG. 26 is a diagram showing a probe configuration table 40 in the second embodiment.
  • the probe configuration table 40 stores the following configuration information of the probe 23, configuration information of the probe 23, and monitoring information of the probe 23.
  • the configuration information of the probe 23 includes the identifier of the probe 23 (probe ID 41) and the service monitoring server 100 (server ID 43) on which the probe 23 is operating.
  • the service configuration information includes a service identifier 410, a resource such as a program module used by the service (service 46, resource type 47, resource ID, path group ID 49), service URL 411 (when the application is a Web application), and the like. Is included.
  • the monitoring information of the probe 23 includes the presence / absence of service monitoring by the probe 23 (monitoring flag 42).
  • the service configuration information may be input manually by the administrator based on the application design information, or the collection program 6 may collect / analyze and input traces and logs output by the application during application execution. Good.
  • the probe management program 5 selects the minimum services necessary to identify the performance degradation of the resources (program modules and database tables) and limits the response time to those services.
  • the probe 23 is instructed to monitor.
  • the method for selecting the minimum service is the same as the method for selecting the minimum probe in the first embodiment. Therefore, the processing flow described in the first embodiment can be applied as it is.
  • a term such as “path” may be replaced with the corresponding term in the second embodiment.
  • SYMBOLS 1 Management computer, 10 ... Display apparatus, 100 ... Service monitoring server, 101 ... Business LAN, 102 ... IP switch, 103 ... Web server, 104 ... Web program, 104 ... Program module, 105 ... Database server, program module, 106 ... Database server, 107 ... Database program, 108 ... Database table, 11 ... LAN, 12 ... Storage, 13 ... SW, 14 ... Port, 15 ... Processor, 16 ... Pool, 17 ... Cache, 18 ... SAN, 19 ... Server, 2 ... CPU, 20 ... CPU, 21 ... memory, 25 ... computer storage, 26 ... HBA, 30 ... resource configuration table, 4 ... computer storage

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

 A management computer for managing an information processing system that executes information processing by a path composed of a row of multiple stages of resource elements has: a probe management means for selecting a path from among paths that can be supervised by a probe, among probes supervising a path passing through a supervisory resource element that is a resource element to be supervised so that the number of paths passing through the supervisory resource element is greater than or equal to a prescribed minimum number of paths, that supervises a path passing through the greatest number of uncovered supervisory resource elements that are a supervisory resource element in which the number of paths that pass through does not reach the minimum number of paths, making the selected path a supervisory path that is a path to be supervised, and setting a probe supervising the supervisory path as a supervisory probe that is a probe to be supervised; a collection means for collecting the results of supervision by the supervisory probe; and a statistical processing means for determining a supervisory resource element that caused a performance degradation by a cut and divide process according to a co-occurrence pattern based on the supervision result of the supervisory probe.

Description

監視計算機および監視方法Monitoring computer and monitoring method
 本発明は、ITシステムの性能を計測する技術に関する。 The present invention relates to a technique for measuring the performance of an IT system.
 IT(Information Technology)システムの性能障害の予兆を検知する方法として、ベースライン分析による手法がよく使われる。この手法では、ITシステムの性能を計測するプログラム(プローブ)をITシステムに実装し、プローブで計測される結果と正常時の計測結果(ベースライン)とを比較する。計測結果とベースラインとの差が所定の閾値を超えたか否かによって、計測結果がベースラインを大きく外れたか否か、あるいは計測結果がベースラインを外れたか否かを判定することができる。プローブによる計測結果がベースラインから大きく外れた場合、あるいは、プローブの計測結果が一時的にベースラインから外れた外れ値を示す状態(以降スパイクと記す)の発生頻度が増加した場合に、性能障害の予兆ありと判定する。 A method based on baseline analysis is often used as a method for detecting a sign of a performance failure in an IT (Information Technology) system. In this method, a program (probe) for measuring the performance of the IT system is installed in the IT system, and the result measured by the probe is compared with the measurement result (baseline) at normal time. Depending on whether or not the difference between the measurement result and the baseline exceeds a predetermined threshold value, it can be determined whether or not the measurement result greatly deviates from the baseline, or whether or not the measurement result deviates from the baseline. Performance failure when the measurement result of the probe greatly deviates from the baseline, or when the frequency of occurrence of the probe measurement result temporarily showing an outlier that is out of the baseline (hereinafter referred to as spike) increases. It is determined that there is a sign of
 計測箇所については、特許文献1に、障害が発生する確率の高いパス、あるいはフローが集中しているパスが通るスイッチのポートに、計測を行うDPI装置を配置することが記載されている。 Regarding the measurement location, Patent Document 1 describes that a DPI device that performs measurement is arranged at a port of a switch through which a path with a high probability of failure or a path with concentrated flows passes.
特開2011-205301号公報JP2011-205301A
 上述したベースライン分析によってシステムの性能障害の予兆を検知する方法では、計測結果がベースラインから外れたスパイクを検出できることが前提条件となる。そのため、スパイクを見逃さずに検出できるようにプローブによる計測の時間間隔が細かくなる傾向にある。これは、例えば、秒オーダーの間隔で細かく計測すれば検出できるスパイクも、分オーダーの間隔で計測した場合、計測結果が統計的に均されてしまいスパイクが見えなくなるためである。 In the method for detecting a sign of system performance failure by the above-described baseline analysis, it is a precondition that a spike whose measurement result is out of the baseline can be detected. Therefore, the time interval of measurement by the probe tends to be narrow so that the spike can be detected without missing it. This is because, for example, if spikes that can be detected by finely measuring at intervals of seconds are measured at intervals of minutes, the measurement results are statistically averaged and the spikes cannot be seen.
 しかしながら、プローブによる計測を細かい時間間隔で実行するということは、それだけ計測にかけるCPU(Central Processing Unit)やメモリなどのリソースの量、あるいは、計測結果を保管する場所などに要するコストが増大することを意味する。 However, if the measurement by the probe is performed at fine time intervals, the amount of resources such as CPU (Central Processing Unit) and memory to be used for the measurement or the place where the measurement result is stored increases. Means.
 また、スパイクを検出できたとしても、そのスパイクの発生原因を特定(切り分け)することができない場合がある。例えば、ストレージへのI/O(Input/Output)要求に対する応答時間を計測し、その応答時間が一時的に増大するスパイクを検出できたとしても、そのスパイクの単独の現象から、その原因が、ストレージ内のどのリソースの逼迫によるものかを切り分けることができない場合がある。 Also, even if a spike can be detected, the cause of the spike may not be identified (isolated). For example, even if a response time to an I / O (Input / Output) request to the storage is measured and a spike whose response time temporarily increases can be detected, the cause of the spike is due to a single phenomenon. It may not be possible to isolate which resource in the storage is due to tightness.
 1つのスパイク単独では、原因の切り分けは難しいが、複数のスパイクの発生パタンを分析することで原因の切り分けが可能になる場合がある。このように、複数のスパイクの発生パタンを分析することで切り分けを行う手法がある。例えば、複数のI/O要求においてスパイクがほぼ同時に発生し、それら複数のI/O要求が、あるストレージ内部のリソースで共通に処理されたものであるとする。その場合、それら複数のI/O要求が、当該ストレージで共通に処理されることが予めに分かっていれば、この共通のリソースが原因でスパイクが起きた可能性が高いという判定が可能である。つまり、スパイクの共起性のパタンから原因を特定することができる。  Although it is difficult to isolate the cause with a single spike, it may be possible to isolate the cause by analyzing the pattern of multiple spikes. As described above, there is a method of performing separation by analyzing a plurality of spike generation patterns. For example, it is assumed that spikes occur in a plurality of I / O requests almost simultaneously, and the plurality of I / O requests are processed in common by resources in a certain storage. In that case, if it is known in advance that these multiple I / O requests are processed in common in the storage, it is possible to determine that there is a high possibility that a spike has occurred due to this common resource. . That is, the cause can be specified from the co-occurrence pattern of spikes.
 この方法は、高い精度でスパイクの原因となったリソースを切り分けられるものであるが、その前提として、より多くのスパイクを検出できる必要がある。これは、より多数のプローブを配置することを要求するものである。すなわち、先に述べたストレージの例では、数百から千のオーダーにのぼる、ストレージに接続されたサーバの大部分の箇所に、細かいスパイクを検出できるような、細かい時間間隔で計測を行うプローブを配置することにほかならない。 This method can identify the resource that caused the spike with high accuracy. However, as a premise, it is necessary to detect more spikes. This requires placing a larger number of probes. In other words, in the storage example described above, probes that measure at fine time intervals that can detect fine spikes at most parts of the server connected to the storage, on the order of hundreds to thousands. It is none other than placement.
 上述のように、細かい時間間隔で計測を行う多数のプローブを配置するには多大なコストが必要になる。その一方で、共起関係でスパイクの原因を特定するには、スパイクを検出できる時間間隔で計測を行う多数のプローブが必要となる。 As described above, a great deal of cost is required to arrange a large number of probes that measure at fine time intervals. On the other hand, in order to identify the cause of the spike in the co-occurrence relationship, a large number of probes that perform measurement at time intervals that can detect the spike are required.
 本発明の目的は、ITシステムの性能低下を少ないコストで判定する技術を提供することである。 An object of the present invention is to provide a technique for determining a decrease in performance of an IT system at a low cost.
 本発明の一態様による管理計算機は、複数段のリソース要素を連ねたパスによって情報処理を実行する情報処理システムを管理する管理計算機であって、監視対象とするリソース要素である監視リソース要素を通過するパスの本数が、所定の最小パス本数以上となるように、当該監視リソース要素を通過するパスを監視するプローブのうち、通過するパスの本数が最小パス本数に達していない監視リソース要素である未カバー監視リソース要素を最も多く通過するパスを監視するプローブが監視できるパスの中から、パスを選択していき、選択されたパスを、監視対象とするパスである監視パスとし、前記監視パスを監視するプローブを、監視対象とするプローブである監視プローブとして設定する、プローブ管理手段と、前記監視プローブによる監視結果を収集する収集手段と、前記監視プローブの監視結果に基づく、共起性パタンによる切り分けにより、性能低下の原因となった監視リソース要素を判定する統計処理手段と、を有している。 A management computer according to an aspect of the present invention is a management computer that manages an information processing system that executes information processing using a path in which a plurality of resource elements are connected, and passes through a monitoring resource element that is a resource element to be monitored. Among the probes that monitor the path that passes through the monitoring resource element so that the number of paths that pass is equal to or greater than the predetermined minimum number of paths, the number of paths that pass is a monitoring resource element that has not reached the minimum number of paths The path that selects the path that passes the most uncovered monitoring resource elements is selected from the paths that can be monitored, and the selected path is set as a monitoring path that is a monitoring target path. A probe management means for setting a probe for monitoring as a monitoring probe to be monitored, and the monitoring probe And a statistical processing means for determining the monitoring resource element that caused the performance degradation by the separation based on the co-occurrence pattern based on the monitoring result of the monitoring probe. .
 本発明により、可能な限り少ないコストで性能障害の原因となっているリソースを切り分けることができる。 The present invention makes it possible to isolate resources that cause performance problems at the lowest possible cost.
第1の実施形態による情報処理システムのシステム構成図である。1 is a system configuration diagram of an information processing system according to a first embodiment. パス構成について説明するための図である。It is a figure for demonstrating a path | pass structure. 第1の実施形態による情報処理システムの概略動作を説明するための図である。It is a figure for demonstrating schematic operation | movement of the information processing system by 1st Embodiment. リソース構成テーブル30の一例を示す図である。3 is a diagram illustrating an example of a resource configuration table 30. FIG. プローブ構成管理テーブル40の一例を示す図である。4 is a diagram showing an example of a probe configuration management table 40. FIG. 監視要求テーブル50の一例を示す図である。It is a figure which shows an example of the monitoring request | requirement table. 共起条件テーブル60の一例を示す図である。It is a figure which shows an example of the co-occurrence condition table. スパイク履歴テーブル70の一例を示す図である。It is a figure which shows an example of the spike log | history table. リソース性能履歴テーブルの一例を示す図である。It is a figure which shows an example of a resource performance log | history table. プローブ選択処理のフローチャートである。It is a flowchart of a probe selection process. 図10のフローチャートの補足説明図である。It is a supplementary explanatory drawing of the flowchart of FIG. プローブ選択処理の全体フローチャートである。It is a whole flowchart of a probe selection process. ステップS13の詳細フローチャートである。It is a detailed flowchart of step S13. ステップS23の詳細フローチャートである。It is a detailed flowchart of step S23. 図12~14のフローチャートの補足説明図である。FIG. 15 is a supplementary explanatory diagram of the flowcharts of FIGS. パス構成の変更に対するプローブ再選択処理のフローチャートである。It is a flowchart of the probe reselection process with respect to the change of the path configuration. 図16のフローチャートの補足説明図である。FIG. 17 is a supplementary explanatory diagram of the flowchart of FIG. 16. 監視対象追加時のプローブ選択処理のフローチャートである。It is a flowchart of the probe selection process at the time of monitoring object addition. リソース選択画面を示す図である。It is a figure which shows a resource selection screen. プローブ選択案の提示画面を示す図である。It is a figure which shows the presentation screen of a probe selection plan. 監視結果画面を示す図である。It is a figure which shows the monitoring result screen. 変形例におけるリソース構成テーブル30を示す図である。It is a figure which shows the resource structure table 30 in a modification. リソース集合/分割処理のフローチャートである。It is a flowchart of a resource set / division process. 第2の実施形態におけるシステム構成図である。It is a system configuration figure in a 2nd embodiment. 第2の実施形態におけるリソース構成テーブル30の内容を示す図である。It is a figure which shows the content of the resource structure table 30 in 2nd Embodiment. 第2の実施形態におけるプローブ構成テーブル40を示す図である。It is a figure which shows the probe structure table 40 in 2nd Embodiment.
 実施形態について図面を参照して説明する。 Embodiments will be described with reference to the drawings.
<<第1の実施形態>>
<システム構成>
<< First Embodiment >>
<System configuration>
 図1は、第1の実施形態による情報処理システムのシステム構成図である。管理計算機1、サーバ19、およびストレージ12がLAN11で接続されている。管理計算機1は、LAN11を介してサーバ19やストレージ12から情報を収集し、また操作の指示を、LAN11を介してサーバ19やストレージ12に送信する。また、サーバ19とストレージ12はSAN(Storage Area Network)18によっても接続されている。サーバ19はSAN18を介してI/O処理要求をストレージ12へ送信し、ストレージ12はI/O要求を処理しサーバ19へ応答を返す。 FIG. 1 is a system configuration diagram of an information processing system according to the first embodiment. The management computer 1, server 19, and storage 12 are connected via a LAN 11. The management computer 1 collects information from the server 19 and the storage 12 via the LAN 11 and transmits operation instructions to the server 19 and the storage 12 via the LAN 11. The server 19 and the storage 12 are also connected by a SAN (Storage Area Network) 18. The server 19 transmits an I / O processing request to the storage 12 via the SAN 18, and the storage 12 processes the I / O request and returns a response to the server 19.
 管理計算機1は、CPU2、メモリ3、計算機内ストレージ4、表示I/F8、NW I/F9から構成される。表示I/F8は表示装置10に、NW I/F9はLAN11にそれぞれ接続される。 The management computer 1 includes a CPU 2, a memory 3, a computer storage 4, a display I / F 8, and an NW I / F 9. The display I / F 8 is connected to the display device 10, and the NW I / F 9 is connected to the LAN 11.
 計算機内ストレージ4には、プローブ管理プログラム5、収集プログラム6、および、統計処理プログラム7が格納される。これらのプログラムは、起動時にメモリ3に読み込まれ、CPU2によって実行される。 The computer storage 4 stores a probe management program 5, a collection program 6, and a statistical processing program 7. These programs are read into the memory 3 at the time of startup and executed by the CPU 2.
 メモリ3には、リソース構成テーブル30、プローブ構成テーブル40、監視要求テーブル50、共起条件テーブル60、スパイク履歴テーブル70、リソース性能履歴テーブル80が格納されている。リソース構成テーブル30には、管理計算機1の管理対象であるインフラリソースの構成情報が格納される。 The memory 3 stores a resource configuration table 30, a probe configuration table 40, a monitoring request table 50, a co-occurrence condition table 60, a spike history table 70, and a resource performance history table 80. The resource configuration table 30 stores configuration information of infrastructure resources that are managed by the management computer 1.
 サーバ19は、CPU20、メモリ21、表示I/F24、計算機内ストレージ25、HBA(Host Bus Adapter)26、および、NW I/F27から構成される。NW I/F27はLAN11に、HBA26はSAN18にそれぞれ接続される。 The server 19 includes a CPU 20, a memory 21, a display I / F 24, an in-computer storage 25, an HBA (Host Bus Adapter) 26, and an NW I / F 27. The NW I / F 27 is connected to the LAN 11 and the HBA 26 is connected to the SAN 18.
 メモリ21にはアプリケーション22およびプローブ23が格納される。これらはプログラムであり、CPU20へ読み込まれて実行される。VOL28は、ストレージ12によって作成された論理デバイス(図示せず)である。サーバ19からは、この論理デバイスがディスク領域として認識される。 The application 21 and the probe 23 are stored in the memory 21. These are programs, which are read into the CPU 20 and executed. The VOL 28 is a logical device (not shown) created by the storage 12. From the server 19, this logical device is recognized as a disk area.
 アプリケーション22は、VOL28に対してデータの読み書きを要求する。このとき、VOL28のデータ実体があるストレージ12へI/O要求が発行される。アプリケーション22より発行されたI/O要求は、HBA26からSAN18という経路を通りストレージ12へ届けられる。ストレージ12は、I/O要求を処理し、さきほどの経路を逆に辿ってアプリケーション22へ結果を返す。 The application 22 requests the VOL 28 to read / write data. At this time, an I / O request is issued to the storage 12 having the data entity of the VOL 28. An I / O request issued from the application 22 is delivered from the HBA 26 to the storage 12 through a path 18. The storage 12 processes the I / O request, and returns the result to the application 22 following the previous path in reverse.
 プローブ23は、このI/O要求の処理を計測する。例えば、プローブはI/O要求がストレージ12へ発行されてから結果が戻るまでの応答時間を計測する。また、プローブ23は、1秒間に処理されたI/O要求の数(IOPS)を計測する。プローブ23は、このようにして計測されたデータを、上述したベースライン監視の手法で分析し、スパイク(平均の計測値から著しく外れた計測値)を検出する。検出したスパイクは、収集プログラム6によって集められ、スパイク履歴テーブル70に格納される。 Probe 23 measures the processing of this I / O request. For example, the probe measures the response time from when an I / O request is issued to the storage 12 until the result returns. The probe 23 also measures the number of I / O requests (IOPS) processed per second. The probe 23 analyzes the data measured in this way by the above-described baseline monitoring technique, and detects spikes (measured values that deviate significantly from the average measured values). The detected spikes are collected by the collection program 6 and stored in the spike history table 70.
 プローブ23は、計測間隔を変える機能を持つ。すなわち、プローブ23は、先に述べた計測を秒オーダーという比較的短い時間間隔で行うことも、分オーダーという比較的長い時間間隔で行うこともできる。また、秒オーダーの計測と分オーダーの計測を同時に行うこと、秒オーダーの計測を停止し、分オーダーの計測のみを行うこともできる。このような計測間隔の変更を、プローブ23自身の判断で行うこともできるし、外部のプログラム、例えば管理計算機1のプローブ管理プログラム5の判断で変更することもできる。 The probe 23 has a function of changing the measurement interval. That is, the probe 23 can perform the measurement described above at a relatively short time interval of the order of seconds, or can be performed at a relatively long time interval of the order of minutes. It is also possible to simultaneously measure the second order and the minute order, stop the second order measurement, and perform only the minute order measurement. Such a change in the measurement interval can be performed by the determination of the probe 23 itself, or can be changed by the determination of an external program, for example, the probe management program 5 of the management computer 1.
 プローブ23は、先に述べたように、個々のI/O要求を直接計測するような、例えばOS(図示せず)のドライバとして実装することもできるし、OSがI/O要求を計測した結果の統計情報を定期的に収集するプログラムとして実装することもできる。 As described above, the probe 23 can be mounted as, for example, an OS (not shown) driver that directly measures individual I / O requests, or the OS measures I / O requests. It can also be implemented as a program that periodically collects statistical information on results.
 ストレージ12は、SW(スイッチ)13、Port14、Processor15、Pool16およびCache17から構成される。Port14、Processor15、Pool16およびCache17は、SW13を介して相互に接続される。Port14はSAN18に接続される。Pool16は複数のディスクから構成される記憶領域であり、データが蓄積される。1つのPool16は、複数種類のメディア(SSD、SAS HDD)で構成されていてもよい。アプリケーション22から発行されたVOL27へのI/O要求は、Port14を通ってProcessor15で処理される。例えば、I/O要求がデータの読み込みの場合、Processor15は、Cache17上に要求されたデータがないか調べ、なければ同データを格納するPool16からデータを取得してアプリケーション22へデータを返す。 The storage 12 includes a SW (switch) 13, a Port 14, a Processor 15, a Pool 16, and a Cache 17. Port14, Processor15, Pool16, and Cache17 are mutually connected via SW13. Port 14 is connected to SAN 18. Pool 16 is a storage area composed of a plurality of disks and stores data. One Pool 16 may be composed of a plurality of types of media (SSD, SAS HDD). An I / O request issued from the application 22 to the VOL 27 is processed by the processor 15 through the port 14. For example, when the I / O request is data reading, the processor 15 checks whether there is requested data on the cache 17, and if not, acquires the data from the Pool 16 storing the data and returns the data to the application 22.
<ストレージのパス構成についての用語説明> <Glossary for storage path configuration>
 以降の説明にあたり、図2を用いて用語を説明する。 In the following explanation, terms will be explained using FIG.
 図2は、パス構成について説明するための図である。 FIG. 2 is a diagram for explaining the path configuration.
 図2(a)は、パスの説明図である。パスはVOL28が使用するストレージ12内部のリソース(Port14、Processor15、Pool16およびCache17)を擬似的に線で結んだものである。また、これらのリソースひとつひとつをリソース要素と記す。VOL28へのI/O要求は、このパス上にあるリソース要素を使って処理される。プローブ23は、I/O要求を監視する。このことを、パスを監視すると記す。上述した応答時間の計測はパスの監視の一例である。また、プローブが監視するパスを監視パスと記す。 FIG. 2A is an explanatory diagram of the path. The path is obtained by artificially connecting resources (Port 14, Processor 15, Pool 16, and Cache 17) in the storage 12 used by the VOL 28. Each of these resources is referred to as a resource element. I / O requests to the VOL 28 are processed using resource elements on this path. The probe 23 monitors an I / O request. This is described as monitoring the path. The response time measurement described above is an example of path monitoring. A path monitored by the probe is referred to as a monitoring path.
 図2(b)は、パスグループの説明図である。パスグループは、リソース要素の共通部分のあるパスで構成されたグループである。図2(b)では、VOL1からのパス(パス1)と、VOL2からのパス(パス2)を図示している。パス1とパス2は、Port以外が共通であり、1つのパスグループに属する。また、このパスの共通部分を共通部もしくは共通パスと記す。一方、共通部でないところを単一部と記す(図2(b)ではPortが単一部に相当)。 FIG. 2B is an explanatory diagram of a path group. A path group is a group composed of paths having a common part of resource elements. FIG. 2B illustrates a path from VOL1 (path 1) and a path from VOL2 (path 2). Path 1 and path 2 are common except for Port, and belong to one path group. A common part of this path is referred to as a common part or a common path. On the other hand, a portion that is not a common portion is described as a single portion (Port corresponds to a single portion in FIG. 2B).
 図2(c)は、単一交点の説明図である。図2(b)と同様に2つのVOL28があり、それぞれからパス1とパス2が出ている。これらのパスはProcessorで交わっている。このとき、Processorを単一交点と記す。また、パス1およびパス2は、単一交点を持つ、と記す。また、リソース要素を通るパスを通過パスと記し、その数(監視パスの数)を通過パス数と記す。図では、Processorの通過パス数は2、それ以外のリソース要素は1である。また、通過パス数が、指定された値を超えた場合、そのリソースはカバーされた、と記す。 FIG. 2C is an explanatory diagram of a single intersection. As in FIG. 2B, there are two VOLs 28, and a path 1 and a path 2 are output from each. These paths intersect at the processor. At this time, the processor is described as a single intersection. Further, it is noted that pass 1 and pass 2 have a single intersection. A path passing through the resource element is referred to as a passing path, and the number (the number of monitoring paths) is referred to as a passing path number. In the figure, the number of passing paths of the processor is 2, and the other resource elements are 1. Also, if the number of passing paths exceeds the specified value, the resource is covered.
<概略動作>
 ここでは、管理計算機1上で動作するプログラムの動作について述べる。図3は、本実施形態による情報処理システムの概略動作を説明するための図である。
<General operation>
Here, the operation of a program operating on the management computer 1 will be described. FIG. 3 is a diagram for explaining a schematic operation of the information processing system according to the present embodiment.
(1) まず収集プログラム6がストレージ12からストレージ内部のリソースの構成情報を収集しリソース構成テーブル30へ格納する。また、収集プログラム6は、サーバ19のプローブ23からサーバ19の構成情報、プローブ23およびVOL28の構成情報を収集しプローブ構成テーブル40へ格納する。次に、プローブ管理プログラム5は、表示I/F8を介して表示装置10に、管理者からの監視要求を受け付ける画面を表示し、管理者からの入力を監視要求テーブル50へ格納する。 (1) First, the collection program 6 collects the configuration information of the resources inside the storage from the storage 12 and stores them in the resource configuration table 30. The collection program 6 collects the configuration information of the server 19 and the configuration information of the probe 23 and the VOL 28 from the probe 23 of the server 19 and stores them in the probe configuration table 40. Next, the probe management program 5 displays a screen for accepting a monitoring request from the administrator on the display device 10 via the display I / F 8 and stores the input from the administrator in the monitoring request table 50.
(2)~(3) 次に、プローブ管理プログラム5は、リソース構成テーブル30およびプローブ構成テーブル40の格納内容を参照して、どのプローブ23に秒オーダーの細かい計測をさせるかを決定し、プローブ23へ計測を指示する。これと同時に、プローブ管理プログラム5は、プローブ23が計測したスパイクの組み合わせから、ストレージ12内のどのリソースがスパイクの原因か切り分けるルール(共起条件)を作成し、共起条件テーブル60へ格納する。 (2) to (3) Next, the probe management program 5 refers to the contents stored in the resource configuration table 30 and the probe configuration table 40, determines which probe 23 is to be measured in seconds, and probes. 23 is instructed to measure. At the same time, the probe management program 5 creates a rule (co-occurrence condition) for identifying which resource in the storage 12 is the cause of the spike from the combination of spikes measured by the probe 23 and stores it in the co-occurrence condition table 60. .
(4) 細かい監視を指示されたプローブ23は、I/O要求を計測してスパイクを検出する。収集プログラム6は、プローブ23から検出したスパイクの記録を収集し、スパイク履歴テーブル70へ格納する。また、収集プログラム6は、ストレージ12から(分オーダーで計測した)各リソースの性能情報を収集し、リソース性能履歴テーブル80へ格納する。 (4) The probe 23 instructed to monitor finely measures the I / O request and detects a spike. The collection program 6 collects spike records detected from the probe 23 and stores them in the spike history table 70. Further, the collection program 6 collects the performance information of each resource (measured in the minute order) from the storage 12 and stores it in the resource performance history table 80.
(5) 統計処理プログラム7は、スパイク履歴テーブル70に格納されたスパイク情報を、共起条件テーブル60に格納された共起条件にしたがって分析して、スパイク原因のリソースを特定し、その結果をスパイク履歴テーブル70へ記録する。また、統計処理プログラム7は、その結果を表示装置10へ表示し、管理者へ提示する。 (5) The statistical processing program 7 analyzes the spike information stored in the spike history table 70 in accordance with the co-occurrence conditions stored in the co-occurrence condition table 60, identifies the resource causing the spike, and determines the result. Record in the spike history table 70. The statistical processing program 7 displays the result on the display device 10 and presents it to the administrator.
<テーブル>
 次に、管理計算機1のメモリ3に格納された各テーブルの構成を説明する。
<Table>
Next, the configuration of each table stored in the memory 3 of the management computer 1 will be described.
 図4は、リソース構成テーブル30の一例を示す図である。リソース構成テーブル30には、ストレージ12内部のリソースの構成情報が格納される。すなわち、ストレージ12(ストレージID31で一意識別)ごとに、内部のリソース(リソースID33で一意識別)がリソース種別32ごとにグルーピングされて格納される。リソース種別32には、例えばPort、Proccessor、Cache、Pool、Portがある。なお、図4では、Port14のリソースID33をPT1、PT2・・・、Processor15のリソースID33をPR1、PR2・・・、Cache17のリソースID33をCA1、CA2・・・、Pool16のリソースID33をPO1、PO2・・・のように表記している。以降の記述でもこの表記を使用する。 FIG. 4 is a diagram illustrating an example of the resource configuration table 30. The resource configuration table 30 stores resource configuration information in the storage 12. That is, for each storage 12 (uniquely identified by the storage ID 31), internal resources (uniquely identified by the resource ID 33) are grouped and stored for each resource type 32. The resource type 32 includes, for example, Port, Processor, Cache, Pool, and Port. 4, the resource ID 33 of Port 14 is PT1, PT2..., The resource ID 33 of Processor 15 is PR1, PR2..., The resource ID 33 of Cache 17 is CA1, CA2..., And the resource ID 33 of Pool 16 is PO1, PO2. It is written as ... This notation is used in the following description.
 図5は、プローブ構成管理テーブル40の一例を示す図である。プローブ構成テーブル40には、各プローブ23の監視内容が格納される。すなわち、どのプローブ23(プローブID41で識別)が、どのサーバ19(サーバID43で識別)で動作し、どのVOL28(VOL IDで識別)を監視しているか(秒オーダーで監視している場合、監視フラグ42が「Y」)を格納する。 FIG. 5 is a diagram showing an example of the probe configuration management table 40. The probe configuration table 40 stores the monitoring contents of each probe 23. In other words, which probe 23 (identified by probe ID 41) operates on which server 19 (identified by server ID 43) and which VOL 28 (identified by VOL ID) is monitored (monitored in the order of seconds) The flag 42 stores “Y”).
 また、プローブ構成テーブル40には、パス46の情報(パスID44で識別)、すなわち、各VOL28が使用するストレージ12内部のリソースが格納される。図5では、各パス46を構成するリソースのリソースID48(リソース構成テーブル30のリソースIDと同じ識別子)と、そのリソース種別47が格納されている。また、各パス46の付属情報としてパスグループID49が格納される。パスグループID49は、パス46の単一部/共通部を区別する情報と、共通部が属するパスグループのIDを示す。図5の行(401a、401b)のように、1つのパス46が、パスグループIDの値がnullと、null以外の値(図では0)の両方を含む場合、nullであるリソースは単一部であり、nullでないリソースは共通部に属し、同パスは、0番のパスグループに属することを示す。 Also, the probe configuration table 40 stores information on the path 46 (identified by the path ID 44), that is, resources in the storage 12 used by each VOL 28. In FIG. 5, the resource ID 48 of the resource configuring each path 46 (the same identifier as the resource ID of the resource configuration table 30) and the resource type 47 are stored. Further, a path group ID 49 is stored as attached information of each path 46. The path group ID 49 indicates information for distinguishing the single part / common part of the path 46 and the ID of the path group to which the common part belongs. When one path 46 includes both null and a value other than null (0 in the figure) as shown in the rows (401a, 401b) in FIG. 5, a single resource is null. A non-null resource belongs to the common part, and the same path belongs to the 0th path group.
 図6は、監視要求テーブル50の一例を示す図である。監視要求テーブル50には、管理者が要求した、監視リソースと、監視の確かさを記録する。監視要求テーブル50には、管理者の要求内容(要求ID51で識別)、監視指定された装置(装置ID52で識別。サーバID43もしくはストレージID31)、監視指定されたリソース(リソースID53で識別)、および、最小パス本数54が格納される。スパイクの共起の状態から原因となっているリソースを特定するために、同じリソースを通る複数本のパスを監視するが、最小パス本数54は、その同じリソースを通る監視パスの最小の本数である。この最小パス本数54が監視の確かさを決め、多く設定すれば原因のリソースを特定できる確かさが上がり、少なく設定すればその確かさが下がる。すなわち、1つのリソースを監視するためのパスの本数が多いほど、原因リソース切り分けの確かさが増す。 FIG. 6 is a diagram illustrating an example of the monitoring request table 50. In the monitoring request table 50, the monitoring resource requested by the administrator and the monitoring certainty are recorded. The monitoring request table 50 includes an administrator request content (identified by a request ID 51), a monitoring designated device (identified by a device ID 52. Server ID 43 or storage ID 31), a monitoring designated resource (identified by a resource ID 53), and The minimum number of paths 54 is stored. In order to identify the causative resource from the co-occurrence state of spikes, multiple paths that pass through the same resource are monitored. The minimum number of paths 54 is the minimum number of monitoring paths that pass through the same resource. is there. The minimum number of paths 54 determines the certainty of monitoring. If a large number of paths are set, the certainty of identifying the causative resource increases, and if it is set small, the certainty decreases. That is, as the number of paths for monitoring one resource increases, the certainty of the cause resource separation increases.
 図7は、共起条件テーブル60の一例を示す図である。共起条件テーブル60には、計測したスパイクの原因リソースを特定するための、パスの組み合わせ条件が格納される。すなわち、条件(条件IDで識別)ごとに、パスの共起条件63と、共起条件が成立したときにスパイクの原因と推定されるリソース(リソースID62で識別)、および、条件の作成時刻64が格納される。例えば、行65aの共起条件63には「P1 NOT(P2 & P3 & P4 & P5)」が格納されている。ここでは、これはパスP1でスパイクが起きたがパスP2からP5ではスパイクが起きなかった場合、を意味する。スパイクの発生のしかたが、この条件を満たすとき、リソースID62に格納されたリソースPT1の逼迫が原因と推定される。 FIG. 7 is a diagram illustrating an example of the co-occurrence condition table 60. The co-occurrence condition table 60 stores path combination conditions for specifying the cause resource of the measured spike. That is, for each condition (identified by condition ID), a path co-occurrence condition 63, a resource (identified by resource ID 62) that is presumed to be the cause of the spike when the co-occurrence condition is satisfied, and a condition creation time 64 Is stored. For example, “P1 NOT (P2 & P3 & P4 & P5)” is stored in the co-occurrence condition 63 of the row 65a. Here, this means that a spike has occurred in the path P1, but no spike has occurred in the paths P2 to P5. When the manner of occurrence of the spike satisfies this condition, it is estimated that the cause is the tightness of the resource PT1 stored in the resource ID 62.
 図8は、スパイク履歴テーブル70の一例を示す図である。スパイク履歴テーブル70は、プローブ23が計測したスパイクと、そのスパイクを発生原因と推定されるリソース、つまり分析結果が格納される。すなわち、スパイク(スパイクIDで識別71)ごとに、発生時刻72、発生したVOL(VOL IDで識別)、スパイクの大きさを示す応答時間74が格納される。また、同テーブルには、これらのスパイクを統計処理プログラム7が分析した結果、すなわち、そのスパイクの発生原因と推定されたリソース(リソースID75で識別)と、その推定に使われた共起条件テーブル60の条件(条件ID76で識別)が格納される。例えば、行77aには、スパイクID0で特定されるスパイクは、条件ID76が「1」の条件に合致したことによって、スパイクの原因リソースがPT1であると特定されたことを示している。なおリソースID75および条件ID76が「null」であるとは(行77b)、そのスパイクがまだ分析されていないことを示す。また、リソースID75および条件ID76には「原因不明」と記録してもよい。 FIG. 8 is a diagram illustrating an example of the spike history table 70. The spike history table 70 stores spikes measured by the probe 23 and resources estimated from the occurrence of the spikes, that is, analysis results. That is, for each spike (identified by spike ID 71), the generation time 72, the generated VOL (identified by VOL ID), and the response time 74 indicating the magnitude of the spike are stored. Also, the table shows the result of analysis of these spikes by the statistical processing program 7, that is, the resource (identified by resource ID 75) estimated to be the cause of the spike, and the co-occurrence condition table used for the estimation. 60 conditions (identified by condition ID 76) are stored. For example, in the row 77a, the spike identified by the spike ID 0 indicates that the cause resource of the spike is identified as PT1 because the condition ID 76 matches the condition “1”. Note that the resource ID 75 and the condition ID 76 being “null” (line 77b) indicate that the spike has not been analyzed yet. Further, “unknown cause” may be recorded in the resource ID 75 and the condition ID 76.
 図9は、リソース性能履歴テーブルの一例を示す図である。リソース性能履歴テーブル80には、ストレージ12内部の各リソースや、プローブ23が計測したVOL28へのアクセス性能(分オーダーでの計測の記録)等を記録する。同テーブルの各行には、性能を計測したリソース(リソースID81で識別)、計測時刻82、計測したメトリック83およびその値84が記録される。 FIG. 9 is a diagram showing an example of a resource performance history table. In the resource performance history table 80, each resource in the storage 12, the access performance to the VOL 28 measured by the probe 23 (measurement record in minute order), and the like are recorded. In each row of the table, a resource whose performance is measured (identified by resource ID 81), a measurement time 82, a measured metric 83, and its value 84 are recorded.
<処理フロー>
 図10は、プローブ選択処理のフローチャートである。図11は、図10のフローチャートの補足説明図である。
<Processing flow>
FIG. 10 is a flowchart of the probe selection process. FIG. 11 is a supplementary explanatory diagram of the flowchart of FIG.
 管理者が指定した、いくつかのリソース要素(例えば、図6の55bに示した監視要求の場合)をカバーするプローブ選択処理について説明する。図11において、Processor1および2(網掛けされたリソース要素;以下ではPR1、PR2と記す)が監視対象に指定されている。なお、以下の処理主体は、全てプローブ管理プログラム5である。 A probe selection process that covers several resource elements (for example, the monitoring request shown in 55b of FIG. 6) specified by the administrator will be described. In FIG. 11, Processors 1 and 2 (hatched resource elements; hereinafter referred to as PR1 and PR2) are designated as monitoring targets. Note that the following processing entities are all the probe management program 5.
 (S1) 監視を指定されたリソース要素を特定する。監視要求テーブル50(図6)の行55bのように、直接リソース要素を指定された場合は何もしない。行55aのように、特定のサーバ19のVOL28を指定された場合、プローブ構成テーブル40を参照し、指定されたVOL28が使用するリソース要素を特定する。 (S1) The resource element designated for monitoring is identified. If the resource element is directly specified as in the row 55b of the monitoring request table 50 (FIG. 6), nothing is done. When the VOL 28 of the specific server 19 is designated as in the row 55a, the probe configuration table 40 is referred to and the resource element used by the designated VOL 28 is specified.
 (S2) プローブ構成テーブル40を参照して、指定リソース要素を含むパスを持つプローブを特定し、リソース要素ごとにプローブのグループを作る。例えば、図11ではPR1を通過するパスを持つプローブグループPG1={プローブ1、プローブ2}と、PR2のプローブグループPG2={プローブ2、プローブ3}となる。 (S2) Referring to the probe configuration table 40, a probe having a path including the specified resource element is specified, and a probe group is created for each resource element. For example, in FIG. 11, the probe group PG1 = {probe 1, probe 2} having a path passing through PR1 and the probe group PG2 = {probe 2, probe 3} of PR2 are obtained.
 (S3) 各リソース(PR1およびPR2)の通過パス本数が、最小パス本数54の値以上か否か判定する。以上であれば終了し、未満であればS4へ進む。 (S3) It is determined whether the number of passing paths of each resource (PR1 and PR2) is equal to or greater than the value of the minimum number of paths 54. If it is above, the process ends. If it is less, the process proceeds to S4.
 (S4) 通過パス数が不足するリソース要素を1つ(例えばPR2)選択する。 (S4) Select one resource element (for example, PR2) whose number of passing paths is insufficient.
 (S5) S4で選択したリソース要素のプローブグループの中から、最も重複数が多いプローブを選択する。重複数とは、そのプローブが含まれているプローブグループの個数である。例えば、図11では、PG1とPG2の両方に含まれるプローブ2(重複数2)を選択する。 (S5) From the probe group of the resource element selected in S4, select the probe with the largest number of duplicates. The overlap number is the number of probe groups that include the probe. For example, in FIG. 11, the probe 2 (multiple number 2) included in both PG1 and PG2 is selected.
 (S6) プローブ構成テーブル40を参照し、S5で選択したプローブが持つ全パスを取得する。図11では、プローブ2が持つパス1およびパス2を特定する。 (S6) With reference to the probe configuration table 40, all the paths possessed by the probe selected in S5 are acquired. In FIG. 11, the path 1 and path 2 of the probe 2 are specified.
 (S7) S6で取得したパスのうち、選択済みの監視パスと、指定リソース要素(PR1もしくはPR2)で単一交点となっているパスのみ残す。なお、特に監視すべきパスをユーザ等が予め選択しており、処理開始時には既に選択済みの監視パスがあるのが通常である。 (S7) Among the paths acquired in S6, only the selected monitoring path and the path that is a single intersection in the specified resource element (PR1 or PR2) are left. Note that, in particular, a user or the like has previously selected a path to be monitored, and there is usually a monitoring path already selected at the start of processing.
 (S8) S7で残ったパスを監視パスとする。プローブ構成テーブル40に格納された同パスの行の監視フラグ42をYへ変更する。このとき、指定リソース要素の通過パス数を更新する。プローブ管理プログラム5は、パスを監視するものとして選択されたプローブに対して細かい時間間隔(秒オーダー)での監視を指示する。また、プローブ管理プログラム5は、選択したパスで共起条件テーブル60を更新する。例えば、図11で、Processor2の監視パスをパス2およびパス3とした場合、共起条件テーブル60に行を追加し、同行のリソースIDをProcessor2、共起条件63を「パス2 & パス3」とする。すなわち、図7の共起条件テーブル60に、PR2を追加し、その共起条件をP2 & P3とする。以降、S3からS8までの処理を、S3の条件を満たすまで繰り返す。 (S8) The path remaining in S7 is set as a monitoring path. The monitoring flag 42 in the row of the same path stored in the probe configuration table 40 is changed to Y. At this time, the number of passing paths of the specified resource element is updated. The probe management program 5 instructs the probe selected to monitor the path to monitor at a fine time interval (second order). In addition, the probe management program 5 updates the co-occurrence condition table 60 with the selected path. For example, in FIG. 11, when the monitoring path of Processor 2 is set to path 2 and path 3, a line is added to the co-occurrence condition table 60, the resource ID of the same line is Processor 2, and the co-occurrence condition 63 is “path 2 & path 3”. And That is, PR2 is added to the co-occurrence condition table 60 of FIG. 7, and the co-occurrence condition is P2 & P3. Thereafter, the processes from S3 to S8 are repeated until the condition of S3 is satisfied.
 次に、管理者がストレージ12内部の全リソースの監視を指定したときのプローブ選択処理を説明する。 Next, probe selection processing when the administrator designates monitoring of all resources in the storage 12 will be described.
 説明には図12~15を用いる。図12は、プローブ選択処理の全体フローチャートである。図13は、ステップS13の詳細フローチャートである。図14は、ステップS23の詳細フローチャートである。図15は、図12~14のフローチャートの補足説明図である。 12 to 15 are used for explanation. FIG. 12 is an overall flowchart of the probe selection process. FIG. 13 is a detailed flowchart of step S13. FIG. 14 is a detailed flowchart of step S23. FIG. 15 is a supplementary explanatory diagram of the flowcharts of FIGS.
 図12に示した全体フローでは、ストレージ12内部のリソースの数をリソース種別(Port、Processor等)毎に集計し、要素数が多いリソース種別から少ないリソース種別へという順番に監視パスとそのプローブ23を決定して、プローブ構成テーブル40と共起条件テーブル60を更新する。 In the overall flow shown in FIG. 12, the number of resources in the storage 12 is aggregated for each resource type (Port, Processor, etc.), and the monitoring path and its probe 23 are ordered in the order from the resource type with the largest number of elements to the resource type with the smallest number. And the probe configuration table 40 and the co-occurrence condition table 60 are updated.
 図12について説明する。 FIG. 12 will be described.
 (S10) リソース構成テーブル30を参照し、対象のストレージ12内部のリソース数をリソース種別ごとに集計する。このとき、カバー済みのリソース要素、つまり、ループの前サイクルにおいて監視パスとそのプローブ23が確定されたリソースは集計から除外する。 (S10) Referring to the resource configuration table 30, the number of resources in the target storage 12 is totaled for each resource type. At this time, the covered resource element, that is, the resource for which the monitoring path and its probe 23 are determined in the previous cycle of the loop is excluded from the aggregation.
 (S11) 全てのリソース種別に対してプローブ選択処理を実施したらS16へ、未処理ならS12へ進む。 (S11) If probe selection processing is executed for all resource types, the process proceeds to S16, and if not processed, the process proceeds to S12.
 (S12) S10で求めたリソース要素数が多い順にリソース種別を1つ選択する。 (S12) Select one resource type in descending order of the number of resource elements obtained in S10.
 (S13) S12で選択したリソース種別のリソース要素を単一部、それ以外のリソース種別のリソース要素を共通部とするように監視パスとそのプローブを選択する。この処理は、図13および図14を用いて詳しく後述する。 (S13) The monitoring path and its probe are selected so that the resource element of the resource type selected in S12 is a single part and the resource elements of other resource types are the common part. This process will be described in detail later with reference to FIGS.
 (S14) S13で選択した監視パスを、プローブ管理テーブル40へ記録する。すなわち、監視パスに選択されたエントリの監視フラグ42をYへ更新する。 (S14) The monitoring path selected in S13 is recorded in the probe management table 40. That is, the monitoring flag 42 of the entry selected for the monitoring path is updated to Y.
 (S15) S13で選択した監視パスに基づいて共起条件テーブル60に行を作成する。この処理は、図9のものと同様である。 (S15) A row is created in the co-occurrence condition table 60 based on the monitoring path selected in S13. This process is the same as that of FIG.
 (S16) プローブ管理テーブル40を参照してS13で選択した監視パスを持つプローブ23を特定し、それらのプローブへ細かい監視間隔での監視を指示する。 (S16) The probes 23 having the monitoring path selected in S13 are identified with reference to the probe management table 40, and monitoring at a fine monitoring interval is instructed to those probes.
 以下、図13について、図15の補足説明図を用いて説明する。 Hereinafter, FIG. 13 will be described with reference to the supplementary explanatory diagram of FIG.
 (S20) リソース構成テーブル30を参照し、管理者から指定されたストレージ12内部のリソースを取得する。次に、プローブ管理テーブル40を参照し、それらリソースを通るパスを取得する。次いで、それらのパスの共通パスを特定し、パスグループを作成する。このとき、S12で選択したリソース種別が単一部、その他のリソース種別が共通部となるようパスグループを作成する。 (S20) Referring to the resource configuration table 30, the internal resource of the storage 12 designated by the administrator is acquired. Next, the probe management table 40 is referred to and a path passing through those resources is acquired. Next, a common path of these paths is specified, and a path group is created. At this time, a path group is created so that the resource type selected in S12 is a single part and the other resource types are a common part.
 図15(a)を例に説明する。ここでは、PortがS12で選択されたリソース種別であるとする。パス1は、Port、Processor、Cache、Poolの順に、それらの番号が1、1、1、1であるリソース要素から構成される。パス2は、同様に番号が2、1、1、1であるリソース要素から構成される。パス1とパス2は、Port以外のリソース種別で同じリソース要素を通過(共通パス1)する。そのため、パス1とパス2は同じパスグループに属する。同様に、パス3とパス4は、パス1およびパス2のパスグループとは別の、同じパスグループに属する。 This will be described with reference to FIG. Here, it is assumed that Port is the resource type selected in S12. The path 1 is composed of resource elements whose numbers are 1, 1, 1, 1 in the order of Port, Processor, Cache, Pool. Similarly, the path 2 is composed of resource elements having numbers 2, 1, 1, 1. Path 1 and path 2 pass through the same resource element with a resource type other than Port (common path 1). Therefore, path 1 and path 2 belong to the same path group. Similarly, the path 3 and the path 4 belong to the same path group different from the path groups of the path 1 and the path 2.
 (S21) S20で作成した共通パスのうち、通過パス数が閾値未満のものを除外する。ここで、閾値とは、監視要求テーブル50の最小パス本数54で指定された数値である。例えば、図15(a)で示した例では、共通パス1と共通パス2が作成され、それぞれ通過パス数が2である。この値と最小パス本数54で指定された数値を比較する。このステップにより、通過パス数が少ない共通パスが除外される。一般に、通過パス数が多いほど、共起条件の判定に用いられるデータ数が増えるため、共起判定の精度が上がる。また、構成変更によってパスが変わったときの通過パス数の減少に強い(これについては後述する)。そのため、通過パス数が基準に満たない共通パスをこの時点で除外しておく。なお、閾値に、最小パス本数54で指定された数値をそのまま使わなくてもよい。例えば、最小パス本数54に一定の係数、例えば2を乗算した値を閾値としてもよい。また、共通パスの過剰な除外を防ぐために、係数を少しずつ下げながらS21からS23の処理をループさせてもよい。 (S21) The common paths created in S20 are excluded if the number of passing paths is less than the threshold. Here, the threshold value is a numerical value designated by the minimum number of paths 54 in the monitoring request table 50. For example, in the example shown in FIG. 15A, common path 1 and common path 2 are created, and the number of passing paths is 2, respectively. This value is compared with the numerical value designated by the minimum number of paths 54. By this step, common paths with a small number of passing paths are excluded. In general, as the number of passing paths increases, the number of data used for determining the co-occurrence condition increases, so the accuracy of the co-occurrence determination increases. Also, it is strong against a decrease in the number of passing paths when the path is changed by the configuration change (this will be described later). Therefore, common paths whose number of passing paths does not meet the standard are excluded at this point. Note that the numerical value specified by the minimum number of paths 54 may not be used as it is for the threshold value. For example, a value obtained by multiplying the minimum number of paths 54 by a constant coefficient, for example, 2 may be used as the threshold value. Further, in order to prevent excessive exclusion of common paths, the processing from S21 to S23 may be looped while the coefficient is gradually decreased.
 (S22) S20およびS21で残った共通パスについて、単一交点を形成する共通パスを求める。図15(b)を用いて説明する。この図では、パス1とパス2が共通パス1を構成し、パスnとパスmが共通パスnを構成している。これらの共通パス同士を比較し、単一交点を形成する共通パスの組み合わせを求める。図15(b)では、共通パス1と共通パス2が、Cache1で単一交点を形成している。このとき、Cache1は、共通パス1と共通パスmで「カバー」されたとする。すなわち、Cache1の逼迫を原因として発生したスパイクでは、共通パス1と共通パスmを構成するパス(パス1、2、およびn、m)でほぼ同時にスパイクが起きるので、他のリソースを原因とするスパイクと区別できる。なお、Cache1がこの時点でカバーされるため、キャッシュはS12で除外されるリソース要素の対象となる。 (S22) For the common paths remaining in S20 and S21, a common path forming a single intersection is obtained. This will be described with reference to FIG. In this figure, path 1 and path 2 constitute common path 1, and path n and path m constitute common path n. These common paths are compared with each other to obtain a combination of common paths forming a single intersection. In FIG. 15B, the common path 1 and the common path 2 form a single intersection at Cache1. At this time, Cache 1 is “covered” by common path 1 and common path m. In other words, in the spike generated due to the tightness of Cache 1, the spike occurs almost simultaneously in the paths ( paths 1, 2, and n, m) that constitute the common path 1 and the common path m, and thus causes other resources. Can be distinguished from spikes. Since Cache1 is covered at this point, the cache is the target of the resource element excluded in S12.
 (S23) 最後に、S22で選択した共通パスに含めるパスを決定する。図15(b)の例では、共通パス1に含まれるパス1、パス2のうち、実際に監視パスに指定するパスを選択することに相当する。但し、この例では通過パス数が2と少ないため、両パスともに監視パスに選択される。このステップの処理は、図14で詳述する。 (S23) Finally, the paths to be included in the common path selected in S22 are determined. In the example of FIG. 15B, this corresponds to selecting a path that is actually designated as a monitoring path from among paths 1 and 2 included in the common path 1. However, since the number of passing paths is small in this example, both paths are selected as monitoring paths. The processing in this step will be described in detail with reference to FIG.
 図14には、パスグループに属するパスの中から監視パスを選択する処理が示されている。この処理では、パスグループに属するパスを条件に応じて加点し、パスグループを構成するパスの本数が閾値以上となるように、加点の多いパスを選択する。これにより、条件のよいパスで監視パスを構成できる。 FIG. 14 shows processing for selecting a monitoring path from paths belonging to a path group. In this process, a path belonging to a path group is added according to a condition, and a path with many points is selected so that the number of paths constituting the path group is equal to or greater than a threshold value. As a result, the monitoring path can be configured with a favorable path.
 (S30) 完全重複パスを加点する。すなわち、通過するリソース要素が完全に重複しているパスが複数あるパスを加点する。例えば、図15(c)では、プローブ1が持つパス1がPort、Processor、Cache、Poolの順に1、1、1、1の番号のリソース要素を通過している。また、プローブ2が持つパス2は、通過するリソース要素がパス1と完全に重複している。このときパス1とパス2は、完全重複パスであり、重複数は2である。このようなパスを加点する。加点する点数は、一定値でもよく、あるいは重複数に比例してもよい。完全重複パスに加点する理由は、パスの構成変更時に代替パスを簡単に用意可能にするためである。 (S30) A complete duplicate path is added. That is, a path having a plurality of paths in which passing resource elements completely overlap is added. For example, in FIG. 15C, the path 1 of the probe 1 passes through the resource elements numbered 1, 1, 1, 1 in the order of Port, Processor, Cache, and Pool. In the path 2 of the probe 2, the resource elements that pass through completely overlap with the path 1. At this time, the path 1 and the path 2 are completely overlapping paths, and the overlapping number is 2. Such a path is added. The number of points to be added may be a fixed value or may be proportional to the overlap number. The reason for adding points to the completely duplicated path is to make it possible to easily prepare an alternative path when the path configuration is changed.
 (S31) 共通パスのいずれかに属するパスを多数持つプローブを特定し、それらのパスに加点する。例えば、図15(c)では、プローブ3が共通パス1に属するパス3と、共通パス2に属するパス4を持つ。このときパス3およびパス4を加点する。加点する点数は、1つのプローブが持つパス数に比例してもよい。プローブ3の例では、パス3とパス4の2本に比例した点数としてもよい。これにより、できるだけプローブが少なくなるようにパスを選択することができる。 (S31) A probe having a large number of paths belonging to any of the common paths is identified, and points are added to those paths. For example, in FIG. 15C, the probe 3 has a path 3 belonging to the common path 1 and a path 4 belonging to the common path 2. At this time, path 3 and path 4 are added. The number of points to be added may be proportional to the number of paths that one probe has. In the example of the probe 3, the number of points may be proportional to two of the path 3 and the path 4. This makes it possible to select a path so that the number of probes is reduced as much as possible.
 (S32) 図9に示すリソース性能履歴テーブル80において、パスごとの流量、例えば1秒あたりのI/O量(IOPS)が多いパスに加点する。すなわち、リソース性能履歴テーブル80のリソースID81を参照し、そこに記載されたVOLからプローブ管理テーブル40を参照してパスを求め、リソース性能履歴テーブル80に記録されたIOPSの値84が多いVOLのパスに対して加点する。加点する点数は、一定値でもよく、あるいは流量に比例した点数でもよい。これは、より多くのI/Oを要求するパスほどスパイクの検知に有利であるためである。 (S32) In the resource performance history table 80 shown in FIG. 9, points are added to paths with a large flow rate per path, for example, an I / O amount per second (IOPS). That is, the resource ID 81 of the resource performance history table 80 is referred to, the path is obtained by referring to the probe management table 40 from the VOL described therein, and the VOL with a large IOPS value 84 recorded in the resource performance history table 80 is obtained. Add points to the path. The number of points to be added may be a constant value or a point proportional to the flow rate. This is because a path requiring more I / O is more advantageous for detecting a spike.
 (S33) 監視パスを決定する対象のパスグループを選択する。このとき、(パス候補数-通過パス数)が最小であるパスグループを選択する。ここで、パス候補数とは、パスグループを構成するパスのうち、他のパスグループの監視パスとして選択されていないパスの数を指す。また、通過パス数とは、パスグループを構成するパスのうち、監視パスとして選択されたパスの本数を指す。例えば、図15(c)では、共通パス2は、Processor、CacheおよびPoolを共通部とするパス4、パス5、パス6から構成される。パス4から6は、それぞれ、Port3、4、5を通る。また、共通パス3に属するパス7は、パス6と同様にPort5を通る。ここでパス7が共通パス3の監視パスとして選択されている場合、共通パス2のパス候補数は、パス4~6の3本からパス7の1本を減算した2となる。ここで更に、パス4が共通パス2の監視パスとして選択済みであれば、共通パス2の(パス候補数-通過パス数)は2-1で1となる。 (S33) Select the target path group for determining the monitoring path. At this time, the path group having the smallest (number of path candidates−number of passing paths) is selected. Here, the number of path candidates refers to the number of paths that are not selected as monitoring paths of other path groups among the paths constituting the path group. The number of passing paths refers to the number of paths selected as monitoring paths among the paths constituting the path group. For example, in FIG. 15C, the common path 2 is composed of a path 4, a path 5, and a path 6 having a processor, cache, and pool as common parts. Paths 4 to 6 pass through Ports 3, 4, and 5, respectively. Further, the path 7 belonging to the common path 3 passes through Port 5 in the same manner as the path 6. When the path 7 is selected as the monitoring path for the common path 3, the number of path candidates for the common path 2 is 2, which is obtained by subtracting one of the paths 7 from three of the paths 4 to 6. Here, if the path 4 has already been selected as the monitoring path for the common path 2, (the number of path candidates−the number of passing paths) of the common path 2 will be 1 at 2-1.
 (S34) パス候補の中から、最も加点の多いパスを監視パスに選ぶ。 (S34) The path with the most points is selected from the path candidates as the monitoring path.
 (S35) 全パスグループの通過パス数が閾値以上か判定する。ここで通過パス数は、パスグループを構成するパスのうち監視パスに選択されたパスの数である。また、閾値とは、監視要求テーブル50の最小パス本数54である。全パスグループがこの条件を満たせばS36へ進み、そうでなければS33へ戻る。 (S35) It is determined whether the number of passing paths of all path groups is greater than or equal to the threshold. Here, the number of passing paths is the number of paths selected as monitoring paths among the paths constituting the path group. The threshold is the minimum number of paths 54 in the monitoring request table 50. If all the path groups satisfy this condition, the process proceeds to S36, and if not, the process returns to S33.
 (S36) 本ステップは、S35で示した条件は満たされたものの、単一部に属するリソース要素を未だ全てカバーできていない場合に行う。例えば、図15(c)で、共通パス2に属するパス4、5、6のうちパス4とパス6が監視パスとして選択され、パス5は選択されていないとする。このとき、Port4は未カバーのままである。本ステップでは、まず、このように、単一部に属する未カバーであるリソース要素を特定する。次に、同リソース要素を通るパスのうち加点が最も高いパスを選択し、同パスが属するパスグループの監視パスとする。 (S36) This step is performed when the conditions shown in S35 are satisfied, but all the resource elements belonging to a single part have not yet been covered. For example, in FIG. 15C, it is assumed that the path 4 and the path 6 are selected as the monitoring paths among the paths 4, 5, and 6 belonging to the common path 2, and the path 5 is not selected. At this time, Port 4 remains uncovered. In this step, first, in this way, uncovered resource elements belonging to a single part are specified. Next, the path having the highest score is selected from the paths passing through the resource element, and is set as the monitoring path of the path group to which the path belongs.
 また、ここに示したように、監視するリソース要素を通過する監視パスの本数が閾値以上となるように監視パスを選択する方法によりパスグループへのパスの割り振りを決めるのではなく、監視するリソース要素を通過する監視パスの本数ができるだけ均等になるように、パスグループに監視パスを割り振る方法もある。これは、共通部の通過パス数の最小数を大きくすることで、パスの構成変更時に通過パス数が閾値を下回る可能性を減らす効果がある。 In addition, as shown here, the resource to be monitored is not determined by allocating the path to the path group by selecting the monitoring path so that the number of monitoring paths passing through the monitored resource element is equal to or greater than the threshold. There is also a method of allocating monitoring paths to path groups so that the number of monitoring paths passing through the elements is as even as possible. This has the effect of reducing the possibility that the number of passing paths falls below the threshold when the path configuration is changed by increasing the minimum number of passing paths of the common unit.
 なお、情報処理システムの運用上、パスの構成が変更されることがある。例えば、性能が逼迫したPoolへの負荷を下げるために、管理者の判断で、そのPool上にデータを置くVOLを別のPool上へ移動することがある。このとき、このVOLのパスが通過するPoolが変わる。つまりパスの構成が変わる。 Note that the path configuration may change due to the operation of the information processing system. For example, in order to reduce the load on a Pool whose performance is tight, a VOL that places data on the Pool may be moved to another Pool at the administrator's discretion. At this time, the Pool through which this VOL path passes changes. That is, the path configuration changes.
 このようにして、用いるリソース要素が変更されたパスが監視パスであった場合、新たなパス構成に合わせてプローブを再選択する必要がある。しかし、再選択による監視パスおよびプローブの選択は最小限に留めたい。これは、あまりに大きく監視パスとプローブの選択を変えてしまうと、これまで計測した性能データやスパイクの記録の連続性が失われるためである。 In this way, when a path whose resource element to be used is changed is a monitoring path, it is necessary to reselect a probe in accordance with a new path configuration. However, we want to minimize the selection of monitoring paths and probes by reselection. This is because, if the monitoring path and the selection of the probe are changed too much, the continuity of performance data and spike recording that have been measured so far will be lost.
 図16は、パス構成の変更に対するプローブ再選択処理のフローチャートである。なお、このフローの処理主体はプローブ管理プログラム5である。図17は、図16のフローチャートの補足説明図である。 FIG. 16 is a flowchart of probe reselection processing for a path configuration change. The processing subject of this flow is the probe management program 5. FIG. 17 is a supplementary explanatory diagram of the flowchart of FIG. 16.
 (S40) 監視パスの構成変更を受領する。収集プログラム6が定期的に構成情報を収集し、その差分を抽出する。プローブ管理プログラム5は、この差分を受領し、プローブ構成テーブル40の監視フラグ42を参照し、変更されたパスが監視パスか否かを判定する。ここで、構成変更前にリソース要素r1を通過していたパス1が、リソース要素r2と同じリソース種別に属するリソース要素r2を通過するパス2に変更されたとする。 (S40) The monitoring path configuration change is received. The collection program 6 periodically collects configuration information and extracts the difference. The probe management program 5 receives this difference, refers to the monitoring flag 42 in the probe configuration table 40, and determines whether the changed path is a monitoring path. Here, it is assumed that the path 1 passing through the resource element r1 before the configuration change is changed to the path 2 passing through the resource element r2 belonging to the same resource type as the resource element r2.
 (S41) プローブ構成テーブル40を参照し、パス1の完全重複パスの有無を調べ、有ればS42へ進み、無ければS43へ進む。 (S41) Referring to the probe configuration table 40, the presence or absence of a completely duplicated path 1 is checked.
 (S42) パス1の代わりに、S41で取得した、パス1の完全重複パス(パス3とする)を新たな監視パスとする。このときパス3の監視フラグ42をYに変更するとともに、共起条件テーブル60の共起条件63に記録されたパス1を全てパス3へ変更する。 (S42) Instead of the path 1, the completely duplicated path (pass 3) acquired in S41 is set as a new monitoring path. At this time, the monitoring flag 42 of the path 3 is changed to Y, and all the paths 1 recorded in the co-occurrence condition 63 of the co-occurrence condition table 60 are changed to the path 3.
 (S43) プローブ構成テーブル40のパスグループID49を参照し、変更前に通過していたリソース要素r1が単一部であったか否かを調べ、単一部であればS44へ進み、単一部でなければ、すなわち共通部であればS48へ進む。なお、単一部であったときの構成変更を図17(a)に、共通部であったときの構成変更を図17(b)に示す。 (S43) Referring to the path group ID 49 of the probe configuration table 40, it is checked whether or not the resource element r1 passed before the change is a single part. If it is a single part, the process proceeds to S44. If not, that is, if it is a common part, the process proceeds to S48. FIG. 17A shows a configuration change when the unit is a single unit, and FIG. 17B shows a configuration change when the unit is a common unit.
 (S44) 以降、S44からS47で、変更箇所が単一部であったときの代替パス選択処理を説明する。なお、適宜図17(c)の補足説明図を用いる。まず、本ステップで、変更前に通過していたリソース要素r1を通過し、既に監視中の別のパスグループを通るパスを検索する。例えば、図17(c)において、共通パス1に属するパス1がパス2に変更されたとする。このとき、パス1が通過していたPort1がパス2ではPort2に変更され、Port1をカバーしていたパスがなくなる。そのため、Port1をカバーする代替パスを選択しなければならない。そこで、Port1を通り共通パス1以外の共通パス(但し、既に監視中のもの)を通るパスを検索する(図17(c)では、そのようなパスの1つがパスnであるとする)。 (S44) Hereinafter, an alternative path selection process when the changed part is a single part will be described in S44 to S47. Note that the supplementary explanatory diagram of FIG. First, in this step, a path that passes through the resource element r1 that has passed before the change and passes through another path group that is already monitored is searched. For example, assume that path 1 belonging to common path 1 is changed to path 2 in FIG. At this time, Port 1 through which path 1 has passed is changed to Port 2 in path 2, and there is no path covering Port 1. Therefore, an alternative path that covers Port1 must be selected. Therefore, a path that passes through Port 1 and passes through a common path other than the common path 1 (however, already monitored) is searched (in FIG. 17C, one of such paths is path n).
 (S45) S44で、パスが見つかった場合はS46へ進み、なければ代替パスがないとして終了する。あるいは、リソース要素r1を対象として図10に示したフローにしたがって監視パスを選択してもよい。 (S45) If a path is found in S44, the process proceeds to S46, and if not, the process ends with no alternative path. Alternatively, the monitoring path may be selected according to the flow shown in FIG. 10 for the resource element r1.
 (S46) S44で検索したパスが複数ある場合、通過パス数が最小のパスグループに属するパスを選択する。これはS36で説明した通り、最小の通過パス数を増加させるためである。 (S46) When there are a plurality of paths searched in S44, the path belonging to the path group with the smallest number of passing paths is selected. This is to increase the minimum number of passing paths as described in S36.
 (S47) S46の選択にしたがい、共起条件テーブル60を更新する。 (S47) The co-occurrence condition table 60 is updated according to the selection of S46.
 (S48) 以降、S48からS410で、変更箇所が共通部であったときの代替パス選択処理を説明する。なお、適宜図17(d)の補足説明図を用いる。まず、本ステップで、変更前のパスであるパス1が属するパスグループの通過パス数-1が閾値(S3、S21で説明したもの)を超過しているか調べる。閾値を超過していれば、代替パスの選択は不要としてS49へ進み、そうでなければS410へ進む。 (S48) Hereinafter, an alternative path selection process when the changed part is a common part from S48 to S410 will be described. Note that the supplementary explanatory diagram of FIG. First, in this step, it is checked whether or not the number of passing paths−1 of the path group to which the path 1 which is the path before the change belongs exceeds a threshold value (explained in S3 and S21). If the threshold value is exceeded, selection of an alternative path is unnecessary and the process proceeds to S49, and if not, the process proceeds to S410.
 (S49) 共起条件テーブル60の共起条件63を更新する。具体的には、パス1が属していたパスグループに基づく条件からパス1を除く。例えば、パスグループがパス1からパス5までの5本で構成されていたとする。このとき、共起条件63に、このパスグループに基づく条件、(パス1&パス2&パス3&パス4&パス5)を含む行があった場合、ここからパス1を除いて(パス2&パス3&パス4&パス5)へ更新する。 (S49) The co-occurrence condition 63 in the co-occurrence condition table 60 is updated. Specifically, path 1 is excluded from the conditions based on the path group to which path 1 belongs. For example, it is assumed that the path group is composed of five paths 1 to 5. At this time, if the co-occurrence condition 63 includes a line including the condition based on this path group (path 1 & path 2 & path 3 & path 4 & path 5), path 1 is excluded from this line (path 2 & path 3 & path 4 & Update to pass 5).
 (S410) 通過パス数に余裕があるパスグループからパスの移譲してもらい、通過パス数を閾値以上に回復させる。図17(d)を用いて説明する。共通部であるProcessor1を通過していたパス1がパス2に変更されたとする。このとき、共通パス1の通過パス数が1減少し、閾値を下回ったとする。そこで、共通パス1に属するパス(パスm)が通過する単一部(Port-n)を特定し、その単一部をカバーしているパス(パスn)が属するパスグループ(共通パスn)を特定する。このようなパスグループのうち、最も通過パス数に余裕がある、すなわち、閾値との差が最も多いパスグループに属するパス(パスn)を共通パス1の新たなメンバとする。 (S410) The path is transferred from a path group with a sufficient number of passing paths, and the number of passing paths is recovered to a threshold value or more. This will be described with reference to FIG. It is assumed that the path 1 that has passed through the common part Processor 1 is changed to the path 2. At this time, it is assumed that the number of passing paths of the common path 1 decreases by 1 and falls below the threshold. Therefore, a single part (Port-n) through which the path (path m) belonging to the common path 1 passes is specified, and the path group (common path n) to which the path (path n) covering the single part belongs is specified. Is identified. Among such path groups, a path (path n) belonging to a path group having the largest number of passing paths, that is, having the largest difference from the threshold value is set as a new member of the common path 1.
 情報処理システムの運用においては管理者が監視対象のリソースを追加する場合がある。次に、この監視対象のリソースが新たに追加されたときに、監視パスおよびプローブ23を追加で選択する処理を説明する。 In the operation of the information processing system, the administrator may add resources to be monitored. Next, a process of additionally selecting a monitoring path and the probe 23 when this monitoring target resource is newly added will be described.
 図18は、監視対象追加時のプローブ選択処理のフローチャートである。この処理の主体はプローブ管理プログラム5である。プローブ管理プログラム5は、新たに指定されたリソース要素の情報を受領し、そのリソース要素に対応する共起条件の有無を調べ、共起条件が有ればその共起条件を構成するパスの監視をプローブ23へ指示し、無ければそのリソース要素を監視するパスとプローブを新たに選定する。 FIG. 18 is a flowchart of probe selection processing when a monitoring target is added. The subject of this processing is the probe management program 5. The probe management program 5 receives information on the newly designated resource element, checks whether there is a co-occurrence condition corresponding to the resource element, and if there is a co-occurrence condition, monitors the path constituting the co-occurrence condition To the probe 23, if not, a path and a probe for monitoring the resource element are newly selected.
 (S50) 新たに監視が指定されたリソース要素の情報を受領する。 (S50) Receives information on resource elements for which monitoring is newly specified.
 (S51) 共起条件テーブル60から、監視が指定されたリソース要素に対応する行を検索する。無ければS52へ進み、有ればS53へ進む。 (S51) The co-occurrence condition table 60 is searched for a row corresponding to the resource element designated for monitoring. If not, the process proceeds to S52, and if present, the process proceeds to S53.
 (S52) 監視が指定されたリソース要素の監視パスおよびプローブ23を選択する。この選択方法は、個々のリソース要素を起点として選択する方法(図9に示したもの)でもよいし、指定されたリソース要素を含むストレージ12内部の全リソース要素に対して監視パスおよびプローブ23を選択する方法(図12に示したもの)でもよい。 (S52) The monitoring path and probe 23 of the resource element for which monitoring is specified are selected. This selection method may be a method of selecting individual resource elements as starting points (shown in FIG. 9), or monitoring paths and probes 23 for all resource elements in the storage 12 including the specified resource elements. The method of selection (shown in FIG. 12) may be used.
 (S53) S51で検索した行の共起条件63を取得し、共起条件を構成するパスの監視フラグ42をYへ更新する。同時に、そのパスを持つプローブに対して、そのパスの監視を指示する。 (S53) The co-occurrence condition 63 of the line searched in S51 is acquired, and the monitoring flag 42 of the path constituting the co-occurrence condition is updated to Y. At the same time, the probe having the path is instructed to monitor the path.
<画面>
 以下、図19から図21を用いて本実施形態における画面表示について説明する。これらの画面を介して、管理計算機1は管理者と情報をやり取りする。
<Screen>
Hereinafter, screen display in the present embodiment will be described with reference to FIGS. 19 to 21. The management computer 1 exchanges information with the administrator via these screens.
 図19は、リソース選択画面を示す図である。この画面で、管理者は、細かい時間間隔での監視を求めるリソースを指定する。なお、図19では、管理計算機1が管理する、複数あるストレージ12のうちストレージ1が既に選択されているものとする。 FIG. 19 is a diagram showing a resource selection screen. On this screen, the administrator designates a resource that requires monitoring at a fine time interval. In FIG. 19, it is assumed that the storage 1 has already been selected from the plurality of storages 12 managed by the management computer 1.
 リソース選択画面は、サーバリスト190とリソースリスト191から構成される。サーバリスト190の各行には、ストレージ1と関連のあるサーバ情報192が表示される。管理者は、左端のチェックボックスで細かい監視を求めるサーバを指定する。管理者は、全選択チェックボックス193をチェックすることで、全サーバを選択することもできる。また、管理者は、最小パス数194の値を適宜変更する。 The resource selection screen includes a server list 190 and a resource list 191. In each row of the server list 190, server information 192 related to the storage 1 is displayed. The administrator designates a server for which detailed monitoring is required with a check box at the left end. The administrator can also select all servers by checking the all selection check box 193. Also, the administrator changes the value of the minimum number of paths 194 as appropriate.
 リソースリスト191の各行には、ストレージ1内部のリソースが各行に表示される。サーバリスト190と同様に、このリストでも、管理者は、細かい監視を求めるリソースの選択を行うことができる。また、リソース要素毎に最小パス数197を設定可能であり、管理者は、細かい監視を求める最小パス数197の値の設定および変更を行うことができる。 In each line of the resource list 191, resources in the storage 1 are displayed in each line. Similar to the server list 190, the administrator can select a resource for which detailed monitoring is required in this list. Further, the minimum number of paths 197 can be set for each resource element, and the administrator can set and change the value of the minimum number of paths 197 for which detailed monitoring is required.
 管理者がOKボタン198を選択すると、入力内容がプローブ管理プログラム5へ送られる。プローブ管理プログラム5は、入力内容を監視要求テーブル50へ保存する。その後、プローブ管理プログラム5は、プローブ選択計算を開始する。 When the administrator selects the OK button 198, the input content is sent to the probe management program 5. The probe management program 5 stores the input content in the monitoring request table 50. Thereafter, the probe management program 5 starts probe selection calculation.
 図20は、プローブ選択案の提示画面を示す図である。この画面には、プローブ管理プログラム5が、管理者の監視要求を満たすように計算したプローブ選択の結果を提示するための画面である。この画面は、プローブサマリ、カバーリソースサマリ202、および、リソース別監視パス構成203から構成される。 FIG. 20 is a diagram showing a probe selection proposal presentation screen. This screen is a screen for presenting the probe selection result calculated by the probe management program 5 so as to satisfy the monitoring request of the administrator. This screen includes a probe summary, a cover resource summary 202, and a resource-specific monitoring path configuration 203.
 プローブサマリには、監視に要するプローブ数200と監視パス数201を表示する。監視に要する監視パス数201は、プローブ管理テーブル40を参照し、ストレージ1のリソースを通過する監視パスを集計することで求められる。また、監視に要するプローブ数200は、それらの監視パスを持つプローブ数を集計することで求められる。 In the probe summary, the number of probes 200 required for monitoring and the number 201 of monitoring paths are displayed. The number 201 of monitoring paths required for monitoring is obtained by referring to the probe management table 40 and totaling the monitoring paths that pass through the resources of the storage 1. The number of probes 200 required for monitoring is obtained by counting the number of probes having those monitoring paths.
 カバーリソースサマリ202には、ストレージ1のリソース種別ごとのリソース数と、現在のプローブの選択により、カバーされているリソース数(カバーリソース数)を提示する。 In the cover resource summary 202, the number of resources for each resource type of the storage 1 and the number of resources covered by the current probe selection (the number of cover resources) are presented.
 リソース別監視パス構成203には、各行に、それぞれの監視対象リソースに対応する監視パス構成を表示する。各行には、監視対象リソースに対応する共起条件と、その共起条件を構成する監視パスもしくは監視パスグループ、および、パス本数(通過パス数)を表示する。これらの情報は、共起条件テーブル60から取得できる。また、補足情報として、それらパスの流量を表すIOPSの計測データを合わせて表示する。 In the monitoring path configuration by resource 203, the monitoring path configuration corresponding to each monitoring target resource is displayed in each row. Each row displays a co-occurrence condition corresponding to the monitoring target resource, a monitoring path or a monitoring path group constituting the co-occurrence condition, and the number of paths (the number of passing paths). Such information can be acquired from the co-occurrence condition table 60. Further, as supplementary information, IOPS measurement data representing the flow rate of these paths is displayed together.
 管理者は、提示されたプローブ選択案で問題なければOKボタン204を押下して、選択案を承認する。問題があればCancelボタン205を押下して、図19のリソース選択画面へ戻る。 If there is no problem with the presented probe selection plan, the administrator presses the OK button 204 and approves the selection plan. If there is a problem, the Cancel button 205 is pressed to return to the resource selection screen of FIG.
 図21は、監視結果画面を示す図である。この画面には、プローブ23が計測したスパイクを管理計算機1が集計、統計処理し、監視指定されたリソースに起因するスパイクを抽出した結果が表示される。ここでは、管理者は、表示されたスパイク履歴から、リソースのスパイク増加傾向を読み取り、そのリソースで性能障害の予兆が現れていると判断する。なお、図21の画面では、ストレージ1のPort1が既に選択されている。 FIG. 21 is a diagram showing a monitoring result screen. On this screen, the management computer 1 totals and statistically processes the spikes measured by the probe 23, and the result of extracting spikes caused by the resources designated for monitoring is displayed. Here, the administrator reads a spike increase tendency of a resource from the displayed spike history, and determines that a sign of performance failure appears in the resource. In the screen of FIG. 21, Port 1 of the storage 1 has already been selected.
 監視結果画面は、スパイク統計210とスパイク履歴211から構成される。スパイク統計210には、現在選択されているリソース(図21ではPort1)に関連する1週間分のスパイクの統計情報を表示する。スパイク統計210には、(a)計測したスパイク数、(b)その他リソースに起因するスパイク数、(c)Port1に起因するスパイク数、および、(c)の前週比が表示される。 The monitoring result screen is composed of spike statistics 210 and spike history 211. The spike statistics 210 displays spike statistical information for one week related to the currently selected resource (Port 1 in FIG. 21). In the spike statistics 210, (a) the number of spikes measured, (b) the number of spikes attributed to other resources, (c) the number of spikes attributed to Port1, and the previous week ratio of (c) are displayed.
 以下、共起条件テーブル60に格納された、Port1に対応する共起条件63が「パス1NOT(パス2)」であったとして、(a)から(c)の計算方法を説明する。(a)の値は、次のように求められる。すなわち、統計処理プログラム7が、スパイク履歴テーブル70から、パス1に関連するVOLに該当する行数、すなわち、パス1で計測したスパイク数を集計する。この値を(a)に表示する。(b)の値は、(a)の計算で求めた行のうち、原因リソースID75がPort1以外、つまり、Port1ではない他のリソースである行の数を集計することで求められる。(c)の値は、計算式(a)-(b)で求められる。これらの値、特に、(c)Port1に起因するスパイクの数と、その前週比から、管理者は、Port1に起因するスパイクの増加を読み取ることができ、Port1の性能が逼迫していると判断できる。 Hereinafter, assuming that the co-occurrence condition 63 corresponding to Port 1 stored in the co-occurrence condition table 60 is “path 1 NOT (pass 2)”, the calculation method from (a) to (c) will be described. The value of (a) is obtained as follows. That is, the statistical processing program 7 totals the number of rows corresponding to the VOL related to pass 1, that is, the number of spikes measured in pass 1 from the spike history table 70. This value is displayed in (a). The value of (b) is obtained by aggregating the number of rows whose cause resource ID 75 is other than Port1, that is, other resources that are not Port1, among the rows obtained by the calculation of (a). The value of (c) is obtained by calculation formulas (a)-(b). From these values, in particular, (c) the number of spikes caused by Port1 and the ratio of the previous week, the administrator can read the increase in spikes caused by Port1, and determine that the performance of Port1 is tight. it can.
 スパイク履歴211は、スパイク統計で示した数値をグラフで示したものである。このグラフには、パス1で計測したI/O応答時間が記録されている。また、このグラフから、合計6つのスパイクが計測されたことが分かる(Port1に起因するスパイク212)。これらスパイクのうち、点線で示されたスパイクは、その他リソースに起因するスパイク213aおよび213bであることを示している。これにより、管理者は、これらのスパイクはPort1に起因するものではないことをグラフから容易に読み取ることができる。 The spike history 211 is a graph showing the numerical values shown in the spike statistics. In this graph, the I / O response time measured in pass 1 is recorded. Also, it can be seen from this graph that a total of 6 spikes were measured (spike 212 due to Port 1). Among these spikes, spikes indicated by dotted lines indicate spikes 213a and 213b caused by other resources. As a result, the administrator can easily read from the graph that these spikes are not caused by Port1.
 本実施形態のここまでの説明は、リソースとリソース要素は1対1として説明してきたが、必ずしもそうでなくともよい。複数のリソースを1つのリソース要素として扱ってもよいし、1つのリソースを複数のリソース要素に分割して扱ってもよい。以下では、そのような変形例を述べる。 In the above description of the present embodiment, the resource and the resource element have been described as one-to-one, but this is not necessarily required. A plurality of resources may be handled as one resource element, or one resource may be divided into a plurality of resource elements. Below, such a modification is described.
 図22は、変形例におけるリソース構成テーブル30を示す図である。図22は、図4に示したリソース構成テーブル30とは、リソースの属性についての情報(属性34および属性値35)が追加されている点が異なる。図22の行36aは、PortのリソースであるPT1の情報である。行36aには、PT1の属性情報として、PT1がPortのグループであるトランクTR1に属することが記録されている。同じトランクに属するPortは、負荷に応じてトラフィックが自動的に分散される。この場合、複数のリソース(Port)の集合であるトランクを1つのリソース要素として扱うことができる。 FIG. 22 is a diagram showing a resource configuration table 30 in the modification. FIG. 22 is different from the resource configuration table 30 shown in FIG. 4 in that information about attributes of resources (attribute 34 and attribute value 35) is added. A row 36a in FIG. 22 is information on PT1 which is a Port resource. The row 36a records that PT1 belongs to the trunk TR1, which is a group of Ports, as attribute information of PT1. For Ports belonging to the same trunk, traffic is automatically distributed according to the load. In this case, a trunk that is a set of a plurality of resources (Ports) can be handled as one resource element.
 同様に、行30bは、ProcessorのリソースであるPR1の情報であり、PR1がPRG1というプロセッサグループに属することを示している。プロセッサグループは、先のトランクと同様に、プロセッサの負荷に応じて、プロセッサグループに属するプロセッサ間で処理が自動的に分散されるものとする。このとき、先のトランクの場合と同様に、プロセッサグループを1つのリソース要素として扱うことができる。 Similarly, the row 30b is information on PR1, which is a processor resource, and indicates that PR1 belongs to a processor group called PRG1. In the processor group, processing is automatically distributed among the processors belonging to the processor group in accordance with the processor load, similarly to the previous trunk. At this time, as in the case of the previous trunk, the processor group can be handled as one resource element.
 また、行36cには、PoolのリソースであるPO1の情報が格納されている。行36cの属性情報(属性34および属性値35)から、PO1が、処理速度が異なる複数のメディア(SSDおよびSAS)から構成されていることが分かる。このようなPoolでは、処理するI/Oが要求するデータの格納先メディアによって、応答時間などの性能特性が大きく異なる。このようなPoolは、メディアごとに、性能特性が異なる複数のリソース要素があるとみなせる。したがって、SSDおよびSASから構成されるPO1は、SSDのリソース要素とSASのリソース要素に分けるとよい。 In addition, the row 36c stores information on PO1, which is a Pool resource. From the attribute information (attribute 34 and attribute value 35) in the row 36c, it can be seen that PO1 is composed of a plurality of media (SSD and SAS) having different processing speeds. In such a Pool, performance characteristics such as response time vary greatly depending on the storage medium of data requested by the I / O to be processed. Such a Pool can be regarded as having a plurality of resource elements having different performance characteristics for each medium. Therefore, PO1 composed of SSD and SAS may be divided into SSD resource elements and SAS resource elements.
 図23は、リソース集合/分割処理のフローチャートである。図23には、プローブ管理プログラム5が、リソース構成テーブル30を参照して、複数のリソースを1つのリソース要素にまとめる、あるいは、1つのリソースを複数のリソース要素に分ける処理が示されている。 FIG. 23 is a flowchart of resource set / division processing. FIG. 23 shows a process in which the probe management program 5 refers to the resource configuration table 30 to combine a plurality of resources into one resource element or divide one resource into a plurality of resource elements.
 (S60) 属性34および属性値35を参照し、複数のリソースが負荷分散される1つグループに属しているなら、同グループに属するリソースを1つのリソース要素とする。負荷分散されるグループは、先の例では、Portのトランクやプロセッサグループに相当する。 (S60) With reference to the attribute 34 and the attribute value 35, if a plurality of resources belong to one group where the load is distributed, the resource belonging to the same group is set as one resource element. In the above example, the load-distributed group corresponds to a Port trunk or a processor group.
 (S61) リソースの属性情報(属性34と属性値35)が、そのリソースを性能特性が異なるいくつかのリソースから構成されることを示しているなら、そのリソースを性能特性ごとのリソース要素に分ける。 (S61) If the resource attribute information (attribute 34 and attribute value 35) indicates that the resource is composed of several resources having different performance characteristics, the resource is divided into resource elements for each performance characteristic. .
 以上説明したように、本実施形態によれば、監視対象とするリソース要素(監視リソース要素)を通過するパスの本数が、所定の最小パス本数以上となるまで、通過するパスの本数が最小パス本数に達していない監視リソース要素(未カバー監視リソース要素)を最も多く通過するパスを監視するプローブが監視できるパスの中から、パスを選択していくので、できるだけ多くの監視リソース要素を1つのプローブにより監視できるようにしながら、性能低下の原因となった監視リソース要素を切り分けるための最小パス本数を確保するまで、監視パスを選択していき、システムの性能の計測を少ないコストで実現する。 As described above, according to this embodiment, the number of paths that pass through the resource element (monitoring resource element) to be monitored is the minimum number of paths that pass until the number of paths that pass through the predetermined minimum number of paths is exceeded. Since the probe that monitors the path that passes the largest number of monitor resource elements (uncovered monitor resource elements) that has not reached the number is selected from the paths that can be monitored, select as many monitor resource elements as possible. While monitoring with the probe, the monitoring path is selected until the minimum number of paths for isolating the monitoring resource element that caused the performance degradation is secured, and the measurement of the system performance is realized at a low cost.
 また、最も多くの未カバー監視リソース要素を通過するパスを監視するプローブが監視できるパスの中から、所定の規則に従ってパスを選択するというように、パスの選択に所定の規則を導入することで、より好適な監視パスの設定を可能にしている。 In addition, by introducing a predetermined rule for path selection, such as selecting a path according to a predetermined rule from paths that can be monitored by a probe that monitors the path that passes through the most uncovered monitoring resource elements. Therefore, it is possible to set a more preferable monitoring path.
 また、監視リソース要素が単一交点となるパスの組が選択されることになるパスを優先するので、その管理対象リソースを他の管理対象リソースと切り分けやすくなり、性能低下の原因となった管理対象リソース要素を特定しやすくなっている。 In addition, since the path whose monitoring resource element has a single intersection is given priority, it becomes easier to separate the managed resource from other managed resources, and the management that caused the performance degradation It is easier to identify the target resource element.
 また、パスごとの流量、例えば1秒あたりのI/O量(IOPS)が多いパスに加点するというように、処理量の多い処理に利用されるパスを優先するので、スパイク等の性能低下の原因となりやすいパスを優先して監視するとともに、スパイク等の性能低下を検知しやすくなり、その性能低下の原因となった管理対象リソースの切り分けが容易となる。 In addition, since a path used for processing with a large amount of processing is prioritized, such as adding a flow rate per path, for example, a path with a large I / O amount per second (IOPS), performance degradation such as spikes is reduced. In addition to monitoring the path that is likely to cause priority, it becomes easier to detect performance degradation such as spikes, and it becomes easier to identify the managed resource that caused the performance degradation.
 また、各監視リソース要素についての、選択されたパスが通過する監視リソース要素の合計個数が所定値以上になるようにパスを選択するので、監視されるパスに含まれる監視リソース要素が多くなるようにパスを選択することで、管理対象リソース要素の切り分けがしやすくなる。 In addition, since the path is selected so that the total number of the monitoring resource elements through which the selected path passes for each monitoring resource element is equal to or greater than a predetermined value, the monitoring resource elements included in the monitored path are increased. By selecting the path, it becomes easy to separate the managed resource elements.
 あるいは、各監視リソース要素についての、選択されたパスが通過する監視リソース要素の合計個数が、均等になるようパスを選択した場合には、管理対象リソース要素ごとに切り分けのしやすさがばらつかなくなる。 Alternatively, if the paths are selected so that the total number of monitor resource elements that the selected path passes through for each monitor resource element is equal, the ease of carving for each managed resource element varies. Disappear.
 また、完全重複パスを優先してパスを選択するので、選択されていたパスのパス構成(そのパスが通過するリソース)が変更されたとき、そのパスの代わりに監視とするパスを簡単に用意することができる。そして、実際に、監視パスが通過する監視リソース要素が変更されたとき、その完全重複パスがあれば、それを監視パスに代えて監視パスとするので、全く同じ監視リソース要素を通過するパスへと簡単に切り替えることができる。 In addition, since a path is selected with priority given to a completely duplicated path, when the path configuration of the selected path (resource through which the path passes) is changed, a path to be monitored is easily prepared instead of that path. can do. When the monitoring resource element that actually passes through the monitoring path is changed, if there is a completely duplicated path, it is used as a monitoring path instead of the monitoring path. And can be switched easily.
 また、監視パスが通過する監視リソース要素が変更されたとき、その監視パスの完全重複パスが存在しなければ、監視パスが含んでいた監視リソース要素を通過するパスのうち、他の監視リソースを通過する監視パスのグループと同じ監視パスを通過するパスの中で、そのグループに含まれる監視パスの本数が最小のグループと同じ監視パスを通過するパスを監視パスとするので、監視パスを新たに設定するときに、それと同時に他の監視リソース要素の切り分け性を向上することもできる。 In addition, when the monitoring resource element that the monitoring path passes is changed, if there is no complete duplicate path of the monitoring path, other monitoring resources are selected from the paths that pass through the monitoring resource element included in the monitoring path. Among the paths that pass the same monitoring path as the group of monitoring paths that pass, the path that passes the same monitoring path as the group with the smallest number of monitoring paths included in the group is used as the monitoring path. At the same time, it is possible to improve the distinguishability of other monitoring resource elements.
<<第2の実施形態>> << Second Embodiment >>
 第1の実施形態における管理計算機1は、ストレージシステムのアクセス性能が劣化したときに、その要因を特定するために必要な最小限のプローブを求めることを意図したものであった。第2の実施形態における管理計算機1は、その対象をストレージシステムからアプリケーションに置き換えたものである。 The management computer 1 in the first embodiment is intended to obtain a minimum probe necessary for specifying the cause when the access performance of the storage system deteriorates. The management computer 1 in the second embodiment is obtained by replacing the target from a storage system with an application.
 第1の実施形態におけるストレージシステムがいくつかのリソースから構成されていたように、アプリケーションもいくつかのプログラム処理から構成される。これらプログラム処理は、第1の実施形態におけるリソースに相当する。プログラム処理は、例えば、プログラムモジュールでの処理や、データベーステーブル(あるいは、データベーステーブルへのアクセス処理)などである。 Just as the storage system in the first embodiment is composed of several resources, the application is also composed of several program processes. These program processes correspond to resources in the first embodiment. The program process is, for example, a process in a program module or a database table (or access process to the database table).
 アプリケーションは、アプリケーションのユーザに対して、何らかのサービスを提供する。例えば、アプリケーションがWebの検索システムであれば、サービスとは特定のキーワードにマッチするWebページをユーザに返すサービスになる。ユーザは、アプリケーションへ、サービスを指定して処理要求(サービス要求)を送る。アプリケーションは、その要求を実行し、結果をユーザへ返す。 Application provides some service to application users. For example, if the application is a Web search system, the service is a service that returns a Web page that matches a specific keyword to the user. The user designates a service and sends a processing request (service request) to the application. The application executes the request and returns the result to the user.
 第1の実施形態では、プローブ23がサーバからストレージシステムへのI/O要求の応答時間を計測したのに対して、第2の実施形態におけるプローブ23は、アプリケーションがユーザへサービス要求を返すまでの応答時間を計測する。また、第2の実施形態のおけるサービスは、第1の実施形態におけるパスに相当する。第1の実施形態において、パスは、そのパスを通ってストレージシステムへ送られるI/O要求を処理する一連のリソースから構成された。第2の実施形態におけるパスに相当するサービスも、これと同様に、ユーザからのサービス要求を処理する、一連のアプリケーション内部のプログラム処理から構成される。 In the first embodiment, the probe 23 measures the response time of the I / O request from the server to the storage system, whereas the probe 23 in the second embodiment is used until the application returns a service request to the user. Measure the response time. The service in the second embodiment corresponds to the path in the first embodiment. In the first embodiment, a path is composed of a series of resources that process I / O requests sent to the storage system through the path. Similarly to this, the service corresponding to the path in the second embodiment is configured by a series of application program processing for processing a service request from the user.
 以上をまとめると、第2の実施形態は、第1の実施形態の相似形であり、監視の対象がストレージシステムからアプリケーションに変わったものである。第1の実施形態における、リソースはプログラム処理に、パスはサービスにそれぞれ相当し、プローブ23が監視する応答時間は、サービス応答時間になる。 In summary, the second embodiment is similar to the first embodiment, and the monitoring target is changed from a storage system to an application. In the first embodiment, a resource corresponds to a program process, a path corresponds to a service, and the response time monitored by the probe 23 is a service response time.
 図24は、第2の実施形態におけるシステム構成図である。一見してわかる通り、図24における大半の部分が図1に示した第1の実施形態のものと重複する。そこで、ここでは主に差分にあたるところを説明する。 FIG. 24 is a system configuration diagram according to the second embodiment. As can be seen at a glance, most of the parts in FIG. 24 overlap with those of the first embodiment shown in FIG. Therefore, here, a description will be given mainly of the difference.
 本実施形態で監視の対象になるアプリケーションは、IPスイッチ102、Webサーバ103およびデータベースサーバ106から構成される。これらは、LAN11を解して管理計算機1と接続されている。また、これらは、LAN11とは別系統の業務LAN101で接続されており、相互通信が可能である。Webサーバ103、データベースサーバ106は、CPUやメモリ、HDDなどの永続記憶装置を備えた通常の計算機である。Webサーバ103には、アプリケーションの一部であるWebプログラム104が動作する。また、Webサーバ103には、Webプログラム104を構成する多数のプログラムモジュール105がある。 The application to be monitored in this embodiment includes the IP switch 102, the Web server 103, and the database server 106. These are connected to the management computer 1 via the LAN 11. These are connected by a business LAN 101 of a different system from the LAN 11 and can communicate with each other. The Web server 103 and the database server 106 are ordinary computers provided with permanent storage devices such as a CPU, memory, and HDD. A Web program 104 that is a part of an application operates on the Web server 103. The Web server 103 includes a large number of program modules 105 that constitute the Web program 104.
 データベースサーバ106には、同じくアプリケーションの一部であるデータベースプログラム107が動作する。また、データベースサーバ106には、アプリケーションのデータが格納されたデータベーステーブル108がある。 The database server 106 operates a database program 107 that is also a part of the application. The database server 106 also has a database table 108 in which application data is stored.
 アプリケーションのユーザから送られるサービス要求は、業務LAN101を通って、まずIPスイッチ102に入る。IPスイッチ102はそれをWebサーバ103へ送る。Webサーバ103では、Webプログラム104がサービス要求を受信し、サービス要求に関連するプログラムモジュール104を読み込んで所定の処理を実行する。所定の処理にアプリケーションが持つデータが必要であれば、Webプログラム104は、サービス要求を更にデータベースサーバ105へ送信する。データベースサーバ105では、データベースプログラム107がこれを受信し、サービス要求に関連するデータベーステーブル108に対して所定のデータ処理を実行し、その結果を要求元のWebプログラム104へ返す。Webプログラム104は、更に所定の処理を実行して、その結果をIPスイッチ102経由でユーザへ返す。 The service request sent from the application user first enters the IP switch 102 through the business LAN 101. The IP switch 102 sends it to the Web server 103. In the Web server 103, the Web program 104 receives the service request, reads the program module 104 related to the service request, and executes predetermined processing. If the data possessed by the application is necessary for the predetermined processing, the Web program 104 further transmits a service request to the database server 105. In the database server 105, the database program 107 receives this, executes predetermined data processing on the database table 108 related to the service request, and returns the result to the requesting Web program 104. The Web program 104 further executes predetermined processing and returns the result to the user via the IP switch 102.
 サービス監視サーバ100は、CPU、メモリ、HDDなどの記憶装置を備え、プログラムを実行する計算機である。サービス監視サーバ100上では、プログラムの一種であるプローブ23が動作する。サービス監視サーバ100は、IPスイッチ102に接続される。IPスイッチ102は、業務LAN101を通過するサービス要求のパケットと、それに対するアプリケーションの応答のパケットを複製し、サービス監視サーバ100へ送信する。プローブ23は、これらのサービス要求/応答パケットの時間差から、サービスごとの応答時間を算出し、記録する。 The service monitoring server 100 is a computer that includes a storage device such as a CPU, a memory, and an HDD and executes a program. On the service monitoring server 100, a probe 23 which is a kind of program operates. The service monitoring server 100 is connected to the IP switch 102. The IP switch 102 copies a service request packet passing through the business LAN 101 and an application response packet to the service request packet, and transmits the duplicated packet to the service monitoring server 100. The probe 23 calculates and records the response time for each service from the time difference between these service request / response packets.
 プローブ23は、サービスの応答時間を算出し、その値を監視する。応答時間にスパイクを検知すると、スパイクが起きたサービス要求、検知した時間、応答時間を記録する。記録内容は、管理計算機1上で動作する収集プログラム6により定期的に収集され、スパイク履歴テーブル70に格納される。 The probe 23 calculates the response time of the service and monitors the value. When a spike is detected in the response time, the service request in which the spike occurred, the detected time, and the response time are recorded. The recorded contents are periodically collected by the collection program 6 operating on the management computer 1 and stored in the spike history table 70.
 複製したパケットを照合することによって応答時間を算出する処理は、CPUやメモリを多量に消費する負荷の高い処理である。そのため、プローブ23は、応答時間を算出する対象のサービスを限定する機能を持つ。これにより、CPUやメモリの消費量を下げることができる。対象とするサービスの選択は、管理計算機1がプローブ23へ指示する。 The process of calculating the response time by collating the duplicated packet is a high-load process that consumes a large amount of CPU and memory. Therefore, the probe 23 has a function of limiting the service for which the response time is calculated. Thereby, the consumption of CPU and memory can be reduced. The management computer 1 instructs the probe 23 to select a target service.
 次に管理計算機1上の各テーブルを説明する。ここではテーブルの内容が第1の実施形態と異なる部分について説明する。 Next, each table on the management computer 1 will be described. Here, a description will be given of portions where the contents of the table are different from those of the first embodiment.
 図25は、第2の実施形態におけるリソース構成テーブル30の内容を示す図である。 FIG. 25 is a diagram showing the contents of the resource configuration table 30 in the second embodiment.
 第1の実施形態でのリソース構成テーブル30には、ストレージ内部のリソース構成が格納された。第2の実施形態では、アプリケーションを構成するサーバ(サーバID31)ごとに、それらサーバ上で動作するリソースに相当するプログラム処理をリソース構成テーブル30に格納する。リソースはリソース種別32ごとに、その一意の識別子(リソースID33)が格納される。例えば、図25では、アプリケーションを構成するWeb-Sv1というサーバ上に、リソース種別がプログラムモジュールであるリソースPM1、PM2、PM3・・・があることを示している。 The resource configuration table 30 in the first embodiment stores the resource configuration inside the storage. In the second embodiment, for each server (server ID 31) configuring an application, program processing corresponding to resources operating on the servers is stored in the resource configuration table 30. The resource stores a unique identifier (resource ID 33) for each resource type 32. For example, FIG. 25 shows that there are resources PM1, PM2, PM3... Whose resource type is a program module on a server called Web-Sv1 that constitutes an application.
 図26は、第2の実施形態におけるプローブ構成テーブル40を示す図である。 FIG. 26 is a diagram showing a probe configuration table 40 in the second embodiment.
 プローブ構成テーブル40は、以下に示す、プローブ23の構成情報、プローブ23の構成情報、およびプローブ23の監視情報という情報を格納する。プローブ23の構成情報には、プローブ23の識別子(プローブID41)と、プローブ23が稼動しているサービス監視サーバ100(サーバID43)とが含まれる。サービスの構成情報には、サービスの識別子410と、サービスが使用するプログラムモジュール等のリソース(サービス46、リソース種別47、リソースID、パスグループID49)、サービスのURL411(アプリケーションがWebアプリケーションの場合)とが含まれる。プローブ23の監視情報には、プローブ23によるサービス監視の有無(監視フラグ42)が含まれる。 The probe configuration table 40 stores the following configuration information of the probe 23, configuration information of the probe 23, and monitoring information of the probe 23. The configuration information of the probe 23 includes the identifier of the probe 23 (probe ID 41) and the service monitoring server 100 (server ID 43) on which the probe 23 is operating. The service configuration information includes a service identifier 410, a resource such as a program module used by the service (service 46, resource type 47, resource ID, path group ID 49), service URL 411 (when the application is a Web application), and the like. Is included. The monitoring information of the probe 23 includes the presence / absence of service monitoring by the probe 23 (monitoring flag 42).
 サービスの構成情報は、アプリケーションの設計情報をもとに管理者が手動で入力してもよいし、アプリケーション実行時にアプリケーションが出力するトレースやログを収集プログラム6が収集/解析して入力してもよい。 The service configuration information may be input manually by the administrator based on the application design information, or the collection program 6 may collect / analyze and input traces and logs output by the application during application execution. Good.
 プローブ管理プログラム5は、これらテーブルの情報に基づいて、リソース(プログラムモジュールやデータベーステーブル)の性能劣化を特定するために必要な、最小限のサービスを選択し、それらサービスに限定して応答時間を監視するようプローブ23に指示する。最小限のサービスを選択する方法は、第1の実施形態において、最小限のプローブを選択する方法と変わらない。したがって、第1の実施形態で説明した処理フローを、そのまま適用できる。パスなどの用語を、対応する第2の実施形態での用語に置き換えればよい。 Based on the information in these tables, the probe management program 5 selects the minimum services necessary to identify the performance degradation of the resources (program modules and database tables) and limits the response time to those services. The probe 23 is instructed to monitor. The method for selecting the minimum service is the same as the method for selecting the minimum probe in the first embodiment. Therefore, the processing flow described in the first embodiment can be applied as it is. A term such as “path” may be replaced with the corresponding term in the second embodiment.
 以上、上述した各実施形態は、本発明の説明のための例示であり、本発明の範囲をそれらの実施形態にのみ限定する趣旨ではない。当業者は、本発明の要旨を逸脱することなしに、他の様々な態様で本発明を実施することができる。 As mentioned above, each embodiment mentioned above is an illustration for explanation of the present invention, and is not the meaning which limits the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the gist of the present invention.
1…管理計算機、10…表示装置、100…サービス監視サーバ、101…業務LAN、102…IPスイッチ、103…Webサーバ、104…Webプログラム、104…プログラムモジュール、105…データベースサーバ、プログラムモジュール、106…データベースサーバ、107…データベースプログラム、108…データベーステーブル、11…LAN、12…ストレージ、13…SW、14…Port、15…Processor、16…Pool、17…Cache、18…SAN、19…サーバ、2…CPU、20…CPU、21…メモリ、25…計算機内ストレージ、26…HBA、30…リソース構成テーブル、4…計算機内ストレージ
 
DESCRIPTION OF SYMBOLS 1 ... Management computer, 10 ... Display apparatus, 100 ... Service monitoring server, 101 ... Business LAN, 102 ... IP switch, 103 ... Web server, 104 ... Web program, 104 ... Program module, 105 ... Database server, program module, 106 ... Database server, 107 ... Database program, 108 ... Database table, 11 ... LAN, 12 ... Storage, 13 ... SW, 14 ... Port, 15 ... Processor, 16 ... Pool, 17 ... Cache, 18 ... SAN, 19 ... Server, 2 ... CPU, 20 ... CPU, 21 ... memory, 25 ... computer storage, 26 ... HBA, 30 ... resource configuration table, 4 ... computer storage

Claims (12)

  1.  複数段のリソース要素を連ねたパスによって情報処理を実行する情報処理システムを管理する管理計算機であって、
     監視対象とするリソース要素である監視リソース要素を通過するパスの本数が、所定の最小パス本数以上となるように、当該監視リソース要素を通過するパスを監視するプローブのうち、通過するパスの本数が最小パス本数に達していない監視リソース要素である未カバー監視リソース要素を最も多く通過するパスを監視するプローブが監視できるパスの中から、パスを選択していき、選択されたパスを、監視対象とするパスである監視パスとし、前記監視パスを監視するプローブを、監視対象とするプローブである監視プローブとして設定する、プローブ管理手段と、
     前記監視プローブによる監視結果を収集する収集手段と、
     前記監視プローブの監視結果に基づく、共起性パタンによる切り分けにより、性能低下の原因となった監視リソース要素を判定する統計処理手段と、を有する管理計算機。
    A management computer that manages an information processing system that executes information processing by a path in which a plurality of resource elements are connected,
    The number of paths that pass through the probes that monitor the paths that pass through the monitored resource element so that the number of paths that pass through the monitored resource element that is the resource element to be monitored is equal to or greater than the predetermined minimum number of paths Select the path from the paths that can be monitored by the probe that monitors the path that passes the largest number of uncovered monitor resource elements that are monitor resource elements that have not reached the minimum number of paths, and monitor the selected path A probe management unit that sets a monitoring path that is a target path and sets a probe that monitors the monitoring path as a monitoring probe that is a probe to be monitored;
    A collecting means for collecting a monitoring result by the monitoring probe;
    And a statistical processing unit that determines a monitoring resource element that has caused the performance degradation based on the co-occurrence pattern based on the monitoring result of the monitoring probe.
  2.  前記プローブ管理手段は、最も多くの未カバー監視リソース要素を通過するパスを監視するプローブが監視できるパスの中から、所定の規則に従ってパスを選択する、請求項1に記載の管理計算機。 The management computer according to claim 1, wherein the probe management means selects a path according to a predetermined rule from paths that can be monitored by a probe that monitors a path that passes through the most uncovered monitoring resource elements.
  3.  前記プローブ管理手段は、前記監視リソース要素が単一交点となるパスの組が選択されることになるパスを優先するという前記規則に従って、前記パスを選択する、請求項2に記載の管理計算機。 3. The management computer according to claim 2, wherein the probe management means selects the path according to the rule that a path from which a set of paths where the monitoring resource element is a single intersection is prioritized is selected.
  4.  前記プローブ管理手段は、処理量の多い処理に利用されるパスを優先するという前記規則に従って、前記パスを選択する、請求項2に記載の管理計算機。 3. The management computer according to claim 2, wherein the probe management means selects the path according to the rule that a path used for processing with a large amount of processing is given priority.
  5.  前記プローブ管理手段は、各監視リソース要素についての、前記選択されたパスが通過する監視リソース要素の合計個数が所定値以上になるようにという前記規則に従って、前記パスを選択する、請求項2に記載の管理計算機。 The probe management unit selects the path according to the rule that the total number of the monitoring resource elements through which the selected path passes is greater than or equal to a predetermined value for each monitoring resource element. The listed management computer.
  6.  前記プローブ管理手段は、各監視リソース要素についての、前記選択されたパスが通過する監視リソース要素の合計個数が、均等になるようにという前記規則に従って、前記パスを選択する、請求項2に記載の管理計算機。 The probe management means selects the path according to the rule that the total number of monitor resource elements that the selected path passes through is equal for each monitor resource element. Management computer.
  7.  前記プローブ管理手段は、含んでいる監視リソース要素が互いに完全に重複する他のパスが存在するパスを優先するという前記規則に従って、前記パスを選択する、請求項2に記載の管理計算機。 3. The management computer according to claim 2, wherein the probe management means selects the path according to the rule that priority is given to a path in which other paths in which the monitoring resource elements included completely overlap each other exist.
  8.  前記プローブ管理手段は、前記監視パスが通過する監視リソース要素が変更されたとき、当該監視パスが変更前に含んでいた監視リソースと完全に重複する監視リソースを含んでいるパスを、当該監視パスに代えて監視パスとする、請求項1に記載の管理計算機。 When the monitoring resource element through which the monitoring path passes is changed, the probe management means determines that the monitoring path includes a monitoring resource that completely overlaps the monitoring resource that the monitoring path included before the change. The management computer according to claim 1, wherein a monitoring path is used instead.
  9.  前記プローブ管理手段は、前記監視パスが通過する監視リソース要素が変更されたとき、当該監視パスが変更前に含んでいた監視リソースと完全に重複する監視リソースを含んでいるパスが存在しなければ、前記監視パスが含んでいた監視リソース要素を通過するパスのうち、他の監視リソースを通過する監視パスのグループと同じ監視パスを通過するパスの中で、前記グループに含まれる監視パスの本数が最小のグループと同じ監視パスを通過するパスを、前記監視パスに代えて監視パスとする、請求項8に記載の管理計算機。 When the monitoring resource element through which the monitoring path passes is changed, the probe management unit is configured so that there is no path including a monitoring resource that completely overlaps the monitoring resource included in the monitoring path before the change. The number of monitoring paths included in the group among the paths that pass through the same monitoring path as the monitoring path group that passes through other monitoring resources among the paths that pass through the monitoring resource element included in the monitoring path The management computer according to claim 8, wherein a path that passes through the same monitoring path as the smallest group is used as a monitoring path instead of the monitoring path.
  10.  複数段のリソース要素を連ねたパスによって情報処理を実行する情報処理システムを管理するための管理方法であって、
     プローブ管理手段が、監視対象とするリソース要素である監視リソース要素を通過するパスの本数が、所定の最小パス本数以上となるように、当該監視リソース要素を通過するパスを監視するプローブのうち、通過するパスの本数が最小パス本数に達していない監視リソース要素である未カバー監視リソース要素を最も多く通過するパスを監視するプローブが監視できるパスの中から、パスを選択していき、選択されたパスを、監視対象とするパスである監視パスとし、前記監視パスを監視するプローブを、監視対象とするプローブである監視プローブとして設定し、
     収集手段が、前記監視プローブによる監視結果を収集し、
     統計処理手段が、前記監視プローブの監視結果に基づく、共起性パタンによる切り分けにより、性能低下の原因となった監視リソース要素を判定する、管理方法。
    A management method for managing an information processing system that executes information processing by a path in which a plurality of resource elements are connected,
    Among the probes that monitor the path that passes through the monitoring resource element so that the number of paths that pass through the monitoring resource element that is the resource element to be monitored by the probe management means is equal to or greater than the predetermined minimum number of paths. Select the path from the paths that can be monitored by the probe that monitors the path that passes the largest number of uncovered monitor resource elements, which are monitor resource elements that have not reached the minimum number of paths. The monitored path is a monitoring path that is a monitoring target path, and a probe that monitors the monitoring path is set as a monitoring probe that is a monitoring target probe.
    The collecting means collects the monitoring result by the monitoring probe,
    A management method in which a statistical processing means determines a monitoring resource element that causes a performance degradation by carving by a co-occurrence pattern based on a monitoring result of the monitoring probe.
  11.  前記プローブ管理手段は、最も多くの未カバー監視リソース要素を通過するパスを監視するプローブが監視できるパスの中から、所定の規則に従ってパスを選択する、請求項10に記載の管理方法。 The management method according to claim 10, wherein the probe management means selects a path according to a predetermined rule from paths that can be monitored by a probe that monitors a path that passes through the most uncovered monitoring resource elements.
  12.  前記プローブ管理手段は、前記監視リソース要素が単一交点となるパスの組が選択されることになるパスを優先するという前記規則に従って、前記パスを選択する、請求項11に記載の管理方法。
     
    The management method according to claim 11, wherein the probe management unit selects the path according to the rule that priority is given to a path from which a set of paths for which the monitoring resource element is a single intersection is selected.
PCT/JP2014/058918 2014-03-27 2014-03-27 Supervisor computer and supervising method WO2015145676A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/058918 WO2015145676A1 (en) 2014-03-27 2014-03-27 Supervisor computer and supervising method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/058918 WO2015145676A1 (en) 2014-03-27 2014-03-27 Supervisor computer and supervising method

Publications (1)

Publication Number Publication Date
WO2015145676A1 true WO2015145676A1 (en) 2015-10-01

Family

ID=54194267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/058918 WO2015145676A1 (en) 2014-03-27 2014-03-27 Supervisor computer and supervising method

Country Status (1)

Country Link
WO (1) WO2015145676A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844736A (en) * 2017-02-13 2017-06-13 北方工业大学 Time-space co-occurrence mode mining method based on time-space network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000088925A (en) * 1998-09-14 2000-03-31 Toshiba Corp Method and apparatus for specifying fault position of semiconductor device
WO2006137373A1 (en) * 2005-06-24 2006-12-28 Nec Corporation Quality degradation portion deducing system and quality degradation portion deducing method
JP2008113186A (en) * 2006-10-30 2008-05-15 Nec Corp QoS ROUTING METHOD AND DEVICE
JP2008158666A (en) * 2006-12-21 2008-07-10 Nec Corp Multipath system for storage device, its failure identification method, and program
WO2010122604A1 (en) * 2009-04-23 2010-10-28 株式会社日立製作所 Computer for specifying event generation origins in a computer system including a plurality of node devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000088925A (en) * 1998-09-14 2000-03-31 Toshiba Corp Method and apparatus for specifying fault position of semiconductor device
WO2006137373A1 (en) * 2005-06-24 2006-12-28 Nec Corporation Quality degradation portion deducing system and quality degradation portion deducing method
JP2008113186A (en) * 2006-10-30 2008-05-15 Nec Corp QoS ROUTING METHOD AND DEVICE
JP2008158666A (en) * 2006-12-21 2008-07-10 Nec Corp Multipath system for storage device, its failure identification method, and program
WO2010122604A1 (en) * 2009-04-23 2010-10-28 株式会社日立製作所 Computer for specifying event generation origins in a computer system including a plurality of node devices

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844736A (en) * 2017-02-13 2017-06-13 北方工业大学 Time-space co-occurrence mode mining method based on time-space network

Similar Documents

Publication Publication Date Title
US10877792B2 (en) Systems and methods of application-aware improvement of storage network traffic
JP5267684B2 (en) Operation management apparatus, operation management method, and program storage medium
US8655623B2 (en) Diagnostic system and method
Zheng et al. Co-analysis of RAS log and job log on Blue Gene/P
US8667334B2 (en) Problem isolation in a virtual environment
JP5546686B2 (en) Monitoring system and monitoring method
US9690645B2 (en) Determining suspected root causes of anomalous network behavior
US9882841B2 (en) Validating workload distribution in a storage area network
US20160378583A1 (en) Management computer and method for evaluating performance threshold value
US20120054331A1 (en) Correlation of metrics monitored from a virtual environment
US10177984B2 (en) Isolation of problems in a virtual environment
US8656012B2 (en) Management computer, storage system management method, and storage system
JP2006107126A (en) Method for collecting/preserving storage network performance information, computer system, and program
US9348685B2 (en) Intermediate database management layer
JP5222876B2 (en) System management method and management system in computer system
JP2009129134A (en) Storage management system, performance monitoring method, and management server
US10754866B2 (en) Management device and management method
US20150370619A1 (en) Management system for managing computer system and management method thereof
KR20150118963A (en) Queue monitoring and visualization
WO2021242466A1 (en) Computing performance analysis for spans in a microservices-based architecture
WO2015145676A1 (en) Supervisor computer and supervising method
US20200142746A1 (en) Methods and system for throttling analytics processing
JP6622808B2 (en) Management computer and management method of computer system
US11789804B1 (en) Identifying the root cause of failure observed in connection to a workflow
Nikiforov Clustering-based anomaly detection for microservices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14886968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14886968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP