US20160103727A1 - Management system for managing computer system comprising multiple monitoring-target devices - Google Patents
Management system for managing computer system comprising multiple monitoring-target devices Download PDFInfo
- Publication number
- US20160103727A1 US20160103727A1 US14/971,187 US201514971187A US2016103727A1 US 20160103727 A1 US20160103727 A1 US 20160103727A1 US 201514971187 A US201514971187 A US 201514971187A US 2016103727 A1 US2016103727 A1 US 2016103727A1
- Authority
- US
- United States
- Prior art keywords
- plan
- expanded
- information
- event
- rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1479—Generic software techniques for error detection or fault masking
- G06F11/1482—Generic software techniques for error detection or fault masking by means of middleware or OS functionality
- G06F11/1484—Generic software techniques for error detection or fault masking by means of middleware or OS functionality involving virtual machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2257—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using expert systems
Definitions
- the present invention for example, relates to technology for managing a computer system comprising monitoring-target devices, such as a host computer, a network switch, and a storage apparatus.
- monitoring-target devices such as a host computer, a network switch, and a storage apparatus.
- VM virtual machine, data
- first VM migration for transferring a VM operating environment from a certain physical host computer to another physical host computer
- second VM migration for migrating a VM stored in a certain storage area to another storage area
- the VM is a type of data stored in a storage area
- the VM migration (the first VM migration and the second VM migration) is a type of data migration between storage areas.
- Data migration technology volume migration between data storage areas (volumes) in a storage apparatus is also known (refer to Patent Literature 3).
- cases can exists in which a variety of plans are capable of being implemented, such as a plan for recovering from the failure cause itself, and a plan that places the highest priority on resuming a business operation, which was affected by the failure cause.
- plans are capable of being implemented, such as a plan for recovering from the failure cause itself, and a plan that places the highest priority on resuming a business operation, which was affected by the failure cause.
- a plan for migrating the business operation to another server for example, a plan related to the VM migration
- a management system manages a computer system comprising multiple monitoring-target devices.
- a storage device of the management system stores: a general rule, which shows a corresponding relationship between one or more condition events, which are related to any of the multiple monitoring-target devices, and a conclusion event, which is related to any of the multiple monitoring-target devices and constitutes a cause in a case where the one or more condition events have occurred, this general rule moreover denoting a monitoring-target device related to the condition event and the conclusion event according to the relevant monitoring-target device type; general plan information, which shows a corresponding relationship between the general rule and one or more general plans, which are recovery plans available in a case where the conclusion event of the general rule is a cause; unresolved information, which shows, for each combination of the general rule and the general plan, a condition event of the relevant general rule that remains unresolved in a case where the relevant general plan has been implemented; and configuration information, which shows a coupling relationship between the multiple monitoring-target devices, wherein the control device: creates,
- FIG. 1 is a constitutional diagram of an example of a computer system related to Example 1.
- FIG. 2 is a constitutional diagram of an example of a management server related to Example 1.
- FIG. 3 is a constitutional diagram of an example of a storage apparatus related to Example 1.
- FIG. 4 is a constitutional diagram of an example of a physical server related to Example 1.
- FIG. 5 is a constitutional diagram of an example of a configuration information table related to Example 1.
- FIG. 6 is a constitutional diagram of an example of a performance information table related to Example 1.
- FIG. 7 is a constitutional diagram of an example of an event management table related to Example 1.
- FIG. 8 is a constitutional diagram of an example of a general rule table related to Example 1.
- FIG. 9 is a constitutional diagram of an example of a general plan table related to Example 1.
- FIG. 10 is a constitutional diagram of an example of a rule-plan association table related to Example 1.
- FIG. 11 is a constitutional diagram of an example of an expanded rule table related to Example 1.
- FIG. 12A is a first constitutional diagram of an expanded plan table related to Example 1.
- FIG. 12B is a second constitutional diagram of an expanded plan table related to Example 1.
- FIG. 13 is a constitutional diagram of an example of an analysis result management table related to Example 1.
- FIG. 14 is a flowchart of an event analysis process related to Example 1.
- FIG. 15 is a flowchart of a rule expansion process related to Example 1.
- FIG. 16 is a flowchart of a plan creation process related to Example 1.
- FIG. 17 is a flowchart of a post-plan execution risk selection process related to Example 1.
- FIG. 18 is a flowchart of a plan presentation process related to Example 1.
- FIG. 19 is a flowchart of an effect-risk presentation process related to Example 1.
- FIG. 20 is a constitutional diagram of an example of a plan presentation screen related to Example 1.
- FIG. 21 is a constitutional diagram of an example of a plan details screen related to Example 1.
- FIG. 22 is a constitutional diagram of an example of a management server related to Example 2.
- FIG. 23 is a constitutional diagram of an example of a physical server related to Example 2.
- FIG. 24 is a logical constitutional diagram of an example of a physical server related to Example 2.
- FIG. 25 is a constitutional diagram of an example of a switch related to Example 2.
- FIG. 26 is a constitutional diagram of an example of a configuration information table related to Example 2.
- FIG. 27 is a constitutional diagram of an example of a VM configuration information table related to Example 2.
- FIG. 28 is a constitutional diagram of an example of a performance information table related to Example 2.
- FIG. 29 is a constitutional diagram of an example of an event management table related to Example 2.
- FIG. 30A is a first constitutional diagram of a general rule table related to Example 2.
- FIG. 30B is a second constitutional diagram of a general rule table related to Example 2.
- FIG. 31 is a constitutional diagram of an example of a rule-plan association table related to Example 2.
- FIG. 32A is a first constitutional diagram of an expanded rule table related to Example 2.
- FIG. 32B is a second constitutional diagram of an expanded rule table related to Example 2.
- FIG. 33A is a first constitutional diagram of an expanded plan table related to Example 2.
- FIG. 33B is a second constitutional diagram of an expanded plan table related to Example 2.
- FIG. 33C is a third constitutional diagram of an expanded plan table related to Example 2.
- FIG. 34 is a constitutional diagram of an example of an analysis result management table related to Example 2.
- FIG. 35 is a flowchart of a plan presentation process related to Example 2.
- FIG. 36 is a constitutional diagram of an example of a plan presentation screen related to Example 2.
- FIG. 37 is a constitutional diagram of an example of a management server related to Example 3.
- FIG. 38 is a constitutional diagram of an example of a general plan table related to Example 3.
- FIG. 39 is a constitutional diagram of an example of a rule-plan association table related to Example 3.
- FIG. 40A is a first constitutional diagram of an expanded plan table related to Example 3.
- FIG. 40B is a second constitutional diagram of an expanded plan table related to Example 3.
- FIG. 41 is a constitutional diagram of an example of a maintenance information management table related to Example 3.
- FIG. 42 is a flowchart of an effect-risk presentation process related to Example 3.
- FIG. 43 is a constitutional diagram of an example of a plan details screen related to Example 3.
- a set of one or more computers which manage the computer system and display the display information of the present invention may be called a management system.
- the management server is the management system, and a combination of a management server and a display computer is also the management system.
- processing similar to that of the management server may be realized using multiple computers to increase the speed and reliability of management processing, and in this case, the relevant multiple computers (to include a display computer in a case where a display computers carries out displays) is the management system.
- Example 1 First, a computer system related to Example 1 will be explained.
- FIG. 1 is a constitutional diagram of an example of a computer system related to Example 1.
- the computer system related to Example 1 comprises one or more management servers 10000 (management server A in FIG. 1 ), one or more storage apparatuses 20000 (storage apparatus A in FIG. 1 ), and one or more physical servers 30000 (physical server A in FIG. 1 ).
- the physical server 30000 and the storage apparatus 20000 are coupled together via a SAN (Storage Area network) 40000 (specifically, a fibre channel).
- the management server 10000 , the storage apparatus 20000 , and the physical server 30000 are coupled together via a management network 50000 .
- the management server 10000 stores a plan creation program 11100 , a post-plan execution risk selection program 11200 , a plan presentation program 11300 , a configuration/performance information repository 11800 , and a rule-plan information repository 11900 in a memory 11000 (refer to FIG. 2 ).
- the management server 10000 is able to communicate with a program running on the storage apparatus 20000 and the physical server 30000 via the management network 50000 .
- One or more logical volumes 22100 are prepared in the storage apparatus 20000 .
- the logical volume 22100 for example, is provided to the physical server 30000 .
- the storage apparatus A provides the logical volume 22100 to the physical server A.
- the physical server 30000 executes various business operations using the logical volume 22100 provided from the storage apparatus 20000 .
- the physical server A and the storage apparatus A are coupled together via the SAN 40000 .
- the management server 10000 stores programs, such as the plan creation program 11100 , the post-plan execution risk selection program 11200 , and the plan presentation program 11300 , but the constitution of the computer system is not limited thereto.
- the storage apparatus 20000 or the physical server 30000 may store a variety of programs, and another apparatus, such as a switch (not shown in the drawing) installed between the respective apparatuses, may also store various programs.
- the connection between the storage apparatus 20000 and the physical server 30000 is not limited to a direct connection via a fibre channel, and the storage apparatus 20000 and the physical server 30000 may be coupled via network equipment, such as one or more fibre channel switches.
- the connection between the storage apparatus 20000 and the physical server 30000 is a data communication network, and, for example, may be an IP (Internet Protocol) network.
- FIG. 2 is a constitutional diagram of an example of the management server related to Example 1.
- the management server 10000 comprises a memory 11000 , a storage device 12000 , an input device 13000 , an output device 14000 , a processor 15000 , and a communication device 16000 , and these elements are coupled together via an internal path or other such communication channel 17000 .
- the memory 11000 stores the plan creation program 11100 , the post-plan execution risk selection program 11200 , the plan presentation program 11300 , an event analysis processing program 11400 , a rule expansion program 11500 , a configuration setting management program 11600 , a performance information collection program 11700 , the configuration/performance information repository 11800 , and the rule-plan information repository 11900 .
- a configuration information table 11810 and a performance information table 11820 are stored in the configuration/performance information repository 11800 .
- An event management table 11910 , one or more general rule tables 11920 , a general plan table 11930 , a rule-plan association table 11940 , one or more expanded rule tables 11950 , one or more expanded plan tables 11960 , and an analysis result management table 11970 are stored in the rule-plan information repository 11900 .
- the configuration information table 11810 manages information showing an apparatus and a device that exist on an I/O (input/output) route from the physical server 30000 to a physical disk, which configures the logical volume 22100 used by the physical server 30000 , that is, information (hereinafter referred to as “configuration information”) showing the coupling relationship of the apparatus and the device based on the I/O route.
- the performance information table 11820 manages performance information with respect to monitoring-target apparatuses coupled to the SAN 40000 and the devices in the monitoring-target apparatuses (monitoring-target devices).
- the event management table 11910 manages information showing when an event occurred in relation to what type of metrics for which device in the computer system.
- the general rule table 11920 manages a general rule showing the corresponding relationship between one or more condition events capable of occurring in the computer system and a conclusion event, which is regarded as the cause of a failure with respect to the one or more condition events.
- the general plan table 11930 manages information showing a recovery plan capable of being implemented in the computer system with respect to a failure.
- the rule-plan association table 11940 manages information (general plan information) denoting the corresponding relationship between the general rule and a general plan, which represents a recovery plan corresponding to the relevant general rule, that is, a recovery plan capable of being implemented in a case where the conclusion event of the relevant general rule is the cause of the failure, and information (unresolved information) denoting which of the failure events in the general rule remains unresolved after the respective general plans have been executed.
- general plan information denoting the corresponding relationship between the general rule and a general plan, which represents a recovery plan corresponding to the relevant general rule, that is, a recovery plan capable of being implemented in a case where the conclusion event of the relevant general rule is the cause of the failure
- information unresolved information
- the expanded rule table 11950 stores information (an expanded rule), which embodies the information of the general rule table 11920 based on the information of the configuration information table 11810 .
- the expanded plan table 11960 manages information (an expanded plan), which embodies the information of the general plan table 11930 based on the information of the configuration information table 11810 and the performance information table 11820 .
- the analysis result management table 11970 stores information related to an apparatus and a device in which an event judged to be the cause of a failure has occurred, and a failure event judged to be the cause thereof.
- the storage device 12000 is an HDD (Hard Disk Drive), SSD (Solid State Drive) or the like for storing information.
- the input device 13000 is a device for the administrator to input an indication to the management server 10000 , and, for example, is a keyboard.
- the output device 14000 is a device for outputting the result of processing executed by the management server 10000 , for example, the result of the execution of the plan presentation program 11300 , and, for example, is a display.
- the processor 15000 executes a program deployed on the memory 11000 .
- the communication device 16000 is a device for coupling to the management network 50000 .
- various programs and tables are stored in the memory 11000 , but these programs and tables may be stored in either the storage device 12000 or another storage medium (not shown in the drawing).
- the processor 15000 reads a target program on the memory 11000 , and executes the program that has been read.
- the above-mentioned programs and tables may be stored in either the memory 21000 of the storage apparatus 20000 (refer to FIG. 3 ) or the memory 31000 of the physical server 30000 (refer to FIG. 4 ), and either the storage apparatus 20000 or the physical server 30000 may execute a stored program.
- Another apparatus such as another physical server 30000 or a switch (not shown in the drawing) may store the above-mentioned programs and tables, and may execute a stored program.
- FIG. 3 is a constitutional diagram of an example of the storage apparatus related to Example 1.
- the storage apparatus 20000 comprises a memory 21000 , a logical volume provision part 22000 , a disk I/F controller 23000 , a management I/F 24000 , a processor 25000 , and a data I/F 26000 , and these elements are coupled via a communication channel 27000 , such as an internal bus.
- a communication channel 27000 such as an internal bus.
- the memory 21000 comprises a disk cache 21100 .
- the memory 21000 stores a configuration/performance information collection program 21200 .
- the disk cache 21100 is a storage area for temporarily storing information.
- the configuration/performance information collection program 21200 is for sending/receiving the management information and the performance information of the storage apparatus 20000 and the like to/from the management server 10000 .
- the logical volume provision part 22000 comprises a disk pool 22200 configured from a storage area of one or more physical disks (not shown in the drawing), logically partitions the storage area of the disk pool 22200 , and provides the relevant logically partitioned storage areas as logical volumes 22100 .
- This makes it possible to access a logical volume 22100 from an apparatus outside the relevant storage apparatus 20000 .
- a disk pool number is appended to a disk pool 22200
- a logical volume number is appended to a logical volume 22100 . This makes it possible for the storage apparatus 20000 to uniquely identify each disk pool 22200 and logical volume 22100 .
- each of two disk pools 22200 (POOL1 and POOL2) is logically partitioned, and four logical volumes 22100 (LV1, LV2, LV3 and LV4) are provided to an apparatus (for example, the physical server 30000 ) outside the storage apparatus 20000 .
- the disk I/F controller 23000 is an interface device for coupling to the logical volume provision part 22000 .
- the management I/F 24000 is an interface device for coupling to the management network 50000 .
- the processor 25000 executes a program deployed on the memory 21000 .
- the data I/F 26000 is an interface device for coupling to the SAN 40000 .
- the storage apparatus 20000 comprises two data I/Fs 26000 , data I/F (P1) and data I/F (P2).
- the configuration/performance information collection program 21200 is stored in the memory 21000 , but this program 21200 may be stored in another storage device (not shown in the drawing) or another storage medium (not shown in the drawing).
- the processor 25000 reads the configuration/performance information collection program 21200 on the memory 21000 , and executes the read configuration/performance information collection program 21200 .
- the configuration/performance information collection program 21200 maybe stored in the memory 11000 of the management server 10000 , and the management server 10000 may execute the stored program 21200 . Also, another storage apparatus 20000 may store the configuration/performance information collection program 21200 and may execute the stored program 21200 .
- the logical volume provision part 22000 may make all the storage areas on a single physical disk 22200 into one logical volume 22100 .
- the logical volume provision part 22000 may also prepare a logical volume 22100 using the storage area of a storage medium other than the physical disk 22200 , for example, a flash memory.
- FIG. 4 is a constitutional diagram of an example of the physical server related to Example 1.
- the physical server 30000 comprises a memory 31000 , a data I/F 32000 , a processor 33000 , and a management I/F 34000 , and these elements are coupled together via an internal bus or other such communication channel 35000 .
- the memory 31000 stores a configuration performance information collection program 31100 , a business program 31200 , and a volume management program 31300 .
- the configuration performance information collection program 31100 is for sending/receiving physical server 30000 management information, performance information, and the like to/from the management server 10000 .
- the business program 31200 is for realizing a business operation executed by the physical server 30000 , and, for example, is a DBMS (Data Base Management System) or a file system.
- the volume management program 31300 is for allocating a logical volume 22100 provided by the storage apparatus 20000 to the physical server 30000 .
- the physical server 30000 executes a business operation using the logical volume 22100 allocated by the volume management program 31300 .
- the data I/F 32000 is an interface device for coupling to the SAN 40000 .
- the processor 33000 executes a program deployed on the memory 31000 .
- the management I/F 34000 is an interface device for coupling to the management network 50000 .
- a variety of programs are stored in the memory 31000 , but these programs maybe stored in another storage apparatus (not shown in the drawing).
- the processor 33000 reads a target program on the memory 31000 , and executes the read program.
- FIG. 5 is a constitutional diagram of an example of the configuration information table related to Example 1.
- the configuration information table 11810 stores information related to an I/O route, which extends from the physical server 30000 to the physical disk configuring the logical volume 22100 provided to the relevant physical server 30000 , and which the physical server 30000 uses to access the logical volume 22100 .
- An entry is added to the configuration information table 11810 in accordance with the configuration setting management program 11600 being executed.
- the configuration information table 11810 comprises the fields of a physical server 11811 , a drive 11812 , a server data I/F 11813 , a storage 11814 , a storage data I/F 11815 , a logical volume 11816 , and a disk pool 11817 .
- An identifier for uniquely identifying a physical server 30000 is stored in the physical server 11811 .
- An identifier for uniquely identifying a mount point of a volume on the physical server 30000 is stored in the drive 11812 .
- An identifier for uniquely identifying a physical server 30000 data I/F 32000 (hereinafter may be referred to as “server data I/F”) used when the physical server 30000 accesses the logical volume 22100 shown by a logical volume 11816 identifier is stored in the server data I/F 11813 .
- An identifier for uniquely identifying a storage apparatus 20000 constituting the access destination of the physical server 30000 is stored in the storage 11814 .
- An identifier for uniquely identifying a storage apparatus 20000 data I/F 26000 (hereinafter maybe referred to as “storage data I/F”) used when the physical server 30000 accesses the logical volume 22100 shown by the logical volume 11816 identifier is stored in the storage data I/F 11815 .
- An identifier for uniquely identifying a logical volume 22100 is stored in the logical volume 11816 .
- An identifier for uniquely identifying a disk pool 22200 by which the logical volume 22100 shown by the logical volume 11816 identifier is prepared is stored in the disk pool 11817 .
- the first entry from the top of FIG. 5 shows that the logical volume “LV1” created from the disk pool “POOL1” of the storage A (storage apparatus A) is coupled to the physical server A via the storage data I/F “P1” and the server data I/F “S1”, and is recognized as the logical volume “/opt” on the physical server A.
- the configuration information table 11810 related to this example comprises information on the physical server 30000 , the server data I/F, the storage apparatus 20000 , the storage data I/F, the logical volume 22100 , and the disk pool 22200 as the apparatuses and devices that exist on the access route, but the constitution of this table is not limited thereto.
- the configuration information table 11810 may comprise information on a switch and a switch data I/F, and may associatively store business program (DBMS and so forth) information or VM information on the business server 30000 , a snapshot volume holding a VM snapshot, and a clone volume holding a clone.
- the configuration information table 11810 may hold configuration management operation historical information, and may be linked to a Syslog server and hold detailed log information denoting system operations.
- FIG. 6 is a constitutional diagram of an example of the performance information table related to Example 1.
- the performance information table 11820 stores performance information related to either an apparatus or a device inside an apparatus comprising the computer system, for example, performance information related to the logical volume 22100 , the disk pool 22200 , and so forth in each storage apparatus 20000 . An entry is added to the performance information table 11820 in accordance with the performance information collection program 11700 being executed.
- the performance information table 11820 comprises the fields of an apparatus ID 11821 , a device ID 11822 , a metrics 11823 , an equipment OS 11824 , a performance value 11825 , an alert execution threshold 11826 , a threshold type 11827 , and a Status 11828 .
- An identifier (apparatus ID) for uniquely identifying an apparatus is stored in the apparatus ID 11821 .
- An identifier (device ID) for uniquely identifying a device, which is the performance information acquisition target, is stored in the device ID 11822 .
- Data showing a type of OS (Operating System) running on the apparatus corresponding to the apparatus ID in the apparatus ID 11821 is stored in the equipment OS 11824 .
- a value of the performance information for the type shown by the metrics 11823 for the device shown by the device ID 11822 is acquired from the apparatus comprising the device and stored in the performance value 11825 .
- Either an upper limit or a lower limit threshold for a normal range of management-target performance values (hereinafter referred to as “alert execution threshold”) is specified by the user and stored in the alert execution threshold 11826 .
- Data showing whether the alert execution threshold is the upper limit or the lower limit of the normal values is stored in the threshold type 11827 .
- Data showing whether the performance value 11825 is a normal value or an abnormal value is stored in the Status 11828 .
- performance information table 1820 shown in FIG. 6 only one performance value is supported for one arbitrary metrics for one arbitrary device on one arbitrary apparatus, but the configuration may be such that each piece of information of the performance information table 11820 is stored together with a value showing the time at with the performance information collection program 11700 acquired information held by each apparatus by communicating with the configuration performance information collection programs 21200 and 31100 , and the performance values at multiple time points corresponding to the acquired times may be held as historical information.
- a CPU utilization rate, an IOPS, a response time with respect to a request, and so forth were given, but other performance indicators, such as an I/O busy rate, a transfer rate, a throughput, a buffer hit rate and a number of records inserted/updated/deleted in database management software, a Web server response time, a free capacity or utilization rate for a file system or disk, an amount of I/O data, a utilization time, a number of network interface errors, a buffer overflow, and frame errors may also be used.
- alert execution threshold stored in the alert execution threshold 11826 rather than a threshold specified by the user, for example, a value capable of triggering an alert notification, such as a value that differs from a baseline value obtained using an average value of the performance information historical information, may be employed.
- FIG. 7 is a constitutional diagram of an example of the event management table related to Example 1.
- the event analysis processing program 11400 compares a performance value registered in the performance information table 11820 to an alert execution threshold registered in the performance information table 11820 , and in a case where the performance value exceeds the alert execution threshold, prepares an entry showing the corresponding event, and registers the prepared entry in the event management table 11910 .
- the event analysis processing program 11400 receives an event message showing the occurrence of an event from various apparatuses in the system, and may register an entry showing the event corresponding to the received event message in the event management table 11910 .
- the event management table 11910 is referred to as needed in a rule expansion process (refer to FIG. 15 ).
- the event management table 11910 comprises the fields of an event ID 11911 , an apparatus ID 11912 , an apparatus component ID 11913 , a metrics 11914 , an equipment OS 11915 , a status 11916 , an analyzed flag 11917 , and an occurrence date/time 11918 .
- An event ID which is the identifier of the event itself, is stored in the event ID 11911 .
- An apparatus ID which is the identifier of the apparatus in which the event occurred, is stored in the apparatus ID 11912 .
- An identifier of the device in which the event occurred is stored in the apparatus component ID 11913 .
- the name of a metrics for which a threshold abnormality was detected is stored in the metrics 11914 .
- Data showing the type of OS in the apparatus in which the threshold abnormality was detected is stored in the equipment OS 11915 .
- Data showing a state at event occurrence of a device in which the event occurred is stored in the status 11916 .
- Data showing whether or not the event has been analyzed by the rule expansion program 11500 is stored in the analyzed flag 11917 .
- Data showing the date and time at which the event occurred is stored in the occurrence date/time 11918 .
- the first entry from the top of FIG. 7 shows that the management server 10000 detected a threshold abnormality of the processor operating rate in the data I/F “P2” of the storage apparatus A at 15:00:00 hours on 30 Jun. 2012, that the event ID thereof is “EV1”, and that this event has not been analyzed by the rule expansion program 115000 .
- FIG. 8 is a constitutional diagram of an example of the general rule table related to Example 1.
- One or more general rule tables 11920 are stored in the rule-plan information repository 11900 .
- one general rule is defined by one general rule table 11920 .
- the general rule (the same holds true for an expanded rule, which will be explained further below) is data showing the relationship between a combination of one or more condition events capable of occurring in a node apparatus comprising the computer system, and a conclusion event, which is regarded as the failure cause with respect to the one or more condition events.
- an event propagation model for identifying the cause in a failure analysis uses an “IF-THEN” format to describe a combination of events, which are expected to occur as the result of a certain failure, and the cause thereof.
- the general rule is not limited to that given in FIG. 8 , and a larger number of rules are allowed.
- the general rule table 11920 comprises a condition part 11921 , a conclusion part 11922 , a general rule ID 11923 , and an applied topology 11924 .
- the condition part 11921 comprises the fields of an event ID 11925 , an apparatus type 11926 , an apparatus component type 11927 , a metrics 11928 , and a status 11929 .
- Data showing a causal event, that is, a conclusion event, which is equivalent to the THEN part of the general rule described using the “IF-THEN” format, is stored in the conclusion part 11922 .
- the conclusion part 11922 comprises the fields of the apparatus type 11926 , the apparatus component type 11927 , the metrics 11928 , and the status 11929 .
- a general rule ID which is the identifier of the general rule, is stored in the general rule ID 11923 .
- Data showing the type of apparatus in which either the condition event or the conclusion event occurs is stored in the apparatus type 11926 .
- Data showing the type of device in which either the condition event or the conclusion event occurs is stored in the apparatus component type 11927 .
- Information showing a type of performance information such as a CPU utilization rate, a storage apparatus IOPS, and a response time with respect to a request, is stored in the metrics 11928 .
- the same performance information stored in the metrics 11823 of the performance information table 11820 , or other performance information may be used as the performance information stored in the metrics 11928 .
- Data showing the state at event occurrence of the device in the apparatus is stored in the status 11929 .
- the conclusion event described in the conclusion part 11922 is the cause of the failure.
- the problem in the condition part 11921 is also resolved, that is, the performance value related to each condition event can also be expected to return to a normal value.
- three events are described in the condition part 11921 , but the number of events is not limited.
- the general rule given as an example in FIG. 8 that is, the general rule for which the general rule ID is shown as “RULE 1” shows that when a threshold abnormality in the response time of a drive on a host computer, a threshold abnormality in the I/Os per unit of time in the logical volume 22100 of the storage apparatus 20000 , and a threshold abnormality in the I/Os per unit of time in the disk pool 22200 of the storage apparatus 20000 have been detected as observed events, the threshold abnormality in the I/Os per unit of time in the disk pool 22200 of the storage apparatus 20000 is concluded to be the cause.
- the fact that a certain condition is normal may also be defined as an event included in the observed events.
- Topology information is acquired from the configuration information table 11810 when an expanded rule is created based on this general rule.
- FIG. 9 is a constitutional diagram of an example of the general plan table related to Example 1.
- the general plan table 11930 shows a list of available plans in the computer system.
- the general plan table 11930 comprises the fields of a general plan ID 11931 and a plan 11932 .
- a general plan ID which is the identifier of the general plan, is stored in the general plan ID 11931 .
- plans for example, there are reboot host, change switch setting, storage apparatus 20000 volume migration, and VM migration, among others. The plans are not limited to those given in FIG. 9 .
- FIG. 10 is a constitutional diagram of an example of the rule-plan association table related to Example 1.
- the rule-plan association table 11940 shows the corresponding relationship between a general rule, a list of plans available in a case where this general rule was applied and the cause of a failure was identified, and an event (hereinafter referred to as “unresolved event”), which remains in an unresolved state in a case where the plans have been executed.
- the rule-plan association table 11940 comprises a general rule ID 11941 , a general plan ID 11942 , and an unresolved event ID 11943 .
- a general rule ID which is the identifier of a general rule, is stored in the general rule ID 11941 .
- the general rule ID stored in the general rule ID 11941 corresponds to the general rule ID stored in the general rule ID 11923 of the general rule table 11920 .
- a general plan ID which is the identifier of a general plan, is stored in the general plan ID 11942 .
- the general plan ID stored in the general plan ID 11942 corresponds to the general plan ID stored in the general plan ID 11931 of the general plan table 11930 .
- An unresolved event ID which is the identifier of an event (unresolved event), which remains in an unresolved state in a case where the plans have been executed, is stored in the unresolved event ID 11943 .
- the unresolved event ID corresponds to the identifier (event ID) of a condition event stored in the event ID 11925 of the general rule table 11920 .
- event ID the identifier of a condition event stored in the event ID 11925 of the general rule table 11920 .
- the unresolved event ID 11943 for example, in a case where an unresolved event does not exist, “NONE” is stored, and in a case where all of the condition events remain as unresolved events, “ALL” is stored.
- FIG. 11 is a constitutional diagram of an example of the expanded rule table related to Example 1.
- One or more expanded rule tables 11950 are stored in the rule-plan information repository 11900 .
- one expanded rule is defined by one expanded rule table 11950 .
- the expanded rule is data, which expands the general rule in a format that depends on the actual configuration of the computer system.
- the expanded rule shown in FIG. 11 is created in accordance with replacing the values of the apparatus type 11926 and the apparatus component type 11927 in the general rule shown in FIG. 8 with the identifier (apparatus ID) of the identified apparatus and the identifier (device ID) of the identified device defined in the configuration information table 11810 .
- the expanded rule table 11950 comprises the fields of a condition part 11951 , a conclusion part 11952 , an expanded rule ID 11953 , and a pre-expansion general rule ID 11954 .
- the condition part 11951 comprises the fields of an event ID 11955 , an apparatus ID 11956 , an apparatus component ID 11957 , a metrics 11958 , and a status 11959 .
- Data showing a causal event, that is, a conclusion event, which is equivalent to the THEN part of the expanded rule described using the “IF-THEN” format, is stored in a conclusion part 11952 .
- the conclusion part 11952 comprises the fields of the apparatus ID 11956 , the apparatus component ID 11957 , the metrics 11958 , and the status 11959 .
- An expanded rule ID which is the identifier of the expanded rule, is stored in the expanded rule ID 11953 .
- the general rule ID of the general rule which constitutes the basis of the expanded rule, is stored in the pre-expansion general rule ID 11954 .
- An ID (apparatus ID) of an apparatus in which either the condition event or the conclusion event occurs is stored in the apparatus ID 11956 .
- An ID (device ID) of the device in which either the condition event or the conclusion event occurs is stored in the apparatus component ID 11957 .
- Information showing a type of performance information such as a CPU utilization rate, a storage apparatus IOPS, and a response time with respect to a request, is stored in the metrics 11958 .
- the same performance information stored in the metrics 11823 of the performance information table 11820 , or other performance information may be used as the performance information stored in the metrics 11958 .
- Data showing the state at event occurrence of the device in the apparatus is stored in the status 11959 .
- the expanded rule is created in accordance with taking into account the actual configuration of the computer system (for example, the coupling relationship shown in the configuration information table 11810 ) and embodying the apparatus type and the device type related to the condition event and the conclusion event in the identified apparatus and the identified device in the actual configuration of the computer system.
- the expanded rule for which the expanded rule ID is shown as “ExRule 1-1”, given as an example in FIG. 11 is created in accordance with replacing the values of the apparatus type 11926 and the apparatus component type 11927 in the general rule “Rule 1” shown in FIG. 8 with the identifiers of the identified apparatuses (physical server A and storage apparatus A) and the identifiers of the identified devices (drive “/var”, logical volume “LV1”, and disk pool “POOL1”) defined in the configuration information table 11810 .
- the expanded rule “ExRule 1-1” is expanded on the basis of the general rule “Rule 1”, and shows that when a threshold abnormality in the response time of the logical volume “/var” of the physical server A, a threshold abnormality in the response time of the logical volume “LV2” of the storage apparatus A, and a threshold abnormality in the response time of the disk pool “POOL1” of the storage apparatus A have been detected as observed events, the threshold abnormality in the response time of the disk pool “POOL1” of the storage apparatus A is concluded to be the cause.
- FIG. 12A is a first constitutional diagram of the expanded plan table related to Example 1.
- FIG. 12B is a second constitutional diagram of the expanded plan table related to Example 1.
- One or more expanded plan tables 11960 are stored in the rule-plan information repository 11900 .
- one or more expanded plans are defined by one expanded plan table 11960 .
- the expanded plan is information for expanding a general plan in a format, which relies on the actual configuration of the computer system.
- the expanded plan table 11960 is created by the plan creation program 11100 based on the general plan table 11930 , the expanded rule table 11950 , the configuration information table 11810 , and the performance information table 11820 .
- the expanded plan table 11960 comprises the fields of a plan details 11961 , a general plan ID 11962 , and an expanded rule ID 1196 A.
- the general plan ID of the general plan which constitutes the basis of the expanded plan, is stored in the general plan ID 11962 .
- the expanded rule ID of the expanded rule corresponding to the expanded plan is stored in the expanded rule ID 1196 A as information for identifying the failure cause for which the expanded plan is implemented.
- plan details 11961 Specific processing contents for each of one or more expanded plans, which have been expanded, and information on the post-execution state of the expanded plan are stored in the plan details 11961 .
- the plan details 11961 comprises the fields of an expanded plan ID 11963 , a plan target 11964 , and a risk site 11969 .
- An expanded plan ID which is the identifier of the expanded plan, is stored in the expanded plan ID 11963 .
- Data showing a problem site hereinafter referred to as “risk site”
- risk site Data showing a problem site (hereinafter referred to as “risk site”), which latently remains as-is even after the plan has been executed, is stored in the risk site 11969 .
- the plan target 11964 there is stored information showing the configuration element (device) related to the plan, and post-plan execution information.
- the post-plan execution information here comprises a predicted value for a post-plan execution performance value related to the device, which is related to the plan.
- the predicted value of the post-plan execution performance value for example, is calculated in accordance with the plan creation program 11100 referencing the performance information table 11820 to simulate the situation subsequent to plan execution.
- the fields included in the plan target 11964 will differ in accordance with the plan contents.
- the expanded plan defined by the expanded plan table 11960 of FIG. 12A is an expanded plan based on the general plan “Plan 1”, that is, an expanded plan related to a volume migration.
- the plan target 11964 for example, comprises the fields of a migration-target volume 11965 , a migration-source pool 11966 , and a migration-destination pool 11967 .
- the migration-target volume 11965 comprises a volume ID 11965 A in which is stored an identifier of the logical volume 22100 (hereinafter referred to as “migration-target volume”), which constitutes the volume migration target, and an I/O Response Time prediction 11965 B in which is stored a predicted value for the I/O response time with respect to the migration-target volume subsequent to volume migration execution.
- migration-target volume an identifier of the logical volume 22100
- the migration-source pool 11966 comprises a pool ID 11966 A in which is stored an identifier of the disk pool 22200 to which the migration-target volume belongs (that is, the migration-source disk pool 22200 , hereinafter referred to as “migration-source pool”), and an I/O Response Time prediction 11966 B in which is stored a predicted value for the I/O response time with respect to the migration-source pool subsequent to volume migration execution.
- migration-source pool an identifier of the disk pool 22200 to which the migration-target volume belongs
- I/O Response Time prediction 11966 B in which is stored a predicted value for the I/O response time with respect to the migration-source pool subsequent to volume migration execution.
- the migration-destination pool 11967 comprises a pool ID 11967 A in which is stored an identifier of the disk pool 22200 , which constitutes the migration destination of the migration-target volume (hereinafter referred to as “migration-destination pool”), and an I/O Response Time prediction 11967 B in which is stored a predicted value for the I/O response time with respect to the migration-destination pool subsequent to volume migration execution.
- the plan creation program 11100 acquire information from the configuration information table 11810 , and stores the information in the volume ID 11965 A, the pool ID 11966 A, and the pool ID 11967 A. Further, an arbitrary method may be used as the method for calculating the values stored in the respective I/O Response Time predictions 11965 B, 11966 B, and 11967 B.
- the values of the respective I/O Response Time predictions 11965 B, 11966 B, and 11967 B may be values (I/O response times) obtained, for example, by the plan creation program 11100 acquiring the I/Os per unit of time of the migration-target volume, the migration-source pool, and the migration-destination pool from the performance information table 11820 , subtracting the value of the I/Os per unit of time of the migration-target volume from the I/Os per unit of time of the migration-source pool, adding the value of the I/Os per unit of time of the migration-target volume to the I/Os per unit of time of the migration-destination pool, predicting the I/Os of the migration-source pool and the migration-destination pool subsequent to volume migration execution, and adopting a value (I/O response time) obtained by inversing thereof.
- the expanded plan defined in accordance with the expanded plan table 11960 of FIG. 12B is an expanded plan based on the general plan “Plan 5”, that is, the expanded plan related to the addition of a disk to the pool.
- the plan target 11964 for example, comprise the fields of a pool ID 11968 A in which is stored the identifier of the disk pool 22200 , which constitutes the destination of the disk (hereinafter referred to as “add-target pool”), an additional disk 11968 B in which is stored the identifier of the disk to be added, and an I/O Response Time prediction 11968 C in which is stored a predicted value for the I/O response time with respect to the add-target pool.
- the plan creation program 11100 acquired information from the configuration information table 11810 and stores the information in the pool ID 11968 A and the additional disk 11968 B. Further, an arbitrary method may be used as the method for calculating the value (a predicted value of I/O response time with respect to add-target pool) of the I/O Response Time prediction 11968 C.
- the plan creation program 11100 may calculate the I/O response time of the add-target pool subsequent to disk addition by acquiring the value of the I/Os per unit of time of the add-target pool, the value of the capacity of the add-target pool, and the value of the capacity of the disk to be added from the performance information table 11820 , and assuming that the I/Os per unit of time are distributed in proportion to the capacity values, multiplying a value obtained by dividing the capacity of the add-target pool prior to disk addition by the capacity of the add-target pool subsequent to disk addition by the I/O response time of the add-target pool prior to disk addition.
- the plan creation program 11100 computes a predicted value for a post-plan execution performance value related a device, which is related to the plan, by referencing the configuration information table 11810 , listing post-processing execution configuration information candidates, referencing the performance information table 11820 , and simulating post-plan execution status information, such as performance information, capacity information, cost information, and downtime information.
- FIG. 13 is a constitutional diagram of an example of the analysis result management table related to Example 1.
- the analysis result management table 11970 comprises the fields of a cause apparatus ID 11971 , a cause component ID 11972 , a metrics 11973 , a certainty factor 11974 , an expanded rule ID 11975 , and a received event ID 11976 .
- the identifier (apparatus ID) of an apparatus related to an event, which has been judged as the cause of a failure in a failure cause analysis process, is stored in the cause apparatus ID 11971 .
- the identifier (device ID) of a device related to the event judged to be the cause of the failure is stored in the cause component ID 11972 .
- Information showing a type of performance information is stored in the metrics 11973 .
- data showing the type of performance information related to an event judged to be the cause of a failure is stored in the metrics 11973 .
- a value (certainty factor) showing the probability of the event judged to be the cause of the failure being the root cause is stored in the certainty factor 11974 .
- an incidence of the condition event for example, serves as the certainty factor.
- the expanded rule ID of the expanded rule which constitutes the basis upon which the event was judged to be the cause of the failure, is stored in the expanded rule ID 11975 .
- the event ID of the event, which actually occurred from among the condition events is stored in the received event ID 11976 .
- the first entry from the top of FIG. 13 shows that, based on the expanded rule “ExRule 1-1”, the management server 10000 has judged that the response time threshold abnormality in the disk pool “POOL1” of the storage apparatus A is the failure cause, that the occurrence of events, which are shown with the event IDs “EV 2” “EV 3” and “EV 5”, is the basis of the judgment, and that the certainty factor, that is, the incidence of the condition event is 3/3.
- This entry shows a case in which, in a case where it is assumed that the performance of the disk comprising the disk pool “POOL1” has slowed, the performance of the disk pool “POOL1” has deteriorated, and the performance of the logical volume “LV2” has slowed, an event has not occurred based on the logical volume “LV1” or the drive “/opt” since there is no sending/receiving of I/Os to/from the drive “/opt” of the physical server A or the logical volume “LV1” of the storage apparatus A.
- the configuration information acquisition process is executed in accordance with the management server 10000 processor 15000 executing the configuration setting management program 11600 deployed on the memory 11000 .
- the configuration setting management program. 11600 communicates with a program, which collects information on the respective apparatuses coupled to the SAN 40000 (in this example, the configuration/performance information collection program 21200 of the storage apparatus 20000 and the configuration/performance information collection program 31100 of the physical server 30000 ), and acquires the configuration information held by the apparatuses.
- the management server 10000 may use an SCSI (Small Computer System Interface) Inquiry command to acquire the coupling relationship between the physical server 30000 , the drive on the physical server 30000 , the server data I/F, the storage apparatus 20000 , the storage data I/F, the logical volume 22100 accessed by the physical server 30000 , and the disk pool 22200 to which the logical volume 22100 belongs.
- the physical server 30000 may acquire configuration information from the storage apparatus 20000 in accordance with the physical server 30000 issuing an SCSI Inquiry command to the storage apparatus 20000 .
- the configuration information collection program 31100 of the physical server 30000 may acquire configuration information related to the physical server 30000 using any method, such as, for example, a request to the OS.
- the configuration information collection program 21200 on the storage apparatus 20000 may acquire configuration information related to the storage apparatus 20000 using any method, such as, for example, acquiring the information from the memory 21000 .
- the configuration setting management program 11600 stores the acquired configuration information in the configuration information table 11810 , and ends the configuration information acquisition process.
- the performance information collection process is executed in accordance with the management server 10000 processor 15000 executing the performance information collection program 11700 deployed on the memory 11000 .
- the performance information collection program 11700 communicates with a program, which collects information on the respective apparatuses coupled to the SAN 40000 (in this example, the configuration/performance information collection program 21200 of the storage apparatus 20000 and the configuration/performance information collection program 31100 of the physical server 30000 ), and acquires the performance information held by the apparatuses.
- the management server 10000 may use the SCSI Inquiry command to acquire the respective data registered in the performance information table 11820 , that is, the data of the apparatus ID 11821 , the device ID 11822 , the metrics 11823 , the equipment OS 11824 , the performance value 11825 , the alert execution threshold 11826 , the threshold type 11827 , and the Status 11828 .
- the physical server 30000 may acquire performance information from the storage apparatus 20000 in accordance with the physical server 30000 issuing an SCSI Inquiry command to the storage apparatus 20000 .
- the performance information collection program 11700 of the physical server 30000 may acquire performance information related to the physical server 30000 using any method, such as, for example, a request to the OS.
- the configuration/performance information collection program. 21200 on the storage apparatus 20000 may acquire performance information related to the storage apparatus 20000 using any method, such as, for example, acquiring the information from the memory 21000 .
- the performance information collection program 11700 stores the acquired performance information in the performance information table 11820 , and ends the performance information collection process.
- FIG. 14 is a flowchart of an event analysis process related to Example 1.
- the event analysis process is executed in accordance with the management server 10000 processor 15000 executing the event analysis processing program 11400 deployed on the memory 11000 .
- the event analysis processing program 11400 references the performance information table 11820 , and acquires the performance values and alert execution threshold for each device (Step 1001 ).
- the event analysis processing program 11400 carries out the processing of the following Steps 1002 through 1004 for the acquired performance values.
- the event analysis processing program 11400 checks whether or not the processing-target performance value exceeds the alert execution threshold corresponding to the processing-target performance value, and, based on the result, updates the value of the Status 11828 of the performance information table 11820 (Step 1002 ). Specifically, in a case where the processing-target performance value exceeds the alert execution threshold, the event analysis processing program 11400 stores “threshold abnormality” in the Status 11828 , and in a case where the processing-target performance value does not exceed the alert execution threshold, stores “normal” in the Status 11828 .
- the event analysis processing program 11400 determines whether or not the value of the Status 11828 changed before or after the update of Step 1002 (Step 1003 ). In a case where a change occurred (Step 1003 : Yes), the event analysis processing program 11400 registers an entry related to the event showing that the processing-target performance value exceeds the alert execution threshold in the event management table 11910 (Step 1004 ), and advances to the next step. At this time, the date and time at which the performance information collection program 11700 collected the processing-target performance value is stored in the entry occurrence date/time 11918 . Alternatively, in a case where the value of the Status 11828 has not changed (Step 1003 : No), the event analysis processing program 11400 advances to the next step without executing Step 1004 .
- the event analysis processing program 11400 determines whether or not there exists an entry related to an event, which has been newly registered in the event management table 11910 (Step 1005 ), and in a case where an entry related to a newly registered event exists (Step 1005 : Yes), instructs the rule expansion program 11500 to execute a rule expansion process (refer to FIG. 15 ) (Step 1006 ), and ends the event analysis process.
- the event analysis processing program 11400 ends the event analysis process.
- the event analysis processing program 11400 registers an entry related to an event in the event management table 11910 based on performance information acquired by the performance collection program 11700 , but may update the information in the event management table 11910 using the notified information, such as a SNMP (Simple Network Management Protocol) trap or the like, which is notified from an apparatus.
- SNMP Simple Network Management Protocol
- FIG. 15 is a flowchart of the rule expansion process related to Example 1.
- the rule expansion process is executed in accordance with the management server 10000 processor 15000 executing the rule expansion program 11500 deployed on the memory 11000 .
- the rule expansion program 11500 acquires an entry (event entry) related to a newly registered event from the event management table 11910 , and updates the analyzed flag 11917 of the acquired event entry to “Yes” (Step 2001 ).
- the rule expansion program 11500 acquires one or more general rules shown by one or more general rule tables 11920 , which are being stored in the rule-plan information repository 11900 (Step 2002 ).
- the rule expansion program 11500 performs the processing of Steps 2004 through 2010 below for each acquired general rule (Step 2003 ).
- the rule expansion program 11500 also performs the processing of Steps 2005 through 2010 below for each event entry acquired in Step 2001 .
- the rule expansion program 11500 determines whether or not the event (processing-target event) shown in the processing-target event entry matches any of the condition events in the processing-target general rule (Step 2005 ). For example, the rule expansion program 11500 determines that the processing-target event and a condition event match in a case where the apparatus and device related to the processing-target event correspond to the types of the apparatus and device related to the condition event, and, in addition, the type of the processing-target event matches the type of the condition event.
- Step 2005 the rule expansion program 11500 executes the processing (Steps 2004 through 2010 ) by making the next event entry the processing target.
- Step 2005 the rule expansion program 11500 acquires from the configuration information table 11810 the identifiers of the apparatus and device having the association shown in the processing-target general rule with the apparatus and device related to the processing-target event (Step 2006 ).
- the rule expansion program 11500 prepares an expanded rule based on the processing-target general rule for each combination having the association (Step 2007 ).
- the rule expansion program 11500 lists all of the logical volume 22100 and disk pool 22200 of the storage apparatus 20000 , which are the apparatus and device having the association described in the general rule “Rule 1” (a coupling relationship exists between the components) with the physical server A drive “/var”, which is the site where the processing-target event occurred. Then, the rule expansion program 11500 prepares an expanded rule, which corresponds to the combination of the physical server A drive “/var”, the logical volume 22100 of the storage apparatus 20000 , and the disk pool 22200 of the storage apparatus 20000 .
- the rule expansion program 11500 determines whether or not an expanded rule table 11950 , which shows the same expanded rule as the prepared expanded rule, already exists in the rule-plan information repository 11900 (Step 2008 ).
- the rule expansion program 11500 stores the expanded rule table 11950 showing the prepared expanded rule in the rule-plan information repository 11900 , and registers a new entry in the analysis result management table 11970 (Step 2009 ).
- the rule expansion program 11500 adds the event ID of the processing-target event in the received event ID 11976 and changes the value of the certainty factor 11974 of the analysis result management table 11970 (Step 2010 ).
- the rule expansion program 11500 executes the processing (Steps 2004 through 2010 ) by making the next general rule the processing target. Then, after the processing (Steps 2004 through 2010 ) for all the general rules has been completed, the rule expansion program 11500 determines whether or not a newly registered entry exists in the analysis result management table 11970 (Step 2011 ), and in a case where a newly registered entry exists (Step 2011 : Yes), instructs the plan creation program 11100 to execute a plan creation process (refer to FIG. 16 ) (Step S 2012 ), and ends the rule expansion process. Alternatively, in a case where a newly registered entry does not exist (S 2011 : No), the rule expansion program 11500 ends the rule expansion process.
- FIG. 16 is a flowchart of the plan creation process related to Example 1.
- the plan creation process is executed in accordance with the management server 10000 processor 15000 executing the plan creation program 11100 deployed on the memory 11000 .
- the plan creation program 11100 acquires a newly registered entry related to analysis result (analysis result entry) from the analysis result management table 11970 (Step 3001 ).
- the plan creation program 11100 implements the processing of Steps 3003 through 3008 below for each acquired analysis result entry (Step 3002 ).
- the plan creation program 11100 acquires the expanded rule ID stored in the expanded rule ID 11975 of the processing-target analysis result entry in the analysis result management table 11970 .
- the expanded rule having the acquired expanded rule ID will be called “processing-target expanded rule”.
- the plan creation program 11100 acquires the general rule ID of the general rule, which constitutes the basis of the processing-target expanded rule, from the pre-expansion general rule ID 11954 of the expanded rule table 11950 showing the processing-target expanded rule (Step 3003 ).
- the plan creation program 11100 acquires from the rule-plan association table 11940 one or more general plan IDs corresponding to the general rule ID acquired in Step 3003 .
- the plan creation program 11100 also acquires from the rule-plan association table 11940 an unresolved event ID corresponding to a combination of the acquired general plan ID and the general rule ID acquired in Step 3003 (Step 3004 ).
- the plan creation program 11100 references the configuration information table 11810 and the general plan table 11930 , and creates an expanded plan which is based on a general plan having the general plan ID acquired in Step 3004 and which corresponds to the processing-target expanded rule, and stores the expanded plan table 11960 showing the created expanded plan in the rule-plan information repository 11900 (Step 3005 ).
- the plan creation program 11100 references the configuration information table 11810 and identifies all the disk pools 22200 that could be the migration-destination pool.
- the plan creation program 11100 based on the configuration information table 11810 , identifies a disk pool 22200 , which differs from the migration-source pool and can be accessed by the physical server 10000 , which had been accessing the migration-target volume, and regards the identified disk pool 22200 as the migration-destination pool.
- the plan creation program 11100 repeatedly executes the processing of Steps 3007 and 3008 with respect to each expanded plan created in Step 3005 (Step 3006 ).
- the plan creation program 11100 references the performance information table 11820 , calculates a predicted value for the post-plan execution performance value in accordance with simulating a post-plan execution state, and, based on the simulated result information, updates the value of the plan target 11964 in the expanded plan table 11960 showing the processing-target expanded plan (Step 3007 ).
- the plan creation program 11100 instructs the post-plan execution risk selection program 11200 to execute a post-plan execution risk selection process (refer to FIG. 17 ) for the processing-target expanded plan (Step 3008 ).
- the plan creation program 11100 inputs to the post-plan execution risk selection program 11200 an unresolved event ID of an unresolved event corresponding to the processing-target expanded plan, that is, an unresolved event, which corresponds to a combination of the general rule constituting the basis of the processing-target expanded rule and the general plan constituting the basis of the processing-target expanded plan.
- a risk site with respect to the processing-target expanded plan is identified in accordance with the post-plan execution risk selection process.
- the plan creation program 11100 after completing the processing (Steps 3003 through 3008 ) for all the acquired analysis result entries, instructs the plan presentation processing program 11300 to execute a plan presentation process (refer to FIG. 18 ) (Step 3009 ). Thereafter, the plan creation program 11100 ends the plan creation process.
- This example takes up performance information, in particular, a predicted value for an I/O response time, and gives an example of a simulation method, but a value other than a performance value may be used as the value to be stored in the expanded plan table 11960 as long as the value is capable of serving as an index, which demotes the characteristics of the plan.
- the management server 10000 may perform the same simulation as that for a performance value by storing cost information pertaining to plan execution and time information pertaining to plan execution in the configuration information table 11810 or the performance information table 11820 .
- FIG. 17 is a flowchart of the post-plan execution risk selection process related to Example 1.
- the post-plan execution risk selection process is executed in accordance with the management server 10000 processor 15000 executing the post-plan execution risk selection program 11200 deployed on the memory 11000 .
- the post-plan execution risk selection program 11200 uses an unresolved event ID received from the plan creation program 11100 to select an unresolvable event from among the actually occurred condition events registered in the received event ID 11976 of the processing-target analysis result entry in the analysis result management table 11970 (Step 4001 ).
- unresolvable event refers to an event, which corresponds to the condition event shown by the unresolved event ID, from among the condition events that have actually occurred.
- the processing-target expanded rule that is, the expanded rule shown by the expanded rule ID 11975 of the processing-target analysis result entry is the expanded rule “ExRule 1-1”, and the general rule, which constitutes the basis of the processing-target expanded rule, is the general rule “Rule 1”.
- the unresolved event corresponding to the processing-target expanded plan “ExPlan 1-1” is the unresolved event corresponding to the combination of the general plan “Plan 1”, which constitutes the basis of the expanded plan “ExPlan 1-1”, and the general rule “Rule 1”, and the unresolved event “3” is acquired from the rule-plan association table 11940 of FIG. 10 .
- the unresolved event “3” shows the condition event 3 of the general rule “Rule 1”, that is, the response time threshold abnormality event in the disk pool 22200 of the storage apparatus 20000 .
- the event “EV3”, which corresponds to the response time threshold abnormality event in the disk pool 22200 of the storage apparatus 20000 , is selected as the unresolvable event from among the events (event “EV2”, event “EV3”, and event “EV5”) registered in the received event ID 11976 of the processing-target analysis result entry.
- the post-plan execution risk selection program 11200 references the event management table 11910 and the expanded rule table 11950 , and identifies the occurrence sites (the occurrence source apparatus and device) of the unresolvable event selected in Step 4001 (Step 4002 ).
- the post-plan execution risk selection program 11200 references the configuration information table 11810 , and selects anyone or more of the unresolvable event occurrence sites and the sites (apparatus and device) associated on the I/O path with the unresolvable event occurrence sites as risk sites (Step 4003 ).
- Step 4004 the post-plan execution risk selection program 11200 stores data showing the selected risk sites in the risk site 11969 of the processing-target expanded plan in the expanded plan table 11969 (Step 4005 ), and ends the post-plan execution risk selection process.
- the post-plan execution risk selection program 11200 ends the post-plan execution risk selection process.
- the risk site for example, refer to FIG. 12
- other sites on the I/O path shown by the entry of the configuration information table 11810 for example, the server data I/F and the logical volume of the storage apparatus 20000 may also be selected as the risk site.
- FIG. 18 is a flowchart of the plan presentation process related to Example 1.
- the plan presentation process is executed in accordance with the management server 10000 processor 15000 executing the plan presentation program 11300 deployed on the memory 11000 .
- the plan presentation program 11300 acquires from the analysis result management table 11970 information showing a failure cause, that is, the values of the cause apparatus ID 11971 , the cause component ID 11972 , the metrics 11973 , and the certainty factor 11974 (Step 5001 ).
- plan presentation program 11300 implements the processing of Step 5002 below with respect to each analysis result entry of the analysis result management table 11970 .
- the expanded rule having the expanded rule ID stored in the expanded rule ID 11975 of the processing-target analysis result entry will be referred to as “processing-target expanded rule”.
- the plan presentation program 11300 acquires from the rule-plan information repository 11900 one or more expanded plan tables 11960 , which show one or more expanded plans corresponding to the processing-target expanded rule (plans, which are candidates for failure recovery) (Step 5002 ).
- the plan presentation program 11300 After completing the processing for all the analysis result entries (Step 5002 ), the plan presentation program 11300 creates a plan presentation screen (refer to FIG. 20 ) based on information showing the failure cause and the certainty factor acquired in Step 5001 and the expanded plan table 11960 acquired in Step 5002 , and causes the output device 14000 to display the created plan presentation screen (Step 5003 ). Thereafter, the plan presentation program 11300 ends the plan presentation process.
- FIG. 19 is a flowchart of an effect-risk presentation process related to Example 1.
- plan presentation process is executed and the plan presentation screen is displayed in accordance with the management server 10000 processor 15000 executing the plan presentation program 11300 deployed on the memory 11000 .
- the plan presentation program 11300 starts the effect-risk presentation process when a desired expanded plan is selected and an input requesting the display of the plan details screen with respect to the selected expanded plan is received in the plan presentation screen (Step 6001 ).
- the plan presentation program 11300 references the configuration information table 11810 and acquires risk site status information, performance information, and setting information (Step 6002 ).
- the plan presentation program 11300 references the expanded plan table 11960 and the analysis result management table 11970 , and acquires information showing which event of the condition events included in the expanded rule corresponding to the selected expanded plan occurred, and information showing which event will be resolved in a case where the selected expanded plan is implemented (Step 6003 ).
- the plan presentation program 11300 selects I/O path information associated with the selected expanded plan (Step 6004 ).
- plan presentation program 11300 creates a plan details screen (refer to FIG. 21 ) based on the information acquired in Steps 6002 through 6004 , and causes the output device 14000 to display the created plan details screen (Step 6005 ). Thereafter, the plan presentation program 11300 ends the effect-risk presentation process.
- FIG. 20 is a constitutional diagram of an example of the plan presentation screen related to Example 1.
- the plan presentation screen 9000 comprises a display area 9001 for displaying information, which is referenced when, in a case where a failure has occurred in the computer system, the administrator is investigating the cause of the failure and implementing countermeasures, specifically, information showing the corresponding relationship between a failure cause and a list of plans available with respect to the failure, a plan details button 9002 for displaying the details of the plans, and an execute plan button 9003 for executing a plan.
- the ID of the apparatus related to the failure cause event for example, the ID of the apparatus related to the failure cause event, the ID of the device related to the failure cause event, the type of the failure cause event, and the certainty factor of the failure cause, that is, the percentage of the number of condition events, which have actually occurred relative to the total number of condition events, are displayed in the display area 9001 , which displays the association between the failure cause and a plan for countering the failure.
- the plan presentation program 11300 acquires these values from the analysis result management table 11970 shown in FIG. 13 in Step 5001 of FIG. 18 .
- the display area 9001 displays information showing the contents of candidate plans, the costs pertaining to plan execution, the time required for plan execution (that is, the time during which the failure will remain, hereinafter referred to as “downtime”), post-plan execution performance information, and information showing a risk site.
- the information showing a risk site comprises data (for example, the name of the risk site) showing the risk site stored in the risk site 11969 of the expanded plan table 11960 , and data (for example, the name of the apparatus) showing the apparatus, which comprises the device regarded as the risk site.
- the management server 10000 may prestore the approximate cost pertaining to each general plan in the general plan table 11930 shown in FIG. 9 , and may determine the cost based on this information. For example, with respect to the plan for replacing the storage port of general plan “Plan 8”, the management server 10000 may store a sum obtained by adding up the cost of purchasing a storage port and the labor costs for the maintenance personnel needed to replace the storage port, and may display this sum as the cost. With respect to the volume migration plan of general plan “Plan 1”, the management server 10000 may store the cost per bit in a case where data is migrated from a certain storage device to a different storage device, and may calculate the cost displayed in the “Cost ($)” field of FIG. 20 in accordance with the capacity of the volume to be migrated.
- the management server 10000 may prestore data showing about how much data can be migrated per unit of time (for example, one second) for each combination of the media type of both the migration source and the migration destination storage devices and a RAID level, and may calculate the downtime displayed in the “Downtime” field of FIG. 20 in accordance with the capacity of the volume to be migrated.
- the management server 10000 may take into account the fact that the migration time will fluctuate in accordance with the utilization state in the actual environment, and, using past migration history information, may calculate the capacity capable of being migrated per unit of time, and may seek to revise the prestored information by averaging the prestored information and the historical information.
- One example of a method for determining the cost information and downtime pertaining to plan execution has been given here, but another determination method may be employed.
- the post-plan execution performance information for example, a predicted value for the post-plan execution performance value, which was simulated in Step 3007 of the plan creation process shown in FIG. 16 and stored in the plan target 11964 of the expanded plan table 11960 shown in FIG. 12 , and, for example, the values stored in the I/O Response Time predictions 11965 B, 11966 B, 11967 B, and 11968 C are displayed.
- the I/O Response Time predictions 11965 B, 11966 B, 11967 B, and 11968 C are displayed.
- a predicted value for the I/O response time with respect to the migration-target volume is displayed for the first plan from the top (the volume migration plan), and a predicted value for the I/O response time with respect to the add-target pool is displayed for the fifth plan from the top (the add disk to pool plan), but other values, for example, a predicted value for the I/O response time with respect to the migration-source pool and the migration-destination pool may be displayed for the first plan from the top (the volume migration plan), and other performance values may also be displayed.
- the predicted value for multiple performance values may also be displayed.
- the order in which the candidate plans are displayed here may be sorted on the basis of the characteristics of the plan, such as being arranged in order from the plan with the lowest plan execution cost, being arranged in order from the plan with the shortest time required for plan execution, or being arranged in order from the plan with no risk site.
- Any method may be used as the arrangement method, such as, for example, arranging the plans in order from the one having the lowest cost by clicking on the “Cost ($)” in the display area 9001 .
- the plan details button 9002 is for instructing the display of the plan details screen ( FIG. 21 ).
- the management server 10000 starts the execution of the effect-risk presentation process of FIG. 19 , and displays the plan details screen ( FIG. 21 ) for displaying the selected plan details information on the output apparatus 14000 .
- the plan execution button 9003 is for instructing the execution of the selected plan, and when this button is pressed, the management server 10000 issues a plan execution indication to the program, which provides the function corresponding to the selected plan.
- the program which receives the plan execution indication, executes the selected plan.
- the program, which executes the plan here, for example, is a program in the memory 11000 of the management server 10000 , and, for example, is a volume migration program or a VM migration program.
- predicted values for a pre-plan execution performance value and a post-plan execution performance value may be displayed in the display area 9001 in graph form as trend information.
- FIG. 20 is an example of the plan display screen 9000 , and plan characteristics-denoting information other than the plan execution cost and time required for plan execution, such as, for example, a list of business operations, which are using plan-related resources and could have an affect at plan execution time, may be displayed together in the display area 9001 , and another display mode may be used.
- plan characteristics-denoting information other than the plan execution cost and time required for plan execution such as, for example, a list of business operations, which are using plan-related resources and could have an affect at plan execution time, may be displayed together in the display area 9001 , and another display mode may be used.
- FIG. 21 is a constitutional diagram of an example of the plan details screen related to Example 1.
- the plan details screen 9010 comprises a display area 9011 for displaying information, which is referenced when, in a case where a failure has occurred in the computer system, the administrator is investigating the cause of the failure and implementing countermeasures, specifically, information showing the coupling relationship between the apparatus and the device related to the failure, and a display area 9017 for displaying detailed information regarding a risk site.
- the display area 9011 comprises a server area 9012 , which denotes the configuration of the physical server 30000 in the computer system, a switch area 9013 , which denotes the configuration of the switch, a storage area 9014 , which denotes the configuration of the storage apparatus 20000 , an area 9015 showing the coupling relationship and the setting relationship between each apparatus and device prior to executing the plan selected in the plan presentation screen 9000 , and an area 9016 showing the coupling relationship and the setting relationship between each apparatus and device after executing the selected plan.
- the display area 9017 for displaying detailed information regarding a risk site comprises a risk site 9018 , which denotes a site where a risk remains even after plan execution, and a risk occurrence timing 9019 , which denotes, when the risk site is left unattended, the timing at which the relevant risk will occur.
- a diagram 9020 denoting the physical server A, and a diagram 9021 denoting the storage apparatus A are displayed in the server area 9012 of the display area 9011 .
- diagrams denoting the drive “/opt” and the drive “/var”, which are mount points, are displayed inside the diagram 9020
- diagrams denoting the logical volume “LV1” and the disk pool “POOL1” are displayed inside the diagram 9021 . That is, the server area 9012 is for expressing the coupling relationships and the setting relationships of the computer system.
- a mark 9022 showing a failure event which has occurred in the system and is being managed by the analysis result management table 11910 , is displayed in the server area 9012 on the site where the failure event occurred, for example, on the data I/F “S2” of the physical server A.
- a mark 9023 showing a risk site which is being managed by the expanded plan table 11960 , is displayed in the server area 9012 on the risk occurrence site, for example, on the drive “/opt” of the physical server A.
- the coupling relationship of an apparatus and a device, such as the physical server A and the storage apparatus A, in the area 9011 is expressed using a solid line connecting the diagrams denoting the apparatus and the device.
- the area 9015 showing the state before plan execution shows the fact that the drive “/opt” of the physical server A, the logical volume “LV1” and the disk pool “POOL1” of the storage apparatus A are associated, the drive “/var” of the physical server A, the logical volume “LV2” and the disk pool “POOL1” of the storage apparatus A are associated, and the logical volume “LV3” and the disk pool “POOL2” of the storage apparatus A are associated.
- the area 9015 showing the state before plan execution also shows the fact that failure events have occurred in the drive “/var” on the physical server A, the data I/F on the physical server A, the data I/F of the storage apparatus A, the logical volume “LV2” of the storage apparatus A, and the disk pool “POOL1” of the storage apparatus A, and that a risk exists in the drive “/opt” on the physical server A.
- the area 9016 showing the state after plan execution shows that the logical volume “LV2”, which existed in the disk pool “POOL1”, has come to exist in the POOL2, and that even after plan execution, a failure event remains in the disk pool “POOL1” of the storage apparatus A, and a risk remains in the drive “/opt” on the physical server A.
- the display area 9017 showing the risk details.
- the risk details information which is on the physical server A drive “/opt” in the display area 9011 , is displayed in the area 9017 , and in the example shown in FIG. 21 , the area 9017 denotes that the risk site is the physical server A drive “/opt”, and that the timing at which the risk could occur is when an I/O is generated to the drive “/opt”.
- the risk occurrence timing is determined based on the information acquired in Step 6002 of FIG. 19 .
- the management server 10000 may acquire the information of the performance value 11825 of the performance information table 11820 , may detect the fact that the performance value related to the physical server A drive “/opt” is 0 msec and that an I/O has not occurred, may judge that the occurrence of an I/O could trigger the occurrence of a risk, and may store the relevant information in the risk occurrence timing 9019 .
- the plan details screen 9010 here comprises the display area 9011 and the display area 9017 , but the display method is not limited thereto, and, for example, when the mark 9023 showing the risk site of the display area 9011 is clicked, the display area 9017 may be newly displayed as a separate screen. Or, when the risk site information in the display area 9001 of the plan presentation screen 9000 is clicked, the display area 9017 may be newly displayed as a separate screen. Also, when a diagram showing either an apparatus or a device, such as the physical server 30000 or the storage apparatus 20000 of the display area 9011 is clicked, the performance values of either the relevant apparatus or device before and after plan execution may be displayed.
- Example 1 it is possible to associatively present a failure cause and a specific recovery plan for the failure, and to check the number of events, of the failure events related to the failure cause, that has been resolved in accordance with executing the plans, and by displaying the result, to present the administrator with a problem site, which latently remains as-is even after plan execution, together with the reason therefor as plan details information. This makes it possible for the administrator to select an appropriate plan, and to readily ascertain post-plan execution risk at plan selection time.
- Example 2 will be explained.
- the focus will be on explaining the differences with Example 1, and descriptions of similar components, programs having similar functions, and tables having similar items will be omitted.
- FIG. 22 is a constitutional diagram of an example of a management server related to Example 2.
- the management server 10000 comprises the same components as in Example 1, and, in addition, a VM configuration management table 11830 is stored in the configuration/performance information repository 11800 of the memory 11000 .
- the VM configuration management table 11830 manages the corresponding relationship between a VM and a hypervisor (hereinafter also called “HV”), which logically creates and runs the VM, and VM setting information, for example, power status information and the like.
- HV hypervisor
- the various programs and tables are stored in the memory 11000 , but may be stored in either the storage device 12000 or another storage medium (not shown in the drawing).
- the processor 15000 reads a target program on the memory 11000 at program execution time, and executes the program that was read.
- the above-described programs and the above-described tables may be stored in either the memory 21000 of the storage apparatus 20000 or the memory 31000 of the physical server 30000 , and either the storage apparatus 20000 or the physical server 30000 may execute the stored programs.
- Another apparatus such as a switch, may store the above-described programs and tables, and may execute the stored programs.
- FIG. 23 is a constitutional diagram of an example of a physical server related to Example 2.
- the physical server 30000 comprises the same components as in Example 1, and, in addition, a VM management program 31400 is stored in the memory 31000 .
- the VM management program 31400 manages the configuration information and the performance information of the VM.
- the VM management program 31400 also carries out VM-related control, such as VM migration.
- various programs are stored in the memory 31000 , but maybe stored in another storage medium (not shown in the drawing).
- the processor 33000 reads a target program on the memory 31000 at process execution time, and executes the program that was read.
- FIG. 24 is a logical constitutional diagram of an example of the physical server related to Example 2.
- the physical server 30000 comprises an HV 80000 , which logically creates a VM 70000 , and runs the created VM 70000 .
- the HV 80000 can control multiple VMs 70000 at one time. Each of the multiple VMs 70000 can execute an application just like a standalone physical computer.
- FIG. 25 is a constitutional diagram of an example of a switch related to Example 2.
- the switch 60000 comprises a memory 61000 , a management I/F 62000 , a processor 63000 , and a switch data I/F 64000 , and these apparatuses are coupled via an internal bus 65000 such as the internal bus 65000 .
- the switch 60000 is an apparatus for selecting a communication route from the data I/F 32000 of the physical server 30000 to the data I/F 26000 of the storage 20000 .
- a configuration/performance information collection program 61100 is stored in the memory 61000 .
- the configuration/performance information collection program 61100 is for sending/receiving the management information and the performance information of the switch 60000 and the like to/from the management server 10000 .
- FIG. 26 is a constitutional diagram of an example of a configuration information table related to Example 2.
- the configuration information table 11810 comprises the field of switch information 11818 in addition to the fields of the configuration information table 11810 related to Example 1.
- the switch information 11818 comprises a switch 11818 A in which is stored the switch 60000 identifier, a switch data I/F IN 11818 B in which is stored data showing the input data I/F of the switch 60000 , and a switch data I/F OUT 11818 C in which is stored data showing the output data I/F of the switch 60000 .
- the switch information 11818 is arranged between the fields of the physical server 11811 and the storage 11814 , but the relative position of this field shows the relative positions of the apparatus and device on the communication route. For example, the second entry from the top of FIG.
- FIG. 26 shows that switch B and switch C exist between the server data I/F “S2” of the physical server A and the storage data I/F “P2” of the storage apparatus A. More specifically, this entry shows that the physical server A server data I/F “S2” is coupled to the switch B data I/F “R10”, the switch B data I/F “R11” and the switch C data I/F “R20” are coupled, and the switch C data I/F “R21” and the storage apparatus A storage data I/F “P2” are coupled.
- the VM 70000 configuration information is also stored in the information configuration table 11810 the same as the physical server 30000 configuration information. Therefore, an identifier for uniquely identifying the VM 70000 is stored in the physical server 11811 .
- the fifth entry from the top of FIG. 26 shows that the logical volume “LV10”, which was created from the disk pool “POOL3” of storage B, is coupled to the VM 1 via the storage B storage data I/F “P3”, the switch A data I/F “R2”, “R1” and the server data I/F “S3”, and is recognized as logical volume “E:” on the VM 1 .
- FIG. 27 is a constitutional diagram of an example of the VM configuration management table related to Example 2.
- the configuration setting management program 11600 comprises the fields of a virtual server 11831 , a power status 11832 , a physical server 11833 , and a server data I/F 11834 .
- An identifier for uniquely identifying a VM 70000 is stored in the virtual server 11831 .
- Data showing the power status of the VM 70000 for example, “ON”, “OFF”, or “SUSPEND” is stored in the power status 11832 .
- An identifier for uniquely identifying the physical server 30000 on which the VM 70000 is running is stored in the physical server 11833 .
- An identifier for uniquely identifying the server data I/F of the physical server 30000 is stored in the server data I/F 11834 .
- FIG. 28 is a constitutional diagram of an example of the performance information table related to Example 2.
- the configuration of the performance information table 11820 related to Example 2 is substantially the same as the configuration of the performance information table 11820 related to Example 1.
- VM 70000 performance information, hypervisor 80000 performance information, and switch 60000 performance information are also stored in the performance information table 11820 related to Example 2 as performance information related to either an apparatus or a device comprising the computer system.
- the VM 70000 and the hypervisor 8000 are also treated as apparatuses.
- an identifier for uniquely identifying either the VM 70000 , the hypervisor 80000 , or the switch 60000 is stored in the apparatus ID 11821 .
- FIG. 29 is a constitutional diagram of an example of the event management table related to Example 2.
- the configuration of the event management table 11910 related to Example 2 is substantially the same as the configuration of the event management table 11910 related to Example 1.
- information related to an event that has occurred in the VM 70000 , information related to an event that has occurred in the hypervisor 80000 , and information related to an event that has occurred in the switch 60000 are also stored as information related to an event, which occurs in either an apparatus or a device comprising the computer system.
- FIG. 30A is a first constitutional diagram of a general rule table related to Example 2.
- FIG. 30B is a second constitutional diagram of a general rule table related to Example 2.
- the configuration of the general rule table 11920 related to Example 2 is substantially the same as the configuration of the general rule table 11920 related to Example 1.
- a VM 70000 -related event, a hypervisor 80000 -related event, and a switch 60000 -related event are also used as a condition event, which is defined by the condition part 11921 and the conclusion part 11922 of the general rule.
- FIG. 31 is a constitutional diagram of an example of a rule-plan association table related to Example 2.
- the configuration of the rule-plan association table 11940 related to Example 2 is substantially the same as the configuration of the rule-plan association table 11940 related to Example 1.
- general rules “Rule 3” and “Rule 4” are taken up and explained as general rules
- general plans “Plan 1” and “Plan 6” are taken up and explained as general plans.
- FIG. 32A is a first constitutional diagram of the expanded rule table related to Example 2.
- FIG. 32B is a second constitutional diagram of the expanded rule table related to Example 2.
- the configuration of the expanded rule table 11950 related to Example 2 is substantially the same as the configuration of the expanded rule table 11950 related to Example 1.
- a VM 70000 -related event, a hypervisor 80000 -related event, and a switch 60000 -related event are also used as a condition event, which is defined by the condition part 11951 and the conclusion part 11952 of the expanded rule.
- FIG. 33A is a first constitutional diagram of the expanded plan table related to Example 2.
- FIG. 33B is a second constitutional diagram of the expanded plan table related to Example 2.
- FIG. 33C is a third constitutional diagram of the expanded plan table related to Example 2.
- the configuration of the expanded plan table 11960 related to Example 2 is substantially the same as the configuration of the expanded plan table 11960 related to Example 1.
- the expanded plan shown in either FIG. 33B or FIG. 33C is an expanded plan based on the general plan “Plan 6”, that is, an expanded plan related to a VM migration, and the plan target 11964 , for example, comprises the fields of a target VM 1196 B, a migration source 1196 C, and a migration destination 1196 D.
- the target VM 1196 B comprises the fields of an ID 1196 BA in which is stored the identifier of the VM 70000 (hereinafter referred to as “target VM”), which is the target of the VM migration, and a performance 1196 BB in which is stored the target VM post-migration performance value.
- the migration source 1196 C comprises the fields of an ID 1196 CA in which is stored the identifier of the hypervisor 80000 , which is the migration source of the target VM (hereinafter referred to as “migration-source hypervisor”), and a performance 1196 CB in which is stored the performance value of the migration-source hypervisor after the target VM has been migrated.
- the migration destination 1196 D comprises the fields of an ID 1196 DA in which is stored the identifier of the hypervisor 80000 , which is the migration destination of the target VM (hereinafter referred to as “migration-destination hypervisor”), and a performance 1196 DB in which is stored the performance value of the migration-destination hypervisor after the target VM has been migrated.
- the plan creation program 11100 acquires the identifiers stored in the ID 1196 BA, the ID 1196 CA, and the ID 1196 DA from the configuration information table 11810 and so forth, and stores these identifiers in the ID 1196 DA. Furthermore, any method may be used to calculate the value of the performance information predicted values stored in the performance 1196 BB, the performance 1196 CB, and the performance 1196 DB, and, for example, the plan creation program 11100 may determine the predicted values in accordance with either adding or subtracting the IOPS as shown in Example 1. An example of performance information has been described here, but cost information, or time information, such as system downtime resulting from a failure, may be stored.
- a single hypervisor 80000 serves as the migration source and the migration destination here, but a set of multiple hypervisors 80000 , which share resources, or a data store inside the hypervisor 80000 may serve as the migration source and the migration destination.
- FIG. 34 is a constitutional diagram of an example of the analysis result management table related to Example 2.
- the configuration of the analysis result management table 11970 related to Example 2 is substantially the same as the configuration of the analysis result management table 11970 related to Example 1.
- the VM 70000 identifier, the hypervisor 80000 identifier, the switch 60000 identifier, and the identifiers of the devices thereof can be stored in a cause apparatus ID 11971 and a cause component ID 11972 .
- the identifier of an event related to the VM 70000 , the identifier of an event related to the hypervisor 80000 , and the identifier of an event related to the switch 60000 can be stored in the received event ID 11976 .
- FIG. 35 is a flowchart of the plan presentation process related to Example 2.
- the plan presentation process is executed in accordance with the management server 10000 processor 15000 executing the plan presentation program 11300 deployed on the memory 11000 .
- the plan presentation program 11300 acquires information showing a failure cause from the analysis result management table 11970 , that is, the values of the cause apparatus ID 11971 , the cause component ID 11972 , the metrics 11973 , and the certainty factor 11974 (Step 7001 ).
- the plan presentation program 11300 executes the processing of Steps 7002 through 7005 below with respect to each of one or more expanded plans shown by one or more expanded plan tables 11960 stored in the rule-plan information repository 11900 .
- the plan presentation program 11300 determines whether or not there exists among the one or more expanded plans shown by the one or more expanded plan tables 11960 stored in the rule-plan information repository 11900 an expanded plan (hereinafter referred to as “first consolidation-target plan”), which has an expanded rule ID 1196 A value that differs from that of the processing-target expanded plan, that is, a corresponding expanded rule that is different, and, in addition, has the same processing content as that of the processing-target expanded plan (Step 7002 ).
- Step 7004 the plan presentation program 11300 advances the processing to Step 7004 .
- the plan presentation program 11300 deletes the first consolidation-target plan from the expanded plan table 11960 , updates the value of the expanded rule ID 1196 A of the expanded plan table 11960 comprising the processing-target expanded plan (Step 7003 ), and advances the processing to Step 7004 .
- the expanded plans are such that the expanded plan “ExPlan 6-1” and the expanded plan “ExPlan 6-3”, and the expanded plan “ExPlan 6-2” and the expanded plan “ExPlan 6-4” have corresponding expanded rules that differ from one another, and, in addition, have the same processing content.
- the plan presentation program 11300 identifies the expanded plan “ExPlan 6-3” as the first consolidation-target plan in Step 7002 , deletes the expanded plan “ExPlan 6-3” from the expanded plan table 11960 in Step 7003 , and updates the value in the expanded rule ID 1196 A of the expanded plan table 11960 comprising the expanded plan “ExPlan 6-1” to data showing the expanded rule “ExRule 3-1” and the expanded rule “ExRule 4-1”, for example, to “ExRule 3-1, ExRule 4-1”.
- the plan presentation program 11300 identifies the expanded plan “ExPlan 6-4” as the first consolidation-target plan in Step 7002 , deletes the expanded plan “ExPlan 6-4” from the expanded plan table 11960 in Step 7003 , and updates the value in the expanded rule ID 1196 A of the expanded plan table 11960 comprising the expanded plan “ExPlan 6-2” to “ExRule 3-1, ExRule 4-1”.
- the value in the expanded rule ID 1196 A of an existing expanded plan table 11960 is updated here, but a new expanded plan table 11960 in which “ExRule 3, ExRule 4” is stored in the expanded rule ID 1196 A may be prepared.
- the plan presentation program 11300 determines whether or not there exists in the one or more expanded plans shown by the one or more expanded plan tables 11960 stored in the rule-plan information repository 11900 an expanded plan (hereinafter referred to as “second consolidation-target plan”), which has a general plan ID 11962 that is the same as the processing-target expanded plan, that is, which has the same general plan as a basis, and, in addition, has similar performance information and has the same risk.
- second consolidation-target plan which has a general plan ID 11962 that is the same as the processing-target expanded plan, that is, which has the same general plan as a basis, and, in addition, has similar performance information and has the same risk.
- Step 7004 In a case where a second consolidation-target plan does not exist (Step 7004 : No), the plan presentation program 11300 advances the processing to the next process without executing Step 7005 .
- the plan presentation program 11300 identifies from among the processing-target expanded plan and one or more second consolidation-target plans an expanded plan (hereinafter referred to as “best plan”) having the best predicted value for the post-plan execution performance value.
- the plan presentation program 11300 deletes from the expanded plan table 11960 the expanded plan, which is not the best plan of the processing-target expanded plan and the one or more second consolidation-target plans, and updates the value in the expanded rule ID 1196 A of the expanded plan table 11960 (Step 7005 ).
- the expanded plan “ExPlan 1-1”, the expanded plan “ExPlan 1-2”, and the expanded plan “ExPlan 1-3”, which are expanded plans having the same general rule as a basis, and, in addition, similar performance information and the same risk the expanded plan “ExPlan 1-1” is the best plan in that the migration-target volume performance is the best. Therefore, only the best plan “ExPlan 1-1” is left as-is, and the other expanded plan “ExPlan 1-2” and expanded plan “ExPlan 1-3” are deleted.
- the expanded plan “ExPlan 6-1” and the expanded plan “ExPlan 6-2” which are expanded plans having the same general rule as a basis, and, in addition, similar performance information and the same risk
- the expanded plan “ExPlan 6-1” is the best plan in that the target VM performance is the best. Therefore, only the best plan “ExPlan 6-1” is left as-is, and the other expanded plan “ExPlan 6-2” is deleted.
- the range for judging that the performance information is similar may be fixedly configured beforehand as the I/O response time being within the range of equal to or less than ⁇ 1 msc, and may be configured by the administrator via the input device 13000 .
- the plan presentation program 11300 only leaves the best plan, for example, the one expanded plan having the fastest I/O response time, and deletes the other expanded plan(s), but may leave as-is multiple expanded plans having good predicted values for the post-plan execution performance values.
- the number of expanded plans left after consolidation may be fixedly decided beforehand, and may be configured by the administrator via the input device 13000 .
- the number of post-consolidation expanded plans may be decided so that all of the expanded plans are capable of being displayed in the output screen.
- the object of this processing is to prevent the plan selection operation performed by the administrator from becoming troublesome due to the display of a large number of similar expanded plans. For example, rather than deleting an expanded plan, a method, which displays only an expanded plan having a good post-plan execution performance value and does not display another expanded plan, and which switches between displaying and not displaying by clicking a prescribed button, may be used.
- the plan presentation program 11300 After completing the processing (Steps 7002 through 7005 ) for all the expanded plans, the plan presentation program 11300 , based on the information showing the failure cause and the certainty factor acquired in Step 7001 and the expanded plan table 11960 stored in the rule-plan information repository 11900 , creates a plan presentation screen 9000 (refer to FIG. 36 ) and causes the output device 14000 to display the created plan presentation screen 9000 (Step 7006 ). Thereafter, the plan presentation program 11300 ends the plan presentation process.
- FIG. 36 is a constitutional diagram of an example of the plan presentation screen related to Example 2.
- the configuration of the plan presentation screen 9000 related to Example 2 is substantially the same as the configuration of the plan presentation screen 9000 related to Example 1.
- Example 2 the expanded plans shown in FIGS. 33A, 33B, and 33C are created in accordance with the plan creation process. Specifically, the expanded plan “ExPlan 1-1”, the expanded plan “ExPlan 1-2”, and the expanded plan “ExPlan 1-3” are created as volume migration-related expanded plans, and the expanded plan “ExPlan 6-1”, the expanded plan “ExPlan 6-2”, the expanded plan “ExPlan 6-3”, and the expanded plan “ExPlan 6-4” are created as VM migration-related expanded plans. That is, a total of seven expanded plans are created. According to the plan presentation process related to Example 2 shown in FIG.
- Example 2 it is possible to associatively present a failure cause and a specific recovery plan for the failure, and to check the number of events, of the failure events related to the failure cause, that has been resolved in accordance with executing the plans, and by displaying the result, to present the administrator with a problem site, which latently remains as-is even after plan execution, together with the reason therefor as plan details information. This makes it possible for the administrator to select an appropriate plan, and to readily ascertain post-plan execution risk at plan selection time.
- Example 3 will be explained.
- the focus of the explanation will be on the differences with Example 1 and Example 2, and descriptions of similar components, programs having similar functions, and tables having similar items will be omitted.
- FIG. 37 is a constitutional diagram of an example of a management server related to Example 3.
- the management server 10000 comprises the same components as in Example 2, and, in addition, a maintenance information management program 11110 is stored in the memory 11000 .
- a maintenance information management table 11980 is stored in the rule-plan information repository 11900 .
- the maintenance information management table 11980 manages information related to replacement and maintenance operations due to apparatus renewal.
- the various programs and tables are stored in the memory 11000 , but these programs and tables may be stored in either the storage device 12000 or another storage medium (not shown in the drawing).
- the processor 15000 reads a target program on the memory 11000 at program execution time, and executes the program that was read.
- the above-described programs and the above-described tables may be stored in either the storage apparatus 20000 memory or the physical server 30000 memory, and either the storage apparatus 20000 or the physical server 30000 may execute the stored programs.
- Another apparatus, such as a switch 60000 may store the above-described programs and tables, and may execute the stored programs.
- FIG. 38 is a constitutional diagram of an example of a general plan table related to Example 3.
- the general plan table 11930 manages a list of plans available in the computer system.
- the general plan table 11930 comprises the fields of a general plan ID 11931 , a plan 11932 , and a maintenance action 11933 .
- a general plan ID which is the identifier of the general plan, is stored in the general plan ID 11931 .
- Information showing the plans, which are available in the computer system, is stored in the plan 11932 .
- Information showing whether or not the relevant plan is a plan related to a maintenance schedule is stored in the maintenance action 11933 .
- a plan in which physical hardware is to be replaced such as the storage port replacement of the general plan “Plan 8” and the switch replacement of the general plan “Plan 9”, is regarded as a plan related to a maintenance schedule.
- FIG. 39 is a constitutional diagram of an example of a rule-plan association table related to Example 3.
- the configuration of the rule-plan association table 11940 related to Example 3 is substantially the same as the configuration of the rule-plan association table 11940 related to Example 1.
- general rule “Rule 4” is taken up and explained as the general rule
- general plans “Plan 6” and “Plan 9” are taken up and explained as the general plans.
- FIG. 40A is a first constitutional diagram of the expanded plan table related to Example 3.
- FIG. 40B is a second constitutional diagram of the expanded plan table related to Example 3.
- the configuration of the expanded plan table 11960 related to Example 3 is substantially the same as the configuration of the expanded plan table 11960 related to Example 1.
- the plan target 11964 comprises the fields of a replacement-target switch 1196 E in which is stored the identifier of the switch 60000 , which is targeted for replacement (hereinafter referred to as “replacement-target switch”), and a Cost 1196 F in which is stored data denoting the replacement cost.
- the plan creation program 11100 acquires the identifier of the replacement-target switch from the configuration information table 11810 , and stores this identifier in replacement-target switch 1196 E.
- the plan creation program 11100 acquires the value stored in the Cost 1196 F from the maintenance information management table 11980 , and stores this value in the Cost 1196 F.
- An example in which only the replacement target identification information and the cost information are stored has been described here, but other information, for example, information showing about how much time it will take to replace the switch 60000 may be stored.
- FIG. 41 is a constitutional diagram of an example of the maintenance information management table related to Example 3.
- the maintenance information management table 11980 manages schedule information in accordance with which the administrator performs a maintenance operation, such as hardware replacement. This table 11980 , for example, is created manually and inputted by the administrator.
- the maintenance information management table 11980 comprises the fields of an apparatus 11981 , an apparatus component 11982 , a replacement reason 11983 , a replacement date/time 11984 , an affected service 11985 , and a cost 11986 .
- the apparatus ID of an apparatus targeted for a maintenance operation is stored in the apparatus 11981 .
- the identifier of a device targeted for a maintenance operation is stored in the apparatus component ID 11982 .
- Information showing the reason that a replacement was scheduled is stored in the replacement reason 11983 .
- Information showing the date and time of the replacement is stored in the replacement date/time 11984 .
- the identifier of a service, which will be affected by replacing a device targeted for the maintenance operation, is stored in the affected service 11985 .
- Information showing the cost of replacing the device targeted for the maintenance operation is stored in the cost 11986 .
- FIG. 42 is a flowchart of the effect-risk presentation process related to Example 3.
- Steps 8001 through 8004 is the same processing as the processing of Steps 6001 through 6004 in the effect-risk presentation process related to Example 1, and as such, explanations of these steps will be omitted.
- the plan presentation program 11300 references the information in the maintenance action 11933 of the general plan table 11930 and the maintenance information table 11980 , and acquires information related to a maintenance schedule.
- plan presentation program 11300 creates a plan details screen 9010 (refer to FIG. 43 ) based on the information acquired in Steps 8002 through 8005 , and causes the output device 14000 to display the created plan details screen 9010 (Step 8006 ). Thereafter, the plan presentation program 11300 ends the effect-risk presentation process.
- FIG. 43 is a constitutional diagram of an example of the plan details screen related to Example 3.
- the configuration of the plan details screen 9010 related to Example 3 is substantially the same as the configuration of the plan details screen 9010 related to Example 1.
- the area 9015 showing the state before plan execution shows the fact that the VM 1 drive “E:”, the switch A, the data I/F “P3” of the storage B, the logical volume “LV10” of the storage B, and the disk pool “POOL3” are associated, the VM2 drive “F:”, the switch A, the logical volume “LV11” of the storage B, and the disk pool “POOL3” are associated, and the VM3 drive “D:”, the switch B, the logical volume “LV12” of the storage B, and the disk pool “POOL4” are associated.
- the area 9015 showing the state before plan execution also shows the fact that failure events have occurred in the VM 2 drive “F:”, the switch A data I/F, and the storage B data I/F, and that a risk exists in the VM 1 drive “E:”.
- the area 9016 showing the state after plan execution shows that the VM 2 , which had been running on the hypervisor 80000 of the physical server B, is running on the hypervisor 80000 of the physical server C, and that after plan execution, the VM 2 drive “F:”, the switch B, the storage B data I/F, the logical volume “LV11” of the storage B, and the disk pool “POOL3” are associated, a failure event remains in the switch A data I/F and the storage B data I/F, and a risk remains in the VM 1 .
- the area 9017 which shows the risk details, denotes that the risk site is the VM 1 drive “E:”, and that the timing at which the risk could occur is when the power to the drive “E:” is turned ON.
- the risk occurrence timing is determined based on the information acquired in Step 8002 of FIG. 42 .
- the management server 10000 may acquire the information in the power status 11832 of the VM configuration management table 11830 , may detect the fact that the VM 1 power status is OFF, may judge that the resumption of business operations accompanying the powering ON of the VM could trigger the occurrence of a risk, and may store the relevant information in the risk occurrence timing 9019 .
- a display area 9022 for showing the maintenance schedule for example, comprises the fields of an area 9023 in which is displayed the identifier of either an apparatus or a device targeted for replacement in a maintenance operation, an area 9024 is which is displayed a replacement reason, and an area 9025 in which is displayed a replacement date/time.
- a replacement target, which is displayed in the display area 9022 showing the maintenance schedule for example, is limited to either an apparatus or a device in which a failure event has occurred, and this information is acquired via the processing of Step 8005 in the effect-risk presentation process of FIG. 42 .
- the information for example, the cost information shown in the maintenance information management table 11980 of FIG. 41 may be collectively displayed in the display area 9022 .
- the administrator who references the information related to the maintenance schedule, for example, is able to check the replacement date/time in accordance with the maintenance of either the apparatus or device where the failure occurred, and, along with taking actions with respect to the failure, to either replace the switch ahead of schedule, or, since switch replacement is near at hand, to select a VM migration-based plan as a temporary measure even though a certain amount of risk remains.
- Example 3 the display area 9011 , the display area 9017 , and the display area 9022 are displayed in the same screen, but the display method is not limited thereto, and, for example, when a diagram showing either the apparatus or the device configured in the maintenance schedule of the display area 9011 , for example, switch A, is clicked, the display area 9022 may be newly displayed as a separate screen. Or, when a plan displayed in the display area 9001 of the plan presentation screen 9000 is clicked, the display area 9022 may be newly displayed as a separate screen. When a diagram showing either an apparatus or a device, such as the physical server 30000 or the storage apparatus 20000 in the display area 9011 is clicked, the performance values of the relevant apparatus or device before and after plan execution may be displayed.
- a diagram showing either an apparatus or a device such as the physical server 30000 or the storage apparatus 20000 in the display area 9011 is clicked, the performance values of the relevant apparatus or device before and after plan execution may be displayed.
- Example 3 it is possible to associatively present a failure cause and a specific recovery plan for the failure, and to check the number events, of the failure events related to the failure cause, that has been resolved in accordance with executing the plans, and by displaying the result, to present the administrator with a problem site, which latently remains as-is even after plan execution, together with the reason therefor as plan details information. This makes it possible for the administrator to select an appropriate plan, and to readily ascertain post-plan execution risk at plan selection time.
- a check can be performed in conjunction with a maintenance schedule in the plan details screen 9010 , making it easier for the administrator to ascertain the importance of the affects of the maintenance, thereby enabling the administrator to select a plan that will reduce costs.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A management system manages a computer system including multiple monitoring-target devices. A storage device of the management system stores a general rule, general plan information, unresolved information, and configuration information. A control device of the management system creates multiple expanded rules based on the general rule and the configuration information, and if an event related to any of the multiple monitoring-target devices has occurred, identifies, based on the multiple expanded rules, a first conclusion event constituting a candidate for the cause of the occurred event, creates, based on the general plan information, one or more expanded plans, which are recovery plans that can be implemented if the first conclusion event is a cause, identifies an unresolved event based on the unresolved information, identifies a risk site based on the identified unresolved event, and displays data showing the first conclusion event, expanded plan, and risk site.
Description
- The present invention, for example, relates to technology for managing a computer system comprising monitoring-target devices, such as a host computer, a network switch, and a storage apparatus.
- In the management of a computer system, the use of Event Correlation technology or another such technology for identifying the cause of an event-based failure makes it possible for the administrator of the computer system to detect the cause of a failure, which has occurred in the computer system (refer to Patent Literature 1).
- There exists technology in which an analysis engine for analyzing the causal relationship between multiple failure events, which have occurred in a machine that is under management, applies a general rule, which comprises a predetermined condition statement and conclusion statement, to an event related to the machine under management, for example, an event in which a performance value exceeds a prescribed threshold, to create an expanded rule comprising a conclusion event, which is the cause of a drop in performance, and a condition event group caused thereby, and identifies the failure based on the created expanded rule (refer to Patent Literature 2).
- There are numerous useful methods capable of being implemented in recent computer systems as recovery methods with respect to the cause of a failure identified in accordance with cause analysis, and, for example, there is a method for recovering from a failure by performing an appropriate data migration at system resource (virtual machine, data) allocation. As data migration technology in an environment in which multiple virtual host computers (that is, virtual machines, hereinafter called “VM”) are running on a physical host computer, for example, technology (first VM migration) for transferring a VM operating environment from a certain physical host computer to another physical host computer, and technology (second VM migration) for migrating a VM stored in a certain storage area to another storage area, in accordance with information denoting VM performance and resource utilization information are known. Here, the VM is a type of data stored in a storage area, and the VM migration (the first VM migration and the second VM migration) is a type of data migration between storage areas. Data migration technology (volume migration) between data storage areas (volumes) in a storage apparatus is also known (refer to Patent Literature 3).
-
- [PTL 1] U.S. Pat. No. 7,107,185 (Description)
- [PTL 2] Japanese Patent Application Laid-open No. 2010-86115
- [PTL 3]U.S. Pat. No. 6,108,748 (Description)
- When dealing with a failure identified using Event Correlation technology such as that of
Patent Literature 1, the administrator does not specifically know which method should be implemented to recover from the failure, and recovering from the failure is costly. - In order to deal with a failure identified using the Event Correlation technology, cases can exists in which a variety of plans are capable of being implemented, such as a plan for recovering from the failure cause itself, and a plan that places the highest priority on resuming a business operation, which was affected by the failure cause. For example, in the case of a drop in performance in business responses resulting from a performance failure in a storage port, there is a plan for migrating the business operation to another server (for example, a plan related to the VM migration) so as not to use the failed port, and a plan for replacing the port.
- In the case of an action, which has implemented a plan that does not achieve recovery from the failure cause itself, the administrator does not know what problems remain in what parts of the computer system although the risk of a failure arising from the failure cause could continue to exist even after the plan has been executed. Thus, in a case where the risk of a failure is likely to remain, the administrator could unintentionally select a plan that does not achieve recovery from the failure cause itself (for example, a plan that places the highest priority on resuming business operations).
- A management system manages a computer system comprising multiple monitoring-target devices. A storage device of the management system stores: a general rule, which shows a corresponding relationship between one or more condition events, which are related to any of the multiple monitoring-target devices, and a conclusion event, which is related to any of the multiple monitoring-target devices and constitutes a cause in a case where the one or more condition events have occurred, this general rule moreover denoting a monitoring-target device related to the condition event and the conclusion event according to the relevant monitoring-target device type; general plan information, which shows a corresponding relationship between the general rule and one or more general plans, which are recovery plans available in a case where the conclusion event of the general rule is a cause; unresolved information, which shows, for each combination of the general rule and the general plan, a condition event of the relevant general rule that remains unresolved in a case where the relevant general plan has been implemented; and configuration information, which shows a coupling relationship between the multiple monitoring-target devices, wherein the control device: creates, based on the general rule and the configuration information, multiple expanded rules, which denote, by means of data showing an identified monitoring-target device, the type of monitoring-target device related to the condition event and the conclusion event; performs, based on the created multiple expanded rules in a case where an event related to any of the multiple monitoring-target devices has occurred, a cause analysis by regarding the occurred event as the condition event, and identifies a first conclusion event, which constitutes a candidate for the cause of the occurred event; creates, based on the general plan information, one or more expanded plans, which are recovery plans that can be implemented in a case where the first conclusion event is a cause, and expand, by taking into account the actual configuration of the computer system, a general plan corresponding to the general rule, which constitutes the basis of an expanded rule comprising the first conclusion event; identifies, based on the unresolved information for each of the created one or more expanded plans, an unresolved event, which remains unresolved in a case where the relevant expanded plan has been implemented, and identifies, based on the identified unresolved event, a risk site, which is a monitoring-target device for which a problem remains even after the relevant expanded plan has been implemented; and displays data showing the first conclusion event, the created one or more expanded plans, and the identified risk site.
-
FIG. 1 is a constitutional diagram of an example of a computer system related to Example 1. -
FIG. 2 is a constitutional diagram of an example of a management server related to Example 1. -
FIG. 3 is a constitutional diagram of an example of a storage apparatus related to Example 1. -
FIG. 4 is a constitutional diagram of an example of a physical server related to Example 1. -
FIG. 5 is a constitutional diagram of an example of a configuration information table related to Example 1. -
FIG. 6 is a constitutional diagram of an example of a performance information table related to Example 1. -
FIG. 7 is a constitutional diagram of an example of an event management table related to Example 1. -
FIG. 8 is a constitutional diagram of an example of a general rule table related to Example 1. -
FIG. 9 is a constitutional diagram of an example of a general plan table related to Example 1. -
FIG. 10 is a constitutional diagram of an example of a rule-plan association table related to Example 1. -
FIG. 11 is a constitutional diagram of an example of an expanded rule table related to Example 1. -
FIG. 12A is a first constitutional diagram of an expanded plan table related to Example 1. -
FIG. 12B is a second constitutional diagram of an expanded plan table related to Example 1. -
FIG. 13 is a constitutional diagram of an example of an analysis result management table related to Example 1. -
FIG. 14 is a flowchart of an event analysis process related to Example 1. -
FIG. 15 is a flowchart of a rule expansion process related to Example 1. -
FIG. 16 is a flowchart of a plan creation process related to Example 1. -
FIG. 17 is a flowchart of a post-plan execution risk selection process related to Example 1. -
FIG. 18 is a flowchart of a plan presentation process related to Example 1. -
FIG. 19 is a flowchart of an effect-risk presentation process related to Example 1. -
FIG. 20 is a constitutional diagram of an example of a plan presentation screen related to Example 1. -
FIG. 21 is a constitutional diagram of an example of a plan details screen related to Example 1. -
FIG. 22 is a constitutional diagram of an example of a management server related to Example 2. -
FIG. 23 is a constitutional diagram of an example of a physical server related to Example 2. -
FIG. 24 is a logical constitutional diagram of an example of a physical server related to Example 2. -
FIG. 25 is a constitutional diagram of an example of a switch related to Example 2. -
FIG. 26 is a constitutional diagram of an example of a configuration information table related to Example 2. -
FIG. 27 is a constitutional diagram of an example of a VM configuration information table related to Example 2. -
FIG. 28 is a constitutional diagram of an example of a performance information table related to Example 2. -
FIG. 29 is a constitutional diagram of an example of an event management table related to Example 2. -
FIG. 30A is a first constitutional diagram of a general rule table related to Example 2. -
FIG. 30B is a second constitutional diagram of a general rule table related to Example 2. -
FIG. 31 is a constitutional diagram of an example of a rule-plan association table related to Example 2. -
FIG. 32A is a first constitutional diagram of an expanded rule table related to Example 2. -
FIG. 32B is a second constitutional diagram of an expanded rule table related to Example 2. -
FIG. 33A is a first constitutional diagram of an expanded plan table related to Example 2. -
FIG. 33B is a second constitutional diagram of an expanded plan table related to Example 2. -
FIG. 33C is a third constitutional diagram of an expanded plan table related to Example 2. -
FIG. 34 is a constitutional diagram of an example of an analysis result management table related to Example 2. -
FIG. 35 is a flowchart of a plan presentation process related to Example 2. -
FIG. 36 is a constitutional diagram of an example of a plan presentation screen related to Example 2. -
FIG. 37 is a constitutional diagram of an example of a management server related to Example 3. -
FIG. 38 is a constitutional diagram of an example of a general plan table related to Example 3. -
FIG. 39 is a constitutional diagram of an example of a rule-plan association table related to Example 3. -
FIG. 40A is a first constitutional diagram of an expanded plan table related to Example 3. -
FIG. 40B is a second constitutional diagram of an expanded plan table related to Example 3. -
FIG. 41 is a constitutional diagram of an example of a maintenance information management table related to Example 3. -
FIG. 42 is a flowchart of an effect-risk presentation process related to Example 3. -
FIG. 43 is a constitutional diagram of an example of a plan details screen related to Example 3. - A number of examples will be explained by referring to the drawings. Furthermore, the examples explained below do not limit the invention related to the claims, and not all of the elements and combinations thereof explained in the examples are essential for the solution provided by the invention. In the drawings, the same reference signs denote the same components throughout the multiple drawings. In the following explanation, information on the present invention is explained using the expression “aaa table”, but this information may also be expressed using a data structure other than a table. Thus, to show that the information is not dependent on the data structure, “aaa table” may be called “aaa information”. In addition, when explaining the contents of the respective information, the expressions “identification information”, “identifier”, “name” and “ID” are used, but these expressions are interchangeable.
- In the following explanation, there may be cases where an explanation is given using a “program” as the doer of the action, but since the defined processing is performed in accordance with a program being executed by a processor while using a memory and a communication port (a communication device, a management I/F, and a data I/F), the explanation may also give the processor as the doer of the action. A process, which is disclosed as having a program as the doer of the action, may be regarded as a process performed by a management server or other such computer, and an information processing apparatus. Furthermore, either all or a portion of a program may be realized in accordance with dedicated hardware. Various types of programs may be installed in respective computers using a program delivery server or computer-readable storage media.
- Hereinafter, a set of one or more computers, which manage the computer system and display the display information of the present invention may be called a management system. In a case where a management server displays the display information, the management server is the management system, and a combination of a management server and a display computer is also the management system. Furthermore, processing similar to that of the management server may be realized using multiple computers to increase the speed and reliability of management processing, and in this case, the relevant multiple computers (to include a display computer in a case where a display computers carries out displays) is the management system.
- First, a computer system related to Example 1 will be explained.
-
FIG. 1 is a constitutional diagram of an example of a computer system related to Example 1. - The computer system related to Example 1 comprises one or more management servers 10000 (management server A in
FIG. 1 ), one or more storage apparatuses 20000 (storage apparatus A inFIG. 1 ), and one or more physical servers 30000 (physical server A inFIG. 1 ). Thephysical server 30000 and thestorage apparatus 20000 are coupled together via a SAN (Storage Area network) 40000 (specifically, a fibre channel). Themanagement server 10000, thestorage apparatus 20000, and thephysical server 30000 are coupled together via amanagement network 50000. - The
management server 10000 stores aplan creation program 11100, a post-plan executionrisk selection program 11200, aplan presentation program 11300, a configuration/performance information repository 11800, and a rule-plan information repository 11900 in a memory 11000 (refer toFIG. 2 ). Themanagement server 10000 is able to communicate with a program running on thestorage apparatus 20000 and thephysical server 30000 via themanagement network 50000. - One or more
logical volumes 22100 are prepared in thestorage apparatus 20000. Thelogical volume 22100, for example, is provided to thephysical server 30000. In the example shown inFIG. 1 , the storage apparatus A provides thelogical volume 22100 to the physical server A. - The
physical server 30000 executes various business operations using thelogical volume 22100 provided from thestorage apparatus 20000. In the example shown inFIG. 1 , the physical server A and the storage apparatus A are coupled together via theSAN 40000. - In the example shown in
FIG. 1 , themanagement server 10000 stores programs, such as theplan creation program 11100, the post-plan executionrisk selection program 11200, and theplan presentation program 11300, but the constitution of the computer system is not limited thereto. For example, either thestorage apparatus 20000 or thephysical server 30000 may store a variety of programs, and another apparatus, such as a switch (not shown in the drawing) installed between the respective apparatuses, may also store various programs. The connection between thestorage apparatus 20000 and thephysical server 30000 is not limited to a direct connection via a fibre channel, and thestorage apparatus 20000 and thephysical server 30000 may be coupled via network equipment, such as one or more fibre channel switches. The connection between thestorage apparatus 20000 and thephysical server 30000 is a data communication network, and, for example, may be an IP (Internet Protocol) network. -
FIG. 2 is a constitutional diagram of an example of the management server related to Example 1. - The
management server 10000 comprises amemory 11000, astorage device 12000, aninput device 13000, anoutput device 14000, aprocessor 15000, and acommunication device 16000, and these elements are coupled together via an internal path or othersuch communication channel 17000. - The
memory 11000 stores theplan creation program 11100, the post-plan executionrisk selection program 11200, theplan presentation program 11300, an eventanalysis processing program 11400, arule expansion program 11500, a configurationsetting management program 11600, a performance information collection program 11700, the configuration/performance information repository 11800, and the rule-plan information repository 11900. - A configuration information table 11810 and a performance information table 11820 are stored in the configuration/
performance information repository 11800. An event management table 11910, one or more general rule tables 11920, a general plan table 11930, a rule-plan association table 11940, one or more expanded rule tables 11950, one or more expanded plan tables 11960, and an analysis result management table 11970 are stored in the rule-plan information repository 11900. - The configuration information table 11810 manages information showing an apparatus and a device that exist on an I/O (input/output) route from the
physical server 30000 to a physical disk, which configures thelogical volume 22100 used by thephysical server 30000, that is, information (hereinafter referred to as “configuration information”) showing the coupling relationship of the apparatus and the device based on the I/O route. - The performance information table 11820 manages performance information with respect to monitoring-target apparatuses coupled to the
SAN 40000 and the devices in the monitoring-target apparatuses (monitoring-target devices). - The event management table 11910 manages information showing when an event occurred in relation to what type of metrics for which device in the computer system.
- The general rule table 11920 manages a general rule showing the corresponding relationship between one or more condition events capable of occurring in the computer system and a conclusion event, which is regarded as the cause of a failure with respect to the one or more condition events.
- The general plan table 11930 manages information showing a recovery plan capable of being implemented in the computer system with respect to a failure.
- The rule-plan association table 11940 manages information (general plan information) denoting the corresponding relationship between the general rule and a general plan, which represents a recovery plan corresponding to the relevant general rule, that is, a recovery plan capable of being implemented in a case where the conclusion event of the relevant general rule is the cause of the failure, and information (unresolved information) denoting which of the failure events in the general rule remains unresolved after the respective general plans have been executed.
- The expanded rule table 11950 stores information (an expanded rule), which embodies the information of the general rule table 11920 based on the information of the configuration information table 11810.
- The expanded plan table 11960 manages information (an expanded plan), which embodies the information of the general plan table 11930 based on the information of the configuration information table 11810 and the performance information table 11820.
- The analysis result management table 11970 stores information related to an apparatus and a device in which an event judged to be the cause of a failure has occurred, and a failure event judged to be the cause thereof.
- The
storage device 12000 is an HDD (Hard Disk Drive), SSD (Solid State Drive) or the like for storing information. Theinput device 13000 is a device for the administrator to input an indication to themanagement server 10000, and, for example, is a keyboard. Theoutput device 14000 is a device for outputting the result of processing executed by themanagement server 10000, for example, the result of the execution of theplan presentation program 11300, and, for example, is a display. Theprocessor 15000 executes a program deployed on thememory 11000. Thecommunication device 16000 is a device for coupling to themanagement network 50000. - In the example shown in
FIG. 2 , various programs and tables are stored in thememory 11000, but these programs and tables may be stored in either thestorage device 12000 or another storage medium (not shown in the drawing). In accordance with this, at program execution, theprocessor 15000 reads a target program on thememory 11000, and executes the program that has been read. Also, the above-mentioned programs and tables may be stored in either thememory 21000 of the storage apparatus 20000 (refer toFIG. 3 ) or thememory 31000 of the physical server 30000 (refer toFIG. 4 ), and either thestorage apparatus 20000 or thephysical server 30000 may execute a stored program. Another apparatus, such as anotherphysical server 30000 or a switch (not shown in the drawing) may store the above-mentioned programs and tables, and may execute a stored program. -
FIG. 3 is a constitutional diagram of an example of the storage apparatus related to Example 1. - The
storage apparatus 20000 comprises amemory 21000, a logicalvolume provision part 22000, a disk I/F controller 23000, a management I/F 24000, aprocessor 25000, and a data I/F 26000, and these elements are coupled via acommunication channel 27000, such as an internal bus. - The
memory 21000 comprises adisk cache 21100. Thememory 21000 stores a configuration/performanceinformation collection program 21200. Thedisk cache 21100 is a storage area for temporarily storing information. The configuration/performanceinformation collection program 21200 is for sending/receiving the management information and the performance information of thestorage apparatus 20000 and the like to/from themanagement server 10000. - The logical
volume provision part 22000 comprises adisk pool 22200 configured from a storage area of one or more physical disks (not shown in the drawing), logically partitions the storage area of thedisk pool 22200, and provides the relevant logically partitioned storage areas aslogical volumes 22100. This makes it possible to access alogical volume 22100 from an apparatus outside therelevant storage apparatus 20000. A disk pool number is appended to adisk pool 22200, and a logical volume number is appended to alogical volume 22100. This makes it possible for thestorage apparatus 20000 to uniquely identify eachdisk pool 22200 andlogical volume 22100. - In the example shown in
FIG. 3 , each of two disk pools 22200 (POOL1 and POOL2) is logically partitioned, and four logical volumes 22100 (LV1, LV2, LV3 and LV4) are provided to an apparatus (for example, the physical server 30000) outside thestorage apparatus 20000. The disk I/F controller 23000 is an interface device for coupling to the logicalvolume provision part 22000. The management I/F 24000 is an interface device for coupling to themanagement network 50000. Theprocessor 25000 executes a program deployed on thememory 21000. The data I/F 26000 is an interface device for coupling to theSAN 40000. Furthermore, there may be multiple disk I/F control parts 23000, management I/Fs 24000, and data I/Fs 26000. - In the example shown in
FIG. 3 , thestorage apparatus 20000 comprises two data I/Fs 26000, data I/F (P1) and data I/F (P2). In the example shown inFIG. 3 , the configuration/performanceinformation collection program 21200 is stored in thememory 21000, but thisprogram 21200 may be stored in another storage device (not shown in the drawing) or another storage medium (not shown in the drawing). In accordance with this, at process execution time, theprocessor 25000 reads the configuration/performanceinformation collection program 21200 on thememory 21000, and executes the read configuration/performanceinformation collection program 21200. - The configuration/performance
information collection program 21200 maybe stored in thememory 11000 of themanagement server 10000, and themanagement server 10000 may execute the storedprogram 21200. Also, anotherstorage apparatus 20000 may store the configuration/performanceinformation collection program 21200 and may execute the storedprogram 21200. The logicalvolume provision part 22000 may make all the storage areas on a singlephysical disk 22200 into onelogical volume 22100. The logicalvolume provision part 22000 may also prepare alogical volume 22100 using the storage area of a storage medium other than thephysical disk 22200, for example, a flash memory. -
FIG. 4 is a constitutional diagram of an example of the physical server related to Example 1. - The
physical server 30000 comprises amemory 31000, a data I/F 32000, aprocessor 33000, and a management I/F 34000, and these elements are coupled together via an internal bus or othersuch communication channel 35000. - The
memory 31000 stores a configuration performanceinformation collection program 31100, abusiness program 31200, and avolume management program 31300. - The configuration performance
information collection program 31100 is for sending/receivingphysical server 30000 management information, performance information, and the like to/from themanagement server 10000. Thebusiness program 31200 is for realizing a business operation executed by thephysical server 30000, and, for example, is a DBMS (Data Base Management System) or a file system. Thevolume management program 31300 is for allocating alogical volume 22100 provided by thestorage apparatus 20000 to thephysical server 30000. Thephysical server 30000 executes a business operation using thelogical volume 22100 allocated by thevolume management program 31300. - The data I/
F 32000 is an interface device for coupling to theSAN 40000. Theprocessor 33000 executes a program deployed on thememory 31000. The management I/F 34000 is an interface device for coupling to themanagement network 50000. - There may be multiple data I/
Fs 32000 and management I/Fs 34000. In the example shown inFIG. 4 , a variety of programs are stored in thememory 31000, but these programs maybe stored in another storage apparatus (not shown in the drawing). In accordance with this, at process execution time, theprocessor 33000 reads a target program on thememory 31000, and executes the read program. -
FIG. 5 is a constitutional diagram of an example of the configuration information table related to Example 1. - The configuration information table 11810 stores information related to an I/O route, which extends from the
physical server 30000 to the physical disk configuring thelogical volume 22100 provided to the relevantphysical server 30000, and which thephysical server 30000 uses to access thelogical volume 22100. An entry is added to the configuration information table 11810 in accordance with the configurationsetting management program 11600 being executed. - The configuration information table 11810 comprises the fields of a
physical server 11811, adrive 11812, a server data I/F 11813, astorage 11814, a storage data I/F 11815, alogical volume 11816, and adisk pool 11817. An identifier for uniquely identifying aphysical server 30000 is stored in thephysical server 11811. An identifier for uniquely identifying a mount point of a volume on thephysical server 30000 is stored in thedrive 11812. An identifier for uniquely identifying aphysical server 30000 data I/F 32000 (hereinafter may be referred to as “server data I/F”) used when thephysical server 30000 accesses thelogical volume 22100 shown by alogical volume 11816 identifier is stored in the server data I/F 11813. An identifier for uniquely identifying astorage apparatus 20000 constituting the access destination of thephysical server 30000 is stored in thestorage 11814. An identifier for uniquely identifying astorage apparatus 20000 data I/F 26000 (hereinafter maybe referred to as “storage data I/F”) used when thephysical server 30000 accesses thelogical volume 22100 shown by thelogical volume 11816 identifier is stored in the storage data I/F 11815. An identifier for uniquely identifying alogical volume 22100 is stored in thelogical volume 11816. An identifier for uniquely identifying adisk pool 22200 by which thelogical volume 22100 shown by thelogical volume 11816 identifier is prepared is stored in thedisk pool 11817. - For example, the first entry from the top of
FIG. 5 shows that the logical volume “LV1” created from the disk pool “POOL1” of the storage A (storage apparatus A) is coupled to the physical server A via the storage data I/F “P1” and the server data I/F “S1”, and is recognized as the logical volume “/opt” on the physical server A. - Here, the configuration information table 11810 related to this example comprises information on the
physical server 30000, the server data I/F, thestorage apparatus 20000, the storage data I/F, thelogical volume 22100, and thedisk pool 22200 as the apparatuses and devices that exist on the access route, but the constitution of this table is not limited thereto. For example, the configuration information table 11810 may comprise information on a switch and a switch data I/F, and may associatively store business program (DBMS and so forth) information or VM information on thebusiness server 30000, a snapshot volume holding a VM snapshot, and a clone volume holding a clone. The configuration information table 11810 may hold configuration management operation historical information, and may be linked to a Syslog server and hold detailed log information denoting system operations. -
FIG. 6 is a constitutional diagram of an example of the performance information table related to Example 1. - The performance information table 11820 stores performance information related to either an apparatus or a device inside an apparatus comprising the computer system, for example, performance information related to the
logical volume 22100, thedisk pool 22200, and so forth in eachstorage apparatus 20000. An entry is added to the performance information table 11820 in accordance with the performance information collection program 11700 being executed. - The performance information table 11820 comprises the fields of an
apparatus ID 11821, adevice ID 11822, ametrics 11823, anequipment OS 11824, aperformance value 11825, analert execution threshold 11826, athreshold type 11827, and aStatus 11828. - An identifier (apparatus ID) for uniquely identifying an apparatus is stored in the
apparatus ID 11821. An identifier (device ID) for uniquely identifying a device, which is the performance information acquisition target, is stored in thedevice ID 11822. Information showing a type of performance information, such as a CPU utilization rate, a number of IOs (IOPS) per unit of time (for example, one second) for a storage apparatus, and a response time with respect to a request, is stored in themetrics 11823. Data showing a type of OS (Operating System) running on the apparatus corresponding to the apparatus ID in theapparatus ID 11821 is stored in theequipment OS 11824. A value of the performance information for the type shown by themetrics 11823 for the device shown by thedevice ID 11822 is acquired from the apparatus comprising the device and stored in theperformance value 11825. Either an upper limit or a lower limit threshold for a normal range of management-target performance values (hereinafter referred to as “alert execution threshold”) is specified by the user and stored in thealert execution threshold 11826. Data showing whether the alert execution threshold is the upper limit or the lower limit of the normal values is stored in thethreshold type 11827. Data showing whether theperformance value 11825 is a normal value or an abnormal value is stored in theStatus 11828. - In the performance information table 1820 shown in
FIG. 6 here, only one performance value is supported for one arbitrary metrics for one arbitrary device on one arbitrary apparatus, but the configuration may be such that each piece of information of the performance information table 11820 is stored together with a value showing the time at with the performance information collection program 11700 acquired information held by each apparatus by communicating with the configuration performanceinformation collection programs - The storage data I/F, the
logical volume 22100, thedisk pool 22200, and the mount point which is recognized by thephysical server 30000, was cited as the performance information acquisition-target device which is shown by the device ID in thedevice ID 11822, but the constitution is not limited thereto, and the performance information acquisition-target device may also be a server data I/F, a physical disk, a switch, a switch port, and so forth. - Furthermore, as examples of metrics, a CPU utilization rate, an IOPS, a response time with respect to a request, and so forth were given, but other performance indicators, such as an I/O busy rate, a transfer rate, a throughput, a buffer hit rate and a number of records inserted/updated/deleted in database management software, a Web server response time, a free capacity or utilization rate for a file system or disk, an amount of I/O data, a utilization time, a number of network interface errors, a buffer overflow, and frame errors may also be used.
- As the alert execution threshold stored in the
alert execution threshold 11826, rather than a threshold specified by the user, for example, a value capable of triggering an alert notification, such as a value that differs from a baseline value obtained using an average value of the performance information historical information, may be employed. -
FIG. 7 is a constitutional diagram of an example of the event management table related to Example 1. - The event
analysis processing program 11400 compares a performance value registered in the performance information table 11820 to an alert execution threshold registered in the performance information table 11820, and in a case where the performance value exceeds the alert execution threshold, prepares an entry showing the corresponding event, and registers the prepared entry in the event management table 11910. The eventanalysis processing program 11400 receives an event message showing the occurrence of an event from various apparatuses in the system, and may register an entry showing the event corresponding to the received event message in the event management table 11910. The event management table 11910 is referred to as needed in a rule expansion process (refer toFIG. 15 ). - The event management table 11910 comprises the fields of an
event ID 11911, anapparatus ID 11912, anapparatus component ID 11913, ametrics 11914, anequipment OS 11915, astatus 11916, an analyzedflag 11917, and an occurrence date/time 11918. An event ID, which is the identifier of the event itself, is stored in theevent ID 11911. An apparatus ID, which is the identifier of the apparatus in which the event occurred, is stored in theapparatus ID 11912. An identifier of the device in which the event occurred is stored in theapparatus component ID 11913. The name of a metrics for which a threshold abnormality was detected is stored in themetrics 11914. Data showing the type of OS in the apparatus in which the threshold abnormality was detected is stored in theequipment OS 11915. Data showing a state at event occurrence of a device in which the event occurred is stored in thestatus 11916. Data showing whether or not the event has been analyzed by therule expansion program 11500 is stored in the analyzedflag 11917. Data showing the date and time at which the event occurred is stored in the occurrence date/time 11918. - For example, the first entry from the top of
FIG. 7 shows that themanagement server 10000 detected a threshold abnormality of the processor operating rate in the data I/F “P2” of the storage apparatus A at 15:00:00 hours on 30 Jun. 2012, that the event ID thereof is “EV1”, and that this event has not been analyzed by the rule expansion program 115000. -
FIG. 8 is a constitutional diagram of an example of the general rule table related to Example 1. - One or more general rule tables 11920 are stored in the rule-
plan information repository 11900. In this example, one general rule is defined by one general rule table 11920. The general rule (the same holds true for an expanded rule, which will be explained further below) is data showing the relationship between a combination of one or more condition events capable of occurring in a node apparatus comprising the computer system, and a conclusion event, which is regarded as the failure cause with respect to the one or more condition events. Generally speaking, an event propagation model for identifying the cause in a failure analysis uses an “IF-THEN” format to describe a combination of events, which are expected to occur as the result of a certain failure, and the cause thereof. The general rule is not limited to that given inFIG. 8 , and a larger number of rules are allowed. - The general rule table 11920 comprises a
condition part 11921, aconclusion part 11922, ageneral rule ID 11923, and an appliedtopology 11924. - Data showing an observed event, that is, each of one or more condition events, which is equivalent to the IF part of the general rule described using the “IF-THEN” format, is stored in the
condition part 11921. Thecondition part 11921 comprises the fields of anevent ID 11925, anapparatus type 11926, anapparatus component type 11927, ametrics 11928, and astatus 11929. Data showing a causal event, that is, a conclusion event, which is equivalent to the THEN part of the general rule described using the “IF-THEN” format, is stored in theconclusion part 11922. Theconclusion part 11922 comprises the fields of theapparatus type 11926, theapparatus component type 11927, themetrics 11928, and thestatus 11929. A general rule ID, which is the identifier of the general rule, is stored in thegeneral rule ID 11923. Data showing a topology which is referenced when themanagement server 10000 expands the general rule to an actual system and creates an expanded rule, is stored in the appliedtopology 11924. An identifier (event ID) for uniquely identifying an event (condition event) included in an observed event equivalent to the IF part of the general rule described using the “IF-THEN” format, is stored in theevent ID 11925. Data showing the type of apparatus in which either the condition event or the conclusion event occurs is stored in theapparatus type 11926. Data showing the type of device in which either the condition event or the conclusion event occurs is stored in theapparatus component type 11927. Information showing a type of performance information, such as a CPU utilization rate, a storage apparatus IOPS, and a response time with respect to a request, is stored in themetrics 11928. Here, the same performance information stored in themetrics 11823 of the performance information table 11820, or other performance information may be used as the performance information stored in themetrics 11928. Data showing the state at event occurrence of the device in the apparatus is stored in thestatus 11929. - In a case where one or more condition events described in the
condition part 11921 have been detected, it is determined that the conclusion event described in theconclusion part 11922 is the cause of the failure. In a case where the status of theconclusion part 11922 transitions to normal, that is, a case in which the performance value related to the conclusion event returns to normal, the problem in thecondition part 11921 is also resolved, that is, the performance value related to each condition event can also be expected to return to a normal value. In the example ofFIG. 8 , three events are described in thecondition part 11921, but the number of events is not limited. - For example, the general rule given as an example in
FIG. 8 , that is, the general rule for which the general rule ID is shown as “RULE 1” shows that when a threshold abnormality in the response time of a drive on a host computer, a threshold abnormality in the I/Os per unit of time in thelogical volume 22100 of thestorage apparatus 20000, and a threshold abnormality in the I/Os per unit of time in thedisk pool 22200 of thestorage apparatus 20000 have been detected as observed events, the threshold abnormality in the I/Os per unit of time in thedisk pool 22200 of thestorage apparatus 20000 is concluded to be the cause. The fact that a certain condition is normal may also be defined as an event included in the observed events. Topology information is acquired from the configuration information table 11810 when an expanded rule is created based on this general rule. -
FIG. 9 is a constitutional diagram of an example of the general plan table related to Example 1. - The general plan table 11930 shows a list of available plans in the computer system. The general plan table 11930 comprises the fields of a
general plan ID 11931 and aplan 11932. A general plan ID, which is the identifier of the general plan, is stored in thegeneral plan ID 11931. Information showing the plans, which are available in the computer system, is stored in theplan 11932. As plans, for example, there are reboot host, change switch setting,storage apparatus 20000 volume migration, and VM migration, among others. The plans are not limited to those given inFIG. 9 . -
FIG. 10 is a constitutional diagram of an example of the rule-plan association table related to Example 1. - The rule-plan association table 11940 shows the corresponding relationship between a general rule, a list of plans available in a case where this general rule was applied and the cause of a failure was identified, and an event (hereinafter referred to as “unresolved event”), which remains in an unresolved state in a case where the plans have been executed. The rule-plan association table 11940 comprises a
general rule ID 11941, ageneral plan ID 11942, and anunresolved event ID 11943. A general rule ID, which is the identifier of a general rule, is stored in thegeneral rule ID 11941. The general rule ID stored in thegeneral rule ID 11941 corresponds to the general rule ID stored in thegeneral rule ID 11923 of the general rule table 11920. A general plan ID, which is the identifier of a general plan, is stored in thegeneral plan ID 11942. The general plan ID stored in thegeneral plan ID 11942 corresponds to the general plan ID stored in thegeneral plan ID 11931 of the general plan table 11930. An unresolved event ID, which is the identifier of an event (unresolved event), which remains in an unresolved state in a case where the plans have been executed, is stored in theunresolved event ID 11943. The unresolved event ID corresponds to the identifier (event ID) of a condition event stored in theevent ID 11925 of the general rule table 11920. In theunresolved event ID 11943, for example, in a case where an unresolved event does not exist, “NONE” is stored, and in a case where all of the condition events remain as unresolved events, “ALL” is stored. -
FIG. 11 is a constitutional diagram of an example of the expanded rule table related to Example 1. - One or more expanded rule tables 11950 are stored in the rule-
plan information repository 11900. In this example, one expanded rule is defined by one expanded rule table 11950. The expanded rule is data, which expands the general rule in a format that depends on the actual configuration of the computer system. The expanded rule shown inFIG. 11 is created in accordance with replacing the values of theapparatus type 11926 and theapparatus component type 11927 in the general rule shown inFIG. 8 with the identifier (apparatus ID) of the identified apparatus and the identifier (device ID) of the identified device defined in the configuration information table 11810. - The expanded rule table 11950 comprises the fields of a
condition part 11951, aconclusion part 11952, an expandedrule ID 11953, and a pre-expansiongeneral rule ID 11954. - Data showing an observed event, that is, each of one or more condition events, which is equivalent to the IF part of the expanded rule described using the “IF-THEN” format, is stored in a
condition part 11951. Thecondition part 11951 comprises the fields of anevent ID 11955, anapparatus ID 11956, anapparatus component ID 11957, ametrics 11958, and astatus 11959. Data showing a causal event, that is, a conclusion event, which is equivalent to the THEN part of the expanded rule described using the “IF-THEN” format, is stored in aconclusion part 11952. Theconclusion part 11952 comprises the fields of theapparatus ID 11956, theapparatus component ID 11957, themetrics 11958, and thestatus 11959. An expanded rule ID, which is the identifier of the expanded rule, is stored in the expandedrule ID 11953. The general rule ID of the general rule, which constitutes the basis of the expanded rule, is stored in the pre-expansiongeneral rule ID 11954. An identifier for uniquely identifying a condition event included in an observed event equivalent to the IF part of the expanded rule described using the “IF-THEN” format, is stored in theevent ID 11955. An ID (apparatus ID) of an apparatus in which either the condition event or the conclusion event occurs is stored in theapparatus ID 11956. An ID (device ID) of the device in which either the condition event or the conclusion event occurs is stored in theapparatus component ID 11957. Information showing a type of performance information, such as a CPU utilization rate, a storage apparatus IOPS, and a response time with respect to a request, is stored in themetrics 11958. The same performance information stored in themetrics 11823 of the performance information table 11820, or other performance information may be used as the performance information stored in themetrics 11958. Data showing the state at event occurrence of the device in the apparatus is stored in thestatus 11959. - The expanded rule is created in accordance with taking into account the actual configuration of the computer system (for example, the coupling relationship shown in the configuration information table 11810) and embodying the apparatus type and the device type related to the condition event and the conclusion event in the identified apparatus and the identified device in the actual configuration of the computer system.
- For example, the expanded rule, for which the expanded rule ID is shown as “ExRule 1-1”, given as an example in
FIG. 11 is created in accordance with replacing the values of theapparatus type 11926 and theapparatus component type 11927 in the general rule “Rule 1” shown inFIG. 8 with the identifiers of the identified apparatuses (physical server A and storage apparatus A) and the identifiers of the identified devices (drive “/var”, logical volume “LV1”, and disk pool “POOL1”) defined in the configuration information table 11810. - As is clear from the expanded rule “ExRule 1-1” shown in the expanded rule table 11950 of
FIG. 11 , the expanded rule “ExRule 1-1” is expanded on the basis of the general rule “Rule 1”, and shows that when a threshold abnormality in the response time of the logical volume “/var” of the physical server A, a threshold abnormality in the response time of the logical volume “LV2” of the storage apparatus A, and a threshold abnormality in the response time of the disk pool “POOL1” of the storage apparatus A have been detected as observed events, the threshold abnormality in the response time of the disk pool “POOL1” of the storage apparatus A is concluded to be the cause. -
FIG. 12A is a first constitutional diagram of the expanded plan table related to Example 1.FIG. 12B is a second constitutional diagram of the expanded plan table related to Example 1. - One or more expanded plan tables 11960 are stored in the rule-
plan information repository 11900. In this example, one or more expanded plans are defined by one expanded plan table 11960. The expanded plan is information for expanding a general plan in a format, which relies on the actual configuration of the computer system. The expanded plan table 11960 is created by theplan creation program 11100 based on the general plan table 11930, the expanded rule table 11950, the configuration information table 11810, and the performance information table 11820. - The expanded plan table 11960 comprises the fields of a plan details 11961, a
general plan ID 11962, and an expandedrule ID 1196A. The general plan ID of the general plan, which constitutes the basis of the expanded plan, is stored in thegeneral plan ID 11962. The expanded rule ID of the expanded rule corresponding to the expanded plan is stored in the expandedrule ID 1196A as information for identifying the failure cause for which the expanded plan is implemented. - Specific processing contents for each of one or more expanded plans, which have been expanded, and information on the post-execution state of the expanded plan are stored in the plan details 11961. The plan details 11961 comprises the fields of an expanded
plan ID 11963, aplan target 11964, and arisk site 11969. An expanded plan ID, which is the identifier of the expanded plan, is stored in the expandedplan ID 11963. Data showing a problem site (hereinafter referred to as “risk site”), which latently remains as-is even after the plan has been executed, is stored in therisk site 11969. - In the
plan target 11964, for example, there is stored information showing the configuration element (device) related to the plan, and post-plan execution information. The post-plan execution information here comprises a predicted value for a post-plan execution performance value related to the device, which is related to the plan. The predicted value of the post-plan execution performance value, for example, is calculated in accordance with theplan creation program 11100 referencing the performance information table 11820 to simulate the situation subsequent to plan execution. The fields included in theplan target 11964 will differ in accordance with the plan contents. - The expanded plan defined by the expanded plan table 11960 of
FIG. 12A is an expanded plan based on the general plan “Plan 1”, that is, an expanded plan related to a volume migration. In the case of an expanded plan related to a volume migration, theplan target 11964, for example, comprises the fields of a migration-target volume 11965, a migration-source pool 11966, and a migration-destination pool 11967. The migration-target volume 11965 comprises avolume ID 11965A in which is stored an identifier of the logical volume 22100 (hereinafter referred to as “migration-target volume”), which constitutes the volume migration target, and an I/OResponse Time prediction 11965B in which is stored a predicted value for the I/O response time with respect to the migration-target volume subsequent to volume migration execution. The migration-source pool 11966 comprises apool ID 11966A in which is stored an identifier of thedisk pool 22200 to which the migration-target volume belongs (that is, the migration-source disk pool 22200, hereinafter referred to as “migration-source pool”), and an I/OResponse Time prediction 11966B in which is stored a predicted value for the I/O response time with respect to the migration-source pool subsequent to volume migration execution. The migration-destination pool 11967 comprises apool ID 11967A in which is stored an identifier of thedisk pool 22200, which constitutes the migration destination of the migration-target volume (hereinafter referred to as “migration-destination pool”), and an I/OResponse Time prediction 11967B in which is stored a predicted value for the I/O response time with respect to the migration-destination pool subsequent to volume migration execution. - The
plan creation program 11100 acquire information from the configuration information table 11810, and stores the information in thevolume ID 11965A, thepool ID 11966A, and thepool ID 11967A. Further, an arbitrary method may be used as the method for calculating the values stored in the respective I/OResponse Time predictions Response Time predictions plan creation program 11100 acquiring the I/Os per unit of time of the migration-target volume, the migration-source pool, and the migration-destination pool from the performance information table 11820, subtracting the value of the I/Os per unit of time of the migration-target volume from the I/Os per unit of time of the migration-source pool, adding the value of the I/Os per unit of time of the migration-target volume to the I/Os per unit of time of the migration-destination pool, predicting the I/Os of the migration-source pool and the migration-destination pool subsequent to volume migration execution, and adopting a value (I/O response time) obtained by inversing thereof. - The expanded plan defined in accordance with the expanded plan table 11960 of
FIG. 12B is an expanded plan based on the general plan “Plan 5”, that is, the expanded plan related to the addition of a disk to the pool. In the case of an expanded plan related to the addition of a disk to the pool, theplan target 11964, for example, comprise the fields of apool ID 11968A in which is stored the identifier of thedisk pool 22200, which constitutes the destination of the disk (hereinafter referred to as “add-target pool”), anadditional disk 11968B in which is stored the identifier of the disk to be added, and an I/OResponse Time prediction 11968C in which is stored a predicted value for the I/O response time with respect to the add-target pool. - The
plan creation program 11100 acquired information from the configuration information table 11810 and stores the information in thepool ID 11968A and theadditional disk 11968B. Further, an arbitrary method may be used as the method for calculating the value (a predicted value of I/O response time with respect to add-target pool) of the I/OResponse Time prediction 11968C. For example, theplan creation program 11100 may calculate the I/O response time of the add-target pool subsequent to disk addition by acquiring the value of the I/Os per unit of time of the add-target pool, the value of the capacity of the add-target pool, and the value of the capacity of the disk to be added from the performance information table 11820, and assuming that the I/Os per unit of time are distributed in proportion to the capacity values, multiplying a value obtained by dividing the capacity of the add-target pool prior to disk addition by the capacity of the add-target pool subsequent to disk addition by the I/O response time of the add-target pool prior to disk addition. In the example ofFIG. 12B , there is described an example in which performance information is stored as the contents of the plan details 11961, but cost information related to the plan, or information on system downtime resulting from a failure when the plan is executed may be stored as the contents of the plan details 11961. - In
FIG. 12 , there is described an example of an expanded plan related to a volume migration, and an expanded plan related to the addition of a disk to a pool, but an expanded plan corresponding to each general plan of the other general plans included in the general plan table 11930 is created in the same way. In a case where another general plans is to be expanded into an expanded plan, theplan creation program 11100, for example, computes a predicted value for a post-plan execution performance value related a device, which is related to the plan, by referencing the configuration information table 11810, listing post-processing execution configuration information candidates, referencing the performance information table 11820, and simulating post-plan execution status information, such as performance information, capacity information, cost information, and downtime information. -
FIG. 13 is a constitutional diagram of an example of the analysis result management table related to Example 1. - The analysis result management table 11970 comprises the fields of a
cause apparatus ID 11971, acause component ID 11972, ametrics 11973, acertainty factor 11974, an expandedrule ID 11975, and a receivedevent ID 11976. The identifier (apparatus ID) of an apparatus related to an event, which has been judged as the cause of a failure in a failure cause analysis process, is stored in thecause apparatus ID 11971. The identifier (device ID) of a device related to the event judged to be the cause of the failure is stored in thecause component ID 11972. Information showing a type of performance information, such as a CPU utilization rate and a response time with respect to a request, for which a threshold abnormality has been detected, is stored in themetrics 11973. In other words, in themetrics 11973 data showing the type of performance information related to an event judged to be the cause of a failure is stored. A value (certainty factor) showing the probability of the event judged to be the cause of the failure being the root cause is stored in thecertainty factor 11974. In this example, an incidence of the condition event, for example, serves as the certainty factor. The expanded rule ID of the expanded rule, which constitutes the basis upon which the event was judged to be the cause of the failure, is stored in the expandedrule ID 11975. The event ID of the event, which actually occurred from among the condition events, is stored in the receivedevent ID 11976. - For example, the first entry from the top of
FIG. 13 shows that, based on the expanded rule “ExRule 1-1”, themanagement server 10000 has judged that the response time threshold abnormality in the disk pool “POOL1” of the storage apparatus A is the failure cause, that the occurrence of events, which are shown with the event IDs “EV 2” “EV 3” and “EV 5”, is the basis of the judgment, and that the certainty factor, that is, the incidence of the condition event is 3/3. This entry, for example, shows a case in which, in a case where it is assumed that the performance of the disk comprising the disk pool “POOL1” has slowed, the performance of the disk pool “POOL1” has deteriorated, and the performance of the logical volume “LV2” has slowed, an event has not occurred based on the logical volume “LV1” or the drive “/opt” since there is no sending/receiving of I/Os to/from the drive “/opt” of the physical server A or the logical volume “LV1” of the storage apparatus A. - Next, the respective processes executed by the
management server 10000 will be explained. First, a configuration information acquisition process executed by themanagement server 10000 will be explained. The configuration information acquisition process is executed in accordance with themanagement server 10000processor 15000 executing the configurationsetting management program 11600 deployed on thememory 11000. First, the configuration setting management program. 11600 communicates with a program, which collects information on the respective apparatuses coupled to the SAN 40000 (in this example, the configuration/performanceinformation collection program 21200 of thestorage apparatus 20000 and the configuration/performanceinformation collection program 31100 of the physical server 30000), and acquires the configuration information held by the apparatuses. - Here, the
management server 10000, for example, may use an SCSI (Small Computer System Interface) Inquiry command to acquire the coupling relationship between thephysical server 30000, the drive on thephysical server 30000, the server data I/F, thestorage apparatus 20000, the storage data I/F, thelogical volume 22100 accessed by thephysical server 30000, and thedisk pool 22200 to which thelogical volume 22100 belongs. Also, thephysical server 30000 may acquire configuration information from thestorage apparatus 20000 in accordance with thephysical server 30000 issuing an SCSI Inquiry command to thestorage apparatus 20000. The configurationinformation collection program 31100 of thephysical server 30000 may acquire configuration information related to thephysical server 30000 using any method, such as, for example, a request to the OS. The configurationinformation collection program 21200 on thestorage apparatus 20000 may acquire configuration information related to thestorage apparatus 20000 using any method, such as, for example, acquiring the information from thememory 21000. Next, the configurationsetting management program 11600 stores the acquired configuration information in the configuration information table 11810, and ends the configuration information acquisition process. - Next, a performance information collection process executed by the
management server 10000 will be explained. The performance information collection process is executed in accordance with themanagement server 10000processor 15000 executing the performance information collection program 11700 deployed on thememory 11000. First, the performance information collection program 11700 communicates with a program, which collects information on the respective apparatuses coupled to the SAN 40000 (in this example, the configuration/performanceinformation collection program 21200 of thestorage apparatus 20000 and the configuration/performanceinformation collection program 31100 of the physical server 30000), and acquires the performance information held by the apparatuses. - Here, the
management server 10000, for example, may use the SCSI Inquiry command to acquire the respective data registered in the performance information table 11820, that is, the data of theapparatus ID 11821, thedevice ID 11822, themetrics 11823, theequipment OS 11824, theperformance value 11825, thealert execution threshold 11826, thethreshold type 11827, and theStatus 11828. Also, thephysical server 30000 may acquire performance information from thestorage apparatus 20000 in accordance with thephysical server 30000 issuing an SCSI Inquiry command to thestorage apparatus 20000. The performance information collection program 11700 of thephysical server 30000 may acquire performance information related to thephysical server 30000 using any method, such as, for example, a request to the OS. The configuration/performance information collection program. 21200 on thestorage apparatus 20000 may acquire performance information related to thestorage apparatus 20000 using any method, such as, for example, acquiring the information from thememory 21000. Next, the performance information collection program 11700 stores the acquired performance information in the performance information table 11820, and ends the performance information collection process. -
FIG. 14 is a flowchart of an event analysis process related to Example 1. - The event analysis process is executed in accordance with the
management server 10000processor 15000 executing the eventanalysis processing program 11400 deployed on thememory 11000. - First, the event
analysis processing program 11400 references the performance information table 11820, and acquires the performance values and alert execution threshold for each device (Step 1001). Next, the eventanalysis processing program 11400 carries out the processing of the followingSteps 1002 through 1004 for the acquired performance values. - The event
analysis processing program 11400 checks whether or not the processing-target performance value exceeds the alert execution threshold corresponding to the processing-target performance value, and, based on the result, updates the value of theStatus 11828 of the performance information table 11820 (Step 1002). Specifically, in a case where the processing-target performance value exceeds the alert execution threshold, the eventanalysis processing program 11400 stores “threshold abnormality” in theStatus 11828, and in a case where the processing-target performance value does not exceed the alert execution threshold, stores “normal” in theStatus 11828. - Next, the event
analysis processing program 11400 determines whether or not the value of theStatus 11828 changed before or after the update of Step 1002 (Step 1003). In a case where a change occurred (Step 1003: Yes), the eventanalysis processing program 11400 registers an entry related to the event showing that the processing-target performance value exceeds the alert execution threshold in the event management table 11910 (Step 1004), and advances to the next step. At this time, the date and time at which the performance information collection program 11700 collected the processing-target performance value is stored in the entry occurrence date/time 11918. Alternatively, in a case where the value of theStatus 11828 has not changed (Step 1003: No), the eventanalysis processing program 11400 advances to the next step without executingStep 1004. - After the processing (
Steps 1002 through 1004) has been completed for all the performance values, the eventanalysis processing program 11400 determines whether or not there exists an entry related to an event, which has been newly registered in the event management table 11910 (Step 1005), and in a case where an entry related to a newly registered event exists (Step 1005: Yes), instructs therule expansion program 11500 to execute a rule expansion process (refer toFIG. 15 ) (Step 1006), and ends the event analysis process. Alternatively, in a case where an entry related to a newly registered event does not exist (Step 1005: No), the eventanalysis processing program 11400 ends the event analysis process. - Here, in this example, the event
analysis processing program 11400 registers an entry related to an event in the event management table 11910 based on performance information acquired by the performance collection program 11700, but may update the information in the event management table 11910 using the notified information, such as a SNMP (Simple Network Management Protocol) trap or the like, which is notified from an apparatus. -
FIG. 15 is a flowchart of the rule expansion process related to Example 1. - The rule expansion process is executed in accordance with the
management server 10000processor 15000 executing therule expansion program 11500 deployed on thememory 11000. - First, the
rule expansion program 11500 acquires an entry (event entry) related to a newly registered event from the event management table 11910, and updates the analyzedflag 11917 of the acquired event entry to “Yes” (Step 2001). Next, therule expansion program 11500 acquires one or more general rules shown by one or more general rule tables 11920, which are being stored in the rule-plan information repository 11900 (Step 2002). Therule expansion program 11500 performs the processing ofSteps 2004 through 2010 below for each acquired general rule (Step 2003). Therule expansion program 11500 also performs the processing ofSteps 2005 through 2010 below for each event entry acquired inStep 2001. - The
rule expansion program 11500 determines whether or not the event (processing-target event) shown in the processing-target event entry matches any of the condition events in the processing-target general rule (Step 2005). For example, therule expansion program 11500 determines that the processing-target event and a condition event match in a case where the apparatus and device related to the processing-target event correspond to the types of the apparatus and device related to the condition event, and, in addition, the type of the processing-target event matches the type of the condition event. - In a case where it has been determined that there is no match in Step 2005 (Step 2005: No), the
rule expansion program 11500 executes the processing (Steps 2004 through 2010) by making the next event entry the processing target. Alternatively, in a case where it has been determined that there is a match in Step 2005 (Step 2005: Yes), therule expansion program 11500 acquires from the configuration information table 11810 the identifiers of the apparatus and device having the association shown in the processing-target general rule with the apparatus and device related to the processing-target event (Step 2006). Next, therule expansion program 11500 prepares an expanded rule based on the processing-target general rule for each combination having the association (Step 2007). - For example, in a case where the EV5 in the event management table 11910 is regarded as the processing-target event, and the general rule “
Rule 1” is regarded as the processing-target general rule, the processing-target event matches the condition event 1 (the condition event for which theevent ID 11925 is “1”) of the general rule “Rule 1”. Therefore, therule expansion program 11500 lists all of thelogical volume 22100 anddisk pool 22200 of thestorage apparatus 20000, which are the apparatus and device having the association described in the general rule “Rule 1” (a coupling relationship exists between the components) with the physical server A drive “/var”, which is the site where the processing-target event occurred. Then, therule expansion program 11500 prepares an expanded rule, which corresponds to the combination of the physical server A drive “/var”, thelogical volume 22100 of thestorage apparatus 20000, and thedisk pool 22200 of thestorage apparatus 20000. - Next, the
rule expansion program 11500 determines whether or not an expanded rule table 11950, which shows the same expanded rule as the prepared expanded rule, already exists in the rule-plan information repository 11900 (Step 2008). - In a case where an expanded rule table 11950 showing the same expanded rule as the prepared expanded rule does not exist (Step 2008: No), the
rule expansion program 11500 stores the expanded rule table 11950 showing the prepared expanded rule in the rule-plan information repository 11900, and registers a new entry in the analysis result management table 11970 (Step 2009). Alternatively, in a case where an expanded rule table 11950 showing the same expanded rule as the prepared expanded rule exists (Step 2008: Yes), therule expansion program 11500 adds the event ID of the processing-target event in the receivedevent ID 11976 and changes the value of thecertainty factor 11974 of the analysis result management table 11970 (Step 2010). - After the processing (
Steps 2005 through 2010) for all the entries in the general rule has been completed, therule expansion program 11500 executes the processing (Steps 2004 through 2010) by making the next general rule the processing target. Then, after the processing (Steps 2004 through 2010) for all the general rules has been completed, therule expansion program 11500 determines whether or not a newly registered entry exists in the analysis result management table 11970 (Step 2011), and in a case where a newly registered entry exists (Step 2011: Yes), instructs theplan creation program 11100 to execute a plan creation process (refer toFIG. 16 ) (Step S2012), and ends the rule expansion process. Alternatively, in a case where a newly registered entry does not exist (S2011: No), therule expansion program 11500 ends the rule expansion process. -
FIG. 16 is a flowchart of the plan creation process related to Example 1. - The plan creation process is executed in accordance with the
management server 10000processor 15000 executing theplan creation program 11100 deployed on thememory 11000. - First, the
plan creation program 11100 acquires a newly registered entry related to analysis result (analysis result entry) from the analysis result management table 11970 (Step 3001). Theplan creation program 11100 implements the processing of Steps 3003 through 3008 below for each acquired analysis result entry (Step 3002). - The
plan creation program 11100 acquires the expanded rule ID stored in the expandedrule ID 11975 of the processing-target analysis result entry in the analysis result management table 11970. Hereinafter, the expanded rule having the acquired expanded rule ID will be called “processing-target expanded rule”. Then, theplan creation program 11100 acquires the general rule ID of the general rule, which constitutes the basis of the processing-target expanded rule, from the pre-expansiongeneral rule ID 11954 of the expanded rule table 11950 showing the processing-target expanded rule (Step 3003). - Next, the
plan creation program 11100 acquires from the rule-plan association table 11940 one or more general plan IDs corresponding to the general rule ID acquired in Step 3003. Theplan creation program 11100 also acquires from the rule-plan association table 11940 an unresolved event ID corresponding to a combination of the acquired general plan ID and the general rule ID acquired in Step 3003 (Step 3004). - Next, the
plan creation program 11100 references the configuration information table 11810 and the general plan table 11930, and creates an expanded plan which is based on a general plan having the general plan ID acquired inStep 3004 and which corresponds to the processing-target expanded rule, and stores the expanded plan table 11960 showing the created expanded plan in the rule-plan information repository 11900 (Step 3005). For example, in a case where a volume migration general plan is to be expanded, theplan creation program 11100 references the configuration information table 11810 and identifies all the disk pools 22200 that could be the migration-destination pool. For example, theplan creation program 11100, based on the configuration information table 11810, identifies adisk pool 22200, which differs from the migration-source pool and can be accessed by thephysical server 10000, which had been accessing the migration-target volume, and regards the identifieddisk pool 22200 as the migration-destination pool. - Next, the
plan creation program 11100 repeatedly executes the processing ofSteps plan creation program 11100 references the performance information table 11820, calculates a predicted value for the post-plan execution performance value in accordance with simulating a post-plan execution state, and, based on the simulated result information, updates the value of theplan target 11964 in the expanded plan table 11960 showing the processing-target expanded plan (Step 3007). - Next, the
plan creation program 11100 instructs the post-plan executionrisk selection program 11200 to execute a post-plan execution risk selection process (refer toFIG. 17 ) for the processing-target expanded plan (Step 3008). At this time, theplan creation program 11100 inputs to the post-plan executionrisk selection program 11200 an unresolved event ID of an unresolved event corresponding to the processing-target expanded plan, that is, an unresolved event, which corresponds to a combination of the general rule constituting the basis of the processing-target expanded rule and the general plan constituting the basis of the processing-target expanded plan. A risk site with respect to the processing-target expanded plan is identified in accordance with the post-plan execution risk selection process. - The
plan creation program 11100, after completing the processing (Steps 3003 through 3008) for all the acquired analysis result entries, instructs the planpresentation processing program 11300 to execute a plan presentation process (refer toFIG. 18 ) (Step 3009). Thereafter, theplan creation program 11100 ends the plan creation process. - This example takes up performance information, in particular, a predicted value for an I/O response time, and gives an example of a simulation method, but a value other than a performance value may be used as the value to be stored in the expanded plan table 11960 as long as the value is capable of serving as an index, which demotes the characteristics of the plan. The
management server 10000, for example, may perform the same simulation as that for a performance value by storing cost information pertaining to plan execution and time information pertaining to plan execution in the configuration information table 11810 or the performance information table 11820. -
FIG. 17 is a flowchart of the post-plan execution risk selection process related to Example 1. - The post-plan execution risk selection process is executed in accordance with the
management server 10000processor 15000 executing the post-plan executionrisk selection program 11200 deployed on thememory 11000. - First, the post-plan execution
risk selection program 11200 uses an unresolved event ID received from theplan creation program 11100 to select an unresolvable event from among the actually occurred condition events registered in the receivedevent ID 11976 of the processing-target analysis result entry in the analysis result management table 11970 (Step 4001). As used here, unresolvable event refers to an event, which corresponds to the condition event shown by the unresolved event ID, from among the condition events that have actually occurred. - For example, in a case where the first entry from the top of the analysis result management table 11970 of
FIG. 13 (the entry in which the disk pool “POOL1” of the storage apparatus A is the failure cause) is selected as the processing-target analysis result entry inStep 3002 ofFIG. 16 , and the expanded plan “ExPlan 1-1” is selected as the processing-target expanded plan inStep 3006, the processing-target expanded rule, that is, the expanded rule shown by the expandedrule ID 11975 of the processing-target analysis result entry is the expanded rule “ExRule 1-1”, and the general rule, which constitutes the basis of the processing-target expanded rule, is the general rule “Rule 1”. Therefore, the unresolved event corresponding to the processing-target expanded plan “ExPlan 1-1” is the unresolved event corresponding to the combination of the general plan “Plan 1”, which constitutes the basis of the expanded plan “ExPlan 1-1”, and the general rule “Rule 1”, and the unresolved event “3” is acquired from the rule-plan association table 11940 ofFIG. 10 . The unresolved event “3” shows thecondition event 3 of the general rule “Rule 1”, that is, the response time threshold abnormality event in thedisk pool 22200 of thestorage apparatus 20000. Therefore, the event “EV3”, which corresponds to the response time threshold abnormality event in thedisk pool 22200 of thestorage apparatus 20000, is selected as the unresolvable event from among the events (event “EV2”, event “EV3”, and event “EV5”) registered in the receivedevent ID 11976 of the processing-target analysis result entry. - Next, the post-plan execution
risk selection program 11200 references the event management table 11910 and the expanded rule table 11950, and identifies the occurrence sites (the occurrence source apparatus and device) of the unresolvable event selected in Step 4001 (Step 4002). Next, the post-plan executionrisk selection program 11200 references the configuration information table 11810, and selects anyone or more of the unresolvable event occurrence sites and the sites (apparatus and device) associated on the I/O path with the unresolvable event occurrence sites as risk sites (Step 4003). - In a case where risk sites have been selected in Step 4003 (Step 4004: Yes), the post-plan execution
risk selection program 11200 stores data showing the selected risk sites in therisk site 11969 of the processing-target expanded plan in the expanded plan table 11969 (Step 4005), and ends the post-plan execution risk selection process. Alternatively, in a case where a risk site was not selected in Step 4003 (Step 4004: No), the post-plan executionrisk selection program 11200 ends the post-plan execution risk selection process. - In this example, only business side information, which is most important for the administrator, that is, only information of the volume “/opt” of the physical server A is selected as the risk site (for example, refer to
FIG. 12 ), but other sites on the I/O path shown by the entry of the configuration information table 11810, for example, the server data I/F and the logical volume of thestorage apparatus 20000 may also be selected as the risk site. -
FIG. 18 is a flowchart of the plan presentation process related to Example 1. - The plan presentation process is executed in accordance with the
management server 10000processor 15000 executing theplan presentation program 11300 deployed on thememory 11000. - First, the
plan presentation program 11300 acquires from the analysis result management table 11970 information showing a failure cause, that is, the values of thecause apparatus ID 11971, thecause component ID 11972, themetrics 11973, and the certainty factor 11974 (Step 5001). - Next, the
plan presentation program 11300 implements the processing ofStep 5002 below with respect to each analysis result entry of the analysis result management table 11970. Here, the expanded rule having the expanded rule ID stored in the expandedrule ID 11975 of the processing-target analysis result entry will be referred to as “processing-target expanded rule”. - The
plan presentation program 11300 acquires from the rule-plan information repository 11900 one or more expanded plan tables 11960, which show one or more expanded plans corresponding to the processing-target expanded rule (plans, which are candidates for failure recovery) (Step 5002). - After completing the processing for all the analysis result entries (Step 5002), the
plan presentation program 11300 creates a plan presentation screen (refer toFIG. 20 ) based on information showing the failure cause and the certainty factor acquired inStep 5001 and the expanded plan table 11960 acquired inStep 5002, and causes theoutput device 14000 to display the created plan presentation screen (Step 5003). Thereafter, theplan presentation program 11300 ends the plan presentation process. -
FIG. 19 is a flowchart of an effect-risk presentation process related to Example 1. - The plan presentation process is executed and the plan presentation screen is displayed in accordance with the
management server 10000processor 15000 executing theplan presentation program 11300 deployed on thememory 11000. - The
plan presentation program 11300 starts the effect-risk presentation process when a desired expanded plan is selected and an input requesting the display of the plan details screen with respect to the selected expanded plan is received in the plan presentation screen (Step 6001). - First, the
plan presentation program 11300 references the configuration information table 11810 and acquires risk site status information, performance information, and setting information (Step 6002). Next, theplan presentation program 11300 references the expanded plan table 11960 and the analysis result management table 11970, and acquires information showing which event of the condition events included in the expanded rule corresponding to the selected expanded plan occurred, and information showing which event will be resolved in a case where the selected expanded plan is implemented (Step 6003). Next, theplan presentation program 11300 selects I/O path information associated with the selected expanded plan (Step 6004). - Thereafter, the
plan presentation program 11300 creates a plan details screen (refer toFIG. 21 ) based on the information acquired inSteps 6002 through 6004, and causes theoutput device 14000 to display the created plan details screen (Step 6005). Thereafter, theplan presentation program 11300 ends the effect-risk presentation process. -
FIG. 20 is a constitutional diagram of an example of the plan presentation screen related to Example 1. - The
plan presentation screen 9000 comprises adisplay area 9001 for displaying information, which is referenced when, in a case where a failure has occurred in the computer system, the administrator is investigating the cause of the failure and implementing countermeasures, specifically, information showing the corresponding relationship between a failure cause and a list of plans available with respect to the failure, a plan detailsbutton 9002 for displaying the details of the plans, and an executeplan button 9003 for executing a plan. - As information showing the failure cause, for example, the ID of the apparatus related to the failure cause event, the ID of the device related to the failure cause event, the type of the failure cause event, and the certainty factor of the failure cause, that is, the percentage of the number of condition events, which have actually occurred relative to the total number of condition events, are displayed in the
display area 9001, which displays the association between the failure cause and a plan for countering the failure. Theplan presentation program 11300 acquires these values from the analysis result management table 11970 shown inFIG. 13 inStep 5001 ofFIG. 18 . - As information about the plan with respect to the failure, for example, the
display area 9001 displays information showing the contents of candidate plans, the costs pertaining to plan execution, the time required for plan execution (that is, the time during which the failure will remain, hereinafter referred to as “downtime”), post-plan execution performance information, and information showing a risk site. The information showing a risk site, for example, comprises data (for example, the name of the risk site) showing the risk site stored in therisk site 11969 of the expanded plan table 11960, and data (for example, the name of the apparatus) showing the apparatus, which comprises the device regarded as the risk site. - Regarding the cost information pertaining to the plan execution, for example, the
management server 10000 may prestore the approximate cost pertaining to each general plan in the general plan table 11930 shown inFIG. 9 , and may determine the cost based on this information. For example, with respect to the plan for replacing the storage port of general plan “Plan 8”, themanagement server 10000 may store a sum obtained by adding up the cost of purchasing a storage port and the labor costs for the maintenance personnel needed to replace the storage port, and may display this sum as the cost. With respect to the volume migration plan of general plan “Plan 1”, themanagement server 10000 may store the cost per bit in a case where data is migrated from a certain storage device to a different storage device, and may calculate the cost displayed in the “Cost ($)” field ofFIG. 20 in accordance with the capacity of the volume to be migrated. - Regarding downtime, for example, in the case of the volume migration plan, the
management server 10000 may prestore data showing about how much data can be migrated per unit of time (for example, one second) for each combination of the media type of both the migration source and the migration destination storage devices and a RAID level, and may calculate the downtime displayed in the “Downtime” field ofFIG. 20 in accordance with the capacity of the volume to be migrated. At this time, themanagement server 10000 may take into account the fact that the migration time will fluctuate in accordance with the utilization state in the actual environment, and, using past migration history information, may calculate the capacity capable of being migrated per unit of time, and may seek to revise the prestored information by averaging the prestored information and the historical information. One example of a method for determining the cost information and downtime pertaining to plan execution has been given here, but another determination method may be employed. - As the post-plan execution performance information, for example, a predicted value for the post-plan execution performance value, which was simulated in
Step 3007 of the plan creation process shown inFIG. 16 and stored in theplan target 11964 of the expanded plan table 11960 shown inFIG. 12 , and, for example, the values stored in the I/OResponse Time predictions FIG. 20 , a predicted value for the I/O response time with respect to the migration-target volume is displayed for the first plan from the top (the volume migration plan), and a predicted value for the I/O response time with respect to the add-target pool is displayed for the fifth plan from the top (the add disk to pool plan), but other values, for example, a predicted value for the I/O response time with respect to the migration-source pool and the migration-destination pool may be displayed for the first plan from the top (the volume migration plan), and other performance values may also be displayed. The predicted value for multiple performance values may also be displayed. - The order in which the candidate plans are displayed here may be sorted on the basis of the characteristics of the plan, such as being arranged in order from the plan with the lowest plan execution cost, being arranged in order from the plan with the shortest time required for plan execution, or being arranged in order from the plan with no risk site.
- Any method may be used as the arrangement method, such as, for example, arranging the plans in order from the one having the lowest cost by clicking on the “Cost ($)” in the
display area 9001. - The plan details
button 9002 is for instructing the display of the plan details screen (FIG. 21 ). When the administrator selects a desired plan from within thedisplay area 9001 using theinput apparatus 15000 and presses the plan detailsbutton 9002, themanagement server 10000 starts the execution of the effect-risk presentation process ofFIG. 19 , and displays the plan details screen (FIG. 21 ) for displaying the selected plan details information on theoutput apparatus 14000. - The
plan execution button 9003 is for instructing the execution of the selected plan, and when this button is pressed, themanagement server 10000 issues a plan execution indication to the program, which provides the function corresponding to the selected plan. The program, which receives the plan execution indication, executes the selected plan. The program, which executes the plan here, for example, is a program in thememory 11000 of themanagement server 10000, and, for example, is a volume migration program or a VM migration program. - Also, for example, predicted values for a pre-plan execution performance value and a post-plan execution performance value may be displayed in the
display area 9001 in graph form as trend information. -
FIG. 20 is an example of theplan display screen 9000, and plan characteristics-denoting information other than the plan execution cost and time required for plan execution, such as, for example, a list of business operations, which are using plan-related resources and could have an affect at plan execution time, may be displayed together in thedisplay area 9001, and another display mode may be used. -
FIG. 21 is a constitutional diagram of an example of the plan details screen related to Example 1. - The plan details
screen 9010 comprises adisplay area 9011 for displaying information, which is referenced when, in a case where a failure has occurred in the computer system, the administrator is investigating the cause of the failure and implementing countermeasures, specifically, information showing the coupling relationship between the apparatus and the device related to the failure, and adisplay area 9017 for displaying detailed information regarding a risk site. Thedisplay area 9011 comprises aserver area 9012, which denotes the configuration of thephysical server 30000 in the computer system, aswitch area 9013, which denotes the configuration of the switch, astorage area 9014, which denotes the configuration of thestorage apparatus 20000, anarea 9015 showing the coupling relationship and the setting relationship between each apparatus and device prior to executing the plan selected in theplan presentation screen 9000, and anarea 9016 showing the coupling relationship and the setting relationship between each apparatus and device after executing the selected plan. Thedisplay area 9017 for displaying detailed information regarding a risk site comprises arisk site 9018, which denotes a site where a risk remains even after plan execution, and arisk occurrence timing 9019, which denotes, when the risk site is left unattended, the timing at which the relevant risk will occur. - In the example of
FIG. 21 , a diagram 9020 denoting the physical server A, and a diagram 9021 denoting the storage apparatus A are displayed in theserver area 9012 of thedisplay area 9011. Also, diagrams denoting the drive “/opt” and the drive “/var”, which are mount points, are displayed inside the diagram 9020, and diagrams denoting the logical volume “LV1” and the disk pool “POOL1” are displayed inside the diagram 9021. That is, theserver area 9012 is for expressing the coupling relationships and the setting relationships of the computer system. In addition, amark 9022 showing a failure event, which has occurred in the system and is being managed by the analysis result management table 11910, is displayed in theserver area 9012 on the site where the failure event occurred, for example, on the data I/F “S2” of the physical server A. Also, amark 9023 showing a risk site, which is being managed by the expanded plan table 11960, is displayed in theserver area 9012 on the risk occurrence site, for example, on the drive “/opt” of the physical server A. Furthermore, the coupling relationship of an apparatus and a device, such as the physical server A and the storage apparatus A, in thearea 9011 is expressed using a solid line connecting the diagrams denoting the apparatus and the device. - In the example shown in
FIG. 21 , thearea 9015 showing the state before plan execution shows the fact that the drive “/opt” of the physical server A, the logical volume “LV1” and the disk pool “POOL1” of the storage apparatus A are associated, the drive “/var” of the physical server A, the logical volume “LV2” and the disk pool “POOL1” of the storage apparatus A are associated, and the logical volume “LV3” and the disk pool “POOL2” of the storage apparatus A are associated. Thearea 9015 showing the state before plan execution also shows the fact that failure events have occurred in the drive “/var” on the physical server A, the data I/F on the physical server A, the data I/F of the storage apparatus A, the logical volume “LV2” of the storage apparatus A, and the disk pool “POOL1” of the storage apparatus A, and that a risk exists in the drive “/opt” on the physical server A. - The
area 9016 showing the state after plan execution shows that the logical volume “LV2”, which existed in the disk pool “POOL1”, has come to exist in the POOL2, and that even after plan execution, a failure event remains in the disk pool “POOL1” of the storage apparatus A, and a risk remains in the drive “/opt” on the physical server A. - Detailed information related to the device with the
mark 9023 showing the risk site is displayed in thedisplay area 9017 showing the risk details. In the example ofFIG. 21 , the risk details information, which is on the physical server A drive “/opt” in thedisplay area 9011, is displayed in thearea 9017, and in the example shown inFIG. 21 , thearea 9017 denotes that the risk site is the physical server A drive “/opt”, and that the timing at which the risk could occur is when an I/O is generated to the drive “/opt”. The risk occurrence timing, for example, is determined based on the information acquired inStep 6002 ofFIG. 19 . For example, themanagement server 10000 may acquire the information of theperformance value 11825 of the performance information table 11820, may detect the fact that the performance value related to the physical server A drive “/opt” is 0 msec and that an I/O has not occurred, may judge that the occurrence of an I/O could trigger the occurrence of a risk, and may store the relevant information in therisk occurrence timing 9019. - The plan details
screen 9010 here comprises thedisplay area 9011 and thedisplay area 9017, but the display method is not limited thereto, and, for example, when themark 9023 showing the risk site of thedisplay area 9011 is clicked, thedisplay area 9017 may be newly displayed as a separate screen. Or, when the risk site information in thedisplay area 9001 of theplan presentation screen 9000 is clicked, thedisplay area 9017 may be newly displayed as a separate screen. Also, when a diagram showing either an apparatus or a device, such as thephysical server 30000 or thestorage apparatus 20000 of thedisplay area 9011 is clicked, the performance values of either the relevant apparatus or device before and after plan execution may be displayed. - According to Example 1, it is possible to associatively present a failure cause and a specific recovery plan for the failure, and to check the number of events, of the failure events related to the failure cause, that has been resolved in accordance with executing the plans, and by displaying the result, to present the administrator with a problem site, which latently remains as-is even after plan execution, together with the reason therefor as plan details information. This makes it possible for the administrator to select an appropriate plan, and to readily ascertain post-plan execution risk at plan selection time.
- Next, Example 2 will be explained. In the following explanation, the focus will be on explaining the differences with Example 1, and descriptions of similar components, programs having similar functions, and tables having similar items will be omitted.
-
FIG. 22 is a constitutional diagram of an example of a management server related to Example 2. - The
management server 10000 comprises the same components as in Example 1, and, in addition, a VM configuration management table 11830 is stored in the configuration/performance information repository 11800 of thememory 11000. The VM configuration management table 11830 manages the corresponding relationship between a VM and a hypervisor (hereinafter also called “HV”), which logically creates and runs the VM, and VM setting information, for example, power status information and the like. - In the example shown in
FIG. 22 , the various programs and tables are stored in thememory 11000, but may be stored in either thestorage device 12000 or another storage medium (not shown in the drawing). In accordance with this, theprocessor 15000 reads a target program on thememory 11000 at program execution time, and executes the program that was read. The above-described programs and the above-described tables may be stored in either thememory 21000 of thestorage apparatus 20000 or thememory 31000 of thephysical server 30000, and either thestorage apparatus 20000 or thephysical server 30000 may execute the stored programs. Another apparatus, such as a switch, may store the above-described programs and tables, and may execute the stored programs. -
FIG. 23 is a constitutional diagram of an example of a physical server related to Example 2. - The
physical server 30000 comprises the same components as in Example 1, and, in addition, aVM management program 31400 is stored in thememory 31000. TheVM management program 31400 manages the configuration information and the performance information of the VM. TheVM management program 31400 also carries out VM-related control, such as VM migration. - In the example shown in
FIG. 23 , various programs are stored in thememory 31000, but maybe stored in another storage medium (not shown in the drawing). In accordance with this, theprocessor 33000 reads a target program on thememory 31000 at process execution time, and executes the program that was read. -
FIG. 24 is a logical constitutional diagram of an example of the physical server related to Example 2. - The
physical server 30000 comprises anHV 80000, which logically creates aVM 70000, and runs the createdVM 70000. TheHV 80000 can controlmultiple VMs 70000 at one time. Each of themultiple VMs 70000 can execute an application just like a standalone physical computer. -
FIG. 25 is a constitutional diagram of an example of a switch related to Example 2. - The
switch 60000 comprises amemory 61000, a management I/F 62000, aprocessor 63000, and a switch data I/F 64000, and these apparatuses are coupled via aninternal bus 65000 such as theinternal bus 65000. Theswitch 60000 is an apparatus for selecting a communication route from the data I/F 32000 of thephysical server 30000 to the data I/F 26000 of thestorage 20000. A configuration/performanceinformation collection program 61100 is stored in thememory 61000. The configuration/performanceinformation collection program 61100 is for sending/receiving the management information and the performance information of theswitch 60000 and the like to/from themanagement server 10000. -
FIG. 26 is a constitutional diagram of an example of a configuration information table related to Example 2. - The configuration information table 11810 comprises the field of
switch information 11818 in addition to the fields of the configuration information table 11810 related to Example 1. Theswitch information 11818 comprises aswitch 11818A in which is stored theswitch 60000 identifier, a switch data I/F IN 11818B in which is stored data showing the input data I/F of theswitch 60000, and a switch data I/F OUT 11818C in which is stored data showing the output data I/F of theswitch 60000. Theswitch information 11818 is arranged between the fields of thephysical server 11811 and thestorage 11814, but the relative position of this field shows the relative positions of the apparatus and device on the communication route. For example, the second entry from the top ofFIG. 26 shows that switch B and switch C exist between the server data I/F “S2” of the physical server A and the storage data I/F “P2” of the storage apparatus A. More specifically, this entry shows that the physical server A server data I/F “S2” is coupled to the switch B data I/F “R10”, the switch B data I/F “R11” and the switch C data I/F “R20” are coupled, and the switch C data I/F “R21” and the storage apparatus A storage data I/F “P2” are coupled. TheVM 70000 configuration information is also stored in the information configuration table 11810 the same as thephysical server 30000 configuration information. Therefore, an identifier for uniquely identifying theVM 70000 is stored in thephysical server 11811. For example, the fifth entry from the top ofFIG. 26 shows that the logical volume “LV10”, which was created from the disk pool “POOL3” of storage B, is coupled to the VM1 via the storage B storage data I/F “P3”, the switch A data I/F “R2”, “R1” and the server data I/F “S3”, and is recognized as logical volume “E:” on the VM1. -
FIG. 27 is a constitutional diagram of an example of the VM configuration management table related to Example 2. - An entry is added to the VM configuration management table 11830 in accordance with the configuration
setting management program 11600 being executed. The configurationsetting management program 11600 comprises the fields of avirtual server 11831, apower status 11832, aphysical server 11833, and a server data I/F 11834. An identifier for uniquely identifying aVM 70000 is stored in thevirtual server 11831. Data showing the power status of theVM 70000, for example, “ON”, “OFF”, or “SUSPEND” is stored in thepower status 11832. An identifier for uniquely identifying thephysical server 30000 on which theVM 70000 is running is stored in thephysical server 11833. An identifier for uniquely identifying the server data I/F of thephysical server 30000 is stored in the server data I/F 11834. -
FIG. 28 is a constitutional diagram of an example of the performance information table related to Example 2. - The configuration of the performance information table 11820 related to Example 2 is substantially the same as the configuration of the performance information table 11820 related to Example 1.
VM 70000 performance information,hypervisor 80000 performance information, and switch 60000 performance information are also stored in the performance information table 11820 related to Example 2 as performance information related to either an apparatus or a device comprising the computer system. Here, theVM 70000 and thehypervisor 8000 are also treated as apparatuses. For example, an identifier for uniquely identifying either theVM 70000, thehypervisor 80000, or theswitch 60000 is stored in theapparatus ID 11821.FIG. 29 is a constitutional diagram of an example of the event management table related to Example 2. - The configuration of the event management table 11910 related to Example 2 is substantially the same as the configuration of the event management table 11910 related to Example 1. In the event management table 11910 related to Example 2, information related to an event that has occurred in the
VM 70000, information related to an event that has occurred in thehypervisor 80000, and information related to an event that has occurred in theswitch 60000 are also stored as information related to an event, which occurs in either an apparatus or a device comprising the computer system. -
FIG. 30A is a first constitutional diagram of a general rule table related to Example 2.FIG. 30B is a second constitutional diagram of a general rule table related to Example 2. - The configuration of the general rule table 11920 related to Example 2 is substantially the same as the configuration of the general rule table 11920 related to Example 1. In Example 2, a VM 70000-related event, a hypervisor 80000-related event, and a switch 60000-related event are also used as a condition event, which is defined by the
condition part 11921 and theconclusion part 11922 of the general rule. -
FIG. 31 is a constitutional diagram of an example of a rule-plan association table related to Example 2. - The configuration of the rule-plan association table 11940 related to Example 2 is substantially the same as the configuration of the rule-plan association table 11940 related to Example 1. In Example 2, general rules “
Rule 3” and “Rule 4” are taken up and explained as general rules, and general plans “Plan 1” and “Plan 6” are taken up and explained as general plans. -
FIG. 32A is a first constitutional diagram of the expanded rule table related to Example 2.FIG. 32B is a second constitutional diagram of the expanded rule table related to Example 2. - The configuration of the expanded rule table 11950 related to Example 2 is substantially the same as the configuration of the expanded rule table 11950 related to Example 1. In Example 2, a VM 70000-related event, a hypervisor 80000-related event, and a switch 60000-related event are also used as a condition event, which is defined by the
condition part 11951 and theconclusion part 11952 of the expanded rule. -
FIG. 33A is a first constitutional diagram of the expanded plan table related to Example 2.FIG. 33B is a second constitutional diagram of the expanded plan table related to Example 2.FIG. 33C is a third constitutional diagram of the expanded plan table related to Example 2. - The configuration of the expanded plan table 11960 related to Example 2 is substantially the same as the configuration of the expanded plan table 11960 related to Example 1. The same as in Example 1, the fields included in the
plan target 11964 will differ in accordance with the contents of the plan. - The expanded plan shown in either
FIG. 33B orFIG. 33C is an expanded plan based on the general plan “Plan 6”, that is, an expanded plan related to a VM migration, and theplan target 11964, for example, comprises the fields of atarget VM 1196B, amigration source 1196C, and amigration destination 1196D. Thetarget VM 1196B comprises the fields of an ID 1196BA in which is stored the identifier of the VM 70000 (hereinafter referred to as “target VM”), which is the target of the VM migration, and a performance 1196BB in which is stored the target VM post-migration performance value. Themigration source 1196C comprises the fields of an ID 1196CA in which is stored the identifier of thehypervisor 80000, which is the migration source of the target VM (hereinafter referred to as “migration-source hypervisor”), and a performance 1196CB in which is stored the performance value of the migration-source hypervisor after the target VM has been migrated. Themigration destination 1196D comprises the fields of an ID 1196DA in which is stored the identifier of thehypervisor 80000, which is the migration destination of the target VM (hereinafter referred to as “migration-destination hypervisor”), and a performance 1196DB in which is stored the performance value of the migration-destination hypervisor after the target VM has been migrated. Theplan creation program 11100 acquires the identifiers stored in the ID 1196BA, the ID 1196CA, and the ID 1196DA from the configuration information table 11810 and so forth, and stores these identifiers in the ID 1196DA. Furthermore, any method may be used to calculate the value of the performance information predicted values stored in the performance 1196BB, the performance 1196CB, and the performance 1196DB, and, for example, theplan creation program 11100 may determine the predicted values in accordance with either adding or subtracting the IOPS as shown in Example 1. An example of performance information has been described here, but cost information, or time information, such as system downtime resulting from a failure, may be stored. Asingle hypervisor 80000 serves as the migration source and the migration destination here, but a set ofmultiple hypervisors 80000, which share resources, or a data store inside thehypervisor 80000 may serve as the migration source and the migration destination. -
FIG. 34 is a constitutional diagram of an example of the analysis result management table related to Example 2. - The configuration of the analysis result management table 11970 related to Example 2 is substantially the same as the configuration of the analysis result management table 11970 related to Example 1. In Example 2, the
VM 70000 identifier, thehypervisor 80000 identifier, theswitch 60000 identifier, and the identifiers of the devices thereof can be stored in acause apparatus ID 11971 and acause component ID 11972. Also, the identifier of an event related to theVM 70000, the identifier of an event related to thehypervisor 80000, and the identifier of an event related to theswitch 60000 can be stored in the receivedevent ID 11976. -
FIG. 35 is a flowchart of the plan presentation process related to Example 2. - The plan presentation process is executed in accordance with the
management server 10000processor 15000 executing theplan presentation program 11300 deployed on thememory 11000. - First, the
plan presentation program 11300 acquires information showing a failure cause from the analysis result management table 11970, that is, the values of thecause apparatus ID 11971, thecause component ID 11972, themetrics 11973, and the certainty factor 11974 (Step 7001). - Next, the
plan presentation program 11300 executes the processing ofSteps 7002 through 7005 below with respect to each of one or more expanded plans shown by one or more expanded plan tables 11960 stored in the rule-plan information repository 11900. Theplan presentation program 11300 determines whether or not there exists among the one or more expanded plans shown by the one or more expanded plan tables 11960 stored in the rule-plan information repository 11900 an expanded plan (hereinafter referred to as “first consolidation-target plan”), which has an expandedrule ID 1196A value that differs from that of the processing-target expanded plan, that is, a corresponding expanded rule that is different, and, in addition, has the same processing content as that of the processing-target expanded plan (Step 7002). - In a case where a first consolidation-target plan does not exist (Step 7002: No), the
plan presentation program 11300 advances the processing toStep 7004. Alternatively, in a case where a first consolidation-target plan exists (Step 7002: Yes), theplan presentation program 11300 deletes the first consolidation-target plan from the expanded plan table 11960, updates the value of the expandedrule ID 1196A of the expanded plan table 11960 comprising the processing-target expanded plan (Step 7003), and advances the processing toStep 7004. - For example, in the examples of
FIGS. 33B and 33C , the expanded plans are such that the expanded plan “ExPlan 6-1” and the expanded plan “ExPlan 6-3”, and the expanded plan “ExPlan 6-2” and the expanded plan “ExPlan 6-4” have corresponding expanded rules that differ from one another, and, in addition, have the same processing content. Therefore, in a case where the processing-target expanded plan is the expanded plan “ExPlan 6-1”, theplan presentation program 11300 identifies the expanded plan “ExPlan 6-3” as the first consolidation-target plan inStep 7002, deletes the expanded plan “ExPlan 6-3” from the expanded plan table 11960 inStep 7003, and updates the value in the expandedrule ID 1196A of the expanded plan table 11960 comprising the expanded plan “ExPlan 6-1” to data showing the expanded rule “ExRule 3-1” and the expanded rule “ExRule 4-1”, for example, to “ExRule 3-1, ExRule 4-1”. Furthermore, in a case where the processing-target expanded plan is the expanded plan “ExPlan 6-2”, theplan presentation program 11300 identifies the expanded plan “ExPlan 6-4” as the first consolidation-target plan inStep 7002, deletes the expanded plan “ExPlan 6-4” from the expanded plan table 11960 inStep 7003, and updates the value in the expandedrule ID 1196A of the expanded plan table 11960 comprising the expanded plan “ExPlan 6-2” to “ExRule 3-1, ExRule 4-1”. The value in the expandedrule ID 1196A of an existing expanded plan table 11960 is updated here, but a new expanded plan table 11960 in which “ExRule 3,ExRule 4” is stored in the expandedrule ID 1196A may be prepared. - In
Step 7004, theplan presentation program 11300 determines whether or not there exists in the one or more expanded plans shown by the one or more expanded plan tables 11960 stored in the rule-plan information repository 11900 an expanded plan (hereinafter referred to as “second consolidation-target plan”), which has ageneral plan ID 11962 that is the same as the processing-target expanded plan, that is, which has the same general plan as a basis, and, in addition, has similar performance information and has the same risk. - In a case where a second consolidation-target plan does not exist (Step 7004: No), the
plan presentation program 11300 advances the processing to the next process without executingStep 7005. Alternatively, in a case where a second consolidation-target plan exists (Step 7004: Yes), theplan presentation program 11300 identifies from among the processing-target expanded plan and one or more second consolidation-target plans an expanded plan (hereinafter referred to as “best plan”) having the best predicted value for the post-plan execution performance value. Then, theplan presentation program 11300 deletes from the expanded plan table 11960 the expanded plan, which is not the best plan of the processing-target expanded plan and the one or more second consolidation-target plans, and updates the value in the expandedrule ID 1196A of the expanded plan table 11960 (Step 7005). - For example, in the examples of
FIGS. 33A and 33B , of the expanded plan “ExPlan 1-1”, the expanded plan “ExPlan 1-2”, and the expanded plan “ExPlan 1-3”, which are expanded plans having the same general rule as a basis, and, in addition, similar performance information and the same risk, the expanded plan “ExPlan 1-1” is the best plan in that the migration-target volume performance is the best. Therefore, only the best plan “ExPlan 1-1” is left as-is, and the other expanded plan “ExPlan 1-2” and expanded plan “ExPlan 1-3” are deleted. Also, of the expanded plan “ExPlan 6-1” and the expanded plan “ExPlan 6-2”, which are expanded plans having the same general rule as a basis, and, in addition, similar performance information and the same risk, the expanded plan “ExPlan 6-1” is the best plan in that the target VM performance is the best. Therefore, only the best plan “ExPlan 6-1” is left as-is, and the other expanded plan “ExPlan 6-2” is deleted. - The range for judging that the performance information is similar, for example, may be fixedly configured beforehand as the I/O response time being within the range of equal to or less than ±1 msc, and may be configured by the administrator via the
input device 13000. - In
Step 7005, theplan presentation program 11300 only leaves the best plan, for example, the one expanded plan having the fastest I/O response time, and deletes the other expanded plan(s), but may leave as-is multiple expanded plans having good predicted values for the post-plan execution performance values. The number of expanded plans left after consolidation, for example, may be fixedly decided beforehand, and may be configured by the administrator via theinput device 13000. Also, the number of post-consolidation expanded plans may be decided so that all of the expanded plans are capable of being displayed in the output screen. The object of this processing is to prevent the plan selection operation performed by the administrator from becoming troublesome due to the display of a large number of similar expanded plans. For example, rather than deleting an expanded plan, a method, which displays only an expanded plan having a good post-plan execution performance value and does not display another expanded plan, and which switches between displaying and not displaying by clicking a prescribed button, may be used. - After completing the processing (
Steps 7002 through 7005) for all the expanded plans, theplan presentation program 11300, based on the information showing the failure cause and the certainty factor acquired inStep 7001 and the expanded plan table 11960 stored in the rule-plan information repository 11900, creates a plan presentation screen 9000 (refer toFIG. 36 ) and causes theoutput device 14000 to display the created plan presentation screen 9000 (Step 7006). Thereafter, theplan presentation program 11300 ends the plan presentation process. -
FIG. 36 is a constitutional diagram of an example of the plan presentation screen related to Example 2. - The configuration of the
plan presentation screen 9000 related to Example 2 is substantially the same as the configuration of theplan presentation screen 9000 related to Example 1. - In Example 2, the expanded plans shown in
FIGS. 33A, 33B, and 33C are created in accordance with the plan creation process. Specifically, the expanded plan “ExPlan 1-1”, the expanded plan “ExPlan 1-2”, and the expanded plan “ExPlan 1-3” are created as volume migration-related expanded plans, and the expanded plan “ExPlan 6-1”, the expanded plan “ExPlan 6-2”, the expanded plan “ExPlan 6-3”, and the expanded plan “ExPlan 6-4” are created as VM migration-related expanded plans. That is, a total of seven expanded plans are created. According to the plan presentation process related to Example 2 shown inFIG. 35 , of the expanded plan “ExPlan 1-1”, the expanded plan “ExPlan 1-2”, and the expanded plan “ExPlan 1-3”, only the expanded plan “ExPlan 1-1”, which is the plan having the best migration-target volume performance, is left as-is, and the expanded plan “ExPlan 1-2” and the expanded plan “ExPlan 1-3” are deleted. Also, according to the plan presentation process, of the expanded plan “ExPlan 6-1”, the expanded plan “ExPlan 6-2”, the expanded plan “ExPlan 6-3”, and the expanded plan “ExPlan 6-4”, only the expanded plan “ExPlan 6-1”, which is the one plan via which the performance of the target VM will become the best, is left as-is, and the expanded plan “ExPlan 6-2”, the expanded plan “ExPlan 6-3”, and the expanded plan “ExPlan 6-4” are deleted. This example shows that it is possible to consolidate the expanded plan corresponding to the general rule in which the failure cause is thestorage apparatus 20000 and the expanded plan corresponding to the general rule in which the failure cause is theswitch 60000. In thisscreen 9000, essentially, any display method may be used as long as it makes it possible to illustrate the fact that a plan with respect to failures having different root causes is shared in common, and the display method is not limited to the one shown inFIG. 36 . - According to Example 2, it is possible to associatively present a failure cause and a specific recovery plan for the failure, and to check the number of events, of the failure events related to the failure cause, that has been resolved in accordance with executing the plans, and by displaying the result, to present the administrator with a problem site, which latently remains as-is even after plan execution, together with the reason therefor as plan details information. This makes it possible for the administrator to select an appropriate plan, and to readily ascertain post-plan execution risk at plan selection time. Also, by consolidating expanded plans via which identical or similar effects are obtained into a single plan, it is possible to curb the presentation of redundant plans, and, in a case where numerous recovery plans exist with respect to a failure, it is possible to reduce the number of plans presented to the administrator and to lower the administrator's costs with respect to plan details checking operations and plan selection operations.
- Next, Example 3 will be explained. In the following explanation, the focus of the explanation will be on the differences with Example 1 and Example 2, and descriptions of similar components, programs having similar functions, and tables having similar items will be omitted.
-
FIG. 37 is a constitutional diagram of an example of a management server related to Example 3. - The
management server 10000 comprises the same components as in Example 2, and, in addition, a maintenanceinformation management program 11110 is stored in thememory 11000. In addition, a maintenance information management table 11980 is stored in the rule-plan information repository 11900. The maintenance information management table 11980 manages information related to replacement and maintenance operations due to apparatus renewal. - In the example shown in
FIG. 37 , the various programs and tables are stored in thememory 11000, but these programs and tables may be stored in either thestorage device 12000 or another storage medium (not shown in the drawing). In accordance with this, theprocessor 15000 reads a target program on thememory 11000 at program execution time, and executes the program that was read. The above-described programs and the above-described tables may be stored in either thestorage apparatus 20000 memory or thephysical server 30000 memory, and either thestorage apparatus 20000 or thephysical server 30000 may execute the stored programs. Another apparatus, such as aswitch 60000, may store the above-described programs and tables, and may execute the stored programs. -
FIG. 38 is a constitutional diagram of an example of a general plan table related to Example 3. - The general plan table 11930 manages a list of plans available in the computer system. The general plan table 11930 comprises the fields of a
general plan ID 11931, aplan 11932, and amaintenance action 11933. A general plan ID, which is the identifier of the general plan, is stored in thegeneral plan ID 11931. Information showing the plans, which are available in the computer system, is stored in theplan 11932. Information showing whether or not the relevant plan is a plan related to a maintenance schedule is stored in themaintenance action 11933. For example, a plan in which physical hardware is to be replaced, such as the storage port replacement of the general plan “Plan 8” and the switch replacement of the general plan “Plan 9”, is regarded as a plan related to a maintenance schedule. -
FIG. 39 is a constitutional diagram of an example of a rule-plan association table related to Example 3. - The configuration of the rule-plan association table 11940 related to Example 3 is substantially the same as the configuration of the rule-plan association table 11940 related to Example 1. In Example 3, general rule “
Rule 4” is taken up and explained as the general rule, and general plans “Plan 6” and “Plan 9” are taken up and explained as the general plans. -
FIG. 40A is a first constitutional diagram of the expanded plan table related to Example 3.FIG. 40B is a second constitutional diagram of the expanded plan table related to Example 3. - The configuration of the expanded plan table 11960 related to Example 3 is substantially the same as the configuration of the expanded plan table 11960 related to Example 1. The same as in Example 1, the fields included in the
plan target 11964 will differ in accordance with the contents of the plan. - Regarding the expanded plan based on the general plan “Plan 9” of
FIG. 40B , that is, the expanded plan related to replacing a switch, theplan target 11964, for example, comprises the fields of a replacement-target switch 1196E in which is stored the identifier of theswitch 60000, which is targeted for replacement (hereinafter referred to as “replacement-target switch”), and aCost 1196F in which is stored data denoting the replacement cost. Theplan creation program 11100 acquires the identifier of the replacement-target switch from the configuration information table 11810, and stores this identifier in replacement-target switch 1196E. Theplan creation program 11100 acquires the value stored in theCost 1196F from the maintenance information management table 11980, and stores this value in theCost 1196F. An example in which only the replacement target identification information and the cost information are stored has been described here, but other information, for example, information showing about how much time it will take to replace theswitch 60000 may be stored. -
FIG. 41 is a constitutional diagram of an example of the maintenance information management table related to Example 3. - The maintenance information management table 11980 manages schedule information in accordance with which the administrator performs a maintenance operation, such as hardware replacement. This table 11980, for example, is created manually and inputted by the administrator. The maintenance information management table 11980 comprises the fields of an
apparatus 11981, anapparatus component 11982, areplacement reason 11983, a replacement date/time 11984, an affectedservice 11985, and acost 11986. The apparatus ID of an apparatus targeted for a maintenance operation is stored in theapparatus 11981. The identifier of a device targeted for a maintenance operation is stored in theapparatus component ID 11982. Information showing the reason that a replacement was scheduled is stored in thereplacement reason 11983. Information showing the date and time of the replacement is stored in the replacement date/time 11984. The identifier of a service, which will be affected by replacing a device targeted for the maintenance operation, is stored in the affectedservice 11985. Information showing the cost of replacing the device targeted for the maintenance operation is stored in thecost 11986. -
FIG. 42 is a flowchart of the effect-risk presentation process related to Example 3. - The processing of
Steps 8001 through 8004 is the same processing as the processing ofSteps 6001 through 6004 in the effect-risk presentation process related to Example 1, and as such, explanations of these steps will be omitted. - In
Step 8005, theplan presentation program 11300 references the information in themaintenance action 11933 of the general plan table 11930 and the maintenance information table 11980, and acquires information related to a maintenance schedule. - Thereafter, the
plan presentation program 11300 creates a plan details screen 9010 (refer toFIG. 43 ) based on the information acquired inSteps 8002 through 8005, and causes theoutput device 14000 to display the created plan details screen 9010 (Step 8006). Thereafter, theplan presentation program 11300 ends the effect-risk presentation process. -
FIG. 43 is a constitutional diagram of an example of the plan details screen related to Example 3. - The configuration of the plan details
screen 9010 related to Example 3 is substantially the same as the configuration of the plan detailsscreen 9010 related to Example 1. - In the example shown in
FIG. 43 , thearea 9015 showing the state before plan execution shows the fact that the VM1 drive “E:”, the switch A, the data I/F “P3” of the storage B, the logical volume “LV10” of the storage B, and the disk pool “POOL3” are associated, the VM2 drive “F:”, the switch A, the logical volume “LV11” of the storage B, and the disk pool “POOL3” are associated, and the VM3 drive “D:”, the switch B, the logical volume “LV12” of the storage B, and the disk pool “POOL4” are associated. Thearea 9015 showing the state before plan execution also shows the fact that failure events have occurred in the VM2 drive “F:”, the switch A data I/F, and the storage B data I/F, and that a risk exists in the VM1 drive “E:”. - The
area 9016 showing the state after plan execution shows that the VM2, which had been running on thehypervisor 80000 of the physical server B, is running on thehypervisor 80000 of the physical server C, and that after plan execution, the VM2 drive “F:”, the switch B, the storage B data I/F, the logical volume “LV11” of the storage B, and the disk pool “POOL3” are associated, a failure event remains in the switch A data I/F and the storage B data I/F, and a risk remains in the VM1. - In Example 3, the
area 9017, which shows the risk details, denotes that the risk site is the VM1 drive “E:”, and that the timing at which the risk could occur is when the power to the drive “E:” is turned ON. The risk occurrence timing, for example, is determined based on the information acquired inStep 8002 ofFIG. 42 . For example, themanagement server 10000 may acquire the information in thepower status 11832 of the VM configuration management table 11830, may detect the fact that the VM1 power status is OFF, may judge that the resumption of business operations accompanying the powering ON of the VM could trigger the occurrence of a risk, and may store the relevant information in therisk occurrence timing 9019. - A
display area 9022 for showing the maintenance schedule, for example, comprises the fields of anarea 9023 in which is displayed the identifier of either an apparatus or a device targeted for replacement in a maintenance operation, anarea 9024 is which is displayed a replacement reason, and anarea 9025 in which is displayed a replacement date/time. A replacement target, which is displayed in thedisplay area 9022 showing the maintenance schedule, for example, is limited to either an apparatus or a device in which a failure event has occurred, and this information is acquired via the processing ofStep 8005 in the effect-risk presentation process ofFIG. 42 . The information, for example, the cost information shown in the maintenance information management table 11980 ofFIG. 41 may be collectively displayed in thedisplay area 9022. The administrator, who references the information related to the maintenance schedule, for example, is able to check the replacement date/time in accordance with the maintenance of either the apparatus or device where the failure occurred, and, along with taking actions with respect to the failure, to either replace the switch ahead of schedule, or, since switch replacement is near at hand, to select a VM migration-based plan as a temporary measure even though a certain amount of risk remains. - In Example 3, the
display area 9011, thedisplay area 9017, and thedisplay area 9022 are displayed in the same screen, but the display method is not limited thereto, and, for example, when a diagram showing either the apparatus or the device configured in the maintenance schedule of thedisplay area 9011, for example, switch A, is clicked, thedisplay area 9022 may be newly displayed as a separate screen. Or, when a plan displayed in thedisplay area 9001 of theplan presentation screen 9000 is clicked, thedisplay area 9022 may be newly displayed as a separate screen. When a diagram showing either an apparatus or a device, such as thephysical server 30000 or thestorage apparatus 20000 in thedisplay area 9011 is clicked, the performance values of the relevant apparatus or device before and after plan execution may be displayed. - According to Example 3, it is possible to associatively present a failure cause and a specific recovery plan for the failure, and to check the number events, of the failure events related to the failure cause, that has been resolved in accordance with executing the plans, and by displaying the result, to present the administrator with a problem site, which latently remains as-is even after plan execution, together with the reason therefor as plan details information. This makes it possible for the administrator to select an appropriate plan, and to readily ascertain post-plan execution risk at plan selection time. Also, in the case of a plan, which can be associated with a maintenance schedule, a check can be performed in conjunction with a maintenance schedule in the plan details
screen 9010, making it easier for the administrator to ascertain the importance of the affects of the maintenance, thereby enabling the administrator to select a plan that will reduce costs. - Needless to say, the present invention is not limited to the examples explained hereinabove, and various changes are possible without departing from the gist thereof.
-
- 10000 Management server
- 20000 Storage apparatus
- 30000 Physical server
Claims (15)
1. A management system for managing a computer system, which comprises multiple monitoring-target devices, comprising:
a storage device; and
a control device coupled to the storage device, wherein the storage device is configured to store:
a general plan that includes a recovery plan available in a case where a failure event happens in the multiple monitoring-target devices; and
configuration information, that includes a coupling relationship between the multiple monitoring-target devices,
wherein the control device is configured to:
create, based on the failure event and the configuration information, one or more expanded plans into which expand the general plan by taking into account configurations of the monitoring-target devices,
display the one or more expanded plans and predicted information corresponding to the one or more expanded plans.
2. The management system according to claim 1 ,
wherein the predicted information comprises information about one of the multiple monitoring-target devices which is related to the failure event.
3. The management system according to claim 1 ,
wherein the predicted information comprises information about one of the multiple monitoring-target devices which is a subject of one or more of the expanded plans.
4. The management system according to claim 1 ,
wherein the predicted information comprises information about an implementation of one or more of the expanded plans.
5. The management system according to claim 2 ,
wherein the predicted information is information about a prediction of state after implementation of one or more of the expanded plans, and
wherein the predicted information comprises at least one of information intending a risk after implementation of one or more of the expanded plans, information intending performance, information intending capacity and information intending cost.
6. The management system according to claim 3 ,
wherein the predicted information is information intending a prediction about state after implementation of one or more of the expanded plans, and
wherein the predicted information comprises at least one of information intending a risk after implementation of one or more of the expanded plans, information intending performance, information intending capacity and information intending cost.
7. The management system according to claim 4 ,
wherein the predicted information is information intending a prediction about one or more of the expanded plans itself, and
wherein the predicted information comprises at least one of information intending time required for implementation of one or more of the expanded plans and information intending costs for implementation of one or more of the expanded plans.
8. The management system according to claim 1 ,
wherein the control device is further configured to
create, when the general plan includes a volume migration, a first expanded plan related to the volume migration, that includes identifying a volume related to the failure event as a migration-source volume, and identifying a migration-destination volume based on the coupling relationship with the migration-source volume,
compute, based on an I/O response time for the migration-source volume and the migration-destination volume with respect to the first expanded plan, a predicted value for the I/O response time of the migration-source volume and the migration-destination volume after the first expanded plan has been implemented, and
display the predicted value for the I/O response time.
9. The management system according to claim 8 ,
wherein the control device is further configured to:
create, when the general plan includes a disk addition to a pool, a second expanded plan related to disk addition to a pool that identifies the pool related to the failure event as a disk addition-target pool,
compute, based on an I/O response time for the disk addition-target pool and a before-and-after disk addition capacity ratio of the disk addition-target pool with respect to the first expanded plan, a second predicted value for the I/O response time of the disk addition-target pool after the second expanded plan has been implemented, and
display the second predicted value for the I/O response time.
10. The management system according to claim 9 ,
wherein the control device is configured to
compute a third predicted value based on a performance value related to the monitoring-target devices before and after implementation of the first expanded plan and the second expanded plan and the performance value related to the monitoring-target devices, and
further display the third predicted value for the performance value.
11. The management system according to claim 10 , wherein the control device is configured to
consolidate similar expanded plans from multiple expanded plans that includes the first expanded plan and the second expanded plan into a single consolidated expanded plan, and
display data showing the single consolidated expanded plan.
12. The management system according to claim 11 ,
wherein the storage device is configured to further store maintenance schedule information, which shows a schedule of maintenance operations to be performed for any of the multiple monitoring-target devices, and
wherein the control device is configured to further display data showing the schedule of maintenance operations to be performed for the monitoring-target devices, which are related to one or more of the expanded plans.
13. The management system according to claim 12 ,
wherein the storage device is configured to further store cost information showing, with respect to the general plan, the cost required for implementing the multiple expanded plans, and
wherein the control device is configured to
compute, with respect to each of the multiple expanded plans, the cost required for implementing the multiple expanded plans based on a cost required for implementing the general plan, and
further display the computed cost.
14. A non-transitory computer-readable storage medium having instructions stored thereon, execution of which by a processor cause the processor to perform operations, the operations comprising:
storing a general plan that includes a recovery plan available in a case where a failure event happens in multiple monitoring-target devices;
storing configuration information, that includes a coupling relationship between the multiple monitoring-target devices; and
creating, based on the failure event and the configuration information, one or more expanded plans into which expand the general plan by taking into account configurations of the monitoring-target devices.
15. A method for managing a computer system, comprising:
storing a general plan that includes a recovery plan available in a case where a failure event happens in multiple monitoring-target devices;
storing configuration information, that includes a coupling relationship between the multiple monitoring-target devices;
creating, based on the failure event and the configuration information, one or more expanded plans into which expand the general plan by taking into account configurations of the monitoring-target devices; and
displaying the one or more expanded plans and predicted information corresponding to the one or more expanded plans.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/971,187 US20160103727A1 (en) | 2012-09-03 | 2015-12-16 | Management system for managing computer system comprising multiple monitoring-target devices |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2012/072310 WO2014033945A1 (en) | 2012-09-03 | 2012-09-03 | Management system which manages computer system having plurality of devices to be monitored |
US201313991671A | 2013-06-05 | 2013-06-05 | |
US14/971,187 US20160103727A1 (en) | 2012-09-03 | 2015-12-16 | Management system for managing computer system comprising multiple monitoring-target devices |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/072310 Continuation WO2014033945A1 (en) | 2012-09-03 | 2012-09-03 | Management system which manages computer system having plurality of devices to be monitored |
US13/991,671 Continuation US9244800B2 (en) | 2012-09-03 | 2012-09-03 | Management system for managing computer system comprising multiple monitoring-target devices |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160103727A1 true US20160103727A1 (en) | 2016-04-14 |
Family
ID=50182791
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/991,671 Expired - Fee Related US9244800B2 (en) | 2012-09-03 | 2012-09-03 | Management system for managing computer system comprising multiple monitoring-target devices |
US14/971,187 Abandoned US20160103727A1 (en) | 2012-09-03 | 2015-12-16 | Management system for managing computer system comprising multiple monitoring-target devices |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/991,671 Expired - Fee Related US9244800B2 (en) | 2012-09-03 | 2012-09-03 | Management system for managing computer system comprising multiple monitoring-target devices |
Country Status (5)
Country | Link |
---|---|
US (2) | US9244800B2 (en) |
EP (1) | EP2808796A4 (en) |
JP (1) | JP5719974B2 (en) |
CN (1) | CN104272266B (en) |
WO (1) | WO2014033945A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190235941A1 (en) * | 2018-01-31 | 2019-08-01 | Splunk Inc. | Self-monitor for computing devices of a distributed computing system |
Families Citing this family (75)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100050156A1 (en) * | 2008-08-20 | 2010-02-25 | International Business Machines Corporation | Using build history information to optimize a software build process |
JP5454216B2 (en) * | 2010-02-23 | 2014-03-26 | 富士通株式会社 | Electronic device design apparatus, electronic device design program, and electronic device design method |
WO2012127588A1 (en) * | 2011-03-18 | 2012-09-27 | 富士通株式会社 | Incident response support program, incident response support device and incident response support method |
JP5768796B2 (en) * | 2012-10-23 | 2015-08-26 | 日本電気株式会社 | Operation management apparatus, operation management method, and program |
CN104583968B (en) * | 2013-04-05 | 2017-08-04 | 株式会社日立制作所 | Management system and management program |
WO2015063889A1 (en) * | 2013-10-30 | 2015-05-07 | 株式会社日立製作所 | Management system, plan generating method, and plan generating program |
US9052938B1 (en) | 2014-04-15 | 2015-06-09 | Splunk Inc. | Correlation and associated display of virtual machine data and storage performance data |
US11210120B2 (en) * | 2014-06-30 | 2021-12-28 | Vmware, Inc. | Location management in a volume action service |
WO2016013056A1 (en) * | 2014-07-22 | 2016-01-28 | 株式会社日立製作所 | Method for managing computer system |
US10223189B1 (en) * | 2015-06-25 | 2019-03-05 | Amazon Technologies, Inc. | Root cause detection and monitoring for storage systems |
US10282245B1 (en) * | 2015-06-25 | 2019-05-07 | Amazon Technologies, Inc. | Root cause detection and monitoring for storage systems |
US9898357B1 (en) * | 2015-06-25 | 2018-02-20 | Amazon Technologies, Inc. | Root cause detection and monitoring for storage systems |
US10348798B2 (en) * | 2015-08-05 | 2019-07-09 | Facebook, Inc. | Rules engine for connected devices |
US9703624B2 (en) * | 2015-10-12 | 2017-07-11 | Bank Of America Corporation | Event correlation and calculation engine |
US10235227B2 (en) | 2015-10-12 | 2019-03-19 | Bank Of America Corporation | Detection, remediation and inference rule development for multi-layer information technology (“IT”) structures |
WO2017068669A1 (en) * | 2015-10-21 | 2017-04-27 | 株式会社ビジネス・プログレス | Event detection terminal |
US10514978B1 (en) * | 2015-10-23 | 2019-12-24 | Pure Storage, Inc. | Automatic deployment of corrective measures for storage arrays |
US10599509B2 (en) * | 2015-12-21 | 2020-03-24 | Hitachi, Ltd. | Management system and management method for computer system |
US11620336B1 (en) | 2016-09-26 | 2023-04-04 | Splunk Inc. | Managing and storing buckets to a remote shared storage system based on a collective bucket size |
US11269939B1 (en) | 2016-09-26 | 2022-03-08 | Splunk Inc. | Iterative message-based data processing including streaming analytics |
US11586627B2 (en) | 2016-09-26 | 2023-02-21 | Splunk Inc. | Partitioning and reducing records at ingest of a worker node |
US11250056B1 (en) | 2016-09-26 | 2022-02-15 | Splunk Inc. | Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system |
US11294941B1 (en) | 2016-09-26 | 2022-04-05 | Splunk Inc. | Message-based data ingestion to a data intake and query system |
US12013895B2 (en) | 2016-09-26 | 2024-06-18 | Splunk Inc. | Processing data using containerized nodes in a containerized scalable environment |
US11550847B1 (en) | 2016-09-26 | 2023-01-10 | Splunk Inc. | Hashing bucket identifiers to identify search nodes for efficient query execution |
US11442935B2 (en) * | 2016-09-26 | 2022-09-13 | Splunk Inc. | Determining a record generation estimate of a processing task |
US20180089324A1 (en) | 2016-09-26 | 2018-03-29 | Splunk Inc. | Dynamic resource allocation for real-time search |
US11461334B2 (en) | 2016-09-26 | 2022-10-04 | Splunk Inc. | Data conditioning for dataset destination |
US11663227B2 (en) | 2016-09-26 | 2023-05-30 | Splunk Inc. | Generating a subquery for a distinct data intake and query system |
US11860940B1 (en) | 2016-09-26 | 2024-01-02 | Splunk Inc. | Identifying buckets for query execution using a catalog of buckets |
US11562023B1 (en) | 2016-09-26 | 2023-01-24 | Splunk Inc. | Merging buckets in a data intake and query system |
US11232100B2 (en) | 2016-09-26 | 2022-01-25 | Splunk Inc. | Resource allocation for multiple datasets |
US11281706B2 (en) | 2016-09-26 | 2022-03-22 | Splunk Inc. | Multi-layer partition allocation for query execution |
US11567993B1 (en) | 2016-09-26 | 2023-01-31 | Splunk Inc. | Copying buckets from a remote shared storage system to memory associated with a search node for query execution |
US10956415B2 (en) | 2016-09-26 | 2021-03-23 | Splunk Inc. | Generating a subquery for an external data system using a configuration file |
US11416528B2 (en) | 2016-09-26 | 2022-08-16 | Splunk Inc. | Query acceleration data store |
US11593377B2 (en) | 2016-09-26 | 2023-02-28 | Splunk Inc. | Assigning processing tasks in a data intake and query system |
US11604795B2 (en) | 2016-09-26 | 2023-03-14 | Splunk Inc. | Distributing partial results from an external data system between worker nodes |
US11580107B2 (en) | 2016-09-26 | 2023-02-14 | Splunk Inc. | Bucket data distribution for exporting data to worker nodes |
US11321321B2 (en) | 2016-09-26 | 2022-05-03 | Splunk Inc. | Record expansion and reduction based on a processing task in a data intake and query system |
US10353965B2 (en) | 2016-09-26 | 2019-07-16 | Splunk Inc. | Data fabric service system architecture |
US11615104B2 (en) | 2016-09-26 | 2023-03-28 | Splunk Inc. | Subquery generation based on a data ingest estimate of an external data system |
US11874691B1 (en) | 2016-09-26 | 2024-01-16 | Splunk Inc. | Managing efficient query execution including mapping of buckets to search nodes |
US11599541B2 (en) | 2016-09-26 | 2023-03-07 | Splunk Inc. | Determining records generated by a processing task of a query |
US10203988B2 (en) * | 2016-10-13 | 2019-02-12 | International Business Machines Corporation | Adaptive parallelism of task execution on machines with accelerators |
US9785519B1 (en) * | 2017-02-16 | 2017-10-10 | Red Hat Israel, Ltd. | Driver switch for device error recovery for assigned devices |
WO2018185885A1 (en) * | 2017-04-05 | 2018-10-11 | 日本電気株式会社 | Development knowledge generation system, development knowledge generation method, and development knowledge generation program |
US11989194B2 (en) | 2017-07-31 | 2024-05-21 | Splunk Inc. | Addressing memory limits for partition tracking among worker nodes |
US11921672B2 (en) | 2017-07-31 | 2024-03-05 | Splunk Inc. | Query execution at a remote heterogeneous data store of a data fabric service |
US11012317B2 (en) * | 2017-08-18 | 2021-05-18 | Salesforce.Com, Inc. | Visualization for monitoring infrastructure entities |
JP6901683B2 (en) * | 2017-09-22 | 2021-07-14 | 富士通株式会社 | Adjustment program, adjustment device and adjustment method |
US10896182B2 (en) | 2017-09-25 | 2021-01-19 | Splunk Inc. | Multi-partitioning determination for combination operations |
JP6622273B2 (en) * | 2017-10-12 | 2019-12-18 | 株式会社日立製作所 | Resource management apparatus, resource management method, and resource management program |
CN110334813A (en) * | 2018-03-28 | 2019-10-15 | 株式会社日立制作所 | Operation management method and operation management system |
US11334543B1 (en) | 2018-04-30 | 2022-05-17 | Splunk Inc. | Scalable bucket merging for a data intake and query system |
CN109039785B (en) * | 2018-09-27 | 2021-05-04 | 郑州云海信息技术有限公司 | Trap alarm recovery method and device based on SNMP |
US11106528B2 (en) * | 2018-10-10 | 2021-08-31 | EMC IP Holding Company LLC | Datacenter IoT-triggered preemptive measures using machine learning |
US10802944B2 (en) * | 2019-01-23 | 2020-10-13 | Salesforce.Com, Inc. | Dynamically maintaining alarm thresholds for software application performance management |
US11194591B2 (en) | 2019-01-23 | 2021-12-07 | Salesforce.Com, Inc. | Scalable software resource loader |
US11095749B2 (en) * | 2019-01-29 | 2021-08-17 | Walmart Apollo, Llc | Self-service operation for bare-metal servers |
JP7177349B2 (en) * | 2019-02-08 | 2022-11-24 | 富士通株式会社 | Scheduling program, scheduling device and scheduling method |
US10922062B2 (en) | 2019-04-15 | 2021-02-16 | Salesforce.Com, Inc. | Software application optimization |
US10922095B2 (en) | 2019-04-15 | 2021-02-16 | Salesforce.Com, Inc. | Software application performance regression analysis |
WO2020220216A1 (en) | 2019-04-29 | 2020-11-05 | Splunk Inc. | Search time estimate in data intake and query system |
US11715051B1 (en) | 2019-04-30 | 2023-08-01 | Splunk Inc. | Service provider instance recommendations using machine-learned classifications and reconciliation |
US11907743B2 (en) * | 2019-05-21 | 2024-02-20 | Oracle International Corporation | System and method for relocating customer virtual machine instances in a multi-tenant cloud service |
CN110287052B (en) * | 2019-06-25 | 2022-01-28 | 深圳前海微众银行股份有限公司 | Root cause task determination method and device for abnormal task |
US11494380B2 (en) | 2019-10-18 | 2022-11-08 | Splunk Inc. | Management of distributed computing framework components in a data fabric service system |
JP7385436B2 (en) * | 2019-11-12 | 2023-11-22 | 株式会社野村総合研究所 | management system |
US11922222B1 (en) | 2020-01-30 | 2024-03-05 | Splunk Inc. | Generating a modified component for a data intake and query system using an isolated execution environment image |
JP2022066799A (en) | 2020-10-19 | 2022-05-02 | 株式会社日立製作所 | Management apparatus and management method |
US11704313B1 (en) | 2020-10-19 | 2023-07-18 | Splunk Inc. | Parallel branch operation using intermediary nodes |
US11803438B2 (en) * | 2021-07-13 | 2023-10-31 | Adp, Inc. | Message call request data having error frequency metrics |
US12072939B1 (en) | 2021-07-30 | 2024-08-27 | Splunk Inc. | Federated data enrichment objects |
US12093272B1 (en) | 2022-04-29 | 2024-09-17 | Splunk Inc. | Retrieving data identifiers from queue for search of external data system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060156086A1 (en) * | 2004-06-21 | 2006-07-13 | Peter Flynn | System and method for integrating multiple data sources into service-centric computer networking services diagnostic conclusions |
US20120066376A1 (en) * | 2010-09-09 | 2012-03-15 | Hitachi, Ltd. | Management method of computer system and management system |
US8620921B1 (en) * | 2011-01-28 | 2013-12-31 | Netapp, Inc. | Modeler for predicting storage metrics |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05114899A (en) * | 1991-10-22 | 1993-05-07 | Hitachi Ltd | Network fault diagnostic system |
US7107185B1 (en) | 1994-05-25 | 2006-09-12 | Emc Corporation | Apparatus and method for event correlation and problem reporting |
US5680640A (en) | 1995-09-01 | 1997-10-21 | Emc Corporation | System for migrating data by selecting a first or second transfer means based on the status of a data element map initialized to a predetermined state |
US6487677B1 (en) * | 1999-09-30 | 2002-11-26 | Lsi Logic Corporation | Methods and systems for dynamic selection of error recovery procedures in a managed device |
JP3743336B2 (en) * | 2001-09-14 | 2006-02-08 | 日本電気株式会社 | Configuration management device |
US20040025077A1 (en) * | 2002-07-31 | 2004-02-05 | International Business Machines Corporation | Method and apparatus for the dynamic tuning of recovery actions in a server by modifying hints and symptom entries from a remote location |
JP4658451B2 (en) * | 2003-01-29 | 2011-03-23 | 株式会社リコー | Computer management system |
JP4358034B2 (en) * | 2004-05-25 | 2009-11-04 | 富士通株式会社 | Monitoring system |
CN100420202C (en) * | 2005-10-20 | 2008-09-17 | 联想(北京)有限公司 | Computer management system and computer management method |
US20100023798A1 (en) * | 2008-07-25 | 2010-01-28 | Microsoft Corporation | Error recovery and diagnosis for pushdown automata |
JP5237034B2 (en) | 2008-09-30 | 2013-07-17 | 株式会社日立製作所 | Root cause analysis method, device, and program for IT devices that do not acquire event information. |
JP5215895B2 (en) * | 2009-02-04 | 2013-06-19 | 株式会社日立ソリューションズ | Failure cause analysis system and program |
WO2011007394A1 (en) * | 2009-07-16 | 2011-01-20 | 株式会社日立製作所 | Management system for outputting information describing recovery method corresponding to root cause of failure |
US8429455B2 (en) * | 2010-07-16 | 2013-04-23 | Hitachi, Ltd. | Computer system management method and management system |
JP5419819B2 (en) * | 2010-07-16 | 2014-02-19 | 株式会社日立製作所 | Computer system management method and management system |
JP5432867B2 (en) * | 2010-09-09 | 2014-03-05 | 株式会社日立製作所 | Computer system management method and management system |
WO2014030249A1 (en) * | 2012-08-24 | 2014-02-27 | 株式会社日立製作所 | Verification system and verification method for i/o performance of volume |
-
2012
- 2012-09-03 WO PCT/JP2012/072310 patent/WO2014033945A1/en active Application Filing
- 2012-09-03 CN CN201280072987.7A patent/CN104272266B/en active Active
- 2012-09-03 US US13/991,671 patent/US9244800B2/en not_active Expired - Fee Related
- 2012-09-03 JP JP2014532708A patent/JP5719974B2/en active Active
- 2012-09-03 EP EP12883542.8A patent/EP2808796A4/en not_active Withdrawn
-
2015
- 2015-12-16 US US14/971,187 patent/US20160103727A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060156086A1 (en) * | 2004-06-21 | 2006-07-13 | Peter Flynn | System and method for integrating multiple data sources into service-centric computer networking services diagnostic conclusions |
US20120066376A1 (en) * | 2010-09-09 | 2012-03-15 | Hitachi, Ltd. | Management method of computer system and management system |
US8620921B1 (en) * | 2011-01-28 | 2013-12-31 | Netapp, Inc. | Modeler for predicting storage metrics |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190235941A1 (en) * | 2018-01-31 | 2019-08-01 | Splunk Inc. | Self-monitor for computing devices of a distributed computing system |
US10776194B2 (en) * | 2018-01-31 | 2020-09-15 | Splunk Inc. | Self-monitor for computing devices of a distributed computing system |
US11347577B1 (en) | 2018-01-31 | 2022-05-31 | Splunk Inc. | Monitoring features of components of a distributed computing system |
US20220291984A1 (en) * | 2018-01-31 | 2022-09-15 | Splunk Inc. | Monitoring statuses of monitoring modules of a distributed computing system |
US11829236B2 (en) * | 2018-01-31 | 2023-11-28 | Splunk Inc. | Monitoring statuses of monitoring modules of a distributed computing system |
Also Published As
Publication number | Publication date |
---|---|
US9244800B2 (en) | 2016-01-26 |
US20140068343A1 (en) | 2014-03-06 |
EP2808796A1 (en) | 2014-12-03 |
JP5719974B2 (en) | 2015-05-20 |
WO2014033945A1 (en) | 2014-03-06 |
CN104272266A (en) | 2015-01-07 |
EP2808796A4 (en) | 2016-04-13 |
CN104272266B (en) | 2016-11-09 |
JPWO2014033945A1 (en) | 2016-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9244800B2 (en) | Management system for managing computer system comprising multiple monitoring-target devices | |
US8359440B2 (en) | Management server device for managing virtual storage device, and method for managing virtual storage device | |
US9619314B2 (en) | Management system and management program | |
JP5140633B2 (en) | Method for analyzing failure occurring in virtual environment, management server, and program | |
US9146793B2 (en) | Management system and management method | |
JP6190468B2 (en) | Management system, plan generation method, and plan generation program | |
JP6009089B2 (en) | Management system for managing computer system and management method thereof | |
US20150074251A1 (en) | Computer system, resource management method, and management computer | |
US9354914B2 (en) | Storage system | |
US9736046B1 (en) | Path analytics using codebook correlation | |
US9852007B2 (en) | System management method, management computer, and non-transitory computer-readable storage medium | |
US20140173065A1 (en) | Automated configuration planning | |
JP4918668B2 (en) | Virtualization environment operation support system and virtualization environment operation support program | |
CN107864055A (en) | The management method and platform of virtualization system | |
US9130880B2 (en) | Management system and information acquisition method | |
US20160004584A1 (en) | Method and computer system to allocate actual memory area from storage pool to virtual volume | |
JP5993052B2 (en) | Management system for managing a computer system having a plurality of devices to be monitored | |
US11762729B2 (en) | Apparatus and method for anomaly countermeasure decision, execution and evaluation | |
WO2016013056A1 (en) | Method for managing computer system | |
US20170046662A1 (en) | Warrantied component cost optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |