WO2017026017A1 - 管理計算機および計算機システムの管理方法 - Google Patents
管理計算機および計算機システムの管理方法 Download PDFInfo
- Publication number
- WO2017026017A1 WO2017026017A1 PCT/JP2015/072562 JP2015072562W WO2017026017A1 WO 2017026017 A1 WO2017026017 A1 WO 2017026017A1 JP 2015072562 W JP2015072562 W JP 2015072562W WO 2017026017 A1 WO2017026017 A1 WO 2017026017A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- countermeasure procedure
- countermeasure
- evaluation
- plan
- procedure
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0721—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/81—Threshold
Definitions
- the present invention relates to management of a computer system, and relates to a management computer, a management method of a computer system, and related technologies.
- Patent Literature 1 mainly refers to operation data such as a disk operation rate, generates a specific countermeasure based on the countermeasure rule, evaluates the effect, and presents it to the administrator. Thereby, the administrator can easily determine or select a specific countermeasure for solving the problem of the computer system.
- Patent Document 1 described above, there is no processing that refers to and considers the operation policy such as the importance of parts constituting the computer system, such as the importance of virtual servers and logical volumes, and the importance of customers in use. For this reason, the countermeasure recommended in Patent Document 1 may adversely affect more important elements such as important customers.
- the computer system which is one aspect of the invention disclosed in the present application holds information on the operation policy for each part constituting the customer and the computer system, and takes measures based on the operation policy when generating a countermeasure for the problem.
- the impact range of the measures is divided, and countermeasures are generated so that the impact on the higher-order customers is less than or equal to the impact on the lower-order customers. For example, what is necessary is just to implement
- the generated countermeasure may be operated by the administrator, or the management computer may present the candidate for the countermeasure to the administrator, and the management computer may execute it after obtaining the administrator's approval. It may be automatically executed by the management computer based on the approval or the learning result.
- Another aspect of the present invention is a management computer that includes a processor, an input device, an output device, and a storage device, and manages a plurality of computer systems.
- the management computer includes a countermeasure procedure plan generation module that generates a countermeasure procedure plan for changing the states of parts of a plurality of computer systems.
- This countermeasure procedure plan generation module is in accordance with the constraint that, among a plurality of computer systems or parts thereof, the influence on the upper rank computer system or its parts is smaller than the influence on the lower rank computer system or its parts. Generate a proposed countermeasure procedure.
- Another aspect of the present invention is a computer system management method in which a management computer having a processor, an input device, an output device, and a storage device manages a plurality of computer systems.
- the management computer when the management computer generates a countermeasure procedure plan for changing the state of a part of a plurality of computer systems, the management computer has an effect on the computer system of the higher rank or the part of the parts. Then, a countermeasure procedure plan is generated in accordance with a constraint condition that it is smaller than the influence on the lower rank computer system or its components.
- the components of the computer system are, for example, a tenant, a server, a virtual computer, a storage volume, an IO processing unit, and the like, and their granularity and classification are arbitrary.
- the constraint condition is created automatically or manually by a person based on the operation policy of the computer system. In some cases, the constraint condition may be the operation policy itself. Further, the definition and granularity of ranking of the computer system or its components may be arbitrary.
- the management computer can present a countermeasure with high importance, for example, a countermeasure having a small influence on a higher-order customer, among countermeasures that can solve the problem. Problems, configurations, and effects other than those described above will become apparent from the description of the following embodiments.
- FIG. 2 is a block diagram illustrating a hardware configuration example of a computer system 2 of the embodiment of FIG.
- FIG. 2 is a block diagram showing a hardware configuration example of a computer system 2 of the embodiment of FIG.
- FIG. 2 is a block diagram mainly showing functions of a management server 201 in a hardware configuration example of a computer system 2 in the embodiment of FIG. 1.
- FIG. 6 is a table showing an example of a connection relationship correspondence table 400 that forms part of the system configuration information 234;
- the table figure which shows an example of the server rank table 500 which makes a part of operation policy information 233.
- FIG. 5 is a table showing an example of a procedure of a problem solving process 900 of the management server 201.
- the conceptual diagram which shows the example of the production
- the flowchart which shows the example of a procedure of the production
- FIG. 5 is a table showing an example of a constraint condition pattern table 1300; The table figure which shows an example of the evaluation result table 1400 of a countermeasure procedure plan.
- summary of a cut-off process when the evaluation result of a countermeasure procedure plan is as having illustrated in FIG. Explanatory drawing which shows the numerical formula example used by calculation processing S1503 of the comprehensive evaluation value of FIG.
- FIG. FIG. 6 is a table showing an example of a pattern table 2000.
- the conceptual diagram which illustrates the mode of a change of the value of the execution performance 2005 at the time of performing a memory
- aaa table such as “aaa list”, “aaaDB (Database)”, “aaa queue” (aaa is an arbitrary character string). May not necessarily be expressed in a data structure other than a table, list, DB, queue, or the like. Therefore, “aaa table”, “aaa list”, “aaaDB”, “aaa queue”, etc. may be referred to as “aaa information” to indicate that they are not dependent on the data structure.
- program may be used as the subject. However, since the program performs processing determined by being executed by the processor using a memory and a communication port (communication control device), the processor The subject may be an explanation. Further, the processing disclosed with the program as the subject may be processing performed by a computer such as a management server or an information processing apparatus. Further, part or all of the program may be realized by dedicated hardware.
- the program distribution server includes a processor and a storage resource, and the storage resource further stores a distribution program and a program to be distributed.
- the processor executes the distribution program
- the processor of the program distribution server distributes the distribution target program to other computers.
- the computer has input / output devices.
- input / output devices include a display, a keyboard, and a pointer device, but other devices may be used.
- a serial interface or an Ethernet interface is used as the input / output device, a display computer having a display or keyboard or pointer device is connected to the interface, and the display information is transmitted to the display computer.
- the display computer may perform the display, or the input may be replaced by the input / output device by receiving the input.
- a set of one or more computers that manage the information processing system and display the display information of this embodiment may be referred to as a management system.
- a management computer hereinafter, management computer
- the management computer displays display information
- the management computer is a management system
- a combination of a management computer and a display computer is also a management system.
- a plurality of computers may realize processing equivalent to that of the management computer.
- the plurality of computers if the display computer performs the display, display (Including computers) is the management system.
- the countermeasures mentioned here are, for example, information including details of specific operations such as migrating the virtual machine with ID 00_1 to the host machine with ID02, and restricting disk access of the virtual machine with ID00_1 to 1000 IOPS. Point to. Hereinafter, it is expressed as a countermeasure, a countermeasure plan, an action plan, etc. Also, qualitative information that does not include the details of specific operations, such as migrating a virtual machine from one host machine to another, or limiting the number of disk accesses to the virtual machine, It is called a rule, or simply a rule.
- FIG. 1 is a diagram for explaining an outline of a problem solving process flow in the computer system of this embodiment.
- the outline of the system of the present embodiment will be described using a system to which the present embodiment is not applied as a comparative example.
- the computer system 1 shows a computer system of a comparative example to which this embodiment is not applied.
- the computer system 1 includes a server 203 that is a management target, a storage 204, a network device 205, and a management server 201 that manages these management target device groups.
- the operation policy 233 which is a specified value of the importance and performance of a tenant system configured by an application operating on the management target device or an application group operating on the management target device, is an Excel that exists outside the management server 201.
- Tenants that use the system are weighted as super-tenant 11, important tenant 12, and normal tenant 13.
- the management server 201 detects a problem (# 1) that has occurred in the important tenant 12 by the monitoring function 2011 (# 2), and performs cause analysis by the cause analysis function 2012 (# 3).
- the countermeasure procedure draft creation function 2013 generates a countermeasure procedure draft that solves the problem based on the countermeasure procedure rules 231 and the operation data 232 in the auxiliary storage device 213 (# 4), and executes the generated countermeasure procedure as an execution base function Registration is executed by 2014 (# 5).
- the server 203 received from the management server 201 (# 6) migrates the virtual machine (exemplified as VM in the figure) running on the server 203 to another server device 203 (# 7). ). As a result, even if the problem occurring in the important tenant 12 can be solved, the super important tenant 11 may be adversely affected (# 8).
- the countermeasure procedure proposal indicates a problem solving procedure proposal such as migrating VM_1 from the server apparatus_1 to the server apparatus_2.
- coping procedure plan generation process for example, various procedures such as migrating VM_3 from the server apparatus_1 to the server apparatus_3 and limiting the upper limit of requests of the tenant system A from 100 requests / second to 50 requests / second are performed.
- This is a process of generating a plan, estimating effects and impacts, and assigning priorities.
- the VM used by the important tenant 12 is migrated to the server where the VM used by the super important tenant 11 is located. is there.
- the computer system 2 exemplifies the outline of the computer system in this embodiment.
- a countermeasure procedure plan is generated in consideration of the operation policy, and priority is given to an important tenant.
- the computer system 2 stores the operation policy 233 that existed outside the management server 201 in the computer system 1 on the management server 201 and does not include the external file 208. 1 is the same system configuration.
- the process flow is the same, but differs from the computer system 1 in that the operation policy 233 is referred to in the process of generating the countermeasure procedure.
- the super important tenant 11 is not adversely affected, and the range of the adverse effect can be limited to the normal tenant 13.
- this embodiment has the effect of using the operation policy as a constraint condition in the coping procedure draft generation process and preferentially treating the higher rank.
- the system configuration illustrated in FIG. 1 omits some of the details of the system configuration described in FIG.
- FIG. 2A is a block diagram showing a hardware configuration example of the computer system 2 of the embodiment of FIG.
- the management server 201 includes a processor 211, a main storage device 212, an auxiliary storage device 213, an input device 214, an output device 205, and a network I / F 216.
- the processor 211, the main storage device 212, the auxiliary storage device 213, the input device 214, the output device 205, and the network I / F 216 are connected to the bus 217.
- the processor 211 executes the problem solving process 220.
- the problem solving process 220 is software (program) stored in the main storage device 212 such as a semiconductor memory, for example, and executes a desired function by using hardware resources of the management server 201 such as the processor 211. Note that the processing by the problem solving processing 220 may be realized by hardware such as an integrated circuit instead of being executed by the processor 211.
- the auxiliary storage device 213 such as a magnetic disk device stores the handling procedure rule 231, the operation data 232, the operation policy 233, and the system configuration information 234 as data.
- the auxiliary storage device 213, the handling procedure rule 231, the operation data 232, the operation policy 233, and the system configuration information 234 may be stored in different storage devices.
- the handling procedure rule 231 is, for example, when an excess of the CPU usage rate of a specific server device is detected, an arbitrary virtual machine running on the server device is migrated to another arbitrary server device However, it occurred in a computer system, such as limiting the amount of I / O to the logical volume existing on the disk when it detected that the threshold of the operation rate of the storage disk that constitutes the volume pool on the storage device was exceeded. It is a group of processing methods for generating a procedure for solving a problem.
- the countermeasure procedure rule 231 may include at least one processing method.
- the operation data 232 refers to operation information such as the resource usage rate of the computer system and the number of received requests for a certain period, such as the CPU usage rate information of the server device 203 for the past month.
- the operation policy 233 includes at least one of “importance” and “performance target value”.
- the importance is an importance as exemplified by gold, silver, and copper. Any information can be used as long as gold is more important than silver and silver is more important than copper.
- the performance target value is, for example, a response time of 100 milliseconds or less or a throughput of 100 requests / second.
- the system configuration information 234 is information for specifying the connection relationship between the management target device groups such as the server 203, the storage 204, and the network device 205, and the connection relationship between the management target tenant system and the management target device group.
- the auxiliary storage device 213 may be an external storage device connected to the management server 201 via an I / F (not shown) or a network I / F 216, for example, the storage device 204. Further, the main storage device 212 and the auxiliary storage device 203 may be the same device.
- the input device 214 is a device for inputting data by an operation of an administrator such as a keyboard.
- the output device 215 is a device that displays the execution result of the processor 211, such as a printer or a monitor.
- the input device 214 and the output device 215 may be an integrated device.
- the operation terminal 202 may be connected to the computer system 201.
- the operation terminal 202 is a computer that operates the management computer 201.
- the operation terminal 202 includes an input device 241 and an output device 242.
- the input device 241 is a device that inputs data by an operation of an administrator. Input data is transmitted to the management server 201 via the network 206.
- the output device 242 is a device that displays data from the management server 201.
- the input device 241 and the output device 242 may be an integrated device.
- the computer system 2 includes a management server 201, an operation terminal 202, a server device 203, a storage device 204, and a network device 205.
- the network device 205 relays data among the management server 201, the operation terminal 202, the server device 203, and the storage device 204.
- FIG. 2B is a block diagram showing a hardware configuration example of the computer system 2 of the embodiment of FIG. 1 centering on a management target device group that is a management target of the management server 201.
- the management target device group is a system in which a server device 203, a storage device 204, and a network device 205 are connected to each other via a network 206 or a SAN (Storage Area Network).
- SAN Storage Area Network
- the server device 203 includes a processor 261, a main memory 262, a network I / F 263, an auxiliary storage device 264, and an HBA (Host Bus Adapter) 365.
- a processor 261 a main memory 262
- a network I / F 263 a network I / F 263
- an auxiliary storage device 264 a network I / F 263
- an HBA Hypervisor Adapter
- the auxiliary storage device 264 may be a network I / F 263, an HBA 265, or an external storage device connected via an I / F of an external device (not shown).
- the server device 203 may be a virtual machine.
- the server device 203 is a monitoring target device of the management server 201.
- the server device 203 executes software and virtual machines that constitute the tenant system.
- the network I / F 263 is connected to another network I / F 252 and an IP (Internet Protocol) switch 205A, which is an example of the network device 205, via the network 206.
- the HBA 265 is connected to a port of an FC (Fiber Channel) switch that is an example of the network device 205.
- FC Fiber Channel
- the storage device 204 is a management target device of the management server 201 and provides storage capacity used by the server 203 or software operating on the management server 201.
- the storage apparatus 204 includes an IO processing unit 251, a network I / F 252, an IO port 253, a DISK 254, and an IO port 255.
- the DISK 254 may form a RAID group 256 with a plurality of DISKs 254.
- the RAID group 256 may constitute a volume pool 257 from a single or a plurality of RAID groups 256.
- the data of the auxiliary storage device 264 may be stored in the logical volume 258.
- the logical volume 258 only needs to exist in any one of the volume pool 257, the RAID group 256, or the DISK 254.
- the network I / F 252 is an interface for connecting to a network 206 such as a LAN (Local Area Network) by Ethernet (registered trademark), for example.
- the IO port 253 and the IO port 255 are interfaces connected to a SAN (Storage Area Network) such as a fiber channel.
- the storage apparatus 204 may manage a logical volume 259 that exists in an external storage apparatus 209 connected via the IO port 255.
- the network device 205 exemplified here includes an IP switch 205A and an FC switch 205B.
- the IP switch 205A includes a network I / F 216 of the management server 201, a network I / F 263 of the server device 203, a network I / F 252 of the storage device 204, a network IF (not shown) of the FC switch 205B, and other IP switches 205B. It is connected to a network I / F (not shown).
- the FC switch 205B transfers data between the server apparatus 203 and the storage apparatus 204.
- the FC switch 205B has a plurality of ports 271.
- the port 271 of the FC switch 205B is connected to the HBA 265 of the server apparatus 203 and the IO port 253 of the storage apparatus 204.
- the network device 205 may be a management target device of the management server 201.
- FIG. 2C is a functional block diagram for explaining a functional configuration example of the management server 201 in the hardware configuration example of the computer system 2 of the embodiment of FIG.
- the processor 211 of the management server 201 realizes various functions under the control of the problem solving processing program 220 in the main memory 220.
- modules corresponding to functions are defined in the problem solving processing program 220, but these modules do not need to be physically separated. Also, these modules need not correspond to independent programs or subroutines.
- the problem solving processing program 220 has a countermeasure procedure plan generating module 2201.
- the countermeasure procedure plan generation module 2201 includes a candidate acquisition module 2202 and a filtering module 2203.
- the problem solution processing program 220 further includes a countermeasure procedure plan evaluation module 2204, a countermeasure procedure plan priority ranking module 2205, a countermeasure procedure plan presentation module 2206, a selection module 2207, and a countermeasure procedure plan execution module 2208. Any one of these modules may be omitted, or another module may be added.
- the whole processing example by the problem solving processing program 220 will be described later with reference to FIG.
- the function realized by the countermeasure procedure plan generation module 2201 corresponds to the processing S903 in FIG. 9, and details will be described later with reference to FIG.
- the function realized by the candidate acquisition module 2202 corresponds to the processing S1103 in FIG. 11, and acquires a list of operation target candidates for problem solving.
- the function realized by the filtering module 2203 corresponds to step S1104 in FIG.
- the function realized by the countermeasure procedure plan evaluation module 2204 corresponds to the processing S904 in FIG.
- the function realized by the countermeasure procedure plan prioritization module 2205 corresponds to the process S905 of FIG. 9, and details will be described later with reference to FIG.
- the function realized by the countermeasure procedure plan presenting module 2206 corresponds to step S906 in FIG.
- the function realized by the selection module 2207 corresponds to step S907 in FIG.
- the function realized by the countermeasure procedure plan execution module corresponds to step S908 in FIG.
- the main memory 212 or the auxiliary storage device 213 holds a constraint condition 2131 reflecting the operation policy 233.
- the restriction condition 2131 may be partially or entirely the same as the operation policy 233, but a more specific rule may be prepared based on the operation policy 233.
- the constraint condition 2131 may be created automatically from the operation policy 233 based on the program by the management server 201 itself, or may be created separately by the administrator and input from outside the management server 201. This process corresponds to the processes S1101 to S1102 of FIG. Examples of constraint conditions will be described later with reference to FIGS.
- the above configuration may be configured by a single computer, or may be configured by another computer in which any part of the input device, output device, processing device, and storage device is connected via a network.
- functions equivalent to those configured by software can be realized by hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit).
- FIG. 3 is a block diagram showing an example of a tenant system configured on the computer system 2 of FIG.
- the tenant A includes a server device 203 named HV1 and virtual machines VM_A1 to A4 existing on the server device 203 named HV2.
- the server devices 203 HV1 and HV2 have a plurality (two as an example in the figure) of CPUs 201 and HBAs 265.
- the storage device 204 ST1 has a plurality (two in the figure as an example) of IO processing units 251 and a plurality (three in the figure as an example) of the volume pool 257.
- the virtual machines constituting the tenant A are VM_A1, VM_A2, VM_A3, and VM_A4.
- the virtual machine VM_A1 is processed by a processor 201 named CPU1 of HV1 and connected to a storage apparatus 204 named ST1 via an HBA265 named HBA1.
- the auxiliary storage device 264 of VM_A1 is a logical volume 258 named Vol_A1 that is processed by the IO processing unit 251 named unit 1 and exists on the volume pool 257 named pool 1.
- Vol_A1 a logical volume 258 named Vol_A1 that is processed by the IO processing unit 251 named unit 1 and exists on the volume pool 257 named pool 1.
- VM_A2, VM_A3, and VM_A4 the connection relationship as illustrated in FIG. 3 is illustrated. In FIG. 3, the connection relationships of other components are omitted for simplicity of explanation.
- FIG. 4 is an explanatory diagram showing an example of the connection relationship correspondence table 400 included in the system configuration information 234.
- the system configuration information 234 may include, for example, information (not shown) such as CPU processing specification information.
- Correspondence correspondence table 400 is information for associating tenant systems with system components, and is information prepared in advance by a manual or some program.
- the connection relationship correspondence table 400 includes a tenant name field 401, a server name field 402, a host name field 403, a CPU name field 404, an HBA name field 405, a storage name field 406, and an IO processing unit name field 407. And a pool name field 408 and a logical volume name field 409.
- the connection relationship correspondence table 400 may not include some of these fields, may include other fields (not illustrated), or may be stored in a plurality of tables. .
- the tenant name field 401 is an area for storing a tenant name.
- the tenant name is identification information that uniquely identifies the tenant.
- the server name field 402 is an area for storing a server name of a server constituting the tenant.
- the server name is identification information that uniquely identifies the server.
- the server may be a physical server or a virtual machine.
- Each of the following fields 403 to 409 is identifier information for uniquely identifying a component having a connection relationship.
- Operation policy information may be managed at a fine granularity such as for each server or logical volume, or may be managed at a coarse granularity such as for each tenant or application, but in the following example, for each server or logical volume An example of managing the operation policy is shown below.
- FIG. 5 is an explanatory diagram showing an example of the server rank table 500 that forms part of the operation policy information 233.
- the server rank table 500 is information that associates the server 203 with the importance of the server (shown as rank in the figure), and is information prepared in advance by a manual or some program.
- the server rank table 500 has a server name field 501 and a rank field 502.
- the server rank table 500 may have fields (not shown) other than these fields.
- the rank of each virtual machine is maintained such that the rank of VM_A1 is gold and the rank of VM_A2 is silver.
- FIG. 6 is an explanatory diagram showing an example of the volume rank table 600 that forms part of the operation policy information 233.
- the volume rank table 600 is information that associates the logical volume 258 with the importance of the logical volume (indicated by rank in the figure), and is information prepared in advance by a manual or some program.
- the volume rank table 600 has a volume name field 601 and a rank field 602.
- the volume rank table 600 may have fields (not shown) other than these.
- FIG. 7 is an explanatory diagram showing an example of the server rank detail table 700 that forms part of the operation policy information 233.
- the server rank detail table 700 is information for storing the importance level of the rank given to the server 203 and the target value of the service level provided in each rank, and is information prepared in advance by a manual or some program.
- the server rank detail table 700 has an importance field 701, a rank field 702, a response time field 703, and an RTO field 704.
- the server rank detail table 700 may not have some of these fields, or may have other fields (not shown).
- the importance field 701 is a field indicating the priority of the rank, and the rank field 702 is an identifier for uniquely specifying a specific rank.
- FIG. 7 shows that the Platinum rank is the most important, followed by the gold rank, and then the silver rank. There may be a plurality of ranks 702 having the same importance 701.
- the response time field 703 stores a response time target value.
- the aim is to provide a service level such that an average response time of a request to a Platinum rank VM is within 20 milliseconds.
- the management server 201 or the computer system administrator determines that there is no problem if the average response time is within 20 milliseconds if the server is a Platinum rank server. If it exceeds 2 seconds, it can be determined that a problem has occurred in the service level.
- the RTO field 704 is a field for storing the recovery target time. For example, in the case of the Platinum rank, the RTO is 5 minutes. Therefore, if a problem occurs that the average response time exceeds 20 milliseconds on the server of the Platinum rank, the problem will occur within 5 minutes after the problem occurs. It can be seen that this is an operational policy that aims to be solved.
- FIG. 8 is an explanatory diagram showing an example of the volume rank detail table 800 that forms part of the operation policy information 233.
- the volume rank detailed table 800 is information for storing the importance level of the rank given to the logical volume 258 and the target value of the service level provided in each rank, and is information prepared in advance by a manual or some program.
- the volume rank detail table 800 has an importance field 801, a rank field 802, a response time field 803, and an IOPS field 804.
- the volume rank detail table 800 may not have some of these fields, or may have other fields (not shown).
- the problem solving process is a process executed by causing the processor 211 to execute the problem solving process program 220 stored in the management computer 201.
- FIG. 9 is a flowchart showing a procedure example of the problem solving process 900 of the management server 201. First, a description will be given of a trigger when this flowchart is performed.
- the problem solving process according to this flowchart may be executed by an instruction from the administrator input from the input device 214 of the management computer 201. Further, the management server 201 may be periodically executed, for example, every 5 minutes. Further, the notification may be executed when the management server 201 receives the notification of the occurrence of the problem transmitted from the computer system that is the management target device of the management server 201 via the network I / F 216.
- the management server 201 performs problem detection processing (step S901), cause location identification processing (step S902), countermeasure procedure plan generation processing (step S903), countermeasure procedure plan evaluation processing (step S904), A countermeasure procedure proposal prioritization process (step S905), a countermeasure procedure proposal presentation process (step S906), an administrator selection (step S907), and a countermeasure procedure draft execution process (step S908) are executed.
- the problem solving process flow 900 may include other processing steps (not shown), and some of these processing steps may not exist.
- the management server 201 detects a problem occurring in the computer system. For example, the collected resource usage rate is compared with a threshold value of the resource usage rate, and when the resource usage rate exceeds the threshold value, it is detected that a problem has occurred. In addition, for example, the collected system log text is analyzed, and when a specific character string such as “Error” or “Warning” is included, it is detected.
- step S902 for example, when the response time of the tenant A exceeds the threshold and deteriorates, the tenant A is referred to the connection relation correspondence table 400 illustrated in FIG. Check the operating status of the computer system components VM_A1 and VM_A2 that are being used, and the cause of the disk 254 of the storage device 204 named ST1 is high, so the response time of the logical volume becomes a bottleneck. It is processing such as detecting that it has become.
- steps S901 and S902 are not necessarily performed if there is an alternative means, for example, the administrator manually identifies the cause location. It does not have to be executed.
- a countermeasure procedure draft that solves the problem of the cause identified in step S902 is created.
- the upper limit of the IO to the VOL_A4 is limited to 50 IOPS
- the upper limit of the IO to the VOL_A4 is limited from 50 IOPS to 30 IOPS
- This is a process for creating a procedure plan such as newly constructing a logical volume for replication and distributing the load of the load read request.
- the operation policy 233 is referred to, and processing is performed such that the adverse effect on the higher rank server and logical volume is smaller than the lower rank.
- the countermeasure procedure plan evaluation process is a process for simulating and evaluating the effect of one or more countermeasure procedure plans generated in step S903. For example, influences and effects are calculated for each rank, and a plurality of types of procedure proposals are evaluated based on the same standard. In order to evaluate the proposed procedure from various perspectives, in addition to the impact, the effect, estimated execution time, and cost (for example, the amount of investment required when additional hardware is required) may be evaluated. . In the countermeasure procedure draft evaluation process (step S904), for example, it may be executed as an internal process of the countermeasure procedure draft generation process (step S903), or may be replaced by receiving a value manually calculated by the administrator. It may be done.
- the countermeasure procedure proposal generated in step S903 is cut off or rearranged based on the evaluation result evaluated in step S904. For example, in all the items evaluated in step S904, when the measure procedure plan 1 is lower than the measure procedure plan 2, the measure procedure plan 1 is deleted from candidates to be cut off and presented to the administrator. Alternatively, it is deleted from candidates for automatic execution. Then, when the evaluation is made with a plurality of items, the comprehensive evaluation result of the countermeasure procedure plan is calculated based on the uniform standard, and the priority is given in the order of good evaluation result. Details of the prioritization process (step S905) of the countermeasure procedure plan will be described with reference to FIG.
- step S906 the countermeasure procedure proposal is presented to the administrator of the computer system according to the rank calculated in step S905 via the output device 215 of the management server 201 or the output device 242 of the operation terminal 202. It is processing. Step S906 does not necessarily have to be executed, for example, when there is a preset setting that the countermeasure procedure plan with the highest overall evaluation of the countermeasure procedure plan calculated in step S905 may be automatically executed.
- the administrator selection process is a process of receiving a countermeasure procedure plan selected by the computer system administrator via the input device 214 of the management server 201 or the input device 241 of the operation terminal 202.
- information for changing the weight of the comprehensive evaluation in step S905 may be received.
- the item of the influence on the gold rank is information such that the parameter is changed so as to work negatively with respect to the overall evaluation.
- step S907 information for changing the constraint condition may be received.
- information for changing the constraint condition is information that excludes a constraint condition in which the adverse effect on SLO exceeds 60% even with a copper rank.
- information that changes the constraint condition is received, it is preferable that there is a branch of processing that executes step S903 again.
- step S907 there may be a branch of processing that is executed again from step S901 when information from the administrator cannot be received for a certain period or longer.
- the problem may be solved naturally after 10 minutes or more, or the problem may be worsened.
- This is a branch for proposing an optimal countermeasure according to such a change in state.
- FIG. 9 shows branches returning from step S907 to step S901, step S903, and step S905, but some of these branches may not exist, and include a branch (not shown). Also good. Further, for example, it may be determined that the countermeasure procedure plan having the highest comprehensive evaluation value is automatically selected by the administrator by a pre-setting such that the countermeasure procedure plan having the highest comprehensive evaluation value may be automatically executed.
- the countermeasure procedure plan execution process is a process for executing or registering the execution of the countermeasure procedure plan selected in step S907. For example, when a coping procedure for migrating a virtual machine is selected in step S907, execution registration of the process of migrating to the host machine is performed.
- the countermeasure procedure plan execution process (step S908) is not necessarily executed, for example, when the management server 201 does not have a function of executing the countermeasure procedure and the administrator manually operates the management target device group. It is not necessary.
- the countermeasure procedure plan selected by the administrator may be stored as an execution result. Details of the processing in the case where the execution result is stored in step S908 will be described with reference to FIG.
- FIG. 10 is an explanatory diagram showing an outline of a procedure example of the countermeasure procedure draft generation process (step S903 in FIG. 9).
- the management server 201 generates a constraint condition pattern 1001 based on the operation policy information 233, and generates a countermeasure procedure plan according to the constraint condition.
- the constraint pattern 1001 may be created by an operator based on the operation policy information 233 and input to the management server 201.
- the influence range is classified. For example, the influence range is classified for each rank of gold, silver, and copper.
- the degree of impact is also classified. For example, if the impact on the performance is 10% from the range where the SLO can be satisfied, the impact is “small”, if the SLO is violated 10% to 30%, the impact is “medium”, and the SLO is violated more than 30%. The case is classified as “Large”. “-” Means that the violating effect is not allowed.
- a pattern 1001 is generated with a constraint that the influence on the upper rank is less than or equal to the lower rank. For example, gold has no influence, silver has a small influence, copper has a small influence, and gold, silver, or copper has a small influence. For example, a pattern in which the influence on gold is small and silver and copper have no influence is excluded.
- the candidates for the operation target are filtered or the upper limit of the operation is set according to the constraint condition pattern 1001.
- the constraint condition pattern 1001. For example, when the upper limit of IO is set for a virtual machine running on the server apparatus 203 as a countermeasure against the problem that the network I / F 263 of the server apparatus 203 is a bottleneck, the problem is considered as the operation target candidate 1002.
- a list of virtual machines running on the generated server apparatus 203 is acquired.
- step S903 is to identify the candidate 1002 to be investigated by using the generated one or more constraint condition patterns 1001 and generate a countermeasure procedure draft.
- FIG. 11 is a flowchart illustrating a procedure example of the countermeasure procedure plan generation process (step S903) illustrated in FIG.
- the management server 201 performs an impact classification process (step S1101), a constraint pattern generation process (step S1102), an operation target candidate acquisition process (step S1103), and an operation target candidate filter process (step S1103).
- Step S1104), an operation upper limit setting process (Step S1105), and a countermeasure procedure plan generation process (S1106) are executed.
- the plan procedure proposal generation processing flow 1100 may include processing steps (not shown) other than these, and the order of some processes may be different.
- the management server 201 classifies the impact range based on the operation policy 233. For example, the influence range is classified for each rank of gold, silver, and copper. In addition, the degree of impact is also classified. For example, “S1” is a category that does not affect the performance, “S2” is a range where the impact on the performance is 10% off from the range where the SLO can be satisfied, and “S3” is a range where the SLO is 10% to 20% off. The range in which the SLO is violated more than 20% but can be used is classified as “S4”, and the range in which the SLO cannot be used is classified as “S5”. Moreover, it defines so that an evaluation value may be evaluated highly in the order of small influence. An example in which the degree of influence is divided is shown in FIG.
- FIG. 12 is an explanatory diagram showing an example of the influence degree classification table 1200 generated in the influence classification process (S1101) of FIG.
- the influence degree division table 1200A includes a division field 1201, a service quality field 1202, and an evaluation value field 1203.
- the partitioned field 1201 uniquely identifies the partitioned performance.
- the service quality field 1202 indicates the range of performance in the category field 1201.
- the evaluation value field 1203 stores an evaluation value given to the countermeasure procedure proposal when the effect or influence of the countermeasure procedure proposal corresponds to the classification field 1201.
- the influence degree division table 1200A may not include some of these fields, or may include fields (not shown).
- the influence degree classification table 1200 may be stored in the main memory 212, or may be stored in the auxiliary storage device 213 as a part of the operation policy information 233, for example.
- the impact degree classification table 1200B shows another example of the table.
- the quality of service field 1202 may be defined regardless of the SLO, such as when the SLO is not defined. For example, when the degree of influence on the resource usage rate is classified, such as the usage rate of the IO processing unit of the storage apparatus, the classification may be performed based on a threshold value of the resource usage rate. Further, the number of categories and the range for each category may be set manually by the administrator, or may be generated by the management server 201 calculating the number and range of categories by some processing.
- the management server 201 generates a constraint condition pattern in which the influence on the upper rank is less than or equal to the influence of the lower rank. For example, when the influence is classified as shown in FIG. 12, gold has no influence S1, silver has a small influence S2, copper has a slight influence S3, and gold, silver, and copper have little influence.
- the pattern is S2. For example, a pattern in which the influence on gold is S3 and silver and copper have no influence is excluded. An example of the pattern to be generated is shown in FIG.
- FIG. 13 is an explanatory diagram showing an example of a constraint condition pattern table 1300 generated in the constraint pattern generation process (S1102) of FIG.
- the constraint pattern table 1300 includes a gold field 1301, a silver field 1302, and a copper field 1303. These fields may be generated based on the rank defined in the operation policy 233.
- S1 indicating that there is no influence is indicated by a thin character so that it is easy to visually recognize that the influence range is close to the lower rank (copper rank side).
- step S1101 and step S1102 may utilize the result performed in advance. Since the operation policy is not frequently changed, for example, step S1101 and step S1102 are executed at the timing when the operation policy is first defined or when the operation policy is changed.
- the partition table 1200 and the constraint condition pattern table 1300 may be held.
- the constraint pattern table 1300 may be created with a large granularity such as a computer system or a tenant, or may be created with the granularity of virtual machines or storages that are their components, as shown in FIGS. Good.
- the constraint condition pattern table 1300 may be stored in the main memory 212, or may be stored in the auxiliary storage device 213 as a part of the operation policy information 233, for example.
- the management server 201 acquires a list of operation target candidates and also acquires operation target rank information.
- a connection correspondence table in FIG. 4 is used.
- a case where an upper limit of IO is set for a virtual machine running on the server apparatus 203 will be described as an example.
- server rank information is acquired from the operation policy 233.
- VM_A1 and VM_A1 are acquired as candidates for operation, and then VM_A1 is a gold rank from the server rank table 500 of FIG. It acquires that VM_A2 is a silver rank.
- the process of filtering operation target candidates is a process of filtering operation target candidates according to a constraint pattern. For example, when filtering based on the constraint pattern shown in the first row of the constraint condition pattern table 1300 shown in FIG. 13, the gold rank and the silver rank are not affected. The rank is excluded from the operation target. For example, when filtering based on the constraint pattern shown in the second row of the constraint pattern table 1300 shown in FIG. 13, the gold rank is not affected, the silver rank is S2, and the copper rank is S3. Therefore, the gold rank is excluded from the operation target.
- the upper limit of the operation is set based on the constraint condition. For example, when the countermeasure procedure plan for setting the upper limit of the IO of the virtual machine is set based on the second line of the constraint condition pattern table 1300 shown in FIG. 13, the influence on the silver rank is S2, The upper limit of IO is set to a value 10% lower than the SLO for the silver rank virtual machine and the influence on the copper rank is S3. Therefore, for the virtual machine of the copper rank, the upper limit is 20 from the SLO. Set the upper limit of IO to a lower value.
- a countermeasure procedure plan that migrates a virtual machine to an external host machine until the bottleneck of the host machine is eliminated by the constraint condition in the second row of the constraint condition pattern table 1300 shown in FIG. 13 is generated.
- a restriction condition is given such that the frequency selected for migration is gold: silver: copper 0: 1: 2.
- search for migration so that both the silver rank and the copper rank are candidates for migration, and twice, only the copper rank is a candidate for migration. If you do it, you can.
- the countermeasure procedure plan generation process (step S1106) is a process of generating a countermeasure procedure plan according to the list of operation target candidates generated in step S1104 and the upper limit generated in step S1105.
- the countermeasure procedure plan itself may be generated using a known technique.
- Steps S1104, S1105, and S1106 may be repeated for all the patterns generated in step S1102, or only one or more of the patterns generated in step S1102 may be used. May be executed.
- FIG. 14 is an explanatory diagram showing an example of a countermeasure procedure plan evaluation result table 1400 generated by the countermeasure procedure draft evaluation process (S904) of FIG.
- the countermeasure procedure plan evaluation result table 1400 includes a countermeasure procedure plan ID field 1401, an influence field 1402, an effect field 1403, an execution result field 1404, and a cost field 1405. Some of these fields may not exist in the countermeasure result evaluation result table 1400, or other fields (not shown) may be included.
- the countermeasure procedure plan ID field 1401 stores an identifier for uniquely identifying a countermeasure procedure plan.
- the influence field 1402 stores the evaluation result of the influence of the simulated countermeasure procedure proposal. As illustrated in FIG. 14, the influence field 1402 may be divided and evaluated for each rank, or may not be subdivided.
- the effect field 1403 stores the evaluation result of the effect of the simulated countermeasure procedure plan. The effect field 1403 may be subdivided and evaluated for each rank as illustrated in FIG. 14, or may not be subdivided.
- the execution result field 1404 stores an evaluation value of the execution result of the countermeasure procedure plan.
- the cost field 1405 is, for example, an amount for purchasing hardware in order to add hardware, a contract amount required for a virtual machine instance newly constructed to deal with scale-out, Stores the evaluation value of the amount necessary to execute the proposed procedure.
- FIG. 14 shows that the larger the evaluation value of any item, the better.
- the evaluation result table 1400 may be created with a large granularity such as a computer system or a tenant, or may be created with the granularity of a virtual computer or a storage as those components as shown in FIGS.
- the evaluation result table 1400 of the countermeasure procedure plan may be stored in the main memory 212, or may be stored in the auxiliary storage device 213 as a part of the operation policy information 233, for example.
- FIG. 15 is a flowchart showing details of the prioritization process (step S905) of the countermeasure procedure plan.
- the management server 201 executes a cut-off process (step S1501), a comprehensive evaluation value calculation process (step S1502), and a rearrangement process (step S1503).
- the prioritized processing flow 1500 for the proposed countermeasure procedure may include other processing steps (not shown), or some steps may not exist. In the priority processing flow 1500 for the proposed countermeasure procedure, the order of these steps may be switched.
- step S1501 all evaluation values of a specific countermeasure procedure proposal are compared with evaluation values of other countermeasure procedure proposals and evaluation values of other countermeasure procedure proposals, and are small in all items. In this case, when some evaluation values are the same and other evaluation values are small, that is, when there is no evaluation value that is excellent in some item, the process is performed.
- FIG. 16 is an explanatory diagram illustrating an outline of the cut-off process when the evaluation result of the proposed countermeasure procedure is as illustrated in FIG. The explanation is as described above.
- the comprehensive evaluation value calculation process (step S1502) is a process of calculating the comprehensive evaluation value of the countermeasure procedure plan.
- the countermeasure procedure proposal is evaluated from the viewpoints of influence, effect, performance record, and cost.
- FIG. 17 is an example of a comprehensive evaluation value calculation formula used in the comprehensive evaluation value calculation process (S1502) of FIG.
- each evaluation value has a constant (A, B, C in FIG. 17,
- the comprehensive evaluation value is calculated by means such as calculating and obtaining the sum of the values multiplied by D).
- the constant multiplied by each evaluation value may be a value arbitrarily set by the administrator or may be any value calculated by the management server 201.
- the rearrangement process is a process of rearranging in the descending order of the comprehensive evaluation calculated in step S1502.
- the countermeasure procedure of FIG. 14 is evaluated based on the mathematical formula of FIG. 17, and the rearrangement process is performed.
- the prioritization process (S905) of the countermeasure procedure plan a list in which the countermeasure procedures in FIG. 14 are rearranged in the order of the evaluation score is obtained.
- the result is presented by the proposed countermeasure procedure presentation process (S ⁇ b> 906).
- the administrator selection process (S907), the administrator selects a desired plan from the countermeasure procedure plan, and the selected countermeasure procedure is executed in the countermeasure procedure plan execution process (S908). It should be noted that the steps after the proposed procedure procedure presentation process (S906) may be omitted, and the process may be completed once until the proposed procedure procedure is stored as data.
- Embodiment 1 is a system in which an administrator can select a candidate from candidates that have been prioritized by the priority order processing (S905) of the countermeasure procedure plan. However, since a certain skill is required for the work selected from the candidates, it is desirable that support is provided on the system. In the second embodiment, an example will be described in which an administrator can assist in selecting a candidate with high validity when selecting a candidate.
- Example 2 is based on the configuration of Example 1, and the following configuration may be added.
- FIG. 18 is a flowchart showing an example of a procedure for executing a countermeasure procedure plan (step S908) in the case where the execution result of the countermeasure procedure plan executed by the management server 201 is stored, and is referred to as a learning process flow 1800 here.
- the countermeasure procedure draft execution process (step S908) simply executes the selected procedure and counts the execution results.
- the management server 201 is an administrator. The execution results are evaluated for each pattern of evaluation of the coping procedure plan selected by. Therefore, even if the countermeasure procedure proposals are of different types, if the evaluation pattern is the same, the same pattern is reflected in the execution result.
- the process of increasing the evaluation value of the execution result is stored or stored, and the process of decreasing the evaluation value of the execution result is described as forgetting process or forgetting.
- the administrator and the user can arbitrarily define the evaluation pattern of the proposed countermeasure procedure. For example, “the effect on gold is 5, the effect on silver is 4, the effect on silver is 1”, or “the effect on gold is 4, the effect on silver is 3 and the effect on silver is 2” Thus, a numerical value can be shown for each rank.
- “No effect of 2 or less on all ranks of gold, silver and copper” “Only 3 or more effects on all ranks of gold, silver and copper” “No effect of 2 or less on all ranks of gold, silver and copper”
- the condition may be set as “only three or more effects for all ranks of gold, silver, and copper”.
- the management server 201 performs role acquisition processing (step S1801), variable acquisition processing (step S1802), selected pattern storage processing (step S1803), and unselected pattern forgetting processing (step S1804). Then, an execution registration process (step S1805) is executed.
- the management server acquires the role (role) of the administrator who selected the countermeasure procedure plan. For example, information that the administrator is an expert role with a high system management skill or a general role with a low skill is acquired.
- step S1802 the storage variable 1902 and the forgetting variable 1903 of the row corresponding to the role acquired in step S1801 are acquired from the variable table 1900.
- FIG. 19 is an explanatory diagram showing an example of the variable table 1900.
- the variable table 1900 holds variables used in the execution performance learning process executed in steps S1803 and S1804, and is information prepared in advance by a manual or some program.
- the variable table 1900 has a roll field 1901, a storage variable 1902, and a forgetting variable 1903.
- the variable table 1900 may not have some of these fields, or may have other fields not shown.
- the role field 1901 is an identifier that uniquely identifies the administrator's role.
- the management server performs a storage process of the selected countermeasure procedure plan evaluation pattern. For example, this can be realized by adding a certain value to the existing execution performance value. For example, when storing the pattern of the proposed countermeasure procedure selected by the administrator role, a value of 5 is acquired from the storage variable field 1902 of the variable table 1900 in step S1802, and the countermeasure procedure proposal selected by the administrator is applicable. The value of 5 is added to the execution result value of the pattern to be performed.
- the number of applicable patterns is not limited to one, and a plurality of patterns may be applicable.
- the management server performs the forgetting process of the evaluation pattern of the countermeasure procedure plan that has not been selected. For example, it can be realized by multiplying an existing execution performance evaluation value by a numerical value from 0 to less than 1. For example, when forgetting the evaluation pattern of the countermeasure procedure plan not selected for the administrator role, a value of 0.6 is obtained from the forgetting variable field 1903 of the variable table 1900 in step S1802, and the administrator did not select it. Multiply the values of execution results of all patterns by a value of 0.6.
- the execution registration process (S1805) is a process for performing execution registration of the countermeasure procedure plan selected by the administrator.
- FIG. 20 is an explanatory diagram showing an example of the pattern table 2000.
- the pattern table 2000 is a table that manages the execution results for each pattern of evaluation of the countermeasure procedure proposal selected by the administrator, and is generated when the administrator selects the countermeasure procedure proposal for the first time and selected by the administrator. As long as there is a pattern, it is only necessary to hold the execution results. Alternatively, the execution results may be held in all the evaluation result patterns of the proposed countermeasure procedure that the management server has generated.
- the pattern table 2000 has a pattern ID field 2001, an influence field 2002, an effect field 2003, a cost field 2004, and an execution result field 2005.
- the pattern table 2000 may basically have fields equivalent to the evaluation result table 1400 of the countermeasure procedure plan, but some of these fields exist.
- a field (not shown) may be provided, such as an evaluation field for storing a value obtained by evaluating a problem occurrence state.
- the management server 201 compares the table 1400 and the table 2000 when calculating the evaluation value of the execution result of the countermeasure procedure plan in the countermeasure procedure plan evaluation process (step S904).
- the value of the execution result 2005 in which the effect field 1402 and the effect field 2002 of the countermeasure procedure plan match, the effect field 1403 and the effect field 2003 match, and the cost field 1405 and the cost field 2004 match are set as the execution result 1404.
- the value of the execution result 2005 in which the influence field 1402 and the influence field 2002 of the countermeasure procedure proposal match and the effect field 1403 and the effect field 2003 match may be calculated as the value of the execution result 1404.
- the value of the execution result 2005 in which the influence field 1402 and the influence field 2002 of the countermeasure procedure plan match may be calculated as the value of the execution result 1404.
- an arbitrary value such as 0 may be input as the evaluation value of the execution result 1404.
- FIG. 21 illustrates an example of a change in the value of the execution result 2005 when the storage role and the forgetting process are executed when the user of the administrator role selects the countermeasure procedure plan corresponding to the pattern ID 1. ing. A predetermined value is added to the weight of the selected pattern, and the weight of the unselected pattern is reduced at the same rate.
- both the storage process (step S1803) and the forgetting process (step S1804) are executed, but only one of them may be executed and the other may not be executed. Further, the storage process (step S1803) and the forgetting process (step S1804) may be executed in the reverse order. If the administrator's role is not taken into consideration, steps S1801 and S1802 are not necessarily executed, and a constant storage variable 1902 and forgetting variable 1903 may always be used in the learning process.
- the variable table 1900 and the pattern table 2000 may be stored in the main memory 212, or may be stored in the auxiliary storage device 213.
- the countermeasure procedure plan evaluation pattern 2000 is weighted by learning the past candidate selection process as described above.
- this information is used to emphasize candidates having the same pattern as the pattern whose execution result value is greater than or equal to a predetermined value (for example, 5 or more), for example, in the processing procedure proposal presentation process (S906) of FIG. Display can be made. Thereby, the administrator can know the tendency of selection of past countermeasure procedure proposal candidates.
- a predetermined value for example, 5 or more
- the weighting is reflected on the value of the execution result 1404 in the evaluation result table 1400 of the countermeasure procedure plan in FIG. 14 in the first embodiment, and the overall evaluation value calculation process (S1502) in FIG. Evaluation is performed based on the mathematical formula of and a rearrangement process is performed.
- the prioritization reflecting the past selection pattern is obtained.
- the execution result 2005 of the pattern ID 2001 having the same pattern is calculated (added or integrated) to the execution result 1404 of the countermeasure procedure plan and the weight is reflected.
- the difference in the value of the execution result 2005 of the evaluation pattern of the countermeasure procedure plan in FIG. 21 for each pattern increases. It may be cut.
- the present invention is not limited to the above-described embodiments, and includes various modifications and equivalent configurations within the scope of the appended claims.
- the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the configurations described.
- a part of the configuration of one embodiment may be replaced with the configuration of another embodiment.
- another configuration may be added, deleted, or replaced.
- each of the above-described configurations, functions, processing units, processing means, etc. may be realized in hardware by designing a part or all of them, for example, with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing the program to be executed.
- Information such as programs, tables, and files that realize each function is recorded on a storage device such as a memory, hard disk, or SSD (Solid State Drive), or on an IC card, SD card, DVD, Blue Ray Disk, or other optical disk. It can be stored on a medium.
- a storage device such as a memory, hard disk, or SSD (Solid State Drive), or on an IC card, SD card, DVD, Blue Ray Disk, or other optical disk. It can be stored on a medium.
- control lines and information lines indicate what is considered necessary for the explanation, and do not necessarily indicate all control lines and information lines necessary for mounting. In practice, it can be considered that almost all the components are connected to each other.
- 201 management server
- 211 processor
- 212 main storage
- 213 auxiliary storage device
- 220 problem solving processing
- 2131 constraint conditions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
図9に戻り説明を続ける。対策手順案の優先順位付処理(S905)により、図14の対策手順を、評価得点順に並び替えた一覧が得られる。図9の例では、対策手順案の提示処理(S906)により結果を提示する。管理者の選択処理(S907)では対策手順案から管理者が所望の案を選択し、対策手順案の実行処理(S908)で、選択された対策手順を実行する。なお、対策手順案の提示処理(S906)以降は省略し、対策手順案をデータとして保持するまでで処理を一度終えてもよい。
Claims (15)
- プロセッサと、入力装置と、出力装置と、記憶装置を有し、複数の計算機システムを管理する管理計算機であって、
前記複数の計算機システムの部品の状態を変更する対策手順案を生成する、対策手順案生成モジュールを備え、
前記対策手順案生成モジュールは、
前記複数の計算機システムまたはその部品のうち、上位ランクの計算機システムまたはその部品への影響が、下位ランクの計算機システムまたはその部品への影響より小さくなるという制約条件に従って、前記対策手順案を生成する、
管理計算機。 - 前記対策手順案生成モジュールは、
前記上位ランクの計算機システムの部品の状態を変更し、かつ、前記下位ランクの計算機システムの部品の状態を変更しない対策手順案を、前記生成する対策手順案から除外するフィルタリングモジュールを有する、
請求項1記載の管理計算機。 - 前記制約条件は、前記計算機システムまたはその部品が満たすべき品質を品質区分として定義し、前記計算機システムまたはその部品ごとに前記品質区分を対応づけた情報を含み、
前記対策手順案生成モジュールは、
前記品質区分を満足するように、前記複数の対策手順案を生成する、
請求項1記載の管理計算機。 - 前記対策手順案生成モジュールで生成した、一つ以上の前記対策手順案の効果をシミュレートし評価する、対策手順案評価モジュールと、
前記対策手順案評価モジュールの評価結果に基づいて、前記一つ以上の対策手順案の優先順位付を行う、対策手順案優先順位付けモジュールと、
をさらに備える、
請求項1記載の管理計算機。 - 前記対策手順案評価モジュールは、
前記一つ以上の対策手順案を特定する対策手順案IDと、該対策手順案毎ID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果および影響の少なくとも一つの評価値を対応付けた対策手順案の評価結果情報を生成し、
前記評価結果情報は、少なくとも第1の対策手順案と第2の対策手順案の評価結果情報を含み、
前記対策手順案優先順位付けモジュールは、
前記評価結果情報において、(1)前記第1の対策手順案の全ての評価値が、前記第2の対策手順案よりも下回る場合、あるいは、(2)前記第1の対策手順案の一部の評価値が、前記第2の対策手順案よりも下回り、かつ、前記第1の対策手順案の他の評価値が、第2の対策手順案と同じ値の場合、前記第1の対策手順案を対策手順案から除外する、
請求項4記載の管理計算機。 - 前記対策手順案評価モジュールは、
前記一つ以上の対策手順案を特定する対策手順案IDと、該対策手順案毎ID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果、影響、実行実績、及びコストの少なくとも一つの評価値を対応付けた対策手順案の評価結果情報を生成し、
前記対策手順案優先順位付けモジュールは、
前記評価値に基づいて所定の演算を行うことにより総合評価値を得、前記総合評価値に基づいて、前記一つ以上の対策手順案を並び替える、
請求項4記載の管理計算機。 - 対策手順案提示モジュールと、選択モジュールと、対策手順案実行モジュールをさらに備え、
前記対策手順案評価モジュールは、
前記一つ以上の対策手順案を特定する対策手順案IDと、該対策手順案毎ID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果および影響の少なくとも一つの評価値を対応付けた対策手順案の評価結果情報を生成し、
前記対策手順案提示モジュールは、
前記評価結果情報を提示し、
前記選択モジュールは、
前記提示した評価結果情報に基づいて1または複数の対策手順案を操作者に選択させ、
前記対策手順案実行モジュールは、
パターンID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果および影響の少なくとも一つの評価値と、実行実績を対応付けたパターン情報を管理し、
前記選択モジュールで選択された対策手順案の評価結果情報と所定の関係を有する前記パターン情報の前記実行実績に対して加算および重み付けの少なくとも一つを行う、
請求項4記載の管理計算機。 - 前記対策手順案実行モジュールは、
前記選択モジュールで選択された対策手順案の評価結果情報と同じパターンを有する前記パターン情報の前記実行実績に対して値を増加させ、選択されなかったパターンの実行実績の値を減少させて前記実行実績を管理する、
請求項7に記載の管理計算機。 - プロセッサと、入力装置と、出力装置と、記憶装置を有する管理計算機が、複数の計算機システムを管理する計算機システムの管理方法であって、
前記管理計算機は、前記複数の計算機システムの部品の状態を変更する対策手順案を生成する際に、
前記複数の計算機システムまたはその部品のうち、上位ランクの計算機システムまたはその部品への影響が、下位ランクの計算機システムまたはその部品への影響より小さくなるという制約条件に従って、前記対策手順案を生成する、
計算機システムの管理方法。 - 前記管理計算機は、
前記上位ランクの計算機システムの部品の状態を変更し、かつ、前記下位ランクの計算機システムの部品の状態を変更しない対策手順案を、前記生成する対策手順案から除外するフィルタリング処理を行う、
請求項9記載の計算機システムの管理方法。 - 前記制約条件は、前記計算機システムまたはその部品が満たすべき品質を品質区分として定義し、前記計算機システムまたはその部品ごとに前記品質区分を対応づけた情報を含み、
前記管理計算機は、
前記品質区分を満足するように、前記複数の対策手順案を生成する、
請求項9記載の計算機システムの管理方法。 - 前記管理計算機は、
一つ以上の前記対策手順案の効果をシミュレートして評価する評価処理と、
前記評価結果に基づいて、前記一つ以上の対策手順案の優先順位付けを行う優先順位付け処理を行う、
請求項9記載の計算機システムの管理方法。 - 前記評価処理では、
前記一つ以上の対策手順案を特定する対策手順案IDと、該対策手順案毎ID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果および影響の少なくとも一つの評価値を対応付けた対策手順案の評価結果情報を生成し、
前記評価結果情報は、少なくとも第1の対策手順案と第2の対策手順案の評価結果情報を含み、
前記優先順位付け処理では、
前記評価結果情報において、(1)前記第1の対策手順案の全ての評価値が、前記第2の対策手順案よりも下回る場合、あるいは、(2)前記第1の対策手順案の一部の評価値が、前記第2の対策手順案よりも下回り、かつ、前記第1の対策手順案の他の評価値が、前記第2の対策手順案と同じ値の場合、前記第1の対策手順案を対策手順案から除外する、
請求項12記載の計算機システムの管理方法。 - 前記評価処理では、
前記一つ以上の対策手順案を特定する対策手順案IDと、該対策手順案毎ID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果、影響、実行実績、及びコストの少なくとも一つの評価値を対応付けた対策手順案の評価結果情報を生成し、
前記優先順位付け処理では、
前記評価値に基づいて所定の演算を行うことにより総合評価値を得、前記総合評価値に基づいて、前記一つ以上の対策手順案を並び替える、
請求項12記載の計算機システムの管理方法。 - 前記管理計算機は、
対策手順案提示処理と、選択処理と、対策手順案実行処理をさらに実行し、
前記評価処理では、
前記一つ以上の対策手順案を特定する対策手順案IDと、該対策手順案毎ID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果および影響の少なくとも一つの評価値を対応付けた対策手順案の評価結果情報を生成し、
前記対策手順案提示処理では、
前記評価結果情報を提示し、
前記選択処理では、
前記提示した評価結果情報に基づいて1または複数の対策手順案を操作者に選択させ、
前記対策手順案実行処理では、
パターンID毎に、前記複数の計算機システムまたはその部品の上位ランク及び下位ランクの其々に対する効果および影響の少なくとも一つの評価値と、実行実績を対応付けたパターン情報を管理し、
前記選択モジュールで選択された対策手順案の評価結果情報と所定の関係を有する前記パターン情報の前記実行実績に対して値を増加させ、それ以外のパターン情報の前記実行実績に対して値を減少させる、
請求項12記載の計算機システムの管理方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/554,123 US20180052729A1 (en) | 2015-08-07 | 2015-08-07 | Management computer and computer system management method |
PCT/JP2015/072562 WO2017026017A1 (ja) | 2015-08-07 | 2015-08-07 | 管理計算機および計算機システムの管理方法 |
JP2017534045A JP6622808B2 (ja) | 2015-08-07 | 2015-08-07 | 管理計算機および計算機システムの管理方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2015/072562 WO2017026017A1 (ja) | 2015-08-07 | 2015-08-07 | 管理計算機および計算機システムの管理方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017026017A1 true WO2017026017A1 (ja) | 2017-02-16 |
Family
ID=57983663
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/072562 WO2017026017A1 (ja) | 2015-08-07 | 2015-08-07 | 管理計算機および計算機システムの管理方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20180052729A1 (ja) |
JP (1) | JP6622808B2 (ja) |
WO (1) | WO2017026017A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019074798A (ja) * | 2017-10-12 | 2019-05-16 | 株式会社日立製作所 | リソース管理装置、リソース管理方法、及びリソース管理プログラム |
JP2021140810A (ja) * | 2017-04-26 | 2021-09-16 | 京セラ株式会社 | 端末装置、プログラム、電力管理装置、およびサーバ |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11755928B1 (en) | 2020-04-27 | 2023-09-12 | Wells Fargo Bank, N.A. | Computing cluster configuration standardization |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008009842A (ja) * | 2006-06-30 | 2008-01-17 | Hitachi Ltd | コンピュータシステムの制御方法及びコンピュータシステム |
WO2013171944A1 (ja) * | 2012-05-15 | 2013-11-21 | 日本電気株式会社 | 仮想マシン管理システム、仮想マシン管理方法およびプログラム |
WO2015040688A1 (ja) * | 2013-09-18 | 2015-03-26 | 株式会社日立製作所 | 計算機システムを管理する管理システム及びその管理方法 |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3826940B2 (ja) * | 2004-06-02 | 2006-09-27 | 日本電気株式会社 | 障害復旧装置および障害復旧方法、マネージャ装置並びにプログラム |
US8074103B2 (en) * | 2007-10-19 | 2011-12-06 | Oracle International Corporation | Data corruption diagnostic engine |
US7904753B2 (en) * | 2009-01-06 | 2011-03-08 | International Business Machines Corporation | Method and system to eliminate disruptions in enterprises |
US8732524B2 (en) * | 2011-08-03 | 2014-05-20 | Honeywell International Inc. | Systems and methods for using a corrective action as diagnostic evidence |
DE112012005598T5 (de) * | 2012-03-08 | 2014-10-16 | Hewlett-Packard Development Company, L.P. | Identifizieren und Einstufen von Lösungen aus mehreren Datenquellen |
US9063856B2 (en) * | 2012-05-09 | 2015-06-23 | Infosys Limited | Method and system for detecting symptoms and determining an optimal remedy pattern for a faulty device |
US8990639B1 (en) * | 2012-05-31 | 2015-03-24 | Amazon Technologies, Inc. | Automatic testing and remediation based on confidence indicators |
US8977899B1 (en) * | 2012-09-14 | 2015-03-10 | CSC Holdings, LLC | Assisted device recovery |
WO2014073045A1 (ja) * | 2012-11-07 | 2014-05-15 | 株式会社日立製作所 | 計算機システム、ストレージ管理計算機及びストレージ管理方法 |
US9081680B2 (en) * | 2013-03-15 | 2015-07-14 | Accenture Global Services Limited | System-level issue detection and handling |
US20160062857A1 (en) * | 2013-04-17 | 2016-03-03 | Nec Corporation | Fault recovery routine generating device, fault recovery routine generating method, and recording medium |
US9250993B2 (en) * | 2013-04-30 | 2016-02-02 | Globalfoundries Inc | Automatic generation of actionable recommendations from problem reports |
US9183074B2 (en) * | 2013-06-21 | 2015-11-10 | Dell Products, Lp | Integration process management console with error resolution interface |
WO2015016925A1 (en) * | 2013-07-31 | 2015-02-05 | Hewlett-Packard Development Company, L.P. | Automated remote network target computing device issue resolution |
US9448907B2 (en) * | 2013-10-27 | 2016-09-20 | Bank Of America Corporation | Computer application maturity illustration system with single point of failure analytics and remediation techniques |
US20150302336A1 (en) * | 2014-04-17 | 2015-10-22 | Bank Of America Corporation | Strategic partner governance framework and performance tracking |
-
2015
- 2015-08-07 WO PCT/JP2015/072562 patent/WO2017026017A1/ja active Application Filing
- 2015-08-07 JP JP2017534045A patent/JP6622808B2/ja active Active
- 2015-08-07 US US15/554,123 patent/US20180052729A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008009842A (ja) * | 2006-06-30 | 2008-01-17 | Hitachi Ltd | コンピュータシステムの制御方法及びコンピュータシステム |
WO2013171944A1 (ja) * | 2012-05-15 | 2013-11-21 | 日本電気株式会社 | 仮想マシン管理システム、仮想マシン管理方法およびプログラム |
WO2015040688A1 (ja) * | 2013-09-18 | 2015-03-26 | 株式会社日立製作所 | 計算機システムを管理する管理システム及びその管理方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021140810A (ja) * | 2017-04-26 | 2021-09-16 | 京セラ株式会社 | 端末装置、プログラム、電力管理装置、およびサーバ |
JP7301906B2 (ja) | 2017-04-26 | 2023-07-03 | 京セラ株式会社 | 端末装置、プログラム、電力管理装置、およびサーバ |
JP2019074798A (ja) * | 2017-10-12 | 2019-05-16 | 株式会社日立製作所 | リソース管理装置、リソース管理方法、及びリソース管理プログラム |
Also Published As
Publication number | Publication date |
---|---|
US20180052729A1 (en) | 2018-02-22 |
JPWO2017026017A1 (ja) | 2018-05-31 |
JP6622808B2 (ja) | 2019-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9870330B2 (en) | Methods and systems for filtering collected QOS data for predicting an expected range for future QOS data | |
JP6165886B2 (ja) | 動的ストレージサービスレベル・モニタリングの管理システムおよび方法 | |
US9411834B2 (en) | Method and system for monitoring and analyzing quality of service in a storage system | |
JP4516306B2 (ja) | ストレージネットワークの性能情報を収集する方法 | |
US20150081484A1 (en) | Automated cost calculation for virtualized infrastructure | |
US10564998B1 (en) | Load balancing using predictive VM-based analytics | |
US9146793B2 (en) | Management system and management method | |
US9886451B2 (en) | Computer system and method to assist analysis of asynchronous remote replication | |
US20150199136A1 (en) | Method and system for monitoring and analyzing quality of service in a storage system | |
US10073866B2 (en) | Dynamic test case prioritization for relational database systems | |
US9747156B2 (en) | Management system, plan generation method, plan generation program | |
US10225158B1 (en) | Policy based system management | |
US9773026B1 (en) | Calculation of system utilization | |
US10002025B2 (en) | Computer system and load leveling program | |
JP6009089B2 (ja) | 計算機システムを管理する管理システム及びその管理方法 | |
JP6622808B2 (ja) | 管理計算機および計算機システムの管理方法 | |
US11775330B2 (en) | Load balancing VM selection and movement | |
US11599404B2 (en) | Correlation-based multi-source problem diagnosis | |
US20200394091A1 (en) | Failure analysis support system, failure analysis support method, and computer readable recording medium | |
US10042572B1 (en) | Optimal data storage configuration | |
AU2021363719B2 (en) | Generating and updating a performance report | |
JP7135780B2 (ja) | ライブマイグレーション調整プログラム及びライブマイグレーション調整方法 | |
WO2006011905A2 (en) | Methods and systems for managing an application environment and portions thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15900972 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15554123 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2017534045 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15900972 Country of ref document: EP Kind code of ref document: A1 |