WO2016098138A1 - Management system and management method for managing computer system - Google Patents

Management system and management method for managing computer system Download PDF

Info

Publication number
WO2016098138A1
WO2016098138A1 PCT/JP2014/006225 JP2014006225W WO2016098138A1 WO 2016098138 A1 WO2016098138 A1 WO 2016098138A1 JP 2014006225 W JP2014006225 W JP 2014006225W WO 2016098138 A1 WO2016098138 A1 WO 2016098138A1
Authority
WO
WIPO (PCT)
Prior art keywords
resources
requirement
resource
configurations
primary
Prior art date
Application number
PCT/JP2014/006225
Other languages
French (fr)
Inventor
Pablo MARTINEZ LERIN
Hironori Emaru
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/JP2014/006225 priority Critical patent/WO2016098138A1/en
Publication of WO2016098138A1 publication Critical patent/WO2016098138A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3442Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity

Definitions

  • the present invention generally relates to reconfiguring resources in a data center.
  • Patent Literature 1 discloses a workload learning method as follows. Before a disaster occurs, the method keeps learning the I/O workload from the primary data center, and applying the learned workload to the secondary data center. As a result, the data being replicated in the secondary data center is placed in the appropriate tier according to the workload. Then, in the event of failover, the data in the secondary data center is already in a suitable tier to fulfill the same performance requirement as the primary data center.
  • Patent Literature 1 The related art disclosed by Patent Literature 1 is used to achieve similar performance in a secondary DC when the primary and the secondary DCs have the same or similar resources.
  • the technology cannot be used when there is a lack or limitation of resources.
  • a method to compute an alternative configuration might be required.
  • the method is not included in the related art. Further, the method is not known.
  • the problem that remains is to compute at least one suitable configuration (e.g. the most suitable configuration) of resources (e.g. what resources, how many, how are combined) available in the secondary data center that would fulfill the performance, reliability and security requirements in the event of failover, and if not fully possible, compute precisely the limitations so that administrators can act beforehand.
  • the solution to the remaining problem is not trivial because at least one suitable configuration (e.g. the most suitable configuration) of available resources changes often for multiple situations such as changes in the data center requirements, changes in the data center metrics such as resources usage and resources workload, and changes in the available resources in the secondary data center.
  • a management system is configured to identify one or more requirements with respect to the primary DC, identify one or more suitable resources which are resources suitable to the identified requirements, from among available resources which are resources that can be used in the secondary DC, based on SPECs of the available resources and metrics (e.g. resources usage and resources workload) from the primary DC, and create a DR (Disaster Recovery) plan by mapping configurations with respect to the identified suitable resources to at least one of the identified requirements.
  • DR Digital Retention
  • a primary DC in the event of disaster can failover to a secondary DC where the features (specifications) of the available resources are limited, and an automated process or an administrator can reconfigure quickly the secondary DC by following an up-to-date disaster recovery plan created and checked beforehand with the suitable configuration of available resources to fulfill the data center requirements.
  • Fig. 1 is a diagram showing a brief description of the actions and data used in an embodiment of the present invention.
  • Fig. 2 is a diagram showing a schematic configuration of a storage system and a management computer coupled to the storage system.
  • Fig. 3 shows a configuration example of an Application requirements table.
  • Fig. 4 shows a configuration example of a Primary DC configuration table.
  • Fig. 5 shows a configuration example of an Available resources table.
  • Fig. 6A shows a first configuration example of a Configuration knowledge table.
  • Fig. 6B shows a second configuration example of a Configuration knowledge table.
  • Fig. 6C shows a third configuration example of a Configuration knowledge table.
  • Fig.7 shows a flow chart for explaining processing to create or update a DR (Disaster Recovery) plan.
  • Fig. 8 shows a flow chart for explaining processing to compute and select the most suitable alternative configuration to fulfill the requirements of an application.
  • Fig. 9 shows a flow chart for explaining processing to acquire all suitable
  • aaa table is used to describe information in the following description, the information may be expressed in a data structure other than a table.
  • the term “aaa table” can be replaced with the term "aaa information”.
  • an ID (identifier) is used as information for identifying a target in the following description, the ID can be replaced with other kinds of identification information.
  • program is occasionally used as a subjective to describe a process.
  • the program is executed by a processor (for example, a CPU (Central Processing Unit)) to perform a determined process using at least one of a storage unit (e.g. a memory) and an interface device (e.g. a communication port). Therefore, the subject of the process can be a processor.
  • a part of the process may be performed by a hardware circuit, and the processor may comprise the hardware circuit.
  • a program may be installed from a program source.
  • the program source may be a program distribution server or a computer-readable storage medium, for example.
  • a collection of one or more computers configured to manage logical volumes and display information for display may be referred to as a "management system".
  • the management computer may serve as the management system.
  • a combination of the management computer and a display computer also may serve as the management system.
  • a plurality of computers may be used to achieve a process that is identical or similar to that performed by the management computer.
  • the plurality of computers (which may include the display computer in the case where the display computer is used for display) may serve as the management system.
  • the management computer serves as the management system.
  • the management computer displays information may denote that information is displayed on a display device possessed by the management computer, or may denote that information for display is transmitted to the display computer coupled to the management computer. In the latter case, the display computer displays information represented by the information for display on the display device possessed by the display computer.
  • the management method according to the embodiment is briefly described by the following 10 steps.
  • the DC administrator 1010 requests to the management computer 1020 the creation of a DR plan (e.g. the creation of a new DR plan or the update of an existing plan) to support DR (Disaster Recovery) in the DC, where a DR plan is a plan to reconfigure the secondary DC 1070 in the event of a failover from the primary DC 1060 to the secondary DC 1070.
  • a DR plan is a plan to reconfigure the secondary DC 1070 in the event of a failover from the primary DC 1060 to the secondary DC 1070.
  • the management computer 1020 acquires (identifies) the requirements of the DC (e.g. the requirements of applications executed in the primary DC 1060), for example performance, reliability and security requirements, from the Application Requirements Table 2151 describing the SLA (Service Level Agreement) and so on of the DC, if available, and the configuration and metrics in the primary DC 1060.
  • the requirements of the DC e.g. the requirements of applications executed in the primary DC 1060
  • performance, reliability and security requirements for example performance, reliability and security requirements
  • the management computer 1020 acquires (identifies) the available resources that can be used in the secondary DC 1070 in the event of failover, from, for example, an Online server of a cloud provider 1030.
  • the resources currently in the secondary DC 1070 are also acquired as available resources.
  • the management computer 1030 acquires alternative configurations based on the configuration knowledge tables 2154 describing knowledge of configurations, to be used when the available resources are not enough to achieve the same configuration in the secondary DC 1070 as in the primary DC 1060.
  • the management computer 1030 acquires the configuration and metrics from the primary DC 1060 in order to determine the suitability of the alternative configurations.
  • the management computer 1030 creates the DR plan 1021 by mapping the requirements in the DC to configurations formed by available resources.
  • the management computer 1030 computes the limitations of the created DR plan 1021 and sends (shows) the DR plan 1021 and the limitations to the DC administrator 1010.
  • the DC administrator 1010 checks the DR plan and confirms the DR plan in the management computer 1030.
  • the management computer 1030 saves the DR plan 1021 in a location (e.g. a storage area in the management computer 1030) safe and accessible in the event of disaster in the primary DC 1060.
  • the management computer 1030 detects when the DR plan may have become outdated and performs a DR plan update. The process cycles again from the Step 2.
  • the management computer 1030 retrieves the DR plan 1021 and reconfigures the secondary DC 1070 according to the DR plan 1021.
  • the Steps 7 and 8 are optional.
  • the Steps 2, 3, and 4 can be done by an independent process at any time, for example, in the management computer 1030.
  • FIG. 2 shows a general configuration of a storage system according to the embodiment of the invention.
  • the storage system includes one (or multiple) storage apparatus 2704, and multiple (or one) host computers 2701 and 2702, which are connected to the storage apparatus 2704 via a data network 2703.
  • the secondary DC 1070 includes one (or multiple) storage apparatus 2804 and multiple (or one) host computers 2801 and 2802, which are connected to the storage apparatus 2804 via a data network 2803.
  • the host computers 2701, 2702, 2801 and 2802, and the storage apparatuses 2704 and 2804 are connected to a management computer 1020 via a management network 2400.
  • the storage apparatuses 2704 and 2804 are connected each other via a remote network 2600.
  • At least one of the storage apparatuses 2704 and 2804 includes one or more storage devices (typically, non-volatile storage devices like HDDs (Hard disk drives) or SSDs (Solid State Drives)) and a controller for controlling read/write data from/to the one or more storage devices.
  • the one or more storage devices may be one or more RAID (Redundant Array of Independent (or Inexpensive) Disks) groups.
  • the data network 2703 herein is a storage area network (SAN) but may be an IP (Internet Protocol) network or any other data communication network.
  • the management network 2400 herein is an IP network, but may be a SAN or any other data communication network.
  • the management computer 1020 herein is directly connected to the storage apparatuses 2704 and 2804 but may acquire necessary information via at least one of the host computers 2071, 2702, 2801 and 2802.
  • the data networks 2703 and 2803 and the management network 2400 herein are separately provided but the data networks 2703 and 2803 may also serve as the management network 2400.
  • the management computer 1020 and at least one of the host computers 2071, 2702, 2801 and 2802 may be one single computer unit.
  • the storage system shown in FIG. 2 includes two storage apparatuses 2704 and 2804, four host computers 2071, 2702, 2801 and 2802, and one management computer 1020. In the present invention, however, the number of those units is not limited to this.
  • a set of the host computers 2071, 2702, 2801 and 2802, the storage apparatuses 2704 and 2804 and the data networks 2703 and 2803 for connecting the two formers is herein referred to as a DC.
  • a plurality of DCs is typically provided at geographically separated locations. This is because, if a DC at a certain location is damaged and cannot continue its operations, another DC at a separated location which is not damaged can take over them, also referred as failover.
  • the storage system includes a primary DC 1060 and a secondary DC 1070 which is a backup DC of the primary DC 1060. This configuration is referred to as a two data center (2DC) configuration. There may be multiple primary DCs for one secondary DC, or multiple secondary DCs for one primary DC.
  • a remote copy is performed between the primary DC 1060 and the secondary DC 1070 via the remote network 2600.
  • the remote copy herein means a technique of duplicating a data by copying a data from a volume in a storage apparatus into a volume in another storage apparatus. Using the remote copy technique, if a volume has any trouble and cannot perform its operations, another volume can take over them using a duplicated data stored therein.
  • Two volumes as a copy source and a copy destination in a remote copy relationship are collectively called a copy pair.
  • the storage system of the present invention includes two DCs. However, the number of those units is not limited to this.
  • the secondary DC may have only the minimum amount of resources running to ensure that a backup of the primary DC data is kept safe, and in the event of failover resources are set in the secondary DC to continue the service of the applications that were in the primary DC.
  • the management computer 1020 includes an input device 2110 (e.g. a keyboard and a pointing device), an output device 2130 (e.g. a display device), a storage resource including a memory 2150, a management I/F for communicating via the Management network 2400, and a CPU 2120 coupled to thereof.
  • the memory 2150 stores programs and management tables.
  • the management tables include an Application requirements table 2151, a Primary DC configuration table 2152, an Available resources table 2153, and Configuration knowledge tables 2154.
  • the program is, for example, a management program 2155.
  • Applications are configured to be executed in at least one of the host computers 2701, 2702, 2801 and 2802. Specifically, before DR, applications are executed in at least one of the host computers 2701 and 2702. After the DR, applications will be executed in at least one of host computers 2801 and 2802. At least one of the host computers 2701, 2702, 2801 and 2802 is configured to perform I/O to at least one of the storage apparatuses 2704 and 2804 (write/read data to/from at least one of the storage apparatuses 2704 and 2804).
  • FIG. 3. shows a configuration example of the Application requirements table 2151.
  • the Application requirements table 2151 is a table which manages the requirements of each application in the primary DC 1060.
  • the Application requirements table 2151 has, as configuration information, a Target application 3001, which is information to identify an application in the primary DC 1060, a Requirement id 3002, which is information to identify a requirement in the primary DC 1060, a Requirement type 3003, which is information to identify the type of requirement, and a Requirement contents 3004, which is information to describe the managed requirement.
  • a Target application 3001 which is information to identify an application in the primary DC 1060
  • a Requirement id 3002 which is information to identify a requirement in the primary DC 1060
  • a Requirement type 3003 which is information to identify the type of requirement
  • a Requirement contents 3004 which is information to describe the managed requirement.
  • Each requirement in the primary DC 1060 is represented by a row in the Application requirements table 2151. Requirements are added, modified and removed, for example, by the administrator action, and by an automated process, for example running in the management computer 1020, that retrieves the SLA (Service Level Agreement) of the primary DC 1060, retrieves special SLAs for the event of disaster, and infers requirements by monitoring the configuration and dynamic metrics of the primary DC 1060.
  • SLA Service Level Agreement
  • the Requirement type 3003 contains one of a set of values predefined by the management computer 1020. Requirements of applications typically mean requirements of application programs executed in host computers.
  • FIG. 4. shows a configuration example of the Primary DC configuration table 2152.
  • the Primary DC configuration table 2152 is a table which manages the configuration and capabilities of the current resources in the primary DC 1060.
  • the Primary DC configuration table 2152 has, an Application id 4001, which is the information to identify an application in the primary DC 1060, a Resource id 4002, which is information to identify a resource present in the primary DC 1060, a Resource type 4003, which is information to identify the type of resource, and a Resource SPECs (specifications or features) provided for the application 4004, which is information to describe the current capabilities of the managed resource being used or provided to the application.
  • Application id 4001 which is the information to identify an application in the primary DC 1060
  • a Resource id 4002 which is information to identify a resource present in the primary DC 1060
  • a Resource type 4003 which is information to identify the type of resource
  • a Resource SPECs specifications or features
  • Each resource present in the primary DC 1060 is represented in one or more rows in the primary DC configuration table 2152. Resources are added, modified and removed, for example, by the action of the administrator, and by an automated process, for example running in the management computer 1020, that monitors the primary DC 1060.
  • the Resource id 4002 is an identifier internal to the DC, and may have no relation with the identifier used to represent a resource that is available.
  • the Resource type 4003 contains one of a set of values predefined by the management computer 1020, indicating a type of resource.
  • the Resource SPECs provided for the application 4004 contains only the capabilities of the managed resource being used or provided to the application, not the total capabilities. When a resource is used or provided to several applications, the resource appears in one row for each of the applications.
  • the resource with id "Server A1" of type server provides 1GB of cache and 4GHz of CPU to an application with id "App01".
  • FIG. 5. shows a configuration example of the Available resources table 2153.
  • the Available resources table 2153 is a table which manages the resources that are available to be used in the secondary DC 1070 in the event of failover.
  • the Available resources table 2153 has, as configuration information, a Resource id 5001, which is the information for identifying a resource, a Resource type 5002, which is information to identify the type of resource, a Resource SPECs 5003, which is information to describe the current capabilities of a resource and the capabilities that can be added to a resource, a Available units 5004, which is information to count the number of units of the managed resource that are available, a Cost per unit 5005, which is information related to the cost of acquiring and providing one unit of the managed resource to the secondary DC 1070, and an Origin 5006, which is information to describe where the managed resource can be acquired from.
  • a Resource id 5001 which is the information for identifying a resource
  • a Resource type 5002 which is information to identify the type of resource
  • a Resource SPECs 5003 which is information to describe the current capabilities of a resource and the capabilities that can be added to a resource
  • a Available units 5004 which is information to count the number of units of the managed resource that are available
  • Each available resource is represented by a row in the Available resources table 2153. Available resources are added, modified and removed, for example, by the action of the administrator, and by an automated process, for example running in the management computer 1020, that monitors the current resources in the secondary DC 1070, and receives or requests information from an online service of a resources shop or a cloud provider.
  • the Resources type 5002 contains one of a set of values predefined by the management computer, indicating a type of resource.
  • the Resource SPECs 5003 refer, for example, to SPECs and available functionality.
  • the Available units 5004 may be a value based on the budget and entered by the administrator.
  • the Origin 5006 may include information and instructions to receive or request further information about the available resources and update the information about the available resources.
  • R001 that is a server with 1 GB of cache and capability to be combined with cache resources up to a total of 12 GB, that there are five units available at a cost per unit of 2.000$ and can be acquired from the Cloud provider A.
  • FIG. 6A, FIG. 6B and FIG. 6C show configuration examples of three configuration knowledge tables 2154A, 2154B and 2154C.
  • the configuration knowledge table 2154 is any one of configuration knowledge tables 2154.
  • the configuration knowledge table 2154 is a table which contains the knowledge of an alternative configuration to fulfill a requirement, the conditions that need to be fulfilled to apply the configuration, the instructions to apply the configuration, and the instructions to compute the resources that the configuration needs. Since an alternative configuration may only be suitable when certain conditions are fulfilled, an alternative configuration may be only suitable as a temporal solution, for example until the primary DC 1060 is recovered from a disaster and failback is performed.
  • the configuration knowledge table 2154 has, as configuration information, a Configuration id 6001, which is the information for identifying a configuration, a Target requirement type 6002, which is information to identify the type of requirement that the configuration is designed to fulfill, the Target Requirement contents 6003, which is information to describe the requirement that the configuration is designed to fulfill, a Conditions for metrics on primary DC 6004, which is a set of any, one or more triplets of Resource, Metric and Condition, and is information to describe what condition, in what metric and in what type of resource need to be fulfilled so that the configuration is suitable to fulfill the requirement 6003, a Configuration instructions 6005, which is information to apply the configuration in a secondary DC 1070, a Relevant metrics on primary DC 6006, which is a set of any, one or more pairs of Resource and Metrics, and is information to describe what metrics on what resources should be acquired from a primary DC 1060 in order to compute the resources that the configuration needs, a Resources needed on secondary DC 6007, which is a set of any, one or more pairs of Resource and Demand
  • Each configuration is represented by one configuration knowledge table 2154.
  • Alternative configurations are added, modified and removed, for example, by the action of the administrator, and by an automated process, for example running in the management computer 1020, that retrieves or receives alternative configurations from an online service.
  • the Target requirement type 6002 and Target requirement contents 6003 may describe more than one requirement.
  • the configuration instructions may include a script to apply the configuration by a process, for example in the management computer 1020.
  • the Resource in 6004 and the Resource in 6007 contain one of a set of values predefined by the management computer, indicating a type of resource.
  • the Metric in 6004 and the Metric in 6006 contain one of a set of values predefined by the management computer, indicating a metric that can be acquired from the primary DC 1060, and can be include details such as trend of metric, average or metric or maximum value of metric in an interval.
  • FIG. 7 is a flowchart for explaining processing to create a DR plan based on alternative configurations to be used in the case of a failover from the primary DC 1060 to the secondary DC 1070 in order to reconfigure the secondary DC 1070.
  • the processing is executed by the management computer 1020, for example, automatically at a regular time interval, when the DC administrator requests a DR plan creation (e.g. the creation of a new DR plan or the update of an existing DR plan) through the input device 2110, when an alert set in the step 7004 is received, or when there is an update in one or more of the tables 2151, 2152, 2153 and 2154 in the memory 2150.
  • a DR plan creation e.g. the creation of a new DR plan or the update of an existing DR plan
  • the management program 2155 refers to the Target application 3001 in the Application requirements table 2151 and iterates through each application with at least one requirement.
  • the applications can have associated a priority so that the important applications are iterated before the non-important applications, so that, for example, the important applications have more available resources.
  • the management program 2155 checks whether the same configuration as the current configuration in the primary DC 1060 can be provided in the secondary DC 1070 for the application identified from Target application 3001 being iterated, where "same configuration” refers to having the same number of resources, of the same type and the same or better SPECs relevant for the requirements of the application. "Better SPECs” means, for example, higher I/O performance, larger capacity, higher reliability, and so on.
  • the management program 2155 retrieves the current configuration in the primary DC 1060 of the Target application 3001 being iterated by referring to the Primary DC configuration table 2152 and retrieving all the rows whose Application id 4001 matches with the Target application 3001 value.
  • the management program 2155 retrieves the Available resources from the table 2153, taking into account for the Available units 5004 value the units already selected for the DR plan being created. For each resource that form the retrieved current configuration, The management program 2155 searches a suitable available resource by matching the Resource type 4003 to the same Resource type 5002, and the Resource SPECs 5003 provided for the application 4004 to the same or better Resource SPEC 5003.
  • the suitable available resource can be combined with more available resource if possible in order to match the SPECs, for example, combine a server cache resource with a server resource. Additionally, the resource SPECs can be matched by referring only to the SPECs that are relevant for the requirements 3002 of the Target application 3001 being iterated.
  • the management program 2155 gives more importance to the current configuration in the primary DC 1060 than to the Requirement contents 3004 in the Application requirements table 2151 because assumes that the configuration in primary DC 1060 is a configuration chosen by the administrator, well tested, and being used for long time. Therefore, the current configuration in the primary DC 1060 has guaranties to achieve the Requirement contents 3004 of all the requirements of the Target Application 3001 being iterated. However, when this assumption may not be true, the processing can skip the step 7002 and go the step 7003.
  • the management program 2155 computes several combinations of alternative configurations and selects the most suitable to fulfill all the requirements of the application being iterated, where "alternative configuration" refers to a configuration where at least one of Resource type 5002 and Resource SPECs provided for the application 4004 of resources is different from the one of resources in the current configuration of the primary DC 1060 for the Target application 3001.
  • An alternative configuration may only be suitable when at least one condition in Conditions for metrics on primary DC 6004 are currently fulfilled in the Requirement contents 3004 corresponding to the Target application 3001. Since the conditions 6004 may change in the future, the alternative configurations may be only suitable as temporal solutions, for example until the primary DC 1060 is recovered from a disaster.
  • the management program 2155 executes the flow described in FIG. 8 passing the Target application 3001 that is being iterated.
  • the management program 2155 adds the selected configuration to the DR plan being created. In case when it is the first iteration of 7001, the configuration is added to a new and empty DR plan.
  • the management program 2155 adds to the DR plan the available resources identified from the Available resources table 2153 that form the configurations selected in the step 7002 or 7003. Besides, in the step 7004, the management program 2155 adds to the DR plan the instructions required to achieve the configuration.
  • the selected configuration is an alternative configuration (selected (defined) in the step 7003)
  • the processing acquires the instructions by referring to Configure instructions 6005 described on the Alternative configuration tables with respect to the alternative configuration selected (defined) in the step 7003.
  • the instructions can be computed by examining the current configuration in the primary DC 1060 and, for example, referring to configuration manuals existing in the primary DC 1060. Additional information such as the resource and logical volume of the secondary DC 1070 where the data of each application is being replicated can be acquired from the secondary DC 1070, in order to compute instructions to move the data to a suitable resource according to the selected configuration.
  • the management program 2155 computes the approximate time that will take performing the instructions, for example by referring to Configure instructions 6005 described on the Alternative configuration tables with respect to the alternative configuration selected (defined) in the step 7003.
  • the time is considered long, for example above a threshold defined by the administrator, an alternative simple configuration can be selected to provide service to the application until the selected configuration is properly formed.
  • the management program 2155 summarizes the created DR plan as follows: (A) The management program 2155 sums the total amount of required resources from the Available resources table 2153. (B) The management program 2155 computes the cost of the DR plan by referring to the computed total amount and the Cost per unit 5004 in the Available resources table 2153. (C) The management program 2155 computes the total time required to perform all the required instructions, acquired from Configure instructions 6005 described on the Alternative configuration tables with respect to the alternative configuration selected (defined) in the step 7003.
  • the management program 2155 summarizes the total limitations that would be in the secondary DC 1070, computed in the step 7003, which include the limitations in the SPECs and number of available resources, computed in the steps 8004 and 8008, and the reduction or removal of requirements, computed in the 8007.
  • the management program 2155 finishes the DR plan by adding the summary computed.
  • the management program 2155 stores the DR plan to a safe place, for example, to a storage area in the management computer 1020.
  • the DR plan can be also sent (shown) to the administrator, for example, to ask for confirmation.
  • the management program 2155 refers to summary of the limitations computed in the step 7005.
  • the processing ends.
  • the processing goes to the step 7007 in order to send an alert to the administrator.
  • the DR plan can be updated often, therefore, rather than asking the administrator to check the DR plan each time, it is useful to compute the limitations and send an alert to the administrator only when there are limitations in the created DR plan.
  • the management program 2155 computes the limitations of the created DR plan and sends them in an alert to the administrator. As a result, the administrator can make decisions and act before a disaster occurs to prepare for the occurrence of a disaster, for example by requesting more available resources. It is important to compute accurately the limitations, so that the administrator can decide, for example, how many resources and with what SPECs should be requested.
  • the management program 2155 further computes the limitations of a configuration by examining the trend and history of the metrics relates with the requirement contents 3004 of the requirement 3002 that the knowledge of configurations 2154 fulfills. With the information of trend and history, the management program 2155 can decompose the requirement in time intervals and find what percentage of the time the requirement has what limitation.
  • the requirement is fulfilled a 100% on weekdays and a 90% on weekends.
  • the management program 2155 can reuse the information trend and history information already retrieved in the step 8006.
  • the alert can, for example, be sent by email or displayed in the output device 2130 of the management computer 1020. Administrator may set filters on the limitations, for example by degree of limitation, in order to avoid receiving alerts that are not important. The alert can be sent to a different administrator based, for example, on the type of requirement that cannot be fulfilled because of a limitation.
  • the management program 2155 determines the sets of resources, metrics, and threshold that need to be monitored to detect a need for DR plan update, and sets alerts to trigger the DR plan update when a determined metric in a determined resources reaches a determined threshold.
  • This step is required because the metrics that the processing has used as evidence to accept an alternative configuration and reduce or remove a requirement may change often.
  • simply updating the DR plan in a regular time interval is not suitable because, when updated often (short time interval), the process consumes computational resources (e.g. the CPU 2120), and may cause congestion in the Management network 2400, and when not updated often (long time interval), a situation where a created DR plan is outdated and not suitable to fulfill the requirements may occur.
  • the management program 2155 performs the following process for each Target application 3001 that cannot have the same configuration in the secondary DC 1070 (returned No in the step 7002).
  • the management program 2155 selects all the alternative configurations (which can be identified from the Configuration knowledge tables 2154) that are suitable for at least one requirement 3002 of one Target application 3001.
  • the selection of the suitable configurations is done by matching the Requirement type 3003 to the Target requirement type 6002, and matching the Requirement contents 3004 to the Target requirement contents 6003, in the Application requirements table 2151 and each of the configuration knowledge tables 2154.
  • the management program 2155 checks whether the conditions included in the Conditions 6004 are fulfilled by referring to the Conditions 6004 and retrieving the information of the metrics that appear in the Conditions 6004, in the Resource ids 4002 that match the resources that appear in the Conditions 6004, and are associated to the Application id 4001 that matches the Target application 3001 that the step 7008 is considering. In the case when the Conditions 6004 are not fulfilled, the management program 2155 determines the thresholds, metrics and resources according to the conditions 6004 in order to detect when the Conditions 6004 become fulfilled.
  • the management program 2155 refers to the relevant metrics 6006, retrieves the available resources in table 2153 and retrieves from the primary DC 1060 the information of the Relevant metrics 6006 needed to compute the required amount and SPECs of resources, according to the demand of resources of each type 6007. Then, the management program 2155 determines the thresholds, metrics and resources according to the relevant metrics 6006 in order to detect when the requirement of amount and/or SPEC of available resources change.
  • the management program 2155 determines the resources, metrics and thresholds in order to detect when the evidence to reduce or remove a requirement in the step 8007 is not valid anymore.
  • the management program 2155 sets alerts to be sent to the management computer 1020 requesting a DR plan update.
  • An alert is set to be sent when a determined metric in a determined resource reaches a determined threshold.
  • An alert can be set, for example, in the management computer of the primary DC 1060 or in an agent within a resource of the primary DC 1060.
  • the management program 2155 can avoid or reduce the need for acquire metrics from primary DC 1060 and compute the requirement of available resources by reusing the metrics acquired and the computation already done in the step 7003.
  • FIG. 8 is a flowchart for explaining the step 7003 to compute and select the most suitable alternative configuration among configurations identified from the tables 2154 that can be achieved with the available resources 2153 in order to fulfill a requirement 2151.
  • the step 7003 is a step executed by the management program 2155.
  • the management program 2155 receives as input a Target application 3001 from the Application requirements table 2151.
  • the management program 2155 refers to the Target application 3001 and the Requirement id 3002 in the Application requirements table 2151 and iterates through each Requirement id 3002 of the input Target application 3001.
  • the requirements can have associated a priority so that the important requirements are iterated before the non-important requirements, so that, for example, the important requirements have more available resources.
  • the management program 2155 selects all the configurations that can be identified from the tables 2154 and that are suitable for the Target Application 3001 and the Requirement id 3002 being iterated.
  • the management program 2155 executes the flow described in FIG. 9 passing the requirement id 3002 being iterated and the Target Application 3001 that is input of the step 7003.
  • the management program 2155 iterates through each configuration acquired in the step 8002.
  • the management program 2155 computes the amount of resources and the SPECs required to form the identified (acquired) configuration being iterated. It is important for the management program 2155 to compute the needed amount of resources and SPECs beforehand to understand whether the configuration is possible and compare several configurations in terms of the limitations to fulfill a requirement.
  • the management program 2155 matches the input Target application 3001 with the Application id 4001 in the primary DC configuration table 2152 to retrieve the resources that form the current configuration of the application 4001.
  • the management program 2155 acquires the information of the relevant metrics for each resource retrieved.
  • the relevant metrics of a resource 4002 are the metrics in the pairs resource-metric from the Relevant metrics 6006 where the resource of the Relevant metrics 6006 matches the retrieved resource type 4003.
  • management program 2155 computes the best combination of available resources from tables 2153 that are needed in the secondary DC 1070 as instructed in Resources 6007 in the tables 2154 being iterated, by using the acquired information of the Relevant metrics 6006, the Contents 3004 of the Requirement id 3002 being iterated, and the Available resources table 2153 taking into account for the Available units 5004 value the units already selected for the DR plan being created.
  • the combination of available resources from the table 2153 is not enough to achieve the needed resources as instructed in the Resources 6007, the possible combination that best fulfills the demand is computed.
  • the management program 2155 refers to result of the step 8004.
  • the processing goes iterates again to the step 8003. Otherwise (No in the step 8005), the processing goes to the step 8006 in order to try to mitigate the need for resources.
  • the management program 2155 checks whether the requirement being iterated is actually useful or not for the input Target application 3001.
  • the management program 2155 selects resources from the primary DC configuration table 2152 by matching the Application id 4001 with the Target application 3001 of the input requirement.
  • the management program 2155 retrieves the near future trend of the metrics and activities of the selected resources and compares them with the Requirement contents 3004.
  • the processing goes to the step 8007. Otherwise (No in the step 8006), the processing iterates again in the step 8003.
  • the time that defines near future trend is, for example, a value based in the time that the primary DC may take to recover after a disaster, and can be defined, for example, by the administrator.
  • the management program 2155 refers to the trend information retrieved in the step 8006 to reduce or remove the input requirement.
  • the requirement is reduced to match the actual trend. For example, when the requirement is 2G IOPS and the actual trend is 1.5G IOPS, the requirement is reduced to 1.5G IOPS.
  • the requirement is removed. For example, when the requirement is to have available a function and the actual trend is that the function has not been used for a long time, the requirement is removed.
  • the management program 2155 reduces or removes the input requirement for two main reasons: the administrator only want a temporal configuration until the primary DC is recovered, and the requirement cannot be fulfilled with the current available resources.
  • the management program 2155 redo the processing done in the step 8004.
  • the management program 2155 selects the best combination of configurations from all the configurations returned in 8002 in each iteration of the step 8001.
  • the management program 2155 computes all possible combinations of configurations, where a combination of configurations is a set of configurations that contains one or more configurations identified from the tables 2154 for each requirement of the input Target Application, and all the contained configurations are compatible to each other, according to the Compatible with other configurations 6008 value of each configuration identified from the tables 2154.
  • a combination of configurations does not need to have a configuration for a requirement that has been removed in the step 7007.
  • the best combination of configurations is selected as the combination such that there are enough available resources to form all the configurations in the combination at the same time.
  • the best combination is further selected as the combination that can fulfill all the requirements of the input application without reducing or removing one or more of the requirements.
  • the best combination is further selected, for example, based on the cost of its resources, based on the origin of its resources, and based on a punctuation made to each resource by the administrator.
  • the best combination is selected as the combination that can fulfill better the reduced requirement with the available resources. For example, if the requirement is 2G IOPS, the best combination is the combination that can achieve higher IOPS.
  • FIG. 9 is a flowchart for explaining the step 8002 to acquire all configurations.
  • the step 8002 is a step executed within the step 7003, executed within the management program 2155.
  • the management program 2155 receives as input a Target application 3001 and a Requirement id 3002 from the Application requirements table 2151.
  • the management program 2155 filters the configurations identified from the tables 2154 by selecting only the configurations that are suitable for the requirement referred by the input Requirement id 3002.
  • the management program 2155 acquires the Requirement type 3003 and the Requirement contents 3004 associated to the input Target application 3001 and Requirement id 3002 in the Application requirements table 2151.
  • the management program 2155 selects alternative configurations by matching the acquired Requirement type 3003 to the Target requirement type 6002 and the acquired Requirement contents 3004 to the Target requirement contents 6003, in each of the Configuration knowledge tables 2154.
  • the management program 2155 computes the suitability of the alternative configurations returned by the step 9001 according to the metrics from the input Target application 3001.
  • the suitability is computed as follows. For each returned configuration, the management program 2155 checks whether the conditions in 6004 are fulfilled by referring to 6004 and retrieving the information of the metrics that appear in 6004, in the resources 4003 that match the resources that appear in 6004, and are associated to the application 4001 that matches the input Target application 3001. In the case when the conditions 6004 are fulfilled, the management program 2155 considers the alternative configuration suitable. Otherwise, management program 2155 considers the alternative configuration not suitable.
  • the management program 2155 acquires and returns the alternative configurations that were considered suitable according to the metrics in the step 9002.
  • the management program 2155 is started by the administrator.
  • the processing goes to the step 7001.
  • the management program 2155 iterates starting with the first Target application 3001 "App01".
  • the processing goes to the step 7002.
  • the management program 2155 retrieves the current configuration of "App01", from the table 2152, which is formed by the host computer "Server A1" and the storage apparatus "Storage A2" corresponding to the Application id 4001 "App01".
  • the management program 2155 finds the available resource "R001", from the table 2153, that matches "Server A1” because the Resource SPECs 4004 of "Server A1" is included in the Resource SPECs 5003 of the Id 5001 "R001" and the Type 5002 "Server".
  • the management program 2155 does not find an available resource that matches "Storage A2" because the Resource SPECs 4004 of "Storage A2" is not included in any of the Resource SPECs 5003 of Type 5002 "Storage".
  • the "Storage A2" SPECs indicate a performance of 2G IOPS and a snapshot function, which are not found in an available resource of type "Storage”. Therefore, the step 7002 returns No.
  • step 7003 is a step explained with the flow in the FIG. 8.
  • the processing goes to the step 8001 passing as input the application "App01".
  • the management program 2155 retrieves from the table 2151 the requirements "Rq001" and "Rq002" of "App01".
  • the management program 2155 iterates starting from "Rq001".
  • step 8002 which is a step explained with the flow in the FIG. 9.
  • the processing goes to the step 9001 passing as input the application "App01" and the requirement "Rq001".
  • the management program 2155 finds the configuration "S001", from the tables 2154, whose configurations (e.g. the Type 6002 and the Contents 6003) matches requirements (e.g. the Type 3003 and the Contents 3004) of "Rq001".
  • the processing goes to the step 9002.
  • the management program 2155 refers to the Conditions 6004 of "S001". Since "S001" has no conditions, the management program 2155 considers "S001" as a suitable alternative configuration for "Rq001" of "App01".
  • the processing goes to the step 9003.
  • the management program 2155 returns the alternative configuration "S001".
  • the processing goes to the step 8003.
  • the management program 2155 iterates "S001".
  • the processing goes to the step 8004.
  • the management program 2155 retrieves the Relevant metrics 6006 of the alternative configuration being iterated "S001", which is the disk usage trend for all the snapshots kept for the resource "storage”.
  • the management program 2155 retrieves from the table 2152 the Resource type 4003 "storage” that form the current configuration of the input application "App01”, which is "Storage A2”.
  • the management program 2155 retrieves the relevant metric from the storage apparatus "Storage A2" and finds that the disk usage trend for all the snapshots kept is 1 TB, and therefore concludes that, according to the available resources table 2153, the configuration "S001" for the "App01" requires one resource "R002".
  • the processing goes to the step 8005.
  • the management program 2155 checks the Available units 5004 and finds that there is one available resource "R002". Therefore, the management program 2155 returns Yes.
  • the processing goes to the step 8003.
  • the management program 2155 finishes the iteration.
  • the processing goes to the step 8001.
  • the management program 2155 iterates using the requirement "Rq002".
  • step 8002 which is a step explained with the flow in the FIG. 9.
  • the processing goes to the step 9001 passing as input the application "App01" and the requirement "Rq002".
  • the management program 2155 finds, from the tables 2154, the alternative configurations "S002" and “S003" whose configurations match the requirements of "Rq002".
  • the processing goes to the step 9002.
  • the management program 2155 refers to the Conditions 6004 of "S002" and "S003".
  • the management program 2155 Since the Conditions of "S002" refer to a storage resource, the management program 2155 refers to the table 2152 and retrieves the "Storage A2" that forms the configuration of the input application "App01". The management program 2155 retrieves the metric "read / write rate" from the storage apparatus "Storage A2". Since mainly the workload is not read, the management program 2155 considers "S002" as a configuration not suitable for "Rq002" of "App01".
  • the management program 2155 Since the conditions of "S003" refer to a server resource, the management program 2155 refers to the table 2152 and retrieves the host computer "Server A1" that forms the configuration of the input application "App01". The management program 2155 retrieves the metric "degree of I/O access dispersion before cache” from the host computer "Server A1". Since the I/O access before cache is focused in a few values and not dispersed, the management program 2155 considers "S003" as a configuration suitable for "Rq002" of "App01".
  • the processing goes to the step 9003.
  • the management program 2155 returns the alternative configuration "S003".
  • the processing goes to the step 8003.
  • the management program 2155 iterates "S003".
  • the processing goes to the step 8004.
  • the management program 2155 retrieves the Relevant metrics 6006 of the alternative configuration being iterated "S003", which is the amount of I/O unique access before cache for the resource server.
  • the management program 2155 retrieves from the table 2152 the Resources type 4003 "server” that form the current configuration of the input application "App01", which is "Server A1".
  • the management program 2155 retrieves the relevant metric from the host computer "Server A1" and finds that the max. Amount of I/O unique access before cache is 2GB, and therefore concludes that, according to the table 2153, the configuration "S001" for the "App01" requires one resource "R004".
  • the processing goes to the step 8005.
  • the management program 2155 checks the Available units 5004 and finds that there is one available resource "R004". Therefore, the management program 2155 returns Yes.
  • the processing goes to the step 8003.
  • the management program 2155 finishes the iteration.
  • the processing goes to the step 8001.
  • the management program 2155 finishes the iteration.
  • the processing goes to the 8009.
  • the management program 2155 checks that "S001" and "S003" are compatible and selects them as the best combination of alternative configurations to fulfill the requirements of "App01".
  • the processing goes to the step 7004.
  • the management program 2155 adds to a new and empty DR plan the instruction to acquire the resources "R002" and "R004", and the instructions to setup the alternative configurations.
  • the requirements of the application are few. However, an application usually would have requirements that ensure the use of a storage resource and a server resource.
  • the processing goes to the step 7001.
  • the management program 2155 iterates using the application "App02".
  • the processing goes to the step 7002.
  • the management program 2155 retrieves the current configuration of "App02", from the table 2152, which is formed by the host computer "Server A2", the storage apparatus "Storage A2".
  • the management program 2155 finds, from the table 2153, the available resource "R001” combined with "R004" that matches "Server A2".
  • the management program 2155 does not find an available resource that matches "Storage A2".
  • the "Storage A2" SPECs indicate a performance of 1.5G IOPS, which is not found in an available resource of type "Storage". Therefore, the management program 2155 returns No.
  • step 7003 is a step explained with the flow in the FIG. 8.
  • the processing goes to the step 8001 passing as input the application "App02".
  • the management program 2155 retrieves from the table 2125 the requirement "Rq003" of "App02".
  • the management program 2155 iterates using "Rq003".
  • step 8002 which is a step explained with the flow in the FIG. 9.
  • the processing goes to the step 9001 passing as input the application "App02" and the requirement "Rq003".
  • the management program 2155 finds, from the tables 2154, the alternative configurations "S002" and “S003" that match "Rq003".
  • the processing goes to the step 9002.
  • the management program 2155 refers to the Conditions 6004 of "S002" and "S003".
  • the management program 2155 refers to the table 2152 and retrieves the storage "Storage A2" that form the configuration of the input application "App02".
  • the management program 2155 retrieves the metric "read / write rate” from the storage apparatus "Storage A2". Since mainly the workload is read, the management program 2155 considers "S002" as a configuration suitable for "Rq003" of "App02".
  • the management program 2155 refers to the table 2152 and retrieves the host computer "Server A2" that forms the configuration of the input application "App02".
  • the management program 2155 retrieves the metric "degree of I/O access dispersion before cache” from the host computer "Server A2". Since the I/O access before cache is dispersed, the management program 2155 considers "S003" as a configuration not suitable for "Rq003" of "App02".
  • the processing goes to the step 9003.
  • the management program 2155 returns the alternative configuration "S002".
  • the processing goes to the step 8003.
  • the management program 2155 iterates "S002".
  • the processing goes to the step 8004.
  • the management program 2155 retrieves the Relevant metrics 6006 of the alternative configuration being iterated "S002", which are the read IOPS and write IOPS for the resource storage.
  • the management program 2155 retrieves from the table 2152 the resources of type 4003 "storage" that form the current configuration of the input application "App01", which is "Storage A2".
  • the processing goes to the step 8005.
  • the management program 2155 finds, from the table 2153, that there is one resource "R005" available, but only five resources “R002" available. Therefore, the management program 2155 returns No.
  • the processing goes to the step 8006.
  • the management program 2155 retrieves the current configuration of "App02", from the table 2152, which is formed by the host computer "Server A2", the storage apparatus "Storage A2".
  • the management program 2155 respectively retrieves from the host computer "Server A2” and the storage apparatus "Storage A2" the metrics of IOPS workload, according to the contents of "Rq003". Since the IOPS workload is 1.6G IOPS, the management program 2155 concludes that the requirement is useful (1.6 GB IOPS > 1.5 GB IOPS). Therefore, the management program 2155 returns No.
  • the processing goes to the step 8003.
  • the management program 2155 finishes the iteration.
  • the processing goes to the step 8001.
  • the management program 2155 finishes the iteration.
  • the processing goes to the step 8009.
  • the management program 2155 selects "S003" the best combination of alternative configurations to fulfill the requirements of "App01". Since “S002 is not able to fulfill the requirements of "App02", “S002” is not selected as the best combination of alternative configurations. Further, since “S002" is not compatible with “S003” according to the Compatible with configurations 6008 of "S002" (Fig. 6B), "S002" cannot be selected as the best combination of alternative configurations together with "S003".
  • the processing goes to the step 7004.
  • the management program 2155 adds to the DR plan being created the instruction to acquire one resource "R005" and five resources "R002", and the instructions to setup the alternative configurations.
  • the requirements of the application are few. However, an application usually would have requirements that ensure the use of a storage resource and a server resource.
  • the processing goes to the step 7001.
  • the management program 2155 finishes the iteration.
  • the processing goes to the 7005.
  • the management program 2155 summarizes the DR plan.
  • the processing goes to the step 7006.
  • the total number of "R002" is six (one “R002” as for “App01” and five “R002” as for “App02"), however the total number is larger than the Available units 5004 "5" of "R002". Therefore there are not enough resources as needed for the selected configuration "S003", the management program 2155 concludes that the DR plan has limitations. Therefore the management program 2155 returns Yes.
  • the processing goes to the step 7007.
  • the processing goes to the step 7008.
  • the management program 2155 sets the following alerts. For the application "App01" and the alternative configuration "S001”, which is suitable, the management program 2155 sets an alert to detect when the disk usage trend of all snapshots kept is bigger than the disk capacity currently provided, i.e. the capacity of the available resource "R002". For the application “App01” and the alternative configuration "S002”, which is suitable, the management program 2155 sets alert to detect when the "App01” I/O workload becomes not mainly read. For the application "App01” and the alternative configuration "S003", which is not suitable, the management program 2155 sets alert to detect when the "App01” I/O workload becomes focused and not dispersed.
  • the management program 2155 sets alert to detect when the "App01" I/O workload becomes mainly read.
  • the management program 2155 sets alert to detect when the amount of unique I/O workload in "App01" becomes bigger than the size of the available resource selected "R004".
  • 1020 Management computer 1060: Primary DC 1080: Secondary DC

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A management system is configured to identify one or more requirements with respect to the primary DC, identify one or more suitable resources which are resources suitable to the identified requirements, from among available resources which are resources that can be used in the secondary DC, based on SPECs of the available resources and metrics (e.g. resources usage and resources workload) from the primary DC, and create a DR (Disaster Recovery) plan by mapping configurations with respect to the identified suitable resources to at least one of the identified requirements.

Description

MANAGEMENT SYSTEM AND MANAGEMENT METHOD FOR MANAGING COMPUTER SYSTEM
The present invention generally relates to reconfiguring resources in a data center.
Recently it is becoming a trend to use low-end DCs (Data Centers) as secondary data centers to support disaster recovery (DR) and avoid having high-end resources unused until the time when a disaster may occur. secondary DC keeps the minimum resources to fulfill the replication data requirement, known as Recovery Point Objective (RPO), and allow quick recovery of the data center services, known as Recovery Time Objective (RTO). On failover to a secondary data center, the service is continued by using, for example, resources that are available to purchase, and resources that are available to rent from a cloud provider.
Although simple data replication to a low-end DC can avoid data loss (RPO) and allow resuming services soon (RTO), there is a need to reconfigure the secondary DC to fulfill performance, reliability and security requirements.
US8468133
As a method to fulfill performance in the event of failover to a secondary data center, for example, Patent Literature 1 discloses a workload learning method as follows. Before a disaster occurs, the method keeps learning the I/O workload from the primary data center, and applying the learned workload to the secondary data center. As a result, the data being replicated in the secondary data center is placed in the appropriate tier according to the workload. Then, in the event of failover, the data in the secondary data center is already in a suitable tier to fulfill the same performance requirement as the primary data center.
The related art disclosed by Patent Literature 1 is used to achieve similar performance in a secondary DC when the primary and the secondary DCs have the same or similar resources. However, the technology cannot be used when there is a lack or limitation of resources. For example, as such a case, there might be a case where the primary DC is using two tiers with fast and slow disks but the secondary DC has only one tier of slow disks.
In that situation, a method to compute an alternative configuration might be required. However the method is not included in the related art. Further, the method is not known.
Therefore, the problem that remains is to compute at least one suitable configuration (e.g. the most suitable configuration) of resources (e.g. what resources, how many, how are combined) available in the secondary data center that would fulfill the performance, reliability and security requirements in the event of failover, and if not fully possible, compute precisely the limitations so that administrators can act beforehand. The solution to the remaining problem is not trivial because at least one suitable configuration (e.g. the most suitable configuration) of available resources changes often for multiple situations such as changes in the data center requirements, changes in the data center metrics such as resources usage and resources workload, and changes in the available resources in the secondary data center.
A management system is configured to identify one or more requirements with respect to the primary DC, identify one or more suitable resources which are resources suitable to the identified requirements, from among available resources which are resources that can be used in the secondary DC, based on SPECs of the available resources and metrics (e.g. resources usage and resources workload) from the primary DC, and create a DR (Disaster Recovery) plan by mapping configurations with respect to the identified suitable resources to at least one of the identified requirements.
Other features and advantages of the present invention will become more apparent from the following detailed description of the invention, when taken in conjunction with the accompanying exemplary drawings.
According to the present invention, in the event of disaster a primary DC can failover to a secondary DC where the features (specifications) of the available resources are limited, and an automated process or an administrator can reconfigure quickly the secondary DC by following an up-to-date disaster recovery plan created and checked beforehand with the suitable configuration of available resources to fulfill the data center requirements.
Fig. 1 is a diagram showing a brief description of the actions and data used in an embodiment of the present invention. Fig. 2 is a diagram showing a schematic configuration of a storage system and a management computer coupled to the storage system. Fig. 3 shows a configuration example of an Application requirements table. Fig. 4 shows a configuration example of a Primary DC configuration table. Fig. 5 shows a configuration example of an Available resources table. Fig. 6A shows a first configuration example of a Configuration knowledge table. Fig. 6B shows a second configuration example of a Configuration knowledge table. Fig. 6C shows a third configuration example of a Configuration knowledge table. Fig.7 shows a flow chart for explaining processing to create or update a DR (Disaster Recovery) plan. Fig. 8 shows a flow chart for explaining processing to compute and select the most suitable alternative configuration to fulfill the requirements of an application. Fig. 9 shows a flow chart for explaining processing to acquire all suitable alternative configurations for an application.
An embodiment of the present invention will be described below.
Although the expression "aaa table" is used to describe information in the following description, the information may be expressed in a data structure other than a table. In order to denote independence from the data structure, the term "aaa table" can be replaced with the term "aaa information".
Although an ID (identifier) is used as information for identifying a target in the following description, the ID can be replaced with other kinds of identification information.
In the following description, the term "program" is occasionally used as a subjective to describe a process. However, the program is executed by a processor (for example, a CPU (Central Processing Unit)) to perform a determined process using at least one of a storage unit (e.g. a memory) and an interface device (e.g. a communication port). Therefore, the subject of the process can be a processor. A part of the process may be performed by a hardware circuit, and the processor may comprise the hardware circuit. A program may be installed from a program source. The program source may be a program distribution server or a computer-readable storage medium, for example.
In the following description, a collection of one or more computers configured to manage logical volumes and display information for display may be referred to as a "management system". In the case where a management computer displays the information for display, the management computer may serve as the management system. In addition, a combination of the management computer and a display computer also may serve as the management system. In order to enhance the speed and the reliability of a management process, a plurality of computers may be used to achieve a process that is identical or similar to that performed by the management computer. In this case, the plurality of computers (which may include the display computer in the case where the display computer is used for display) may serve as the management system. In the embodiment, the management computer serves as the management system. The phrase "the management computer displays information" may denote that information is displayed on a display device possessed by the management computer, or may denote that information for display is transmitted to the display computer coupled to the management computer. In the latter case, the display computer displays information represented by the information for display on the display device possessed by the display computer.
<Overview of the Embodiment>
An overview of the embodiment is explained by referring to the diagram shown in FIG. 1.
The management method according to the embodiment is briefly described by the following 10 steps.
In the Step 1, the DC administrator 1010 requests to the management computer 1020 the creation of a DR plan (e.g. the creation of a new DR plan or the update of an existing plan) to support DR (Disaster Recovery) in the DC, where a DR plan is a plan to reconfigure the secondary DC 1070 in the event of a failover from the primary DC 1060 to the secondary DC 1070.
In the Step 2, the management computer 1020 acquires (identifies) the requirements of the DC (e.g. the requirements of applications executed in the primary DC 1060), for example performance, reliability and security requirements, from the Application Requirements Table 2151 describing the SLA (Service Level Agreement) and so on of the DC, if available, and the configuration and metrics in the primary DC 1060.
In the Step 3, the management computer 1020 acquires (identifies) the available resources that can be used in the secondary DC 1070 in the event of failover, from, for example, an Online server of a cloud provider 1030. The resources currently in the secondary DC 1070 are also acquired as available resources.
In the Step 4, the management computer 1030 acquires alternative configurations based on the configuration knowledge tables 2154 describing knowledge of configurations, to be used when the available resources are not enough to achieve the same configuration in the secondary DC 1070 as in the primary DC 1060.
In the Step 5, the management computer 1030 acquires the configuration and metrics from the primary DC 1060 in order to determine the suitability of the alternative configurations.
In the Step 6, the management computer 1030 creates the DR plan 1021 by mapping the requirements in the DC to configurations formed by available resources.
In the Step 7, the management computer 1030 computes the limitations of the created DR plan 1021 and sends (shows) the DR plan 1021 and the limitations to the DC administrator 1010.
In the Step 8, the DC administrator 1010 checks the DR plan and confirms the DR plan in the management computer 1030. The management computer 1030 saves the DR plan 1021 in a location (e.g. a storage area in the management computer 1030) safe and accessible in the event of disaster in the primary DC 1060.
In the Step 9, the management computer 1030 detects when the DR plan may have become outdated and performs a DR plan update. The process cycles again from the Step 2.
In the Step 10, in the event of disaster in the primary DC 1060, the management computer 1030 retrieves the DR plan 1021 and reconfigures the secondary DC 1070 according to the DR plan 1021.
The Steps 7 and 8 are optional. The Steps 2, 3, and 4 can be done by an independent process at any time, for example, in the management computer 1030.
<Configuration of DC with Support for DR>
FIG. 2 shows a general configuration of a storage system according to the embodiment of the invention.
The storage system includes one (or multiple) storage apparatus 2704, and multiple (or one) host computers 2701 and 2702, which are connected to the storage apparatus 2704 via a data network 2703. The secondary DC 1070 includes one (or multiple) storage apparatus 2804 and multiple (or one) host computers 2801 and 2802, which are connected to the storage apparatus 2804 via a data network 2803. The host computers 2701, 2702, 2801 and 2802, and the storage apparatuses 2704 and 2804 are connected to a management computer 1020 via a management network 2400. The storage apparatuses 2704 and 2804 are connected each other via a remote network 2600. At least one of the storage apparatuses 2704 and 2804 includes one or more storage devices (typically, non-volatile storage devices like HDDs (Hard disk drives) or SSDs (Solid State Drives)) and a controller for controlling read/write data from/to the one or more storage devices. The one or more storage devices may be one or more RAID (Redundant Array of Independent (or Inexpensive) Disks) groups.
The data network 2703 herein is a storage area network (SAN) but may be an IP (Internet Protocol) network or any other data communication network. The management network 2400 herein is an IP network, but may be a SAN or any other data communication network.
The management computer 1020 herein is directly connected to the storage apparatuses 2704 and 2804 but may acquire necessary information via at least one of the host computers 2071, 2702, 2801 and 2802. The data networks 2703 and 2803 and the management network 2400 herein are separately provided but the data networks 2703 and 2803 may also serve as the management network 2400. Or, the management computer 1020 and at least one of the host computers 2071, 2702, 2801 and 2802 may be one single computer unit. For convenience of description, the storage system shown in FIG. 2 includes two storage apparatuses 2704 and 2804, four host computers 2071, 2702, 2801 and 2802, and one management computer 1020. In the present invention, however, the number of those units is not limited to this.
In this embodiment, a set of the host computers 2071, 2702, 2801 and 2802, the storage apparatuses 2704 and 2804 and the data networks 2703 and 2803 for connecting the two formers is herein referred to as a DC. A plurality of DCs is typically provided at geographically separated locations. This is because, if a DC at a certain location is damaged and cannot continue its operations, another DC at a separated location which is not damaged can take over them, also referred as failover. In FIG. 2, the storage system includes a primary DC 1060 and a secondary DC 1070 which is a backup DC of the primary DC 1060. This configuration is referred to as a two data center (2DC) configuration. There may be multiple primary DCs for one secondary DC, or multiple secondary DCs for one primary DC.
In the 2DC configuration, a remote copy is performed between the primary DC 1060 and the secondary DC 1070 via the remote network 2600. The remote copy herein means a technique of duplicating a data by copying a data from a volume in a storage apparatus into a volume in another storage apparatus. Using the remote copy technique, if a volume has any trouble and cannot perform its operations, another volume can take over them using a duplicated data stored therein. Two volumes as a copy source and a copy destination in a remote copy relationship are collectively called a copy pair. For convenience of description, the storage system of the present invention includes two DCs. However, the number of those units is not limited to this.
In the 2DC configuration, the secondary DC may have only the minimum amount of resources running to ensure that a backup of the primary DC data is kept safe, and in the event of failover resources are set in the secondary DC to continue the service of the applications that were in the primary DC.
The management computer 1020 includes an input device 2110 (e.g. a keyboard and a pointing device), an output device 2130 (e.g. a display device), a storage resource including a memory 2150, a management I/F for communicating via the Management network 2400, and a CPU 2120 coupled to thereof. The memory 2150 stores programs and management tables. The management tables include an Application requirements table 2151, a Primary DC configuration table 2152, an Available resources table 2153, and Configuration knowledge tables 2154. The program is, for example, a management program 2155.
Applications (application programs) are configured to be executed in at least one of the host computers 2701, 2702, 2801 and 2802. Specifically, before DR, applications are executed in at least one of the host computers 2701 and 2702. After the DR, applications will be executed in at least one of host computers 2801 and 2802. At least one of the host computers 2701, 2702, 2801 and 2802 is configured to perform I/O to at least one of the storage apparatuses 2704 and 2804 (write/read data to/from at least one of the storage apparatuses 2704 and 2804).
<Application requirements table 2151>
FIG. 3. shows a configuration example of the Application requirements table 2151.
The Application requirements table 2151 is a table which manages the requirements of each application in the primary DC 1060.
The Application requirements table 2151 has, as configuration information, a Target application 3001, which is information to identify an application in the primary DC 1060, a Requirement id 3002, which is information to identify a requirement in the primary DC 1060, a Requirement type 3003, which is information to identify the type of requirement, and a Requirement contents 3004, which is information to describe the managed requirement.
Each requirement in the primary DC 1060 is represented by a row in the Application requirements table 2151. Requirements are added, modified and removed, for example, by the administrator action, and by an automated process, for example running in the management computer 1020, that retrieves the SLA (Service Level Agreement) of the primary DC 1060, retrieves special SLAs for the event of disaster, and infers requirements by monitoring the configuration and dynamic metrics of the primary DC 1060.
The Requirement type 3003 contains one of a set of values predefined by the management computer 1020. Requirements of applications typically mean requirements of application programs executed in host computers.
It is seen from the Application requirements table 2151 in FIG. 3 that an application with id "App01" has a reliability requirement of id "Rq001" consisting in a need for snapshot functionality.
<Primary DC configuration table 2152>
FIG. 4. shows a configuration example of the Primary DC configuration table 2152.
The Primary DC configuration table 2152 is a table which manages the configuration and capabilities of the current resources in the primary DC 1060.
The Primary DC configuration table 2152 has, an Application id 4001, which is the information to identify an application in the primary DC 1060, a Resource id 4002, which is information to identify a resource present in the primary DC 1060, a Resource type 4003, which is information to identify the type of resource, and a Resource SPECs (specifications or features) provided for the application 4004, which is information to describe the current capabilities of the managed resource being used or provided to the application.
Each resource present in the primary DC 1060 is represented in one or more rows in the primary DC configuration table 2152. Resources are added, modified and removed, for example, by the action of the administrator, and by an automated process, for example running in the management computer 1020, that monitors the primary DC 1060.
The Resource id 4002 is an identifier internal to the DC, and may have no relation with the identifier used to represent a resource that is available. The Resource type 4003 contains one of a set of values predefined by the management computer 1020, indicating a type of resource. The Resource SPECs provided for the application 4004 contains only the capabilities of the managed resource being used or provided to the application, not the total capabilities. When a resource is used or provided to several applications, the resource appears in one row for each of the applications.
It is seen from the primary DC configuration table 2152 in FIG. 4 that the resource with id "Server A1" of type server provides 1GB of cache and 4GHz of CPU to an application with id "App01".
<Available resources table 2153>
FIG. 5. shows a configuration example of the Available resources table 2153.
The Available resources table 2153 is a table which manages the resources that are available to be used in the secondary DC 1070 in the event of failover.
The Available resources table 2153 has, as configuration information, a Resource id 5001, which is the information for identifying a resource, a Resource type 5002, which is information to identify the type of resource, a Resource SPECs 5003, which is information to describe the current capabilities of a resource and the capabilities that can be added to a resource, a Available units 5004, which is information to count the number of units of the managed resource that are available, a Cost per unit 5005, which is information related to the cost of acquiring and providing one unit of the managed resource to the secondary DC 1070, and an Origin 5006, which is information to describe where the managed resource can be acquired from.
Each available resource is represented by a row in the Available resources table 2153. Available resources are added, modified and removed, for example, by the action of the administrator, and by an automated process, for example running in the management computer 1020, that monitors the current resources in the secondary DC 1070, and receives or requests information from an online service of a resources shop or a cloud provider.
The Resources type 5002 contains one of a set of values predefined by the management computer, indicating a type of resource. The Resource SPECs 5003 refer, for example, to SPECs and available functionality. The Available units 5004 may be a value based on the budget and entered by the administrator. The Origin 5006 may include information and instructions to receive or request further information about the available resources and update the information about the available resources.
It is seen from the Available resources table 2153 in FIG. 5 that a resource is available with id
"R001", that is a server with 1 GB of cache and capability to be combined with cache resources up to a total of 12 GB, that there are five units available at a cost per unit of 2.000$ and can be acquired from the Cloud provider A.
< Configuration knowledge table 2154>
FIG. 6A, FIG. 6B and FIG. 6C show configuration examples of three configuration knowledge tables 2154A, 2154B and 2154C. Hereinafter, "the configuration knowledge table 2154" is any one of configuration knowledge tables 2154.
The configuration knowledge table 2154 is a table which contains the knowledge of an alternative configuration to fulfill a requirement, the conditions that need to be fulfilled to apply the configuration, the instructions to apply the configuration, and the instructions to compute the resources that the configuration needs. Since an alternative configuration may only be suitable when certain conditions are fulfilled, an alternative configuration may be only suitable as a temporal solution, for example until the primary DC 1060 is recovered from a disaster and failback is performed.
The configuration knowledge table 2154 has, as configuration information, a Configuration id 6001, which is the information for identifying a configuration, a Target requirement type 6002, which is information to identify the type of requirement that the configuration is designed to fulfill, the Target Requirement contents 6003, which is information to describe the requirement that the configuration is designed to fulfill, a Conditions for metrics on primary DC 6004, which is a set of any, one or more triplets of Resource, Metric and Condition, and is information to describe what condition, in what metric and in what type of resource need to be fulfilled so that the configuration is suitable to fulfill the requirement 6003, a Configuration instructions 6005, which is information to apply the configuration in a secondary DC 1070, a Relevant metrics on primary DC 6006, which is a set of any, one or more pairs of Resource and Metrics, and is information to describe what metrics on what resources should be acquired from a primary DC 1060 in order to compute the resources that the configuration needs, a Resources needed on secondary DC 6007, which is a set of any, one or more pairs of Resource and Demand, which is information to describe for each type of resources the instructions to compute the amount and SPECs of resources needed, and a Compatible with configurations 6008, which contains any, one or more Configuration ids 6001 and indicates the configurations that are compatible with the described configurations.
Each configuration is represented by one configuration knowledge table 2154. Alternative configurations are added, modified and removed, for example, by the action of the administrator, and by an automated process, for example running in the management computer 1020, that retrieves or receives alternative configurations from an online service.
The Target requirement type 6002 and Target requirement contents 6003 may describe more than one requirement. The configuration instructions may include a script to apply the configuration by a process, for example in the management computer 1020. The Resource in 6004 and the Resource in 6007 contain one of a set of values predefined by the management computer, indicating a type of resource. The Metric in 6004 and the Metric in 6006 contain one of a set of values predefined by the management computer, indicating a metric that can be acquired from the primary DC 1060, and can be include details such as trend of metric, average or metric or maximum value of metric in an interval.
It is seen from the configuration knowledge table 2154A in FIG. 6A an alternative configuration with id "S001" that can fulfill the reliability requirement of having a snapshot function, and the configuration can be applied without conditions in the primary DC 1060 metrics, and the demand on the server and storage resources can be computed by acquiring the disk usage trend from all snapshots kept in the storage resources in the primary DC 1060, and the configuration "S001" is compatible with the configurations "S002" and "S003".
<Management program 2155>
FIG. 7 is a flowchart for explaining processing to create a DR plan based on alternative configurations to be used in the case of a failover from the primary DC 1060 to the secondary DC 1070 in order to reconfigure the secondary DC 1070.
The processing is executed by the management computer 1020, for example, automatically at a regular time interval, when the DC administrator requests a DR plan creation (e.g. the creation of a new DR plan or the update of an existing DR plan) through the input device 2110, when an alert set in the step 7004 is received, or when there is an update in one or more of the tables 2151, 2152, 2153 and 2154 in the memory 2150.
In the step 7001, the management program 2155 refers to the Target application 3001 in the Application requirements table 2151 and iterates through each application with at least one requirement. The applications can have associated a priority so that the important applications are iterated before the non-important applications, so that, for example, the important applications have more available resources.
In the step 7002, the management program 2155 checks whether the same configuration as the current configuration in the primary DC 1060 can be provided in the secondary DC 1070 for the application identified from Target application 3001 being iterated, where "same configuration" refers to having the same number of resources, of the same type and the same or better SPECs relevant for the requirements of the application. "Better SPECs" means, for example, higher I/O performance, larger capacity, higher reliability, and so on.
In the step 7002, the management program 2155 retrieves the current configuration in the primary DC 1060 of the Target application 3001 being iterated by referring to the Primary DC configuration table 2152 and retrieving all the rows whose Application id 4001 matches with the Target application 3001 value. The management program 2155 retrieves the Available resources from the table 2153, taking into account for the Available units 5004 value the units already selected for the DR plan being created. For each resource that form the retrieved current configuration, The management program 2155 searches a suitable available resource by matching the Resource type 4003 to the same Resource type 5002, and the Resource SPECs 5003 provided for the application 4004 to the same or better Resource SPEC 5003. The suitable available resource can be combined with more available resource if possible in order to match the SPECs, for example, combine a server cache resource with a server resource. Additionally, the resource SPECs can be matched by referring only to the SPECs that are relevant for the requirements 3002 of the Target application 3001 being iterated.
In the case when a combination of available resources 2153 allows having the same configuration as the configuration retrieved from the primary DC 1060 (Yes in the step 7002), the processing goes to the step 7004.
Otherwise (No in the step 7002), the processing goes to the step 7003.
In the step 7002, the management program 2155 gives more importance to the current configuration in the primary DC 1060 than to the Requirement contents 3004 in the Application requirements table 2151 because assumes that the configuration in primary DC 1060 is a configuration chosen by the administrator, well tested, and being used for long time. Therefore, the current configuration in the primary DC 1060 has guaranties to achieve the Requirement contents 3004 of all the requirements of the Target Application 3001 being iterated. However, when this assumption may not be true, the processing can skip the step 7002 and go the step 7003.
In the 7003, the management program 2155 computes several combinations of alternative configurations and selects the most suitable to fulfill all the requirements of the application being iterated, where "alternative configuration" refers to a configuration where at least one of Resource type 5002 and Resource SPECs provided for the application 4004 of resources is different from the one of resources in the current configuration of the primary DC 1060 for the Target application 3001. An alternative configuration may only be suitable when at least one condition in Conditions for metrics on primary DC 6004 are currently fulfilled in the Requirement contents 3004 corresponding to the Target application 3001. Since the conditions 6004 may change in the future, the alternative configurations may be only suitable as temporal solutions, for example until the primary DC 1060 is recovered from a disaster.
In the step 7003, the management program 2155 executes the flow described in FIG. 8 passing the Target application 3001 that is being iterated.
In the step 7004, the management program 2155 adds the selected configuration to the DR plan being created. In case when it is the first iteration of 7001, the configuration is added to a new and empty DR plan. The management program 2155 adds to the DR plan the available resources identified from the Available resources table 2153 that form the configurations selected in the step 7002 or 7003. Besides, in the step 7004, the management program 2155 adds to the DR plan the instructions required to achieve the configuration. When the selected configuration is an alternative configuration (selected (defined) in the step 7003), the processing acquires the instructions by referring to Configure instructions 6005 described on the Alternative configuration tables with respect to the alternative configuration selected (defined) in the step 7003. Otherwise (selected in the step 7002), the instructions can be computed by examining the current configuration in the primary DC 1060 and, for example, referring to configuration manuals existing in the primary DC 1060. Additional information such as the resource and logical volume of the secondary DC 1070 where the data of each application is being replicated can be acquired from the secondary DC 1070, in order to compute instructions to move the data to a suitable resource according to the selected configuration.
Further, in the step 7004, the management program 2155 computes the approximate time that will take performing the instructions, for example by referring to Configure instructions 6005 described on the Alternative configuration tables with respect to the alternative configuration selected (defined) in the step 7003. When the time is considered long, for example above a threshold defined by the administrator, an alternative simple configuration can be selected to provide service to the application until the selected configuration is properly formed.
In the step 7005, the management program 2155 summarizes the created DR plan as follows:
(A) The management program 2155 sums the total amount of required resources from the Available resources table 2153.
(B) The management program 2155 computes the cost of the DR plan by referring to the computed total amount and the Cost per unit 5004 in the Available resources table 2153.
(C) The management program 2155 computes the total time required to perform all the required instructions, acquired from Configure instructions 6005 described on the Alternative configuration tables with respect to the alternative configuration selected (defined) in the step 7003.
(D) The management program 2155 summarizes the total limitations that would be in the secondary DC 1070, computed in the step 7003, which include the limitations in the SPECs and number of available resources, computed in the steps 8004 and 8008, and the reduction or removal of requirements, computed in the 8007.
In the step 7005, the management program 2155 finishes the DR plan by adding the summary computed. The management program 2155 stores the DR plan to a safe place, for example, to a storage area in the management computer 1020. The DR plan can be also sent (shown) to the administrator, for example, to ask for confirmation.
In the 7006, the management program 2155 refers to summary of the limitations computed in the step 7005. When there are no limitations (No in the step 7006), the processing ends. When there is at least one limitation (Yes in the 7006), the processing goes to the step 7007 in order to send an alert to the administrator.
The DR plan can be updated often, therefore, rather than asking the administrator to check the DR plan each time, it is useful to compute the limitations and send an alert to the administrator only when there are limitations in the created DR plan.
In the step 7007, the management program 2155 computes the limitations of the created DR plan and sends them in an alert to the administrator. As a result, the administrator can make decisions and act before a disaster occurs to prepare for the occurrence of a disaster, for example by requesting more available resources. It is important to compute accurately the limitations, so that the administrator can decide, for example, how many resources and with what SPECs should be requested. In the step 7007, the management program 2155 further computes the limitations of a configuration by examining the trend and history of the metrics relates with the requirement contents 3004 of the requirement 3002 that the knowledge of configurations 2154 fulfills. With the information of trend and history, the management program 2155 can decompose the requirement in time intervals and find what percentage of the time the requirement has what limitation. For example, the requirement is fulfilled a 100% on weekdays and a 90% on weekends. For efficiency, the management program 2155 can reuse the information trend and history information already retrieved in the step 8006. The alert can, for example, be sent by email or displayed in the output device 2130 of the management computer 1020. Administrator may set filters on the limitations, for example by degree of limitation, in order to avoid receiving alerts that are not important. The alert can be sent to a different administrator based, for example, on the type of requirement that cannot be fulfilled because of a limitation.
In the step 7008, the management program 2155 determines the sets of resources, metrics, and threshold that need to be monitored to detect a need for DR plan update, and sets alerts to trigger the DR plan update when a determined metric in a determined resources reaches a determined threshold. This step is required because the metrics that the processing has used as evidence to accept an alternative configuration and reduce or remove a requirement may change often. However, simply updating the DR plan in a regular time interval is not suitable because, when updated often (short time interval), the process consumes computational resources (e.g. the CPU 2120), and may cause congestion in the Management network 2400, and when not updated often (long time interval), a situation where a created DR plan is outdated and not suitable to fulfill the requirements may occur.
In the step 7008, the management program 2155 performs the following process for each Target application 3001 that cannot have the same configuration in the secondary DC 1070 (returned No in the step 7002). The management program 2155 selects all the alternative configurations (which can be identified from the Configuration knowledge tables 2154) that are suitable for at least one requirement 3002 of one Target application 3001. The selection of the suitable configurations is done by matching the Requirement type 3003 to the Target requirement type 6002, and matching the Requirement contents 3004 to the Target requirement contents 6003, in the Application requirements table 2151 and each of the configuration knowledge tables 2154. For each selected configuration, the management program 2155 checks whether the conditions included in the Conditions 6004 are fulfilled by referring to the Conditions 6004 and retrieving the information of the metrics that appear in the Conditions 6004, in the Resource ids 4002 that match the resources that appear in the Conditions 6004, and are associated to the Application id 4001 that matches the Target application 3001 that the step 7008 is considering. In the case when the Conditions 6004 are not fulfilled, the management program 2155 determines the thresholds, metrics and resources according to the conditions 6004 in order to detect when the Conditions 6004 become fulfilled. In the case when the Conditions 6004 are fulfilled, the management program 2155 refers to the relevant metrics 6006, retrieves the available resources in table 2153 and retrieves from the primary DC 1060 the information of the Relevant metrics 6006 needed to compute the required amount and SPECs of resources, according to the demand of resources of each type 6007. Then, the management program 2155 determines the thresholds, metrics and resources according to the relevant metrics 6006 in order to detect when the requirement of amount and/or SPEC of available resources change.
Further, when a requirement has been reduced or removed for one of the alternative configurations selected in the step 7003, the management program 2155 determines the resources, metrics and thresholds in order to detect when the evidence to reduce or remove a requirement in the step 8007 is not valid anymore.
Finally, in the step 7008, the management program 2155 sets alerts to be sent to the management computer 1020 requesting a DR plan update. An alert is set to be sent when a determined metric in a determined resource reaches a determined threshold. An alert can be set, for example, in the management computer of the primary DC 1060 or in an agent within a resource of the primary DC 1060.
In the step 7008, the management program 2155 can avoid or reduce the need for acquire metrics from primary DC 1060 and compute the requirement of available resources by reusing the metrics acquired and the computation already done in the step 7003.
<Most suitable alternative configuration computation and selection processing>
FIG. 8 is a flowchart for explaining the step 7003 to compute and select the most suitable alternative configuration among configurations identified from the tables 2154 that can be achieved with the available resources 2153 in order to fulfill a requirement 2151. The step 7003 is a step executed by the management program 2155. The management program 2155 receives as input a Target application 3001 from the Application requirements table 2151.
In the 8001, the management program 2155 refers to the Target application 3001 and the Requirement id 3002 in the Application requirements table 2151 and iterates through each Requirement id 3002 of the input Target application 3001. The requirements can have associated a priority so that the important requirements are iterated before the non-important requirements, so that, for example, the important requirements have more available resources.
In the step 8002, the management program 2155 selects all the configurations that can be identified from the tables 2154 and that are suitable for the Target Application 3001 and the Requirement id 3002 being iterated. The management program 2155 executes the flow described in FIG. 9 passing the requirement id 3002 being iterated and the Target Application 3001 that is input of the step 7003.
In the 8003, the management program 2155 iterates through each configuration acquired in the step 8002.
In the 8004, the management program 2155 computes the amount of resources and the SPECs required to form the identified (acquired) configuration being iterated. It is important for the management program 2155 to compute the needed amount of resources and SPECs beforehand to understand whether the configuration is possible and compare several configurations in terms of the limitations to fulfill a requirement. The management program 2155 matches the input Target application 3001 with the Application id 4001 in the primary DC configuration table 2152 to retrieve the resources that form the current configuration of the application 4001. The management program 2155 acquires the information of the relevant metrics for each resource retrieved. The relevant metrics of a resource 4002 are the metrics in the pairs resource-metric from the Relevant metrics 6006 where the resource of the Relevant metrics 6006 matches the retrieved resource type 4003.
In the step 8004, management program 2155 computes the best combination of available resources from tables 2153 that are needed in the secondary DC 1070 as instructed in Resources 6007 in the tables 2154 being iterated, by using the acquired information of the Relevant metrics 6006, the Contents 3004 of the Requirement id 3002 being iterated, and the Available resources table 2153 taking into account for the Available units 5004 value the units already selected for the DR plan being created. When the combination of available resources from the table 2153 is not enough to achieve the needed resources as instructed in the Resources 6007, the possible combination that best fulfills the demand is computed.
In the step 8005, the management program 2155 refers to result of the step 8004. When there were enough available resources to fulfill the needed resources (Yes in the step 8005), the processing goes iterates again to the step 8003. Otherwise (No in the step 8005), the processing goes to the step 8006 in order to try to mitigate the need for resources.
In the step 8006, the management program 2155 checks whether the requirement being iterated is actually useful or not for the input Target application 3001. The management program 2155 selects resources from the primary DC configuration table 2152 by matching the Application id 4001 with the Target application 3001 of the input requirement. The management program 2155 retrieves the near future trend of the metrics and activities of the selected resources and compares them with the Requirement contents 3004. When the actual trend is lower than the Requirement contents 3004 (Yes in the step 8006), the processing goes to the step 8007. Otherwise (No in the step 8006), the processing iterates again in the step 8003. The time that defines near future trend is, for example, a value based in the time that the primary DC may take to recover after a disaster, and can be defined, for example, by the administrator.
In the step 8007, the management program 2155 refers to the trend information retrieved in the step 8006 to reduce or remove the input requirement. There are two cases as follows. In the case when the actual trend is a fraction of the actual requirement, the requirement is reduced to match the actual trend. For example, when the requirement is 2G IOPS and the actual trend is 1.5G IOPS, the requirement is reduced to 1.5G IOPS. In the case when the actual trend makes the requirement useless, the requirement is removed. For example, when the requirement is to have available a function and the actual trend is that the function has not been used for a long time, the requirement is removed.
In the step 8007, the management program 2155 reduces or removes the input requirement for two main reasons: the administrator only want a temporal configuration until the primary DC is recovered, and the requirement cannot be fulfilled with the current available resources.
In the step 8008, the management program 2155 redo the processing done in the step 8004.
In the step 8009, the management program 2155 selects the best combination of configurations from all the configurations returned in 8002 in each iteration of the step 8001. The management program 2155 computes all possible combinations of configurations, where a combination of configurations is a set of configurations that contains one or more configurations identified from the tables 2154 for each requirement of the input Target Application, and all the contained configurations are compatible to each other, according to the Compatible with other configurations 6008 value of each configuration identified from the tables 2154. A combination of configurations does not need to have a configuration for a requirement that has been removed in the step 7007.
The best combination of configurations is selected as the combination such that there are enough available resources to form all the configurations in the combination at the same time. When there are still several combinations to select, the best combination is further selected as the combination that can fulfill all the requirements of the input application without reducing or removing one or more of the requirements. When there are still several combinations to select, the best combination is further selected, for example, based on the cost of its resources, based on the origin of its resources, and based on a punctuation made to each resource by the administrator.
When all the computed possible combinations have at least a missing resource, it means that one or more requirements have been reduced. In that situation, the best combination is selected as the combination that can fulfill better the reduced requirement with the available resources. For example, if the requirement is 2G IOPS, the best combination is the combination that can achieve higher IOPS.
<All suitable alternative configurations acquisition processing>
FIG. 9 is a flowchart for explaining the step 8002 to acquire all configurations.
The step 8002 is a step executed within the step 7003, executed within the management program 2155. The management program 2155 receives as input a Target application 3001 and a Requirement id 3002 from the Application requirements table 2151.
In the 9001, the management program 2155 filters the configurations identified from the tables 2154 by selecting only the configurations that are suitable for the requirement referred by the input Requirement id 3002. The management program 2155 acquires the Requirement type 3003 and the Requirement contents 3004 associated to the input Target application 3001 and Requirement id 3002 in the Application requirements table 2151. The management program 2155 selects alternative configurations by matching the acquired Requirement type 3003 to the Target requirement type 6002 and the acquired Requirement contents 3004 to the Target requirement contents 6003, in each of the Configuration knowledge tables 2154.
In the 9002, the management program 2155 computes the suitability of the alternative configurations returned by the step 9001 according to the metrics from the input Target application 3001. The suitability is computed as follows. For each returned configuration, the management program 2155 checks whether the conditions in 6004 are fulfilled by referring to 6004 and retrieving the information of the metrics that appear in 6004, in the resources 4003 that match the resources that appear in 6004, and are associated to the application 4001 that matches the input Target application 3001. In the case when the conditions 6004 are fulfilled, the management program 2155 considers the alternative configuration suitable. Otherwise, management program 2155 considers the alternative configuration not suitable.
In the step 9003, the management program 2155 acquires and returns the alternative configurations that were considered suitable according to the metrics in the step 9002.
<Concrete example of the processing>
Below is described a concrete example of the processing described in the FIG. 7, FIG. 8 and FIG. 9, using the sample contents in the tables in the FIG. 3, FIG. 4, FIG. 5, FIG. 6A, FIG. 6B and FIG. 6C. The computations are enough to show the points of the invention and are simplified for the sake of clarity.
The management program 2155 is started by the administrator.
The processing goes to the step 7001. The management program 2155 iterates starting with the first Target application 3001 "App01".
The processing goes to the step 7002. The management program 2155 retrieves the current configuration of "App01", from the table 2152, which is formed by the host computer "Server A1" and the storage apparatus "Storage A2" corresponding to the Application id 4001 "App01". The management program 2155 finds the available resource "R001", from the table 2153, that matches "Server A1" because the Resource SPECs 4004 of "Server A1" is included in the Resource SPECs 5003 of the Id 5001 "R001" and the Type 5002 "Server". The management program 2155 does not find an available resource that matches "Storage A2" because the Resource SPECs 4004 of "Storage A2" is not included in any of the Resource SPECs 5003 of Type 5002 "Storage". In particular, the "Storage A2" SPECs indicate a performance of 2G IOPS and a snapshot function, which are not found in an available resource of type "Storage". Therefore, the step 7002 returns No.
The processing goes to the step 7003, which is a step explained with the flow in the FIG. 8.
The processing goes to the step 8001 passing as input the application "App01". The management program 2155 retrieves from the table 2151 the requirements "Rq001" and "Rq002" of "App01". The management program 2155 iterates starting from "Rq001".
The processing goes to the step 8002, which is a step explained with the flow in the FIG. 9.
The processing goes to the step 9001 passing as input the application "App01" and the requirement "Rq001". The management program 2155 finds the configuration "S001", from the tables 2154, whose configurations (e.g. the Type 6002 and the Contents 6003) matches requirements (e.g. the Type 3003 and the Contents 3004) of "Rq001".
The processing goes to the step 9002. The management program 2155 refers to the Conditions 6004 of "S001". Since "S001" has no conditions, the management program 2155 considers "S001" as a suitable alternative configuration for "Rq001" of "App01".
The processing goes to the step 9003. The management program 2155 returns the alternative configuration "S001".
The processing goes to the step 8003. The management program 2155 iterates "S001".
The processing goes to the step 8004. The management program 2155 retrieves the Relevant metrics 6006 of the alternative configuration being iterated "S001", which is the disk usage trend for all the snapshots kept for the resource "storage". The management program 2155 retrieves from the table 2152 the Resource type 4003 "storage" that form the current configuration of the input application "App01", which is "Storage A2". The management program 2155 retrieves the relevant metric from the storage apparatus "Storage A2" and finds that the disk usage trend for all the snapshots kept is 1 TB, and therefore concludes that, according to the available resources table 2153, the configuration "S001" for the "App01" requires one resource "R002".
The processing goes to the step 8005. The management program 2155 checks the Available units 5004 and finds that there is one available resource "R002". Therefore, the management program 2155 returns Yes.
The processing goes to the step 8003. The management program 2155 finishes the iteration.
The processing goes to the step 8001. The management program 2155 iterates using the requirement "Rq002".
The processing goes to the step 8002, which is a step explained with the flow in the FIG. 9.
The processing goes to the step 9001 passing as input the application "App01" and the requirement "Rq002". The management program 2155 finds, from the tables 2154, the alternative configurations "S002" and "S003" whose configurations match the requirements of "Rq002".
The processing goes to the step 9002. The management program 2155 refers to the Conditions 6004 of "S002" and "S003".
Since the Conditions of "S002" refer to a storage resource, the management program 2155 refers to the table 2152 and retrieves the "Storage A2" that forms the configuration of the input application "App01". The management program 2155 retrieves the metric "read / write rate" from the storage apparatus "Storage A2". Since mainly the workload is not read, the management program 2155 considers "S002" as a configuration not suitable for "Rq002" of "App01".
Since the conditions of "S003" refer to a server resource, the management program 2155 refers to the table 2152 and retrieves the host computer "Server A1" that forms the configuration of the input application "App01". The management program 2155 retrieves the metric "degree of I/O access dispersion before cache" from the host computer "Server A1". Since the I/O access before cache is focused in a few values and not dispersed, the management program 2155 considers "S003" as a configuration suitable for "Rq002" of "App01".
The processing goes to the step 9003. The management program 2155 returns the alternative configuration "S003".
The processing goes to the step 8003. The management program 2155 iterates "S003".
The processing goes to the step 8004. The management program 2155 retrieves the Relevant metrics 6006 of the alternative configuration being iterated "S003", which is the amount of I/O unique access before cache for the resource server. The management program 2155 retrieves from the table 2152 the Resources type 4003 "server" that form the current configuration of the input application "App01", which is "Server A1". The management program 2155 retrieves the relevant metric from the host computer "Server A1" and finds that the max. Amount of I/O unique access before cache is 2GB, and therefore concludes that, according to the table 2153, the configuration "S001" for the "App01" requires one resource "R004".
The processing goes to the step 8005. The management program 2155 checks the Available units 5004 and finds that there is one available resource "R004". Therefore, the management program 2155 returns Yes.
The processing goes to the step 8003. The management program 2155 finishes the iteration.
The processing goes to the step 8001. The management program 2155 finishes the iteration.
The processing goes to the 8009. The management program 2155 checks that "S001" and "S003" are compatible and selects them as the best combination of alternative configurations to fulfill the requirements of "App01".
The processing goes to the step 7004. The management program 2155 adds to a new and empty DR plan the instruction to acquire the resources "R002" and "R004", and the instructions to setup the alternative configurations. For the sake of the explanation, the requirements of the application are few. However, an application usually would have requirements that ensure the use of a storage resource and a server resource.
The processing goes to the step 7001. The management program 2155 iterates using the application "App02".
The processing goes to the step 7002. The management program 2155 retrieves the current configuration of "App02", from the table 2152, which is formed by the host computer "Server A2", the storage apparatus "Storage A2". The management program 2155 finds, from the table 2153, the available resource "R001" combined with "R004" that matches "Server A2". The management program 2155 does not find an available resource that matches "Storage A2". In particular, the "Storage A2" SPECs indicate a performance of 1.5G IOPS, which is not found in an available resource of type "Storage". Therefore, the management program 2155 returns No.
The processing goes to the step 7003, which is a step explained with the flow in the FIG. 8.
The processing goes to the step 8001 passing as input the application "App02". The management program 2155 retrieves from the table 2125 the requirement "Rq003" of "App02". The management program 2155 iterates using "Rq003".
The processing goes to the step 8002, which is a step explained with the flow in the FIG. 9.
The processing goes to the step 9001 passing as input the application "App02" and the requirement "Rq003". The management program 2155 finds, from the tables 2154, the alternative configurations "S002" and "S003" that match "Rq003".
The processing goes to the step 9002. The management program 2155 refers to the Conditions 6004 of "S002" and "S003".
Since the Conditions 6004 of "S002" refer to a storage resource, the management program 2155 refers to the table 2152 and retrieves the storage "Storage A2" that form the configuration of the input application "App02". The management program 2155 retrieves the metric "read / write rate" from the storage apparatus "Storage A2". Since mainly the workload is read, the management program 2155 considers "S002" as a configuration suitable for "Rq003" of "App02".
Since the Conditions 6004 of "S003" refer to a server resource, the management program 2155 refers to the table 2152 and retrieves the host computer "Server A2" that forms the configuration of the input application "App02". The management program 2155 retrieves the metric "degree of I/O access dispersion before cache" from the host computer "Server A2". Since the I/O access before cache is dispersed, the management program 2155 considers "S003" as a configuration not suitable for "Rq003" of "App02".
The processing goes to the step 9003. The management program 2155 returns the alternative configuration "S002".
The processing goes to the step 8003. The management program 2155 iterates "S002".
The processing goes to the step 8004. The management program 2155 retrieves the Relevant metrics 6006 of the alternative configuration being iterated "S002", which are the read IOPS and write IOPS for the resource storage. The management program 2155 retrieves from the table 2152 the resources of type 4003 "storage" that form the current configuration of the input application "App01", which is "Storage A2". The management program 2155 retrieves the relevant metric from the host computer "Server A1" and finds that the read IOPS is 1.4G and the write IOPS is 0.1G, and therefore concludes that according to the table 2153, the configuration "S003" for the "App01" requires one resource "R002" for the write workload, seven (1.4G/0.2 GB = 7) resources "R002" for the read workload, and one resource "R005" to balance the workload.
The processing goes to the step 8005. The management program 2155 finds, from the table 2153, that there is one resource "R005" available, but only five resources "R002" available. Therefore, the management program 2155 returns No.
The processing goes to the step 8006. The management program 2155 retrieves the current configuration of "App02", from the table 2152, which is formed by the host computer "Server A2", the storage apparatus "Storage A2". The management program 2155 respectively retrieves from the host computer "Server A2" and the storage apparatus "Storage A2" the metrics of IOPS workload, according to the contents of "Rq003". Since the IOPS workload is 1.6G IOPS, the management program 2155 concludes that the requirement is useful (1.6 GB IOPS > 1.5 GB IOPS). Therefore, the management program 2155 returns No.
The processing goes to the step 8003. The management program 2155 finishes the iteration.
The processing goes to the step 8001. The management program 2155 finishes the iteration.
The processing goes to the step 8009. The management program 2155 selects "S003" the best combination of alternative configurations to fulfill the requirements of "App01". Since "S002 is not able to fulfill the requirements of "App02", "S002" is not selected as the best combination of alternative configurations. Further, since "S002" is not compatible with "S003" according to the Compatible with configurations 6008 of "S002" (Fig. 6B), "S002" cannot be selected as the best combination of alternative configurations together with "S003".
The processing goes to the step 7004. The management program 2155 adds to the DR plan being created the instruction to acquire one resource "R005" and five resources "R002", and the instructions to setup the alternative configurations. For the sake of the explanation, the requirements of the application are few. However, an application usually would have requirements that ensure the use of a storage resource and a server resource.
The processing goes to the step 7001. The management program 2155 finishes the iteration.
The processing goes to the 7005. The management program 2155 summarizes the DR plan.
The processing goes to the step 7006. The total number of "R002" is six (one "R002" as for "App01" and five "R002" as for "App02"), however the total number is larger than the Available units 5004 "5" of "R002". Therefore there are not enough resources as needed for the selected configuration "S003", the management program 2155 concludes that the DR plan has limitations. Therefore the management program 2155 returns Yes.
The processing goes to the step 7007. The management program 2155 computes that the configuration "S003" can provide 1G IOPS (5 x 0.2G IOPS = 1G IOPS) with the five available resources "R002". Therefore, since the contents of the requirement "Rq003" state a need for 1.5G IOPS, the management program 2155 concludes that the DR plan has the limitation of fulfilling the requirement "Rq003" of the application "App03" only until 1G IOPS (33.3%). Finally, the management program 2155 sends and alert with the accurate limitation to the administrator.
The processing goes to the step 7008. The management program 2155 sets the following alerts. For the application "App01" and the alternative configuration "S001", which is suitable, the management program 2155 sets an alert to detect when the disk usage trend of all snapshots kept is bigger than the disk capacity currently provided, i.e. the capacity of the available resource "R002". For the application "App01" and the alternative configuration "S002", which is suitable, the management program 2155 sets alert to detect when the "App01" I/O workload becomes not mainly read. For the application "App01" and the alternative configuration "S003", which is not suitable, the management program 2155 sets alert to detect when the "App01" I/O workload becomes focused and not dispersed. For the application "App02" and the alternative configuration "S002", which is not suitable, the management program 2155 sets alert to detect when the "App01" I/O workload becomes mainly read. For the application "App02" and the alternative configuration "S003", which is suitable, the management program 2155 sets alert to detect when the amount of unique I/O workload in "App01" becomes bigger than the size of the available resource selected "R004".
The processing ends.
The embodiments according to the present invention have been explained as aforementioned. However, the embodiments of the present invention are not limited to those explanations, and those skilled in the art ascertain the essential characteristics of the present invention and can make the various modifications and variations to the present invention to adapt it to various usages and conditions without departing from the spirit and scope of the claims.
1020: Management computer
1060: Primary DC
1080: Secondary DC

Claims (15)

  1. A management system for managing a computer system including a primary DC (Data Center) and a secondary DC, the management system comprising:
    an interface device being coupled to the computer system; and
    a processor being coupled to the interface device,
    the processor being configured to
    identify one or more requirements with respect to the primary DC,
    identify one or more suitable resources which are resources suitable to the identified requirements, from among available resources which are resources that can be used in the secondary DC, based on SPECs of the available resources and metrics from the primary DC,
    create a DR (Disaster Recovery) plan by mapping configurations with respect to the identified suitable resources to at least one of the identified requirements.
  2. The management system according to claim 1,
    wherein the processor is configured to reduce or remove a requirement which is equal to or larger than an actual trend identified based on the metric when there are not enough suitable resources.
  3. The management system according to claim 1, further comprising,
    a storage resource including a memory and being configured to store management information,
    wherein the primary DC includes resources including one or more first storage apparatuses and one or more first host computers being configured to execute applications and perform I/O (Input/Output) to the first storage apparatuses,
    wherein the management information includes at least one of first information denoting a requirement for each application, second information denoting, for each application, SPECs of resources provided to the application from the primary DC, third information with respect to the available resources, which include information denoting the SPECs of the available resources, and forth information including knowledge units of configurations,
    wherein the processor is configured to identify the suitable resources by referring to the management information.
  4. The management system according to claim 3,
    wherein the processor is configured to
    select an application,
    identify a requirement of the selected application, and
    find one or more knowledge units of configurations whose configurations fulfill one of the identified requirements in order to find at least one of the suitable resources.
  5. The management system according to claim 4,
    wherein the requirement denoted for each application by the first information includes requirement type and requirement detail,
    wherein each of knowledge units of configurations includes information denoting requirement type and requirement detail,
    wherein each of the found knowledge units of configurations is a knowledge unit of configurations whose requirement type and requirement detail is the same as requirement type and requirement detail of the identified requirement.
  6. The management system according to claim 3,
    wherein each of one or more knowledge units of configurations includes information denoting conditions with respect to resource type and metric, and information denoting resource type and conditions with respect to relevant metric, and
    wherein the processor is configured to, for each of at least one of the found knowledge units of configurations,
    perform a first determination which is to determine whether or not metric acquired from a resource of the primary DC according to information of the found knowledge denoting metric and resource type fulfill the condition as for the metric denoted by the found knowledge,
    when the result of the first determination is positive, perform second determination which is to determine whether or not relevant metric acquired from a resource of the primary DC according to information of the found knowledge denoting relevant metric and resource type fulfill the condition as for the relevant metric denoted by the found knowledge,
    when the result of the second determination is positive, consider an available resource according to the resource type as for the relevant metric, as a suitable alternative resource which is a suitable resource.
  7. The management system according to claim 6,
    wherein the processor is configured to reduce or remove a requirement which is equal to or larger than an actual trend identified based on the metric when there are not enough suitable resources.
  8. The management system according to claim 6,
    wherein each of knowledge units of configurations includes information denoting instruction to be configured in a DR plan,
    wherein the processor is configured to configure an instruction in the DR plan according to the information denoting instruction to be configured in a DR plan in the found knowledge unit.
  9. The management system according to claim 3,
    wherein each of knowledge units of configurations includes information denoting ID of a knowledge unit of configurations which is compatible with the configuration denoted by the knowledge unit,
    wherein the processor is configured to use combination of configurations of multiple knowledge units which are compatible each other, in order to create the DR plan.
  10. The management system according to claim 1,
    wherein at least one of available resources are acquired from at least one of an online service that sells or rents resources and an online cloud service that provides resources.
  11. The management system according to claim 1,
    wherein a combination of alternative configurations of available resources are mapped together to more than one requirement.
  12. The management system according to claim 1,
    wherein the processor is configure to compute the conditions in the metrics from the primary DC that indicate the need to update the DR plan.
  13. The management system according to claim 1,
    wherein the processor is configure to compute limitations of the mapped configurations and send them to an computer of an administrator.
  14. The management system according to claim 1,
    wherein the processor is configure to store the DR plan in a safe location.
  15. A management method for managing a computer system including a primary DC (Data Center) and a secondary DC, the management method comprising:
    identifying one or more requirements with respect to the primary DC,
    identifying one or more suitable resources which are resources suitable to the identified requirements, from among available resources which are resources that can be used in the secondary DC, based on SPECs of the available resources and metrics from the primary DC,
    creating a DR (Disaster Recovery) plan by mapping configurations with respect to the identified suitable resources to at least one of the identified requirements.
PCT/JP2014/006225 2014-12-15 2014-12-15 Management system and management method for managing computer system WO2016098138A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/006225 WO2016098138A1 (en) 2014-12-15 2014-12-15 Management system and management method for managing computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/006225 WO2016098138A1 (en) 2014-12-15 2014-12-15 Management system and management method for managing computer system

Publications (1)

Publication Number Publication Date
WO2016098138A1 true WO2016098138A1 (en) 2016-06-23

Family

ID=56126065

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/006225 WO2016098138A1 (en) 2014-12-15 2014-12-15 Management system and management method for managing computer system

Country Status (1)

Country Link
WO (1) WO2016098138A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11269527B2 (en) 2019-08-08 2022-03-08 International Business Machines Corporation Remote data storage

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080313242A1 (en) * 2007-06-15 2008-12-18 Savvis, Inc. Shared data center disaster recovery systems and methods
JP2011197989A (en) * 2010-03-19 2011-10-06 Nomura Research Institute Ltd Dynamic management device, dynamic management system, and dynamic management method for information processing system
US20140122926A1 (en) * 2012-10-31 2014-05-01 Internation Business Machines Corporation Simulation engine for use in disaster recovery virtualization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080313242A1 (en) * 2007-06-15 2008-12-18 Savvis, Inc. Shared data center disaster recovery systems and methods
JP2011197989A (en) * 2010-03-19 2011-10-06 Nomura Research Institute Ltd Dynamic management device, dynamic management system, and dynamic management method for information processing system
US20140122926A1 (en) * 2012-10-31 2014-05-01 Internation Business Machines Corporation Simulation engine for use in disaster recovery virtualization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11269527B2 (en) 2019-08-08 2022-03-08 International Business Machines Corporation Remote data storage

Similar Documents

Publication Publication Date Title
US10331370B2 (en) Tuning a storage system in dependence upon workload access patterns
US8738961B2 (en) High-availability computer cluster with failover support based on a resource map
US11321197B2 (en) File service auto-remediation in storage systems
US10044551B2 (en) Secure cloud management agent
JP6073246B2 (en) Large-scale storage system
US9542115B1 (en) Methods and systems for trouble shooting performance issues in networked storage systems
US10146636B1 (en) Disaster recovery rehearsals
US8195777B2 (en) System and method for adding a standby computer into clustered computer system
US9450700B1 (en) Efficient network fleet monitoring
US9146793B2 (en) Management system and management method
US10067704B2 (en) Method for optimizing storage configuration for future demand and system thereof
US20060155912A1 (en) Server cluster having a virtual server
JP2009129148A (en) Server switching method and server system
US8065560B1 (en) Method and apparatus for achieving high availability for applications and optimizing power consumption within a datacenter
JP4920248B2 (en) Server failure recovery method and database system
US10002025B2 (en) Computer system and load leveling program
WO2015063889A1 (en) Management system, plan generating method, and plan generating program
CN107864055A (en) The management method and platform of virtualization system
KR101586354B1 (en) Communication failure recover method of parallel-connecte server system
US20080192643A1 (en) Method for managing shared resources
US11057264B1 (en) Discovery and configuration of disaster recovery information
CN108200151B (en) ISCSI Target load balancing method and device in distributed storage system
US10719265B1 (en) Centralized, quorum-aware handling of device reservation requests in a storage system
US20150095424A1 (en) Information acquisition method, computer system, and management computer
US20210294816A1 (en) Method and system for workload aware storage replication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14908348

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14908348

Country of ref document: EP

Kind code of ref document: A1