CN106254166B - Disaster recovery center-based cloud platform resource configuration method and system - Google Patents

Disaster recovery center-based cloud platform resource configuration method and system Download PDF

Info

Publication number
CN106254166B
CN106254166B CN201610874942.1A CN201610874942A CN106254166B CN 106254166 B CN106254166 B CN 106254166B CN 201610874942 A CN201610874942 A CN 201610874942A CN 106254166 B CN106254166 B CN 106254166B
Authority
CN
China
Prior art keywords
disaster recovery
service system
virtual machine
data
recovery mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610874942.1A
Other languages
Chinese (zh)
Other versions
CN106254166A (en
Inventor
李兴锋
郝建明
张炼
宋泽锋
伍福生
简超
韩笑
潘星明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201610874942.1A priority Critical patent/CN106254166B/en
Publication of CN106254166A publication Critical patent/CN106254166A/en
Application granted granted Critical
Publication of CN106254166B publication Critical patent/CN106254166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0833Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability for reduction of network energy consumption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a disaster recovery center-based cloud platform resource allocation method and system, and relates to the technical field of disaster recovery. The method comprises the following steps: acquiring load data of servers corresponding to each service system deployed by a disaster recovery center on a cloud platform; collecting service operation data from a production environment; classifying the load data and the service operation data according to preset disaster tolerance mode grades to obtain load condition data of each service system in different disaster tolerance modes; and carrying out resource configuration on the cloud platform according to a preset strategy configuration table and load condition data. The virtual machine and the physical machine are migrated according to different strategies corresponding to different disaster tolerance modes, and the effects of saving resources and energy consumption are achieved.

Description

Disaster recovery center-based cloud platform resource configuration method and system
Technical Field
The invention relates to the technical field of disaster recovery, in particular to a resource scheduling technology of a disaster recovery center, and particularly relates to a cloud platform resource allocation method and system based on the disaster recovery center.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
With the development of services, more and more service systems are deployed in the disaster recovery center, wherein the more and more service systems include a core production system with architectures such as dual active, main and auxiliary, and a large number of service systems with multiple disaster recovery modes of hot standby, warm standby, and cold standby architectures, and are used for ensuring the safe and stable operation of the production system, and most of the service systems are gradually migrated to a cloud platform. At present, more virtual machines are operated on each average physical machine in a cloud platform physical machine cluster, and when a service peak occurs, a certain physical machine is over-high in load and other physical machines in the same cluster are low in load, that is, imbalance of resource utilization rate of the physical machines easily occurs, so that service operation efficiency is affected and resource waste is caused.
At present, a physical machine is generally monitored by using a Patrol, but the Patrol monitoring and a cloud platform are mutually independent platforms, have no universal interface, and cannot combine the load of the physical machine with the high-level characteristics of the cloud platform such as heat migration.
Therefore, how to research and develop a new scheme to configure cloud platform resources for different disaster recovery modes is a technical problem to be solved in the field.
Disclosure of Invention
In order to overcome the technical problems in the prior art, the invention provides a disaster recovery center-based cloud platform resource allocation method and system, which are used for realizing the migration of virtual machines and physical machines according to different strategies corresponding to different disaster recovery modes by acquiring load data of servers corresponding to various service systems deployed by the disaster recovery center on a cloud platform, acquiring service operation data from a production environment, introducing preset disaster recovery mode grades for classification, and performing resource allocation by combining a preset strategy allocation table, thereby achieving the effects of saving resources and energy consumption.
In order to achieve the above object, the present invention provides a disaster recovery center-based cloud platform resource allocation method, which includes: acquiring load data of servers corresponding to each service system deployed by a disaster recovery center on a cloud platform; collecting service operation data from a production environment; classifying the load data and the service operation data according to preset disaster tolerance mode grades to obtain load condition data of each service system in different disaster tolerance modes; and carrying out resource configuration on the cloud platform according to a preset strategy configuration table and load condition data.
In the preferred embodiment of the invention, the load data of the server is collected by adopting the script, and the server comprises a virtual machine and a physical machine.
In a preferred embodiment of the present invention, the load data includes a CPU usage rate, a memory usage rate, and a storage capacity usage rate, and the service operation data includes a daily average transaction amount and a daily peak transaction amount.
In a preferred embodiment of the present invention, the disaster recovery mode includes a dual-active disaster recovery mode, a main-auxiliary disaster recovery mode, a warm-standby disaster recovery mode, and a cold-standby disaster recovery mode.
In a preferred embodiment of the present invention, when the service system is in a dual-active disaster recovery mode or a primary and secondary disaster recovery mode, the resource configuration of the cloud platform according to the policy configuration table and the load condition data includes: collecting forecast information from the production environment; determining an optimization strategy of the service system according to the prediction information and the strategy configuration table; and carrying out resource allocation on the service system according to the optimization strategy and the load condition data.
In a preferred embodiment of the present invention, when the service system is in a warm standby disaster recovery mode or a cold standby disaster recovery mode, performing resource configuration on the cloud platform according to the policy configuration table and the load condition data includes determining an optimization policy of the service system according to the policy configuration table; and carrying out resource allocation on the service system according to the optimization strategy and the load condition data.
In a preferred embodiment of the present invention, performing resource allocation on the service system according to the optimization policy and the load condition data includes acquiring setting information from the optimization policy;
when the resource configuration of the service system does not meet the set information, reading a virtual machine corresponding to a physical machine from a preset database according to the name of the physical machine corresponding to the service system; acquiring the CPU utilization rate and the memory utilization rate of the virtual machine from the load condition data; determining the coefficient of the virtual machine according to the CPU utilization rate and the memory utilization rate; selecting a virtual machine to be migrated according to the coefficient of the virtual machine and the optimization strategy; selecting a physical machine to be migrated according to a virtual machine to be migrated; and migrating the physical machine to be migrated and the virtual machine to be migrated so as to enable the migrated service system to meet the set information.
One of the purposes of the invention is to provide a disaster recovery center-based cloud platform resource configuration system, which comprises a load data acquisition device, a service data acquisition device and a service data acquisition device, wherein the load data acquisition device is used for acquiring load data of servers corresponding to each service system deployed by the disaster recovery center on a cloud platform; the operation data acquisition device is used for acquiring service operation data from the production environment; the data classification device is used for classifying the load data and the service operation data according to a preset disaster tolerance mode grade to obtain load condition data of each service system in different disaster tolerance modes; and the resource configuration device is used for performing resource configuration on the cloud platform according to a preset strategy configuration table and the load condition data.
In a preferred embodiment of the present invention, when the service system is in a warm-standby disaster recovery mode or a cold-standby disaster recovery mode, the resource configuration device includes a first optimization policy determining module, configured to determine an optimization policy of the service system according to a policy configuration table; and the resource allocation module is used for performing resource allocation on the service system according to the optimization strategy and the load condition data.
In a preferred embodiment of the present invention, when the service system is in a dual-active disaster recovery mode or a main-auxiliary disaster recovery mode, the resource configuration device further includes a prediction information acquisition module, configured to acquire prediction information from the production environment;
and the second optimization strategy determining module is used for determining the optimization strategy of the service system according to the prediction information and the strategy configuration table.
In a preferred embodiment of the present invention, the resource allocation module includes an obtaining unit, configured to obtain setting information from the optimization policy;
a reading unit, configured to, when the resource configuration of the service system does not meet the setting information, read a virtual machine corresponding to a physical machine from a preset database according to a name of the physical machine corresponding to the service system; a utilization rate obtaining unit, configured to obtain the CPU utilization rate and the memory utilization rate of the virtual machine from the load condition data; the coefficient determining unit is used for determining the coefficient of the virtual machine according to the CPU utilization rate and the memory utilization rate; the first determining unit is used for selecting the virtual machine to be migrated according to the coefficient of the virtual machine and the optimization strategy; the second determining unit is used for selecting the physical machine to be migrated according to the virtual machine to be migrated; and the migration unit is used for migrating the physical machine to be migrated and the virtual machine to be migrated so as to enable the business system after migration to meet the set information.
The cloud platform resource allocation method and system based on the disaster recovery center have the advantages that the load data of the servers corresponding to the service systems deployed by the disaster recovery center on the cloud platform are obtained, the service operation data are obtained from the production environment, the preset disaster tolerance mode grades are introduced for classification, the resource allocation is carried out by combining the preset strategy allocation table, the virtual machine and the physical machine are migrated according to different strategies corresponding to different disaster tolerance modes, and the effects of saving resources and energy consumption are achieved.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a disaster recovery center-based cloud platform resource allocation method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first embodiment of step S104 in FIG. 1;
fig. 3 is a flowchart of a second embodiment of step S104 in fig. 1;
fig. 4 is a flowchart of step S303 in fig. 2;
fig. 5 is a block diagram of a cloud platform resource configuration system based on a disaster recovery center according to an embodiment of the present invention;
fig. 6 is a block diagram of a first embodiment of a resource allocation device in a disaster recovery center-based cloud platform resource allocation system according to an embodiment of the present invention;
fig. 7 is a structural block diagram of a second embodiment of a resource allocation device in a disaster recovery center-based cloud platform resource allocation system according to an embodiment of the present invention;
fig. 8 is a block diagram of a structure of a resource allocation module in a disaster recovery center-based cloud platform resource allocation system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
In the currently used cloud platform management tool, the distribution condition of each virtual machine can be monitored, a physical machine resource use condition monitoring function is not available temporarily, the physical machine monitoring uses the control of a Patrol commercial product, but the Patrol monitoring and the cloud resource management platform are mutually independent platforms, a universal interface is not available, and the load of the physical machine and the high-grade characteristics of the cloud platform such as thermal migration cannot be combined.
At present, functions such as virtual machine migration of a cloud platform management tool aim at a scene of high-availability migration of physical machine faults, the functions are the most basic high-availability characteristics of a universal cloud computing platform, and in order to realize automatic resource optimization suitable for service characteristics of a disaster recovery center, functions such as data acquisition, data analysis, process control, optimization and decision making need to be introduced into the cloud platform.
Aiming at the technical problems, the invention provides a disaster recovery center-based cloud platform resource allocation method and system.
Fig. 1 is a specific flowchart of a cloud platform resource allocation method based on a disaster recovery center according to the present invention, please refer to fig. 1, where the method includes:
s101: and acquiring load data of servers corresponding to each service system deployed by the disaster recovery center on the cloud platform.
In a specific embodiment, load data of a server can be collected through a script and the like, the server comprises a virtual machine and a physical machine, and the load data comprises a CPU utilization rate, a memory utilization rate and a storage capacity utilization rate.
S102: business operation data is collected from a production environment. In the present invention, the mentioned production environment refers to an IT system environment for processing actual financial transaction information in a typical financial system. In a specific embodiment, the service operation data includes a daily average value of the transaction amount and a daily peak value of the transaction amount. That is, step S102 opens the transaction monitoring interface between the disaster recovery center and the production center
S103: and classifying the load data and the service operation data according to a preset disaster tolerance mode grade to obtain load condition data of each service system in different disaster tolerance modes. In a specific embodiment, the preset disaster tolerance mode levels include a dual-active disaster tolerance mode, a main-auxiliary disaster tolerance mode, a warm-standby disaster tolerance mode, and a cold-standby disaster tolerance mode. The disaster recovery mode grade can be configured, load data and service operation data are classified, and load condition data with a service system as a dimensionality is formed for decision making in subsequent steps.
S104: and carrying out resource configuration on the cloud platform according to a preset strategy configuration table and load condition data.
Fig. 2 is a flowchart of a first implementation manner of step S104, please refer to fig. 2, in the first implementation manner, when the service system is in a warm disaster recovery mode or a cold disaster recovery mode, step S104 includes:
s201: determining an optimization strategy of the service system according to a strategy configuration table;
s202: and carrying out resource allocation on the service system according to the optimization strategy and the load condition data.
In the first embodiment, the policy configuration table is shown in table 1, which shows the optimization policies respectively corresponding to three scenarios, namely, a working day, a weekend and a holiday when the service system is in the warm-standby disaster recovery mode or the cold-standby disaster recovery mode. Taking the disaster recovery mode for example, the optimization strategy corresponding to the working day is "the average of the historical peaks, and the peak of the previous week is used as the reference", that is, in this case, the average of the historical peaks of the previous week is used as the reference.
TABLE 1
Figure BDA0001125039860000051
Figure BDA0001125039860000061
Fig. 3 is a flowchart of a second implementation manner of step S104, please refer to fig. 3, in the second implementation manner, when the service system is in the dual-active disaster recovery mode or the primary and secondary disaster recovery mode, step S104 includes:
s301: the prediction information is collected from the production environment, the prediction information is derived from a prediction platform of the production environment, the prediction information is generally input into a specified future time period, and the predicted traffic peak value of the time period is output.
S302: determining an optimization strategy of the service system according to a strategy configuration table;
s303: and carrying out resource allocation on the service system according to the optimization strategy and the load condition data.
In the second embodiment, the policy configuration table is shown in table 2, which shows the optimization policies respectively corresponding to the three scenarios of working day, weekend and holiday when the service system is in the dual-active disaster recovery mode or the primary and secondary disaster recovery mode. Taking the double-life disaster recovery mode as an example, the optimization strategy corresponding to the working day is the sum of the historical peak values of the double centers, and the previous one-month peak value is taken as a reference, that is, in this case, the sum of the historical peak values of the previous one month of the double centers is taken as a reference, and the double centers here refer to beijing center and shanghai center.
TABLE 2
Figure BDA0001125039860000062
Figure BDA0001125039860000071
Fig. 4 is a flowchart of steps S202 and S303, please refer to fig. 4, where performing resource allocation on the service system according to the optimization policy and the load condition data includes:
s401: and acquiring setting information from the optimization strategy. Taking the double-life disaster recovery mode as an example, step S303 determines that the optimization strategy corresponding to the working day is "the sum of the historical peak values of the double centers, and the peak value of the previous month is taken as a reference", that is, in this case, the sum of the historical peak values of the previous month of the double centers is taken as a reference. For example, if the historical peak value of the previous month of Shanghai center is 1500tps (i.e., 1500 transactions per second), and the historical peak value of the previous month of Beijing center is 1000tps, the setting information determined by the optimization strategy corresponding to the working day in the double-activity disaster recovery mode is 2500 tps.
S402: and when the resource configuration of the service system does not meet the setting information, reading the virtual machine corresponding to the physical machine from a preset database according to the name of the physical machine corresponding to the service system, and if the resource configuration meets the setting information, adjusting the resource. The resource allocation mentioned here refers to the computing power of the CPU and the memory of the virtual machine.
S403: acquiring the CPU utilization rate and the memory utilization rate of the virtual machine from the load condition data;
s404: and determining the coefficient of the virtual machine according to the CPU utilization rate and the memory utilization rate, and in a specific implementation mode, calculating the coefficient by dividing the memory utilization rate by the CPU utilization rate of the virtual machine. The reason for adopting this factor in the present invention is that the high utilization of the CPU directly affects the load of the physical machine, while the size of the memory affects the migration time. The large coefficient value indicates high utilization rate or low used memory, and the influence on the physical machine load after the migration is obvious.
S405: selecting a virtual machine to be migrated according to the coefficient of the virtual machine and the optimization strategy;
s406: and selecting a physical machine to be migrated according to the virtual machine to be migrated.
S407: and migrating the physical machine to be migrated and the virtual machine to be migrated so as to enable the resource configuration of the service system after migration to meet the set information.
In a specific implementation manner, a physical machine suitable for migration can be selected after the virtual machine to be migrated is selected, and then whether the resource configuration after migration can meet the set information is calculated, and if the resource configuration does not meet the condition, the selection is performed again. And executing the migration operation if the condition is met.
The cloud platform resource allocation method based on the disaster recovery center obtains load data of servers corresponding to each service system deployed by the disaster recovery center on the cloud platform, obtains service operation data from a production environment, introduces preset disaster tolerance mode grades for classification, and performs resource allocation by combining a preset policy allocation table, so that different policies are automatically triggered to migrate a virtual machine according to different disaster tolerance modes, and the virtual machine is migrated at least several physical machines and other physical machines are shut down when the service is in a low ebb, so that the effects of saving resources and saving energy consumption are achieved.
It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Having described the method of an exemplary embodiment of the present invention, the cloud platform resource configuration system of an exemplary embodiment of the present invention is next described with reference to fig. 5. The implementation of the system can be referred to the implementation of the above method, and repeated details are not repeated. The terms "module" and "unit", as used below, may be software and/or hardware that implements a predetermined function. While the modules described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
Fig. 5 is a block diagram of a cloud platform resource allocation system based on a disaster recovery center according to an embodiment of the present invention, and please refer to fig. 5, where the system includes:
and the load data acquisition device 101 is used for acquiring load data of servers corresponding to each service system deployed by the disaster recovery center on the cloud platform.
In a specific embodiment, load data of a server can be collected through a script and the like, the server comprises a virtual machine and a physical machine, and the load data comprises a CPU utilization rate, a memory utilization rate and a storage capacity utilization rate.
And the operation data acquisition device 102 is used for acquiring service operation data from the production environment. In a specific embodiment, the service operation data includes a daily average value of the transaction amount and a daily peak value of the transaction amount. Namely, the operation data acquisition device 102 opens the transaction monitoring interface between the disaster recovery center and the production center.
And the data classification device 103 is configured to classify the load data and the service operation data according to a preset disaster tolerance mode level, so as to obtain load condition data of each service system in different disaster tolerance modes. In a specific embodiment, the preset disaster tolerance mode levels include a dual-active disaster tolerance mode, a main-auxiliary disaster tolerance mode, a warm-standby disaster tolerance mode, and a cold-standby disaster tolerance mode. The disaster recovery mode grade can be configured, load data and service operation data are classified, and load condition data with a service system as a dimensionality is formed for decision making in subsequent steps.
And the resource configuration device 104 is configured to perform resource configuration on the cloud platform according to a preset policy configuration table and load condition data.
Fig. 6 is a block diagram of a first embodiment of a resource allocation device 104 in a cloud platform resource allocation system based on a disaster recovery center according to an embodiment of the present invention, and referring to fig. 6, in the first embodiment, when the service system is in a disaster recovery and warm-up mode or a disaster recovery and cold-standby mode, the resource allocation device 104 includes:
a first optimization strategy determination module 201, configured to determine an optimization strategy of the service system according to a strategy configuration table;
and the resource configuration module 202 is configured to perform resource configuration on the service system according to the optimization strategy and the load condition data.
In the first embodiment, the policy configuration table is shown in table 1, which shows the optimization policies respectively corresponding to three scenarios, namely, a working day, a weekend and a holiday when the service system is in the warm-standby disaster recovery mode or the cold-standby disaster recovery mode. Taking the disaster recovery mode for example, the optimization strategy corresponding to the working day is "the average of the historical peaks, and the peak of the previous week is used as the reference", that is, in this case, the average of the historical peaks of the previous week is used as the reference.
Fig. 7 is a block diagram of a second embodiment of a resource allocation device in a disaster recovery center-based cloud platform resource allocation system according to an embodiment of the present invention, referring to fig. 7, in the second embodiment, when the service system is in a dual active disaster recovery mode or a primary and secondary disaster recovery mode, the resource allocation device 104 further includes:
and the prediction information acquisition module 203 is used for acquiring prediction information from the production environment.
In the second embodiment, the policy configuration table is shown in table 2, which shows the optimization policies respectively corresponding to the three scenarios of working day, weekend and holiday when the service system is in the dual-active disaster recovery mode or the primary and secondary disaster recovery mode. Taking the double-life disaster recovery mode as an example, the optimization strategy corresponding to the working day is the sum of the historical peak values of the double centers, and the previous one-month peak value is taken as a reference, that is, in this case, the sum of the historical peak values of the previous one month of the double centers is taken as a reference, and the double centers here refer to beijing center and shanghai center.
Fig. 8 is a block diagram of a resource allocation module 202 in a disaster recovery center-based cloud platform resource allocation system according to an embodiment of the present invention, and referring to fig. 8, the resource allocation module 202 includes:
an obtaining unit 301, configured to obtain setting information from the optimization policy. Taking the double-life disaster recovery mode as an example, step S303 determines that the optimization strategy corresponding to the working day is "the sum of the historical peak values of the double centers, and the peak value of the previous month is taken as a reference", that is, in this case, the sum of the historical peak values of the previous month of the double centers is taken as a reference. For example, if the historical peak value of the previous month of Shanghai center is 1500tps, and the historical peak value of the previous month of Beijing center is 1000tps, the setting information determined by the optimization strategy corresponding to the working day in the dual-active disaster recovery mode is 2500 tps.
A reading unit 302, configured to, when the resource configuration of the service system does not meet the setting information, read a virtual machine corresponding to a physical machine from a preset database according to a name of the physical machine corresponding to the service system, and if the setting information is met, no adjustment is required to be performed on the resource.
A utilization rate obtaining unit 303, configured to obtain a CPU utilization rate and a memory utilization rate of the virtual machine from the load condition data;
a coefficient determining unit 304, configured to determine a coefficient of the virtual machine according to the CPU utilization and the memory utilization, in a specific embodiment, the coefficient is calculated by dividing the CPU utilization by the memory utilization of the virtual machine. The reason for adopting this factor in the present invention is that the high utilization of the CPU directly affects the load of the physical machine, while the size of the memory affects the migration time. The large coefficient value indicates high utilization rate or low used memory, and the influence on the physical machine load after the migration is obvious.
A first determining unit 305, configured to select a virtual machine to be migrated according to the coefficient of the virtual machine and the optimization policy;
the second determining unit 306 is configured to select a physical machine to be migrated according to the virtual machine to be migrated.
A migration unit 307, configured to migrate the physical machine to be migrated and the virtual machine to be migrated, so that the resource configuration of the service system after migration meets the setting information.
In a specific implementation manner, a physical machine suitable for migration can be selected after the virtual machine to be migrated is selected, and then whether the resource configuration after migration can meet the set information is calculated, and if the resource configuration does not meet the condition, the selection is performed again. And executing the migration operation if the condition is met.
Furthermore, although several unit modules of the disaster recovery center-based cloud platform resource configuration system are mentioned in the above detailed description, such partitioning is not mandatory only. Indeed, the features and functions of two or more of the units described above may be embodied in one unit, according to embodiments of the invention. Also, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
The following specific embodiment takes a process of allocating resources of a disaster recovery center cloud platform of a certain disaster recovery system a in 2016, 10, 1 and describes in detail how to implement load balancing of cloud platforms in different disaster recovery modes by using the disaster recovery center-based cloud platform resource allocation method and system of the present invention.
1. The load data acquisition device displays a production environment A system, 1 virtual machine VM1 is commonly used, the configuration of the VM1 is 8C16G, the CPU utilization rate is 20%, and the memory utilization rate is 20%; disaster recovery center a system uses 1 VM1 in common, configured as 2C 4G.
2. The operation data acquisition device displays that the current traffic of the system in the production environment A is 50tps, the average value of the peak value of the historical current day is 80tps, and the peak value of the 10 month and 1 day in 2015 is 100 tps.
3. And inquiring the strategy configuration table to know that the system A is in the warm standby disaster recovery mode, and initializing the strategy.
4. The resource configuration device obtains data from 1 and 2, and performs a target decision in conjunction with the policy configuration table, where the decision target is to adjust 2C4G, where the current tps is 50tps, to a virtual machine configuration that can satisfy 100tps processing capacity, and calculate a target virtual machine configuration ((8C 20%/50 tps) 100 tps)/70% — 4.5C), (16G 20%/50 tps) 100 tps)/70% — 9.14G, (note that 70% is the resource tolerance of the disaster backup environment, which is a fixed parameter), so the decision target is to adjust configuration 2C4G of VM1 of the disaster backup center to 5C 10G.
5. The resource allocation device sends the decision target 4 and the relevant decision information 1, 2 (server load information, traffic load information of the production environment) to the disaster recovery system.
6. And the disaster recovery system determines that resource adjustment is needed according to the decision target.
7. Calculating the ratio of the CPU utilization rate of all virtual machines on the physical machine to the size of the used memory, and arranging the ratio from large to small; and calculates all physical machine loads, arranged from small to large.
8. And selecting a proper target physical machine, and carrying out migration and resource adjustment on the VM 1.
The dynamic resource adjustment process of 2016, 10, and 1 is described above, and this example describes that the automatic resource expansion is realized in the coming holidays, and according to the system design, the resources can be automatically contracted in the ordinary working days, and the resource utilization rate of the disaster recovery center is improved.
In summary, the cloud platform resource allocation method and system based on the disaster recovery center provided by the invention can perform load balancing for the load of the cloud computing platform in different disaster recovery modes, are flexible in scheme, and can also perform independent setting for different service systems without worrying about applicability. And the manual operation step can be omitted, the workload is saved, and the migration information and the like can be stored in a log or stored in a database, so that the log is convenient to check. According to the scheme, at the visual angle of the disaster backup data center, on the premise of ensuring disaster backup indexes of all service systems of the whole disaster backup center, computing resources in a cloud platform resource pool can be effectively utilized, resources are automatically and logically distributed, and the workload of operators is reduced.
The cloud disaster recovery data center is based on the perspective of the whole disaster recovery data center, and the core of the cloud IT system is to combine the service disaster tolerance indexes in the disaster recovery center according to the analysis of production traffic and the decision analysis calculation, so that the resource capacity of the cloud disaster recovery center is automatically adjusted at any time, and the whole cloud disaster recovery center always has the real-time operation capacity of a production system or the take-over capacity of the disaster recovery system, so that how the cloud IT system in the disaster recovery center better meets the operation requirements of each service system in the disaster recovery center (for example, how the cloud IT system meets the disaster tolerance indexes of a double-active system, a warm recovery system or a cold recovery system, and the resource utilization rate needs to be optimized to the maximum).
Improvements to a technology can clearly be distinguished between hardware improvements (e.g. improvements to the circuit structure of diodes, transistors, switches, etc.) and software improvements (improvements to the process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate a dedicated integrated circuit chip 2. Furthermore, nowadays, instead of manually manufacturing an integrated circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abll (advanced desktop Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal cpu, JHDL (alternate software Description Language), Lava, Lola, HDL, pam, hard Language (Hardware Description Language), and vhigh-Language (Hardware Description Language, which is currently used by java-version 2). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
While the present application has been described with examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (8)

1. A cloud platform resource allocation method based on a disaster recovery center is characterized by comprising the following steps:
acquiring load data of servers corresponding to each service system deployed by a disaster recovery center on a cloud platform;
collecting service operation data from a production environment;
classifying the load data and the service operation data according to preset disaster tolerance mode grades to obtain load condition data of each service system in different disaster tolerance modes; the disaster recovery mode comprises a double-active disaster recovery mode, a main and auxiliary disaster recovery mode, a warm standby disaster recovery mode and a cold standby disaster recovery mode;
performing resource configuration on the cloud platform according to a preset strategy configuration table and load condition data;
when the service system is in a warm standby disaster recovery mode or a cold standby disaster recovery mode, the resource configuration of the cloud platform according to the policy configuration table and the load condition data comprises:
determining an optimization strategy of the service system according to a strategy configuration table;
performing resource allocation on the service system according to the optimization strategy and the load condition data;
the resource configuration of the service system according to the optimization strategy and the load condition data comprises:
acquiring set information from the optimization strategy;
when the resource configuration of the service system does not meet the set information, reading a virtual machine corresponding to a physical machine from a preset database according to the name of the physical machine corresponding to the service system;
acquiring the CPU utilization rate and the memory utilization rate of the virtual machine from the load condition data;
determining the coefficient of the virtual machine according to the CPU utilization rate and the memory utilization rate;
selecting a virtual machine to be migrated according to the coefficient of the virtual machine and the optimization strategy;
selecting a physical machine to be migrated according to a virtual machine to be migrated;
and migrating the physical machine to be migrated and the virtual machine to be migrated so as to enable the resource configuration of the service system after migration to meet the set information.
2. The method of claim 1, wherein a script is used to collect load data of a server, wherein the server comprises a virtual machine and a physical machine.
3. The method of claim 2, wherein the load data comprises CPU usage, memory usage, and storage capacity usage, and the service operation data comprises daily average transaction amount and daily peak transaction amount.
4. The method according to claim 1, wherein when the service system is in a dual active disaster recovery mode or a primary and secondary disaster recovery mode, performing resource configuration on the cloud platform according to a policy configuration table and load condition data comprises:
collecting forecast information from the production environment;
determining an optimization strategy of the service system according to the prediction information and the strategy configuration table;
and carrying out resource allocation on the service system according to the optimization strategy and the load condition data.
5. A disaster recovery center-based cloud platform resource configuration system is characterized by comprising:
the load data acquisition device is used for acquiring load data of servers corresponding to each service system deployed by the disaster recovery center on the cloud platform;
the operation data acquisition device is used for acquiring service operation data from the production environment;
the data classification device is used for classifying the load data and the service operation data according to a preset disaster tolerance mode grade to obtain load condition data of each service system in different disaster tolerance modes; the disaster recovery mode comprises a double-active disaster recovery mode, a main and auxiliary disaster recovery mode, a warm standby disaster recovery mode and a cold standby disaster recovery mode;
the resource configuration device is used for performing resource configuration on the cloud platform according to a preset strategy configuration table and load condition data; when the service system is in a warm standby disaster recovery mode or a cold standby disaster recovery mode, the resource configuration device includes:
the first optimization strategy determination module is used for determining the optimization strategy of the service system according to a strategy configuration table;
the resource allocation module is used for performing resource allocation on the service system according to the optimization strategy and the load condition data;
the resource configuration module comprises:
the acquisition unit is used for acquiring setting information from the optimization strategy;
a reading unit, configured to, when the resource configuration of the service system does not meet the setting information, read a virtual machine corresponding to a physical machine from a preset database according to a name of the physical machine corresponding to the service system;
a utilization rate obtaining unit, configured to obtain a CPU utilization rate and a memory utilization rate of the virtual machine from the load condition data;
the coefficient determining unit is used for determining the coefficient of the virtual machine according to the CPU utilization rate divided by the memory utilization rate;
the first determining unit is used for selecting the virtual machine to be migrated according to the coefficient of the virtual machine and the optimization strategy;
the second determining unit is used for selecting the physical machine to be migrated according to the virtual machine to be migrated;
and the migration unit is used for migrating the physical machine to be migrated and the virtual machine to be migrated so as to enable the business system after migration to meet the set information.
6. The system of claim 5, wherein the load data collection device collects load data of a server using a script, and the server comprises a virtual machine and a physical machine.
7. The system of claim 6, wherein the load data comprises CPU usage, memory usage, and storage capacity usage, and the business operation data comprises daily average and daily peak transaction amounts.
8. The system according to claim 5, wherein when the service system is in the dual active disaster recovery mode or the primary and secondary disaster recovery mode, the resource configuration device further comprises:
the prediction information acquisition module is used for acquiring prediction information from the production environment;
and the second optimization strategy determination module is used for determining the optimization strategy of the service system according to the prediction information and the strategy configuration table.
CN201610874942.1A 2016-09-30 2016-09-30 Disaster recovery center-based cloud platform resource configuration method and system Active CN106254166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610874942.1A CN106254166B (en) 2016-09-30 2016-09-30 Disaster recovery center-based cloud platform resource configuration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610874942.1A CN106254166B (en) 2016-09-30 2016-09-30 Disaster recovery center-based cloud platform resource configuration method and system

Publications (2)

Publication Number Publication Date
CN106254166A CN106254166A (en) 2016-12-21
CN106254166B true CN106254166B (en) 2020-06-23

Family

ID=57612481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610874942.1A Active CN106254166B (en) 2016-09-30 2016-09-30 Disaster recovery center-based cloud platform resource configuration method and system

Country Status (1)

Country Link
CN (1) CN106254166B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659459A (en) * 2017-11-02 2018-02-02 郑州云海信息技术有限公司 Distributed type assemblies management software disaster tolerance management method, system, medium and equipment
CN109614034A (en) * 2018-10-15 2019-04-12 酒泉钢铁(集团)有限责任公司 A method of for the complex data center bulk migration multi-platform across physical region
CN110134513B (en) * 2019-04-17 2023-08-22 平安科技(深圳)有限公司 Load balancing method, device, computer equipment and storage medium
CN110336855B (en) * 2019-05-09 2020-05-05 广州市番禺区中心医院 Medical cloud data system
CN110191016B (en) * 2019-05-21 2021-08-13 深信服科技股份有限公司 Cloud platform service monitoring method, device, equipment and system and readable storage medium
CN110417596B (en) * 2019-07-29 2023-07-28 北京百度网讯科技有限公司 Capacity expansion method and device for available area
CN112015590B (en) * 2020-07-15 2023-11-14 北京淇瑀信息科技有限公司 Multi-level disaster recovery method and device and electronic equipment
CN111897654B (en) * 2020-07-31 2023-08-15 腾讯科技(深圳)有限公司 Method and device for migrating application to cloud platform, electronic equipment and storage medium
CN112732490A (en) * 2021-01-14 2021-04-30 国网上海市电力公司 Information determination method, device, equipment and storage medium
CN114221962B (en) * 2021-12-09 2024-02-13 兴业银行股份有限公司 Cloud resource reallocation method and system based on peak utilization rate
CN115022342B (en) * 2022-05-31 2023-12-05 Oppo广东移动通信有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN115599606B (en) * 2022-11-16 2023-03-21 恒丰银行股份有限公司 Method, device and medium for generating disaster recovery switching scheme

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778474A (en) * 2012-10-18 2014-05-07 华为技术有限公司 Resource load capacity prediction method, analysis prediction system and service operation monitoring system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629224B (en) * 2012-04-26 2014-11-26 广东电子工业研究院有限公司 Method and device of integrated data disaster recovery based on cloud platform
CN103500126B (en) * 2013-10-28 2016-06-15 北京大学 A kind of automatization fault-tolerant configuration method of cloud computing platform

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778474A (en) * 2012-10-18 2014-05-07 华为技术有限公司 Resource load capacity prediction method, analysis prediction system and service operation monitoring system

Also Published As

Publication number Publication date
CN106254166A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
CN106254166B (en) Disaster recovery center-based cloud platform resource configuration method and system
US9262272B2 (en) Data center power adjustment
Zhu et al. Real-time tasks oriented energy-aware scheduling in virtualized clouds
Amur et al. Robust and flexible power-proportional storage
Sarood et al. Maximizing throughput of overprovisioned hpc data centers under a strict power budget
CN104102543B (en) The method and apparatus of adjustment of load in a kind of cloud computing environment
Papadimitriou et al. Adaptive voltage/frequency scaling and core allocation for balanced energy and performance on multicore cpus
Jog et al. Exploiting core criticality for enhanced GPU performance
CN102694868B (en) A kind of group system realizes and task dynamic allocation method
CN102812439B (en) For the method and system of assigned tasks in multiprocessor computer system
US20140059232A1 (en) Robust tenant placement and migration in database-as-a-service environments
Yang et al. Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds
EP4068090A1 (en) Container scheduling method and apparatus, and non-volatile computer-readable storage medium
US8788864B2 (en) Coordinated approach between middleware application and sub-systems
Li et al. Managing green datacenters powered by hybrid renewable energy systems
Aikema et al. Energy-cost-aware scheduling of HPC workloads
CN105893141A (en) Regulation control method and apparatus for multi-core processor and mobile terminal using method
Sampaio et al. Dynamic power-and failure-aware cloud resources allocation for sets of independent tasks
US20120054762A1 (en) Scheduling apparatus and method for a multicore device
Yang et al. Improving Spark performance with MPTE in heterogeneous environments
Maroulis et al. A framework for efficient energy scheduling of spark workloads
Zhang et al. Workload consolidation in alibaba clusters: the good, the bad, and the ugly
Hu et al. Hope: enabling efficient service orchestration in software-defined data centers
CN109144693B (en) Power self-adaptive task scheduling method and system
CN105933702A (en) Power consumption control method based on task sensitivity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant