WO2018001061A1 - 降低灾备中心系统切换不可用时间的方法、设备及终端 - Google Patents

降低灾备中心系统切换不可用时间的方法、设备及终端 Download PDF

Info

Publication number
WO2018001061A1
WO2018001061A1 PCT/CN2017/087534 CN2017087534W WO2018001061A1 WO 2018001061 A1 WO2018001061 A1 WO 2018001061A1 CN 2017087534 W CN2017087534 W CN 2017087534W WO 2018001061 A1 WO2018001061 A1 WO 2018001061A1
Authority
WO
WIPO (PCT)
Prior art keywords
job
disaster recovery
rpo
time
risk value
Prior art date
Application number
PCT/CN2017/087534
Other languages
English (en)
French (fr)
Inventor
郝建明
张炼
马平清
王巍
韩智东
廉宜果
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2018001061A1 publication Critical patent/WO2018001061A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0836Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability to enhance reliability, e.g. reduce downtime
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

  • This application relates to the field of system disaster recovery technology, especially the disaster recovery scheduling technology in the process of disaster recovery center system switching.
  • the technique is specifically a method, device and terminal based on the scheduling operation of the disaster recovery center.
  • the disaster recovery center can continue to provide users with continuous and stable services in the event of a disaster, but it takes a certain amount of time from disaster occurrence to business and data switching to the disaster recovery center. How to shorten the system unavailable time caused by system switching during disasters It is an important indicator to measure the disaster recovery capability of the disaster recovery center.
  • the traditional disaster recovery mode there is an interval between each job, and the time interval is equal to the recovery point target RPO specified time.
  • the RPO here refers to the point in time at which the system and data must be recovered after a disaster.
  • the traditional disaster recovery center has weak management capabilities for multi-user and multi-task disaster recovery services.
  • the present application provides a method, a device, and a terminal for reducing the unavailability of a disaster recovery center system, and analyzing the system unavailability time when a data center disaster occurs.
  • the RPO risk model and the disaster recovery service scheduling model are used to calculate the overall risk value of the user's job corresponding to the business system, and compare with the threshold to perform business scheduling, thereby reducing the unavailability of system data during disasters. Effectively improve the disaster recovery capability of the data center.
  • One of the objectives of the present application is to provide a method for reducing the unavailability of the disaster recovery center system, the method comprising: obtaining a system unavailability time when the disaster recovery center disaster occurs; determining according to the system unavailability time The disaster recovery service scheduling model; obtaining an operation of the user corresponding to the business system of the disaster recovery center; determining an overall risk value of the operation according to the disaster recovery service scheduling model; acquiring a preset threshold; according to the threshold And the overall risk value described is scheduled for the job.
  • determining a disaster recovery service scheduling model according to the system unavailable time includes: analyzing the system unavailable time, obtaining a recovery point target RPO risk model; and according to the RPO risk model Determine the disaster recovery business scheduling model of the running job.
  • the disaster recovery service scheduling model of the running job is:
  • is the overall risk value
  • is the risk value
  • T RPO is the RPO time
  • T n is the nth job duration
  • T m is the waiting time
  • T n+1 is the n+1th job duration
  • is the remaining time of the job
  • i 1 , i 2 , i 3 are weighting coefficients
  • i 1 + i 2 + i 3 1.
  • determining a disaster recovery service scheduling model according to the system unavailable time includes: analyzing the system unavailable time, obtaining a recovery point target RPO risk model; and according to the RPO risk model The historical risk value of the RPO is determined; and the disaster recovery service scheduling model of the unoperated operation is determined according to the historical risk value of the RPO.
  • the disaster recovery service scheduling model of the non-running job is:
  • ⁇ average ⁇ i 1 + ⁇ ⁇ i 2
  • beta] is the overall risk value
  • [lambda] is the priority of the job
  • [alpha] average historical risk value for the job i 1, i 2 is a weighting coefficient
  • i 1 + i 2 1, T RPO as RPO time
  • T n For the nth job duration
  • k is the number of historical jobs.
  • scheduling the job according to the threshold and the overall risk value comprises: determining whether the overall risk value exceeds the threshold; when the determination is yes, acquiring The proportion of the system of the job and the proportion of network resources; obtaining a preset intervention rule; according to the system ratio and The network resource ratio selects an intervention strategy from the intervention rules; and the operation is scheduled according to the intervention strategy.
  • An object of the present invention is to provide a device for reducing the time when the disaster recovery center system is unavailable for use, and the device includes a system unavailable time acquisition module, which is used to obtain the system unavailable time when the disaster recovery center disaster occurs.
  • a service scheduling module determining module configured to determine a disaster recovery service scheduling model according to the system unavailable time;
  • the job obtaining module configured to acquire a job of a user corresponding to the business system of the disaster recovery center; and an overall risk value determining module And determining, according to the disaster recovery service scheduling model, an overall risk value of the job;
  • a threshold obtaining module configured to acquire a preset threshold; and a scheduling module, configured to use, according to the threshold, the overall risk value The job is scheduled.
  • the service scheduling module determining module includes: an RPO risk model determining unit, configured to analyze the system unavailable time, and obtain a recovery point target RPO risk model; the first service scheduling door is determined And a unit, configured to determine, according to the RPO risk model, a disaster recovery service scheduling model of the running job.
  • the service scheduling module determining module further includes: a historical risk value determining unit, configured to determine an RPO historical risk value according to the RPO risk model; and a second service scheduling model determining unit, A disaster recovery service scheduling model for determining an unoperated job based on the RPO historical risk value.
  • the scheduling module includes: a determining unit, configured to determine whether the total risk value exceeds the threshold; and a ratio obtaining unit, configured to determine, when the determining unit is When yes, the system occupancy ratio of the job and the network resource ratio are obtained; the intervention rule acquisition unit is configured to acquire a preset intervention rule; the intervention strategy selection unit is configured to occupy the system accountant and the network resource account. The intervention strategy is selected from the intervention rule; the scheduling unit is configured to schedule the job according to the intervention strategy.
  • One of the objectives of the present application is to provide a storage device, wherein the storage device stores a plurality of instructions, the instructions being adapted to be loaded and executed by a processor: acquiring a system unavailable time when a disaster recovery center disaster occurs; Determining a disaster recovery service scheduling model according to the system unavailable time; obtaining a job of the user corresponding to the service system of the disaster recovery center; determining an overall risk value of the operation according to the disaster recovery service scheduling model; a set threshold; scheduling the job based on the threshold and the overall risk value.
  • One of the objects of the present application is to provide a terminal, wherein the terminal comprises: a processor adapted to implement instructions and a storage device, the storage device storing a plurality of instructions, the instructions being adapted to be processed Load and execute: obtain the system unavailable time when the disaster recovery center disaster occurs; determine the disaster recovery service scheduling model according to the system unavailable time; and obtain an operation of the user corresponding to the business system of the disaster recovery center; Disaster recovery business The scheduling model determines an overall risk value of the job; acquires a preset threshold; and schedules the job according to the threshold and the overall risk value.
  • the utility model has the beneficial effects of providing a method, a device and a terminal for reducing the unavailability of the disaster recovery center system, and analyzing the system unavailability time when the data center disaster occurs, and the RPO risk model and the disaster are obtained.
  • the service scheduling model is used to calculate the overall risk value of the user's job corresponding to the business system, and compares with the threshold to perform service scheduling, thereby reducing the unavailability time of the system data in the event of a disaster, and effectively improving the data center. Disaster preparedness.
  • FIG. 1 is a flowchart of a method for reducing a handover unavailable time of a disaster recovery center system according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of Embodiment 1 of step S101 in FIG. 1;
  • FIG. 3 is a flowchart of Embodiment 2 of step S101 in FIG. 1;
  • FIG. 4 is a flow chart of step S106 in Figure 1;
  • FIG. 5 is a structural block diagram of an apparatus for reducing an unavailable time of a disaster recovery center system according to an embodiment of the present disclosure
  • FIG. 6 is a structural block diagram of Embodiment 1 of a service scheduling module determining module in an apparatus for reducing a time when the disaster recovery center system is unavailable for use according to an embodiment of the present disclosure
  • FIG. 7 is a structural block diagram of Embodiment 2 of a service scheduling module determining module in a device for reducing the time for switching the unavailable time of the disaster recovery center system according to an embodiment of the present disclosure
  • FIG. 8 is a structural block diagram of a service scheduling module determining module in a device for reducing a handover unavailable time of a disaster recovery center system according to an embodiment of the present disclosure
  • Figure 9 is a diagram showing an example of the timing of the job.
  • Figure 10 is a schematic diagram of a disaster occurring during the nth operation
  • Figure 11 is a schematic diagram of a disaster occurring during the n+1th operation
  • Figure 12 is a schematic diagram of the occurrence of a disaster in waiting time
  • Figure 13 is a schematic diagram showing that the RPO time is less than the working time
  • FIG. 14 is a schematic diagram showing an RPO time greater than an nth operation time and less than a completion time of the n+1th job;
  • Figure 15 is a schematic diagram showing the completion time of the RPO time being greater than the n+1th job
  • Figure 16 is a schematic view of a conventional operation method
  • Figure 17 is a schematic diagram of actual business RPO risk values
  • FIG. 18 is a schematic diagram of disaster service scheduling in a specific embodiment.
  • Recovery Point Target RPO The point in time at which the system and data must be restored after a disaster.
  • Recovery time target RTO time requirement for information system or business function from pause to recovery.
  • FIG. 1 is a specific flowchart of a method for reducing a handover unavailability time of a disaster recovery center system according to the present application. As shown in FIG. 1, the method includes:
  • FIG. 2 is a flowchart of the first embodiment of step S101
  • FIG. 3 is a flowchart of the second embodiment of step S101.
  • S103 Acquire an operation of a user corresponding to the business system of the disaster recovery center.
  • a user corresponding to the business system of the disaster recovery center.
  • the disaster recovery center corresponds to multiple service systems, and each service system corresponds to multiple users, and each user corresponds to multiple jobs.
  • S104 Determine an overall risk value of the job according to the disaster recovery service scheduling model.
  • S106 Schedule the job according to the threshold and the overall risk value.
  • the method for reducing the unavailability of the disaster recovery center system is provided by the present application, which reduces the system by scheduling the ongoing work or the running job of the corresponding user of the business system when the disaster occurs. Switch the unavailable time.
  • FIG. 2 is a flowchart of Embodiment 1 of step S101. As shown in FIG. 2, the step specifically includes the following in the first embodiment:
  • the main quantitative criteria for measuring the recovery capability of the disaster recovery center are the recovery point target and the recovery time target, according to the definition of national standards:
  • the recovery point target RPO and the recovery time objective (RTO) are quantitative indicators for measuring the recovery capability level of the disaster recovery center, and can be used as the target standard for the disaster recovery center to provide various user service commitments.
  • RPO The point in time at which the system and data must be restored after a disaster.
  • RTO The time required for an information system or business function from a standstill to a recovery after a disaster.
  • the disaster recovery center is responsible for data replication and storage, providing a recovery drill environment; each access unit is responsible for system and data recovery.
  • the service commitment of the disaster recovery center is based on RPO. In order to meet the service commitment, the disaster recovery center must ensure that the disaster recovery data stored by the user meets the RPO requirements.
  • This application is mainly for copying and backing up data to the disaster recovery center in an asynchronous manner.
  • the disaster recovery system of Level 1 to Level 5 that is, the system of T RPO ⁇ 0
  • the T RPO must be equal to 0, using the synchronous mode, and there is no service scheduling, and will not be discussed.
  • the disaster recovery center of the present application uses the wide area network to transmit and store data of each unit to the disaster recovery center. Due to the small bandwidth and large amount of data, each operation time will be long, and the working time becomes the main factor affecting the RPO. According to the general knowledge at the time, each time T ⁇ T RPO , the recovery point target can be guaranteed and the service promise can be realized. However, after in-depth research, it is found that such a problem exists.
  • T RPO is a time indicator and time is a constant forward factor, we can place the job on the timeline.
  • the disaster recovery work is a process consisting of one job, the waiting time between jobs, and the next job. , as shown in Figure 9.
  • the nth operation starts from t 0 , and the disaster recovery system transmits the disaster recovery data generated by the user at the time point t 0 to the disaster recovery center, and completes the operation at the time point t 2 after the T 2 duration; the waiting time after the T 2 , i.e., the next job n + 1 times the job started at time t 2, the user data transmission at the time point t 2 generated disaster to disaster recovery center, after duration T 3 at time t 3 to complete the job. This process will continue to be repeated.
  • This situation can also be considered as a disaster in the middle of the n+1th job process, as shown in Figure 11.
  • Figure 11 shows that the disaster occurred during the n+1th operation.
  • the time point that can be recovered is the starting point of the nth job, namely: t 0 .
  • Figure 12 shows that the disaster occurred during the waiting time.
  • the relationship between the time of the disaster and the RPO is: when the disaster occurs, the time point target of the data can be recovered within the time specified by the forward RPO.
  • Figure 13 shows that the RPO time is less than the operating time.
  • each operation time must be no more than T RPO , that is: T n ⁇ T RPO .
  • Figure 14 shows the RPO time is greater than the nth job time and less than the completion time of the n+1th job.
  • the RPO time is greater than the completion time of the n+1th job, that is, (T 1 + T 2 + T 3 ) ⁇ T RPO , as shown in FIG.
  • Figure 15 shows the RPO time is greater than the completion time of the n+1th job.
  • the data recovery point target can be guaranteed to meet the RPO requirements:
  • T n ⁇ T RPO , (T 1 + T 2 + T 3 ) ⁇ T RPO formula can be expressed as:
  • T 2 is the waiting time that can be used for dispensing, so: T 2 ⁇ 0.
  • T RPO ⁇ (T 1 +T 2 +T 3 ) ⁇ 2T RPO formula can be expressed as:
  • T 2 is the waiting time that can be used for dispensing
  • is the risk value
  • T RPO is the RPO time
  • T n is the nth job duration
  • T m is the waiting time
  • T n+1 is the n+1th job duration.
  • S202 Determine a disaster recovery service scheduling model of the running job according to the RPO risk model.
  • the RPO risk model shows that the maximum time of each operation time can be guaranteed to meet the RPO requirements, and there is no risk:
  • each work time is similar, that is, T 2 ⁇ T 3 ;
  • T 2 can be adjusted as the waiting time, ie T 2 ⁇ 0; since: (T 1 +T 2 +T 3 ) ⁇ T RPO , (T 1 +0+T 1 ) ⁇ T RPO
  • Inference 1 The time of each operation is controlled at Within the scope of the disaster recovery business is safe.
  • T 3 When two inference operations a risk, should minimize the time T 3, T 3 must be less than
  • the traditional method of operation is to start a job at intervals, and the interval is often equal to the time specified by the RPO. As shown in Figure 16.
  • the system cannot reach the RPO target when the disaster occurs during the next operation, that is, when the disaster occurs at T 3 .
  • the RPO risk value ⁇ is mainly used to measure the average risk value of each unit through historical data
  • is the average value of the historical risk job
  • T n is the n-th operation duration
  • k is the number of job history.
  • the next job start time t 2 in the RPO risk model is replaced with the current time, that is, the time period between the previous job completion time point t 1 and the current time t 2 is T 2 , which represents the previous job completion to the current time.
  • the time period from the current time t 2 to the completion of the current job T 2 time t 3 is T 3 , which represents the time to be completed by the current job.
  • T 1 and T 2 occur before the current time, they are all known values.
  • T 3 is a predictive value and is a key value used to predict RPO risk.
  • the network bandwidth (user egress bandwidth and disaster recovery center ingress bandwidth) of the WAN is the main bottleneck of the disaster recovery service. Therefore, the main factor affecting the completion time of the operation is the data size g (Mb). And the size of the bandwidth m (Mb / s).
  • g stands for: From the current start, how much data needs to be transferred to complete the job;
  • the RPO risk value formula for the business scheduling model is:
  • the warning is given, indicating that the operation can be completed in the RPO, but there is no guarantee that the next operation can be completed in the RPO. It is necessary to pay attention to the current and next operation resource utilization. When it is large, the risk is greater and intervention is necessary when necessary;
  • the disaster recovery service that is being executed is intervened, and the disaster recovery service scheduling model is:
  • is the overall risk value
  • is the risk value
  • T RPO is the RPO time
  • T n is the nth job duration
  • T m is the waiting time
  • T n+1 is the n+1th job duration
  • is the remaining time of the job
  • i 1 , i 2 , i 3 are weighting coefficients
  • i 1 + i 2 + i 3 1.
  • the weighting factor can be adjusted according to the actual situation.
  • FIG. 3 is a flowchart of Embodiment 2 of step S101. As shown in FIG. 3, the step specifically includes the following in the second embodiment:
  • step S301 Analyze the system unavailable time to obtain a recovery point target RPO risk model. This step is similar to step S201 and will not be described here.
  • S302 Determine an RPO historical risk value according to the RPO risk model.
  • the RPO risk value ⁇ is mainly used to measure the average risk value of each unit through historical data;
  • is the average value of the historical risk job
  • T n is the n-th operation duration
  • k is the number of job history.
  • S303 Determine a disaster recovery service scheduling model of the unoperated operation according to the RPO historical risk value. Intervene in the disaster recovery transaction to be performed.
  • the disaster recovery service scheduling model is:
  • ⁇ average ⁇ i 1 + ⁇ ⁇ i 2
  • beta] is the overall risk value
  • [lambda] is the priority of the job
  • [alpha] average historical risk value for the job i 1, i 2 is a weighting coefficient
  • i 1 + i 2 1, T RPO as RPO time
  • T n For the nth job duration
  • k is the number of historical jobs, and the weighting factor can be adjusted according to the actual situation.
  • information such as the duration of each job, the waiting time, the priority, and the remaining time corresponding to the job in step S103 can be obtained, thereby determining the overall risk value.
  • step S106 includes:
  • S401 Determine whether the overall risk value exceeds the threshold.
  • the threshold can be preset and changed according to the specific implementation scenario.
  • intervention rules such as pause, delay, termination, speed limit, and the like.
  • the main intervention is to intervene in other operations and give resources to the RPO with higher risk value to complete the task.
  • the intervened work is resumed.
  • S404 Select an intervention strategy from the intervention rule according to the system occupancy ratio and the network resource ratio.
  • the corresponding overall risk value ⁇ can be obtained according to the disaster recovery service scheduling model.
  • the obtained system accounts for C, and the network resource proportion is D.
  • the intervention strategy is selected from the intervention rules.
  • the relationship between the value of the system C and the network resource ratio D and the intervention rule can be determined according to different actual conditions. For example, for a system with a large overall resource, even if the occupancy rate reaches 80% or more, there are still sufficient resources available.
  • a pause intervention strategy is selected.
  • D is 60%-70%
  • a delayed intervention strategy is selected.
  • C is 70% or more and D is 70%-80%
  • the termination intervention strategy is selected.
  • C is an arbitrary value and D is 80% or more
  • a speed limit intervention strategy is selected.
  • S405 Schedule the job according to the intervention strategy.
  • the scheduling manner corresponding to the intervention strategy is as shown in Table 2.
  • the method for reducing the unavailability of the disaster recovery center system is provided in the present application.
  • a disaster recovery service scheduling is developed.
  • the solution implements disaster recovery service management for multi-user and multi-task, and provides effective services for each access user according to the promised RPO requirements, ensuring disaster recovery service scheduling tasks.
  • FIG. 5 is a structural block diagram of an apparatus for reducing the time of unavailability of a disaster recovery center system according to an embodiment of the present disclosure. As shown in FIG. 5, the device includes:
  • the system unavailable time acquisition module 101 is configured to obtain the system unavailable time when the disaster recovery center disaster occurs.
  • the service scheduling module determining module 102 is configured to determine a disaster recovery service scheduling model according to the system unavailable time.
  • 6 is a structural block diagram of Embodiment 1 of a service scheduling module determining module
  • FIG. 7 is a structural block diagram of Embodiment 2 of a service scheduling module determining module.
  • the job obtaining module 103 is configured to acquire a job of a user corresponding to the business system of the disaster recovery center.
  • 18 is a schematic diagram of disaster service scheduling in a specific embodiment.
  • the disaster recovery center corresponds to multiple service systems, and each service system corresponds to multiple users, and each user corresponds to multiple jobs.
  • the overall risk value determining module 104 is configured to determine an overall risk value of the job according to the disaster recovery service scheduling model.
  • the threshold obtaining module 105 is configured to acquire a preset threshold
  • the scheduling module 106 is configured to schedule the job according to the threshold and the overall risk value.
  • FIG. 6 is a structural block diagram of Embodiment 1 of a service scheduling module determining module in a device for reducing the unavailability of a disaster recovery center system according to an embodiment of the present invention.
  • FIG. 6 is specifically included in Embodiment 1 :
  • the RPO risk model determining unit 201 is configured to analyze the system unavailable time and obtain a recovery point target RPO risk model.
  • the RPO risk model can be derived as:
  • is the risk value
  • T RPO is the RPO time
  • T n is the nth job duration
  • T m is the waiting time
  • T n+1 is the n+1th job duration.
  • the first service scheduling door determining unit 202 is configured to determine a disaster recovery service scheduling model of the running job according to the RPO risk model.
  • the disaster recovery service is intervened, and the disaster recovery service scheduling model is:
  • is the overall risk value
  • is the risk value
  • T RPO is the RPO time
  • T n is the nth job duration
  • T m is the waiting time
  • T n+1 is the n+1th job duration
  • is the remaining time of the job
  • i 1 , i 2 , i 3 are weighting coefficients
  • i 1 + i 2 + i 3 1.
  • the weighting factor can be adjusted according to the actual situation.
  • FIG. 7 is a structural block diagram of a second embodiment of a service scheduling module determining module in a device for reducing the unavailability of a disaster recovery center system according to an embodiment of the present invention.
  • FIG. 7 is specifically included in the second embodiment. :
  • the historical risk value determining unit 203 is configured to determine an RPO historical risk value according to the RPO risk model.
  • the RPO risk value ⁇ is mainly used to measure the average risk value of each unit through historical data;
  • is the average value of the historical risk job
  • T n is the n-th operation duration
  • k is the number of job history.
  • the second service scheduling model determining unit 204 is configured to determine a disaster recovery service scheduling model of the unoperated job according to the RPO historical risk value. Intervene in the disaster recovery transaction to be performed.
  • the disaster recovery service scheduling model is:
  • ⁇ average ⁇ i 1 + ⁇ ⁇ i 2
  • beta] is the overall risk value
  • [lambda] is the priority of the job
  • [alpha] average historical risk value for the job i 1, i 2 is a weighting coefficient
  • i 1 + i 2 1, T RPO as RPO time
  • T n For the nth job duration
  • k is the number of historical jobs, and the weighting factor can be adjusted according to the actual situation.
  • FIG. 8 is a structural block diagram of a service scheduling module determining module in a device for reducing the unavailability of a disaster recovery center system according to an embodiment of the present disclosure. As shown in FIG. 8, the scheduling module includes:
  • the determining unit 401 is configured to determine whether the overall risk value exceeds the threshold.
  • the threshold can be preset and changed according to the specific implementation scenario.
  • the ratio obtaining unit 402 is configured to acquire, when the determination is yes, the system occupancy ratio of the job and the network resource proportion.
  • the intervention rule obtaining unit 403 is configured to acquire a preset intervention rule.
  • intervention rules such as pause, delay, termination, speed limit, and the like.
  • the main intervention is to intervene in other operations and give resources to the RPO with higher risk value to complete the task. When the RPO risk value is controlled, the intervened work is resumed.
  • the intervention strategy selecting unit 404 is configured to select an intervention strategy from the intervention rules according to the system occupancy ratio and the network resource ratio.
  • the corresponding overall risk value ⁇ can be obtained according to the disaster recovery service scheduling model.
  • the obtained system accounts for C, and the network resource proportion is D.
  • the intervention strategy is selected from the intervention rules.
  • the relationship between the value of the system C and the network resource ratio D and the intervention rule can be determined according to different actual conditions. For example, for a system with a large overall resource, even if the occupancy rate reaches 80% or more, there are still sufficient resources available.
  • a pause intervention strategy is selected.
  • D is 60%-70%
  • a delayed intervention strategy is selected.
  • C is 70% or more and D is 70%-80%
  • the termination intervention strategy is selected.
  • C is an arbitrary value and D is 80% or more
  • a speed limit intervention strategy is selected.
  • the scheduling unit 405 is configured to schedule the job according to the intervention policy.
  • the scheduling manner corresponding to the intervention strategy is as shown in Table 2.
  • the device for reducing the unavailability of the disaster recovery center system is provided in the present application.
  • a disaster recovery service scheduling is developed.
  • the solution implements disaster recovery service management for multi-user and multi-task, and provides effective services for each access user according to the promised RPO requirements, ensuring disaster recovery service scheduling tasks.
  • the equipment labels each work in each unit in the form of a list and a diagram.
  • the risk value calculated by each user for each job according to the business scheduling model is measured by the solid line, the dotted line, and the line marked with a cross mark: the solid line indicates normal, the broken line indicates early warning, and the cross mark indicates Intervention.
  • the business scheduling work for the job is carried out according to the business scheduling model.
  • the service scheduling model also improves the overall performance of the disaster recovery system. By rationally scheduling multiple tasks for multiple users, it effectively utilizes less abundant system resources.
  • the business scheduling model effectively guarantees the safe operation of the production system and plays a decisive role in the effective disaster recovery of the production system.
  • a storage device stores a plurality of instructions, the instructions being adapted to be loaded and executed by a processor:
  • the job is scheduled according to the threshold and the overall risk value.
  • a terminal including:
  • a storage device adapted to store a plurality of instructions adapted to be loaded and executed by the processor:
  • the job is scheduled according to the threshold and the overall risk value.
  • the method and device for reducing the unavailability of the disaster recovery center system are proposed by the present application, and the RPO risk model and the disaster recovery service are obtained through analysis of the system unavailability time when the data center disaster occurs.
  • the scheduling model is used to calculate the overall risk value of the user's job corresponding to the business system, and compares with the threshold to perform service scheduling, thereby reducing the unavailability time of the system data in the event of a disaster, and effectively improving the disaster of the data center.
  • embodiments of the present application can be provided as a method, apparatus, or computer program product. Accordingly, the present application may employ an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. The form of the embodiment. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请提供一种降低灾备中心系统切换不可用时间的方法、设备以及终端,所述方法包括:获取灾备中心灾难发生时的系统不可用时间;根据所述的系统不可用时间确定出灾备业务调度模型;获取灾备中心的业务系统对应的用户的一作业;根据所述的灾备业务调度模型确定所述作业的总体风险值;获取预先设定的阈值;根据所述阈值以及所述的总体风险值对所述的作业进行调度。通过对数据中心灾难发生时系统不可用时间的分析,研究得出了RPO风险模型及灾备业务调度模型,并以此为基础计算业务系统对应的用户的作业的总体风险值,并与阈值进行比较进行业务调度,从而降低了灾难发生时系统数据的不可用时间,有效提高了数据中心的灾备能力。

Description

降低灾备中心系统切换不可用时间的方法、设备及终端
相关申请的交叉引用
本申请要求2016年06月28日递交,申请号为CN201610485697.5,标题为“降低灾备中心系统切换不可用时间的方法及设备”的中国专利的优先权,上述专利申请的全部内容被引用作为参考。
技术领域
本申请关于系统容灾技术领域,特别是关于灾备中心系统切换过程中容灾调度技
术,具体的讲是一种基于灾备中心调度作业的方法、设备及终端。
背景技术
随着信息化技术和银行卡产业的发展,人们对数据中心的运行稳定性的要求越来越高。同时,由于业务和技术的不断更新、系统软硬件的故障,保证数据中心系统的持续稳定运行几乎是不可能完成的任务。灾备中心可以在灾难发生时继续向用户提供持续稳定的服务,但是从灾难的发生到业务和数据切换至灾备中心需要一定的时间,如何缩短灾难发生时因系统切换导致的系统不可用时间是衡量灾备中心的灾备能力的重要指标。
传统的灾难恢复方式中,每次作业之间存在间隔,并且时间间隔等于恢复点目标RPO规定时间。此处的RPO是指灾难发生后,系统和数据必须恢复到的时间点要求。传统的灾难恢复中心对于多用户多任务的灾备业务管理能力较弱。
因此,如何研究和开发出一种新的方案以降低灾难发生时系统、数据的不可用时间,有效提高数据中心的灾备能力是本领域亟待解决的技术难题。
发明内容
为了克服现有技术存在的上述技术问题,本申请提供了一种降低灾备中心系统切换不可用时间的方法、设备及终端,通过对数据中心灾难发生时系统不可用时间的分析,研究得出了RPO风险模型及灾备业务调度模型,并以此为基础计算业务系统对应的用户的作业的总体风险值,并与阈值进行比较进行业务调度,从而降低了灾难发生时系统数据的不可用时间,有效提高了数据中心的灾备能力。
本申请的目的之一是,提供一种降低灾备中心系统切换不可用时间的方法,所述方法包括:获取灾备中心灾难发生时的系统不可用时间;根据所述的系统不可用时间确定出灾备业务调度模型;获取灾备中心的业务系统对应的用户的一作业;根据所述的灾备业务调度模型确定所述作业的总体风险值;获取预先设定的阈值;根据所述阈值以及所述的总体风险值对所述的作业进行调度。
在本申请的优选实施方式中,根据所述的系统不可用时间确定出灾备业务调度模型包括:分析所述的系统不可用时间,得到恢复点目标RPO风险模型;根据所述的RPO风险模型确定出运行中的作业的灾备业务调度模型。
在本申请的优选实施方式中,所述运行中的作业的灾备业务调度模型为:
β=α×i1+λ×i2+δ×i3
Figure PCTCN2017087534-appb-000001
其中,β为总体风险值,α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间,λ为作业的优先级,δ为作业的剩余时间,i1、i2、i3为加权系数,且i1+i2+i3=1。
在本申请的优选实施方式中,根据所述的系统不可用时间确定出灾备业务调度模型包括:分析所述的系统不可用时间,得到恢复点目标RPO风险模型;根据所述的RPO风险模型确定出RPO历史风险值;根据所述的RPO历史风险值确定出未运行的作业的灾备业务调度模型。
在本申请的优选实施方式中,所述未运行的作业的灾备业务调度模型为:
β=α平均×i1+λ×i2
Figure PCTCN2017087534-appb-000002
其中,β为总体风险值,λ为作业的优先级,α平均为作业的历史风险值,i1、i2为加权系数,且i1+i2=1,TRPO为RPO时间,Tn为第n次作业持续时间,k为历史作业的个数。
在本申请的优选实施方式中,根据所述阈值以及所述的总体风险值对所述的作业进行调度包括:判断所述的总体风险值是否超出所述的阈值;当判断为是时,获取所述作业的系统占比以及网络资源占比;获取预先设定的干预规则;根据所述的系统占比以及 网络资源占比从所述的干预规则中选取出干预策略;根据所述的干预策略对所述的作业进行调度。
本发明的目的之一是,提供了一种降低灾备中心系统切换不可用时间的设备,所述的设备包括系统不可用时间获取模块,用于获取灾备中心灾难发生时的系统不可用时间;业务调度模块确定模块,用于根据所述的系统不可用时间确定出灾备业务调度模型;作业获取模块,用于获取灾备中心的业务系统对应的用户的一作业;总体风险值确定模块,用于根据所述的灾备业务调度模型确定所述作业的总体风险值;阈值获取模块,用于获取预先设定的阈值;调度模块,用于根据所述阈值以及所述的总体风险值对所述的作业进行调度。
在本发明的优选实施方式中,所述的业务调度模块确定模块包括:RPO风险模型确定单元,用于分析所述的系统不可用时间,得到恢复点目标RPO风险模型;第一业务调度门口确定单元,用于根据所述的RPO风险模型确定出运行中的作业的灾备业务调度模型。
在本发明的优选实施方式中,所述的业务调度模块确定模块还包括:历史风险值确定单元,用于根据所述的RPO风险模型确定出RPO历史风险值;第二业务调度模型确定单元,用于根据所述的RPO历史风险值确定出未运行的作业的灾备业务调度模型。
在本发明的优选实施方式中,所述的调度模块包括:判断单元,用于判断所述的总体风险值是否超出所述的阈值;占比获取单元,用于当所述的判断单元判断为是时,获取所述作业的系统占比以及网络资源占比;干预规则获取单元,用于获取预先设定的干预规则;干预策略选取单元,用于根据所述的系统占比以及网络资源占比从所述的干预规则中选取出干预策略;调度单元,用于根据所述的干预策略对所述的作业进行调度。
本申请的目的之一是,提供了存储设备,其中,所述存储设备存储有多条指令,所述指令适于由处理器加载并执行:获取灾备中心灾难发生时的系统不可用时间;根据所述的系统不可用时间确定出灾备业务调度模型;获取灾备中心的业务系统对应的用户的一作业;根据所述的灾备业务调度模型确定所述作业的总体风险值;获取预先设定的阈值;根据所述阈值以及所述的总体风险值对所述的作业进行调度。
本申请的目的之一是,提供了一种终端,其中,所述终端包括:适于实现各指令的处理器以及存储设备,所述存储设备存储有多条指令,所述指令适于由处理器加载并执行:获取灾备中心灾难发生时的系统不可用时间;根据所述的系统不可用时间确定出灾备业务调度模型;获取灾备中心的业务系统对应的用户的一作业;根据所述的灾备业务 调度模型确定所述作业的总体风险值;获取预先设定的阈值;根据所述阈值以及所述的总体风险值对所述的作业进行调度。
本申请的有益效果在于,提供了一种降低灾备中心系统切换不可用时间的方法、设备以及终端,通过对数据中心灾难发生时系统不可用时间的分析,研究得出了RPO风险模型及灾备业务调度模型,并以此为基础计算业务系统对应的用户的作业的总体风险值,并与阈值进行比较进行业务调度,从而降低了灾难发生时系统数据的不可用时间,有效提高了数据中心的灾备能力。
为让本申请的上述和其他目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附图式,作详细说明如下。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种降低灾备中心系统切换不可用时间的方法的流程图;
图2为图1中的步骤S101的实施方式一的流程图;
图3为图1中的步骤S101的实施方式二的流程图;
图4为图1中的步骤S106的流程图;
图5为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备的结构框图;
图6为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备中业务调度模块确定模块的实施方式一的结构框图;
图7为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备中业务调度模块确定模块的实施方式二的结构框图;
图8为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备中业务调度模块确定模块的结构框图;
图9为作业的时序示例图;
图10为灾难在第n次作业过程中发生的示意图;
图11为灾难在第n+1次作业过程中发生的示意图;
图12为灾难在等待时间发生的示意图;
图13为RPO时间小于作业时间的示意图;
图14为RPO时间大于第n次作业时间且小于第n+1次作业的完成时间示意图;
图15为RPO时间大于第n+1次作业的完成时间示意图;
图16为传统作业方法示意图;
图17为实际业务RPO风险值的示意图;
图18为具体实施例中的灾难业务调度示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的关键技术术语包括:
恢复点目标RPO:灾难发生后,系统和数据必须恢复到的时间点要求。
恢复时间目标RTO,信息系统或业务功能从停顿到必须恢复的时间要求。
图1为本申请提出的一种降低灾备中心系统切换不可用时间的方法的具体流程图,由图1可知,所述的方法包括:
S101:获取灾备中心灾难发生时的系统不可用时间。
S102:根据所述的系统不可用时间确定出灾备业务调度模型。图2为步骤S101的实施方式一的流程图,图3为步骤S101的实施方式二的流程图。
S103:获取灾备中心的业务系统对应的用户的一作业。如在金融数据领域,甲城市设置有H数据中心,乙城市设置有备用的K数据中心。当H数据中心发生灾难时,需要将数据中心从H切换到K,在切换过程中,灾备中心的业务系统对应的一用户有正在进行运行的作业或即将要运行的作业。图18为具体实施例中的灾难业务调度示意图。由图18可知,灾备中心对应多个业务系统,每个业务系统对应多个用户,每个用户对应多个作业。
S104:根据所述的灾备业务调度模型确定所述作业的总体风险值。
S105:获取预先设定的阈值;
S106:根据所述阈值以及所述的总体风险值对所述的作业进行调度。
如上即是本申请提供的一种降低灾备中心系统切换不可用时间的方法,其通过对灾难发生时,业务系统对应用户的正在进行中的作业或将要运行的作业进行调度,进而降低了系统切换不可用时间。
图2为步骤S101的实施方式一的流程图,由图2可知,该步骤在实施方式一中具体包括:
S201:分析所述的系统不可用时间,得到恢复点目标RPO风险模型。
在具体的实施方式中,衡量灾备中心恢复能力的主要量化标准是恢复点目标和恢复时间目标,根据国家标准的定义:
恢复点目标RPO和恢复时间目标RTO(recovery time objective)是衡量灾备中心恢复能力等级的量化指标,可以作为灾备中心提供给各类用户服务承诺的目标标准。
RPO:灾难发生后,系统和数据必须恢复到的时间点要求。
RTO:灾难发生后,信息系统或业务功能从停顿到必须恢复的时间要求。
RTO/RPO与灾难恢复能力等级的关系具体见表1所示。
表1
Figure PCTCN2017087534-appb-000003
根据责任界面,灾备中心负责数据的复制和存放,提供恢复演练环境;各接入单位负责系统和数据的恢复。灾备中心的服务承诺以RPO为主,为了达到服务承诺,灾备中心必须保证用户存放的灾备数据满足RPO的要求。
本申请主要针对采用异步方式将数据复制、备份到灾备中心,这里主要研究恢复能力第1级至第5级的灾备系统,即TRPO≠0的系统。对于恢复能力第6级的灾备中心,TRPO必须等于0,采用同步方式,不存在业务调度的情况,不再论述。
本申请的灾备中心利用广域网将各家单位的数据传送、储存到灾备中心。由于带宽小、数据量大,每次作业时间会很长,作业时间成为影响RPO的主要因素。按照当时的一般认识,每次作业时间T≤TRPO,则能保证恢复点目标,实现服务承诺。但是经过深入研究发现这样的认识存在问题。
由于TRPO是一个时间指标,而时间是一个恒定向前的因素,我们可以将作业放置于时间轴上,灾备工作是一个由一次作业、作业之间的等待时间、下一次作业组成的过程,如图9所示。
第n次作业自t0开始,灾备系统将用户在时间点t0产生的灾备数据传输至灾备中心,经过T2持续时间,于时间点t2完成作业;经过T2的等待时间,下一次作业即n+1次作业于时间点t2开始,将用户在时间点t2产生的灾备数据传输至灾备中心,经过T3持续时间于时间点t3完成作业。这一过程会不断重复。
结合灾备工作,讨论灾难发生时间对数据恢复点的影响:
第一种情况———灾难在第n次作业过程中发生,如图10所示。
图10中,灾难在第n次作业过程中发生。
1)第n次作业失败;
2)时间点t0的数据无法恢复;
3)能够恢复的时间点是第n-1次作业的起始点。
这种情况也可以视同灾难在第n+1次作业过程中间发生,如图11所示。
图11为灾难在第n+1次作业过程中发生
1)第n+1次作业失败;
2)时间点t2的数据无法恢复;
3)能够恢复的时间点是第n次作业的起始点,即:t0
第二种情况———灾难在等待时间发生,如图12所示。
图12是灾难在等待时间发生。
1)第n次作业成功;
2)时间点t0的数据可以恢复;
3)能够恢复的时间点第t0次作业的起始点,即:t0
经过分析,可以看出:在完成下一次作业之前,都只能恢复本次作业起始点的数据。得出结论:在n+1次作业完成之前,数据恢复点位于第n次作业的起始点t0
灾难发生时间与RPO的关系是:当灾难发生时,向前RPO规定的时间内可以恢复数据的时间点目标。
第一种情况———RPO间小于一次作业时间,即第n次作业时间T2>TRPO时,如图13所示。
图13为RPO时间小于作业时间。
当T2>TRPO,表示一次作业时间大于TRPO,很显然,灾备中心无法完成服务承诺。若灾难在图中风险区段发生时,第n次作业的数据无法恢复,灾难恢复失败。为了保证灾备业务正常开展,每次作业时间必须不大于TRPO,即:Tn≤TRPO
第二种情况———RPO时间大于第n次作业时间,但小于第n+1次作业的完成时间,即T1≤TRPO<(T1+T2+T3)时,如图14所示。
图14为RPO时间大于第n次作业时间且小于第n+1次作业的完成时间。
T1≤TRPO<(T1+T2+T3),当灾难发生在区间(TRPO-t2)里时,可以恢复时间点t0的数据;当灾难发生在区间(t3-RPO)里时,灾难发生时间点向前RPO规定的时间段内并没有找到可用的数据恢复点,根据之前的结论在n+1次作业完成之前,数据恢复点位于第n次作业的起始点t0,向前T′RPO到达t0才能找到第1个可用的数据恢复点,由于T′RPO>TRPO,作业恢复不能达到服务承诺,虽然第n次作业的数据完成传输,但是灾难恢复存在风险。
第三种情况———RPO时间大于第n+1次作业的完成时间,即(T1+T2+T3)≤TRPO时,如图15所示。
图15为RPO时间大于第n+1次作业的完成时间。
(T1+T2+T3)≤TRPO,当灾难在区间t3-t1里发生时,可恢复第n次作业的数据,可以恢复时间点t0的数据;当灾难发生在TRPO-t3区间时,可恢复第n+1次作业的数据,可以恢复时间点t2的数据。
根据以上分析,可以得出结论:
(1)当(T1+T2+T3)≤TRPO时,灾备系统在没有风险条件下达到RPO要求。
在以下两个条件同时有效的情况下,可以保证数据恢复点目标达到RPO的要求:
1)Tn≤TRPO
2)(T1+T2+T3)≤TRPO
由于(T1+T2+T3)≤TRPO,则必然Tn≤TRPO,(T1+T2+T3)≤TRPO公式可以表示为:
Figure PCTCN2017087534-appb-000004
(2)当TRPO<(T1+T2+T3)≤2TRPO时,灾备系统能达到RPO要求,但存在风险
当Tn≤TRPO且T2≤TRPO<(T1+T2+T3)时,第n次作业的数据完成了传输,如果灾难发生在区间TRPO-t3里时,可以恢复时间点t0的数据,如果灾难发生在区间t3-TRPO里时,恢复失败。说明要达到TRPO要求,存在一定的风险。
因为Tn≤TRPO所以T2≤TRPO,T3≤TRPO
因为:T2是等待时间可以用于调剂,所以:T2→0。
所以:(T1+T2+T3)≤2TRPO,同时,(T1+T2+T3)>TRPO
则:TRPO<(T1+T2+T3)≤2TRPO公式可以表示为:
Figure PCTCN2017087534-appb-000005
(3)当任何一次作业大于TRPO时,灾备系统不能达到TRPO要求
当Tn>TRPO时,灾备中心无法完成服务承诺。因为Tn>TRPO,所以:
T1>TRPO,T3>TRPO
因为T2是等待时间可以用于调剂,所以T2→0。
所以(T1+T2+T3)>2TRPO
公式可以表示为:
Figure PCTCN2017087534-appb-000006
(4)RPO风险值公式汇总
归纳以上公式计算,可以得出RPO风险模型为:
Figure PCTCN2017087534-appb-000007
当时α≤0,表示没有任何风险的情况下可以达到RPO要求;
当时0<α≤1,表示在一定风险情况下可以达到RPO要求,α越大且越接近1表示风险越大,反之风险越小;当时α>1,表示不可能达到RPO要求。α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间。
S202:根据所述的RPO风险模型确定出运行中的作业的灾备业务调度模型。
在具体的实施方式中,由RPO风险模型可知,每次作业时间最多控制在多长时间可以保证达到RPO要求,且没有风险:
当α≤0时,即:(T1+T2+T3)≤TRPO,可以在没有风险情况下达成RPO要求。
假设:每次作业时间相近,即T2≈T3
为了计算最长时间,T2作为等待时间可以调剂,即T2→0;因为:(T1+T2+T3)≤TRPO,所以(T1+0+T1)≤TRPO
即:
Figure PCTCN2017087534-appb-000008
推论一:每次作业时间控制在
Figure PCTCN2017087534-appb-000009
的范围内,灾备业务是安全的。
当作业存在风险,如何进行调度:
当0<α≤1时,作业存在超时的风险。为了避免风险,要么灾难发生在n+1次作业之后,要么让第n+1次作业在区间(TRPO-t2)时间范围内完成。由于灾难不可控,所以较为可行的做法是“让第n+1次作业在区间(TRPO-t2)时间范围内完成”,以此来降低不能完成服务承诺的风险。即:T3≤TRPO-T2-t2
由于:
Figure PCTCN2017087534-appb-000010
所以:
Figure PCTCN2017087534-appb-000011
推论二当作业产生风险时,应该尽量缩短T3的时间,T3必须小于
Figure PCTCN2017087534-appb-000012
1)传统作业方法对实现RPO的风险
传统作业方法是每间隔一段时间开始一次作业,而间隔时间往往等于RPO规定的时间。如图16所示。
由于两次作业起始点的间隔时间为TRPO,当灾难在下一次作业过程中发生时,即灾难在T3发生时,则系统无法达到RPO目标。
设T3=n,
则发生不能达到RPO的概率为:
Figure PCTCN2017087534-appb-000013
因为TRPO为常量,当n越小,则P越小;当n越大,则P越大。由于n≤TRPO,根据以上公式可以得出:P≤50%。
推论三传统作业方法会对实现RPO产生风险,风险一般不高于50%。
对于采用独享的高带宽备用网络系统,数据传输速度不再成为瓶颈,每次作业的持续时间都很短。根据风险概率公式
Figure PCTCN2017087534-appb-000014
风险发生的概率往往很低。
在实际应用中,RPO风险值α主要用于通过历史数据测算各家单位的平均风险值;
Figure PCTCN2017087534-appb-000015
α平均为作业的历史风险值,Tn为第n次作业持续时间,k为历史作业的个数。
实际工作中,测算本次作业或者下一次作业的RPO风险值,需要对公式的取值进行调整,如图17所示。
将RPO风险模型中的下一次作业开始时间t2替换为当前时刻,即前一次作业完成时间点t1到当前时刻t2之间的时间段为T2,代表前一次作业完成到当前时刻的时间,当前时刻t2到本次作业完成T2时间t3之间的时间段为T3,代表本次作业将要完成的时间。
由于T1和T2发生在当前时刻之前,都是已知的数值。
而T3是预测数值,是用于预测RPO风险的关键值。对于定位数据级灾备的项目来说,由于广域网的网络带宽(用户出口带宽和灾备中心入口带宽)是灾备业务的主要瓶颈,所以影响作业完成时间的主要因素是数据量大小g(Mb)和带宽m(Mb/s)的大小。
g代表:从当前开始,完成作业还需要传输多少数据量;
m代表:当前网络带宽速率:
Figure PCTCN2017087534-appb-000016
业务调度模型的RPO风险值公式是:
Figure PCTCN2017087534-appb-000017
当α≤0时,无需预警;
当0<α≤1时,提示预警,说明本次作业能够在RPO内完成,但不能保证下一次作业能够在RPO内完成,需要关注当前和下一次作业的资源利用情况,当α越来越大时,风险越大,必要时进行干预;
当α>1时,说明本次作业不能在RPO内完成,系统可能存在问题,需要结合故障检测进行排错,排除故障后,必须通过干预进行灾备业务的调度。
在实际应用“RPO风险模型”转化为“灾备业务调度模型”时,必须考虑其他因素。由于灾备中心要面对多个用户,每个用户单位又有多个作业同时发起,在调度时需要考虑客户作业优先顺序;对于正在执行的作业还要考虑它的剩余完成时间。
本实施方式中,对正在执行的灾备事物进行干预,灾备业务调度模型为:
β=α×i1+λ×i2+δ×i3
其中,β为总体风险值,α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间,λ为作业的优先级,δ为作业的剩余时间,i1、i2、i3为加权系数,且i1+i2+i3=1。加权系数可以按实际情况调整。
图3为步骤S101的实施方式二的流程图,由图3可知,该步骤在实施方式二中具体包括:
S301:分析所述的系统不可用时间,得到恢复点目标RPO风险模型。该步骤与步骤S201类似,此处不再赘述。
S302:根据所述的RPO风险模型确定出RPO历史风险值。在实际应用中,RPO风险值α主要用于通过历史数据测算各家单位的平均风险值;
Figure PCTCN2017087534-appb-000018
α平均为作业的历史风险值,Tn为第n次作业持续时间,k为历史作业的个数。
S303:根据所述的RPO历史风险值确定出未运行的作业的灾备业务调度模型。对将要执行的灾备事务进行干预,灾备业务调度模型为:
β=α平均×i1+λ×i2
其中,β为总体风险值,λ为作业的优先级,α平均为作业的历史风险值,i1、i2为加权系数,且i1+i2=1,TRPO为RPO时间,Tn为第n次作业持续时间,k为历史作业的个数,加权系数可以按实际情况调整。
在灾备业务调度模型确定的基础上,即可获取步骤S103中作业对应的每次作业持续时间、等待时间、优先级、剩余时间等信息,进而确定出总体风险值。
图4为步骤S106的流程图,由图4可知,步骤S106包括:
S401:判断所述的总体风险值是否超出所述的阈值。阈值可预先设定,并根据具体的实施场景改变。
S402:当判断为是时,获取所述作业的系统占比以及网络资源占比。
S403:获取预先设定的干预规则。在具体的实施方式中,干预规则诸如暂停、延迟、终止、限速等操作。主要干预的方法为通过干预其他作业,将资源让给RPO风险值较高的作业,使其完成任务。当RPO风险值得到控制后,重新恢复被干预的作业。
S404:根据所述的系统占比以及网络资源占比从所述的干预规则中选取出干预策略。如,对于A用户的B作业,根据灾备业务调度模型计算可得其对应的总体风险值β。当β大于阈值时,获取到的系统占比为C,网络资源占比为D,则根据C、D从干预规则中选取干预策略。具体的,系统占比C和网络资源占比D的值与干预规则的关系,可根据不同的实际情况确定。如,对于总体资源较大的系统而言,即使占用率达到80%以上,仍有充足的可用资源。而对于总体资源较小的系统,可能占用率达到50%时,就已经资源不足以支撑业务需要了。如,当C为50%-60%,D为50%-60时,选取出暂停的干预策略。当C为60%-70%,D为60%-70%时,选取出延迟的干预策略。当C为70%及以上,D为70%-80%时,选取出终止的干预策略。当C为任意值,D为80%及以上时,选取出限速的干预策略。
S405:根据所述的干预策略对所述的作业进行调度。在具体的实施方式中,干预策略对应的调度方式如表2所示。
表2
名称 说明
暂停 将当前正在执行的作业进行暂停,让出计算和网络资源
延迟 将某一将要执行的作业进行延迟操作,让出计算和网络资源
终止 终止某一正在进行中的作业,让出计算和网络资源
限速 对某一正在进行或将要进行的作业进行限速,让出网络资源
也即,当β总体风险值越来越大时,风险值随之增大,调度优先级也越高。当风险值超过设定的阈值,需要对作业进行干预。
如上所述,即为本申请提供的一种降低灾备中心系统切换不可用时间的方法,根据研究得出的RPO险模型、灾备业务调度模型为基础,开发了一个进行灾备业务调度的方案,实现了面向多用户多任务的灾备业务管理,按照承诺的RPO要求,为各接入用户提供了有效的服务,保证了灾备业务调度任务。
图5为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备的结构框图,由图5可知,所述的设备包括:
系统不可用时间获取模块101,用于获取灾备中心灾难发生时的系统不可用时间。
业务调度模块确定模块102,用于根据所述的系统不可用时间确定出灾备业务调度模型。图6为业务调度模块确定模块的实施方式一的结构框图,图7为业务调度模块确定模块的实施方式二的结构框图。
作业获取模块103,用于获取灾备中心的业务系统对应的用户的一作业。图18为具体实施例中的灾难业务调度示意图。如在金融数据领域,甲城市设置有H数据中心,乙城市设置有备用的K数据中心。当H数据中心发生灾难时,需要将数据中心从H切换到K,在切换过程中,灾备中心的业务系统对应的一用户有正在进行运行的作业或即将要运行的作业。由图18可知,灾备中心对应多个业务系统,每个业务系统对应多个用户,每个用户对应多个作业。
总体风险值确定模块104,用于根据所述的灾备业务调度模型确定所述作业的总体风险值。
阈值获取模块105,用于获取预先设定的阈值;
调度模块106,用于根据所述阈值以及所述的总体风险值对所述的作业进行调度。
图6为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备中业务调度模块确定模块的实施方式一的结构框图,由图6可知,该模块在实施方式一中具体包括:
RPO风险模型确定单元201,用于分析所述的系统不可用时间,得到恢复点目标RPO风险模型。
在具体的实施方式中,可以得出RPO风险模型为:
Figure PCTCN2017087534-appb-000019
当时α≤0,表示没有任何风险的情况下可以达到RPO要求;
当时0<α≤1,表示在一定风险情况下可以达到RPO要求,α越大且越接近1表示风险越大,反之风险越小;当时α>1,表示不可能达到RPO要求。α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间。
第一业务调度门口确定单元202,用于根据所述的RPO风险模型确定出运行中的作业的灾备业务调度模型。
在具体的实施方式中,对正在执行的灾备事物进行干预,灾备业务调度模型为:
β=α×i1+λ×i2+δ×i3
其中,β为总体风险值,α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间,λ为作业的优先级,δ为作业的剩余时间,i1、i2、i3为加权系数,且i1+i2+i3=1。加权系数可以按实际情况调整。
图7为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备中业务调度模块确定模块的实施方式二的结构框图,由图7可知,该模块在实施方式二中具体包括:
历史风险值确定单元203,用于根据所述的RPO风险模型确定出RPO历史风险值。在实际应用中,RPO风险值α主要用于通过历史数据测算各家单位的平均风险值;
Figure PCTCN2017087534-appb-000020
α平均为作业的历史风险值,Tn为第n次作业持续时间,k为历史作业的个数。
第二业务调度模型确定单元204,用于根据所述的RPO历史风险值确定出未运行的作业的灾备业务调度模型。对将要执行的灾备事务进行干预,灾备业务调度模型为:
β=α平均×i1+λ×i2
其中,β为总体风险值,λ为作业的优先级,α平均为作业的历史风险值,i1、i2为加权系数,且i1+i2=1,TRPO为RPO时间,Tn为第n次作业持续时间,k为历史作业的个数,加权系数可以按实际情况调整。
图8为本申请实施例提供的一种降低灾备中心系统切换不可用时间的设备中业务调度模块确定模块的结构框图,由图8可知,所述的调度模块包括:
判断单元401,用于判断所述的总体风险值是否超出所述的阈值。阈值可预先设定,并根据具体的实施场景改变。
占比获取单元402,用于当判断为是时,获取所述作业的系统占比以及网络资源占比。
干预规则获取单元403,用于获取预先设定的干预规则。在具体的实施方式中,干预规则诸如暂停、延迟、终止、限速等操作。主要干预的方法为通过干预其他作业,将资源让给RPO风险值较高的作业,使其完成任务。当RPO风险值得到控制后,重新恢复被干预的作业。
干预策略选取单元404,用于根据所述的系统占比以及网络资源占比从所述的干预规则中选取出干预策略。对于A用户的B作业,根据灾备业务调度模型计算可得其对应的总体风险值β。当β大于阈值时,获取到的系统占比为C,网络资源占比为D,则根据C、D从干预规则中选取干预策略。具体的,系统占比C和网络资源占比D的值与干预规则的关系,可根据不同的实际情况确定。如,对于总体资源较大的系统而言,即使占用率达到80%以上,仍有充足的可用资源。而对于总体资源较小的系统,可能占用率达到50%时,就已经资源不足以支撑业务需要了。如,当C为50%-60%,D为50%-60时,选取出暂停的干预策略。当C为60%-70%,D为60%-70%时,选取出延迟的干预策略。当C为70%及以上,D为70%-80%时,选取出终止的干预策略。当C为任意值,D为80%及以上时,选取出限速的干预策略。
调度单元405,用于根据所述的干预策略对所述的作业进行调度。在具体的实施方式中,干预策略对应的调度方式如表2所示。
也即,当β总体风险值越来越大时,风险值随之增大,调度优先级也越高。当风险值超过设定的阈值,需要对作业进行干预。
如上所述,即为本申请提供的一种降低灾备中心系统切换不可用时间的设备,根据研究得出的RPO险模型、灾备业务调度模型为基础,开发了一个进行灾备业务调度的方案,实现了面向多用户多任务的灾备业务管理,按照承诺的RPO要求,为各接入用户提供了有效的服务,保证了灾备业务调度任务。
下面结合具体的实施例,详细介绍本申请的技术方案。由于业务调度模型明确,业务调度方法清晰,基于业务调度模型开发的降低灾备中心系统切换不可用时间的设备很快得到应用。在政务外网带宽限制等现实约束条件下,通过业务调度,在保证用户正常生产业务开展的同时实现了多用户、多系统、多任务的灾备恢复点目标。如图18所示。
设备以列表、图示的模式,对各单位各项作业进行标注。对每个用户每个作业根据业务调度模型计算的风险值进行风险测算,分别通过实线、虚线、带有叉号标记的线进行标注:实线表示正常,虚线表示预警,带有叉号表示干预。对于作业的业务调度工作根据业务调度模型开展。业务调度模型同时也提高了灾备系统的总体性能,通过对多个用户的多个任务的合理调度,有效利用了不太宽裕的系统资源。业务调度模型有效保障了生产系统安全运行,为生产系统的有效灾难恢复发挥了决定性的作用。
在本申请一实施例中,还提供一种存储设备,该存储设备存储有多条指令,所述指令适于由处理器加载并执行:
获取灾备中心灾难发生时的系统不可用时间;
根据所述的系统不可用时间确定出灾备业务调度模型;
获取灾备中心的业务系统对应的用户的一作业;
根据所述的灾备业务调度模型确定所述作业的总体风险值;
获取预先设定的阈值;
根据所述阈值以及所述的总体风险值对所述的作业进行调度。
在本申请一实施例中,还提供一种终端,包括:
处理器,适于实现各指令;以及
存储设备,适于存储多条指令,所示指令适于由处理器加载并执行:
获取灾备中心灾难发生时的系统不可用时间;
根据所述的系统不可用时间确定出灾备业务调度模型;
获取灾备中心的业务系统对应的用户的一作业;
根据所述的灾备业务调度模型确定所述作业的总体风险值;
获取预先设定的阈值;
根据所述阈值以及所述的总体风险值对所述的作业进行调度。
综上所述,本申请提出的一种降低灾备中心系统切换不可用时间的方法以及设备,通过对数据中心灾难发生时系统不可用时间的分析,研究得出了RPO风险模型及灾备业务调度模型,并以此为基础计算业务系统对应的用户的作业的总体风险值,并与阈值进行比较进行业务调度,从而降低了灾难发生时系统数据的不可用时间,有效提高了数据中心的灾备能力。
本领域内的技术人员应明白,本申请的实施例可提供为方法、设备或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的 实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本领域技术人员还可以了解到本申请实施例列出的各种功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员可以对于每种特定的应用,可以使用各种方法实现所述的功能,但这种实现不应被理解为超出本申请实施例保护的范围。
本申请中应用了具体实施例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种降低灾备中心系统切换不可用时间的方法,其中,所述的方法包括:
    获取灾备中心灾难发生时的系统不可用时间;
    根据所述的系统不可用时间确定出灾备业务调度模型;
    获取灾备中心的业务系统对应的用户的一作业;
    根据所述的灾备业务调度模型确定所述作业的总体风险值;
    获取预先设定的阈值;
    根据所述阈值以及所述的总体风险值对所述的作业进行调度。
  2. 根据权利要求1所述的方法,其中,根据所述的系统不可用时间确定出灾备业务调度模型包括:
    分析所述的系统不可用时间,得到恢复点目标RPO风险模型;
    根据所述的RPO风险模型确定出运行中的作业的灾备业务调度模型。
  3. 根据权利要求2所述的方法,其中,所述运行中的作业的灾备业务调度模型为:
    β=α×i1+λ×i2+δ×i3
    Figure PCTCN2017087534-appb-100001
    其中,β为总体风险值,α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间,λ为作业的优先级,δ为作业的剩余时间,i1、i2、i3为加权系数,且i1+i2+i3=1。
  4. 根据权利要求1所述的方法,其中,根据所述的系统不可用时间确定出灾备业务调度模型包括:
    分析所述的系统不可用时间,得到恢复点目标RPO风险模型;
    根据所述的RPO风险模型确定出RPO历史风险值;
    根据所述的RPO历史风险值确定出将要运行的作业的灾备业务调度模型。
  5. 根据权利要求4所述的方法,其中,所述将要运行的作业的灾备业务调度模型为:
    β=α平均×i1+λ×i2
    Figure PCTCN2017087534-appb-100002
    其中,β为总体风险值,λ为作业的优先级,α平均为作业的历史风险值,i1、i2为加权系数,且i1+i2=1,TRPO为RPO时间,Tn为第n次作业持续时间,k为历史作业的个数。
  6. 根据权利要求3或5所述的方法,其中,根据所述阈值以及所述的总体风险值对所述的作业进行调度包括:
    判断所述的总体风险值是否超出所述的阈值;
    当判断为是时,获取所述作业的系统占比以及网络资源占比;
    获取预先设定的干预规则;
    根据所述的系统占比以及网络资源占比从所述的干预规则中选取出干预策略;
    根据所述的干预策略对所述的作业进行调度。
  7. 一种降低灾备中心系统切换不可用时间的设备,其特征是,所述的设备包括:
    系统不可用时间获取模块,用于获取灾备中心灾难发生时的系统不可用时间;
    业务调度模块确定模块,用于根据所述的系统不可用时间确定出灾备业务调度模型;
    作业获取模块,用于获取灾备中心的业务系统对应的用户的一作业;
    总体风险值确定模块,用于根据所述的灾备业务调度模型确定所述作业的总体风险值;
    阈值获取模块,用于获取预先设定的阈值;
    调度模块,用于根据所述阈值以及所述的总体风险值对所述的作业进行调度。
  8. 根据权利要求7所述的设备,其特征是,所述的业务调度模块确定模块包括:
    RPO风险模型确定单元,用于分析所述的系统不可用时间,得到恢复点目标RPO风险模型;
    第一业务调度门口确定单元,用于根据所述的RPO风险模型确定出运行中的作业的灾备业务调度模型。
  9. 根据权利要求8所述的设备,其特征是,所述运行中的作业的灾备业务调度模型为:
    β=α×i1+λ×i2+δ×i3
    Figure PCTCN2017087534-appb-100003
    其中,β为总体风险值,α为风险值,TRPO为RPO时间,Tn为第n次作业持续时间,Tm为等待时间,Tn+1为第n+1次作业持续时间,λ为作业的优先级,δ为作业的剩余时间,i1、i2、i3为加权系数,且i1+i2+i3=1。
  10. 根据权利要求8所述的设备,其特征是,所述的业务调度模块确定模块还包括:
    历史风险值确定单元,用于根据所述的RPO风险模型确定出RPO历史风险值;
    第二业务调度模型确定单元,用于根据所述的RPO历史风险值确定出将要运行的作业的灾备业务调度模型。
  11. 根据权利要求10所述的设备,其特征是,所述将要运行的作业的灾备业务调度模型为:
    β=α平均×i1+λ×i2
    Figure PCTCN2017087534-appb-100004
    其中,β为总体风险值,λ为作业的优先级,α平均为作业的历史风险值,i1、i2为加权系数,且i1+i2=1,TRPO为RPO时间,Tn为第n次作业持续时间,k为历史作业的个数。
  12. 根据权利要求9或11所述的设备,其特征是,所述的调度模块包括:
    判断单元,用于判断所述的总体风险值是否超出所述的阈值;
    占比获取单元,用于当所述的判断单元判断为是时,获取所述作业的系统占比以及网络资源占比;
    干预规则获取单元,用于获取预先设定的干预规则;
    干预策略选取单元,用于根据所述的系统占比以及网络资源占比从所述的干预规则中选取出干预策略;
    调度单元,用于根据所述的干预策略对所述的作业进行调度。
  13. 一种存储设备,其中,所述存储设备存储有多条指令,所述指令适于由处理器加载并执行:
    获取灾备中心灾难发生时的系统不可用时间;
    根据所述的系统不可用时间确定出灾备业务调度模型;
    获取灾备中心的业务系统对应的用户的一作业;
    根据所述的灾备业务调度模型确定所述作业的总体风险值;
    获取预先设定的阈值;
    根据所述阈值以及所述的总体风险值对所述的作业进行调度。
  14. 一种终端,其中,所述终端包括:适于实现各指令的处理器以及存储设备,所述存储设备存储有多条指令,所述指令适于由处理器加载并执行:
    获取灾备中心灾难发生时的系统不可用时间;
    根据所述的系统不可用时间确定出灾备业务调度模型;
    获取灾备中心的业务系统对应的用户的一作业;
    根据所述的灾备业务调度模型确定所述作业的总体风险值;
    获取预先设定的阈值;
    根据所述阈值以及所述的总体风险值对所述的作业进行调度。
PCT/CN2017/087534 2016-06-28 2017-06-08 降低灾备中心系统切换不可用时间的方法、设备及终端 WO2018001061A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610485697.5A CN106209422A (zh) 2016-06-28 2016-06-28 降低灾备中心系统切换不可用时间的方法及设备
CN201610485697.5 2016-06-28

Publications (1)

Publication Number Publication Date
WO2018001061A1 true WO2018001061A1 (zh) 2018-01-04

Family

ID=57460908

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/087534 WO2018001061A1 (zh) 2016-06-28 2017-06-08 降低灾备中心系统切换不可用时间的方法、设备及终端

Country Status (2)

Country Link
CN (1) CN106209422A (zh)
WO (1) WO2018001061A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144691A (zh) * 2019-11-27 2020-05-12 广东电力信息科技有限公司 一种灾备调控管理方法及其系统
CN113535469A (zh) * 2021-06-03 2021-10-22 北京思特奇信息技术股份有限公司 一种灾备数据库的切换方法和切换系统
CN113722159A (zh) * 2021-09-09 2021-11-30 中国电信集团系统集成有限责任公司 基于ansible的灾备切换系统
CN115277376A (zh) * 2022-09-29 2022-11-01 深圳华锐分布式技术股份有限公司 灾备切换方法、装置、设备及介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209422A (zh) * 2016-06-28 2016-12-07 中国银联股份有限公司 降低灾备中心系统切换不可用时间的方法及设备
CN109460322B (zh) * 2018-11-14 2021-11-05 西安瑞蓝创软件科技有限公司 基于流程调度引擎技术的灾备切换演练系统及方法
CN112527481B (zh) * 2020-12-02 2023-12-08 中国农业银行股份有限公司 系统的灾备演练计划生成方法及装置
CN112559151B (zh) * 2020-12-19 2024-02-02 黑龙江亿林网络股份有限公司 一种用于灾备恢复的任务分配系统及其使用方法
CN112286733B (zh) * 2020-12-23 2021-04-06 深圳市科力锐科技有限公司 备份数据恢复时间确定方法、装置、设备及存储介质
CN114138348A (zh) * 2021-11-16 2022-03-04 中国电信集团系统集成有限责任公司 业务恢复优先级评估方法及设备、存储介质和产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694677A (zh) * 2012-04-11 2012-09-26 佳都新太科技股份有限公司 基于远程ip解析灾备数据中心的建设新方法
CN103530698A (zh) * 2013-10-09 2014-01-22 北京邮电大学 一种容灾方案最优化选择方法
CN104318486A (zh) * 2014-10-08 2015-01-28 华北电力大学(保定) 一种基于云计算的电力调度数据容灾方法
CN106209422A (zh) * 2016-06-28 2016-12-07 中国银联股份有限公司 降低灾备中心系统切换不可用时间的方法及设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324715B (zh) * 2013-06-20 2017-04-12 交通银行股份有限公司 一种灾备系统可用性检测方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694677A (zh) * 2012-04-11 2012-09-26 佳都新太科技股份有限公司 基于远程ip解析灾备数据中心的建设新方法
CN103530698A (zh) * 2013-10-09 2014-01-22 北京邮电大学 一种容灾方案最优化选择方法
CN104318486A (zh) * 2014-10-08 2015-01-28 华北电力大学(保定) 一种基于云计算的电力调度数据容灾方法
CN106209422A (zh) * 2016-06-28 2016-12-07 中国银联股份有限公司 降低灾备中心系统切换不可用时间的方法及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WU, JUN: "BUSINESS SCHEDULING MODEL OF UNIFIED DISASTER REDUNDANCY CENTRE", COMPUTER APPLICATIONS AND SOFTWARE, vol. 30, no. 12, 31 December 2013 (2013-12-31) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144691A (zh) * 2019-11-27 2020-05-12 广东电力信息科技有限公司 一种灾备调控管理方法及其系统
CN113535469A (zh) * 2021-06-03 2021-10-22 北京思特奇信息技术股份有限公司 一种灾备数据库的切换方法和切换系统
CN113535469B (zh) * 2021-06-03 2024-01-30 北京思特奇信息技术股份有限公司 一种灾备数据库的切换方法和切换系统
CN113722159A (zh) * 2021-09-09 2021-11-30 中国电信集团系统集成有限责任公司 基于ansible的灾备切换系统
CN115277376A (zh) * 2022-09-29 2022-11-01 深圳华锐分布式技术股份有限公司 灾备切换方法、装置、设备及介质
CN115277376B (zh) * 2022-09-29 2022-12-23 深圳华锐分布式技术股份有限公司 灾备切换方法、装置、设备及介质

Also Published As

Publication number Publication date
CN106209422A (zh) 2016-12-07

Similar Documents

Publication Publication Date Title
WO2018001061A1 (zh) 降低灾备中心系统切换不可用时间的方法、设备及终端
WO2021008543A1 (zh) 一种资源调度的方法和电子设备
CN108733509B (zh) 用于在集群系统中备份和恢复数据的方法和系统
WO2021103790A1 (zh) 容器的调度方法、装置和非易失性计算机可读存储介质
US8788864B2 (en) Coordinated approach between middleware application and sub-systems
US9128777B2 (en) Operating and maintaining a cluster of machines
TW201413594A (zh) 多核心裝置以及其多執行緒排程方法
WO2021012510A1 (zh) Cpu使用率自适应调整方法、装置、终端及存储介质
US20180329750A1 (en) Resource management method and system, and computer storage medium
US8677375B2 (en) Selecting executing requests to preempt
CN113608871A (zh) 业务处理方法及装置
Xu et al. Task-cloning algorithms in a MapReduce cluster with competitive performance bounds
AU2007261607B2 (en) Resource-based scheduler
CN110196773B (zh) 统一调度计算资源的多时间尺度安全校核系统及方法
US9612907B2 (en) Power efficient distribution and execution of tasks upon hardware fault with multiple processors
CN108415765B (zh) 任务调度方法、装置及智能终端
CN107329817A (zh) 一种待机备用系统混合划分可靠性感知能耗优化方法
US20080195447A1 (en) System and method for capacity sizing for computer systems
CN116303132A (zh) 一种数据缓存方法、装置、设备以及存储介质
CN115883357A (zh) 一种软负载服务器处理方法、装置、电子设备及介质
CN108429704B (zh) 一种节点资源分配方法及装置
EP3396553B1 (en) Method and device for processing data after restart of node
Chen et al. Task scheduling in real-time industrial scenarios
Murad et al. Priority Based Fair Scheduling: Enhancing Efficiency in Cloud Job Distribution
CN116893893B (zh) 一种虚拟机调度方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17819058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17819058

Country of ref document: EP

Kind code of ref document: A1