CN104965770B

CN104965770B - A kind of central server disaster-tolerant backup method

Info

Publication number: CN104965770B
Application number: CN201510330091.XA
Authority: CN
Inventors: 姚文斌; 刘郑博; 常静坤; 赵辰吟
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2015-06-15
Filing date: 2015-06-15
Publication date: 2018-02-02
Anticipated expiration: 2035-06-15
Also published as: CN104965770A

Abstract

It is an object of the invention to protect the continuity of operation system, there is provided a kind of reliable central server disaster recovery method.This method is directed to the central server being located in good network environment, and the optional equipment that alternative central server is worked is added to alternate list, and optional equipment is connected with central server.After central server delays machine, by the network and hardware performance that calculate optional equipment, the equipment that best performance is selected from alternate list works on instead of central server, ensure the performance of central server, reduce unplanned downtime so that central server delay after machine can recovery system in time operation.

Description

A kind of central server disaster-tolerant backup method

Technical field

The present invention relates to a kind of central server disaster-tolerant backup method.

Background technology

Calamity is to reduce loss caused by disaster, ensure the important measures that continuously run of computer system for technology, be take precautions against natural calamities, The core of mitigation, its purpose are exactly that the continuity of operation system is protected after disaster generation, reduce unplanned delay as best one can The machine time.Different business is had nothing in common with each other to the Tolerance of loss of data and the service recovery time length of requirement, as bank believes System requirements amount of data lost very little even zero-data loss when disaster occurs is ceased, and requires that energy is in time after disaster generation Recover the operation of information system in ground.Therefore, from the point of view of calamity is standby, the central server for running operation system is delayed after machine, and one Kind rational alternate servers selection mechanism can while central server performance is ensured, reduce it is unplanned delay machine when Between so that central server delay after machine can recovery system in time operation.Secondly, according to the server hardware and net of dynamic change Network performance indications, the two performance in the range of sometime is calculated, then optimal device is filtered out using the method for weight distribution, The system normal operation under current allocation optimum is all the time ensured.

The content of the invention

It is an object of the invention to protect the continuity of operation system, there is provided a kind of reliable standby side of central server calamity Method.This method is directed to the central server being located in good network environment, the optional equipment that alternative central server is worked Added to alternate list, optional equipment is connected with central server.After central server delays machine, by calculating optional equipment Network and hardware performance, the equipment that best performance is selected from alternate list works on instead of central server, in guarantee The performance of central server, reduce unplanned downtime so that central server delay after machine can recovery system in time operation.

What the present invention was realized in：

Operation system is run and has set the server of central database to be referred to as central server, in the server net of classification In network, central server is referred to as first nodes, and the equipment being directly connected with central server is two-level node.Each two-level node One local data base, and operational monitoring program on node be set, monitoring program timing detect this node network condition and Hardware information, and the information record of other equipment is received to local data base.Meanwhile monitoring program is to central server and other Equipment sends this nodal information detected.

The module being related in this method is as follows：

Configuration module：Alternately equipment is added to alternate list to the two-level node that alternative central server is worked, Alternate list includes sequence number, device IP two, and the equipment in list has disposed monitoring program, preserves list after addition To central database.

Memory module：Each optional equipment stores a alternate list in the database of oneself.

Detection module：Whether optional equipment timing detection connects with central server：If connection, continue timing and detect； If not connecting, conclude that central server is delayed machine.

Performance calculating module：The monitoring program run on each optional equipment collects data below and calculates current device Energy：

Weight X shared by network performance, weight Y shared by hardware performance；

Latency test：Current device needs to carry out connection test to m two-level node, carries out n test altogether, remembers the I test P₁, P₂... ..., P_mNetwork delay caused by node is D_i1, D_i2... ..., D_im。

1 hardware performance is counted every T time, n times is counted altogether, calculates the average profit of CPU, internal memory and hard disk respectively With rate；

In the calculating to hardware performance, weight A shared by CPU average utilizations, weight B shared by internal memory average utilization, Weight C (A+B+C=1) shared by hard disk average utilization；

By the calculation of performance indicators data processing being collected into be required form after, according to calculating property of below equation Energy：

(1) average delay：

(2) hardware performance：HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C；

(3) combination property：TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%；

Optimal selection module：After the completion of calculating, it is alternative that this device IP and combination property TP are distributed to remaining by monitoring program Equipment, after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select combination property Conduct optimal device minimum index TP, if there is with this equipment performance index identical equipment, then sequence number in alternate list Less is optimal device, and optimal device information is deleted from the alternate list of local data base.

Notification module：If equipment is optimal device, the monitoring program run in equipment starts the business system disposed System, is connected with central database, turns into central server, while deletes the letter of the optimal device in central database alternate list Breath, notifies the change of all two-level node central servers, and two-level node refers to the equipment for directly accessing central server.

Its specific method step is：

(1) alternately equipment is added to alternate list to the two-level node that alternative central server works, alternative row Table includes sequence number, device IP two, and the equipment in list has disposed monitoring program.List is preserved to center after addition Database, each optional equipment store a alternate list in the database of oneself；

(2) whether optional equipment timing detection connects with central server：If connection, perform step (2)；If do not connect It is logical, then conclude that central server is delayed machine, performs step (3)；

(3) average delay：

(4) hardware performance：HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C；

(5) combination property：TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%；

(6) this device IP and combination property TP are distributed to remaining optional equipment by monitoring program；

(7) after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select comprehensive Close the minimum conduct optimal device of performance index TP, if there is with this equipment performance index identical equipment, then alternate list Less middle sequence number is optimal device, and optimal device information is deleted from alternate list；

(8) if equipment is optimal device, the monitoring program in equipment starts the operation system disposed, with middle calculation Connected according to storehouse, turn into central server, perform step (9)；Otherwise, step (10) is performed；

(9) the optimal device information in central database alternate list is deleted, notifies all two-level node central servers Change, two-level node refer to directly access central server equipment, perform step (11)；

Etc. (10) message for the central server change that optimal device to be received is sent, center service is reconnected after reception Device；

(11) central server is completed to replace.

The key of the present invention is how to filter out optimal device.The network performance of computing device is needed for this：By more Secondary calculating and the averaging network time delay of multiple nodes, obtain final average delay；Computing hardware performance again, distribution CPU, internal memory, The weight of hard disk utilization, is calculated hardware performance；Then the two weight of running environment distribution according to needed for system, after calculating Combination property TP is obtained, that selects combination property TP minimums turns into central server, if TP is identical, sequence in alternate list Number less equipment turns into central server.This method can reduce unplanned machine of delaying after central server surprisingly delays machine Time, the operation of timely recovery system, ensure the central server performance after replacing, improve the continuity of business, reduce because The loss that the server machine of delaying is brought, saves man power and material.

The novelty of this method is：

1. from the point of view of calamity is standby, centered on server configuration optional equipment list, after central server delays machine, lead to Cross calculate optional equipment network and hardware performance, selected from alternate list best performance equipment replace central server after Continuous work, ensures the performance of central server, reduces unplanned downtime so that central server surprisingly delay after machine can and The operation of Shi Huifu business, it ensure that the continuity of business.

2. optimal device is filtered out according to network and hardware performance dynamic.According to the network of dynamic change and hardware information meter After calculating the two performance, when can be according to system normal operation required server performance requirement distribute network performance and hardware Weight shared by energy, while can be also the different weight of hardware Distribution Indexes different in hardware performance, so as to filter out optimal set It is standby, ensure the system normal operation under allocation optimum all the time.

Embodiment

Illustrate below in conjunction with the accompanying drawings and the present invention is described in more detail：

The method of the invention is characterised by：

The module being related in this method is as follows：

(1) average delay：

Its specific method step is：

(3) average delay：

(11) central server is completed to replace.

Its specific implementation pattern is such：

Network condition is good, and the server disposed and start operation system is referred to as central server, connects centre data Storehouse, central server are referred to as first nodes, and the equipment being directly connected with central server is two-level node.First, configuration module After alternate list configuration successful, alternate list is stored in the local data base of each equipment by memory module, then detects mould Whether the local data base of block detection device has alternate list, if the operation shape for starting timing inspection center server in the presence of if State, if detecting, central server is delayed machine, and performance evaluation module begins through Weight Value Distributed Methods and calculates this equipment performance, is counted After calculation, optimal selection module selects optimal device as central server, while updates the alternative row of local data base Table, then notification module start the operation system disposed in the equipment, equipment is connected with central database, genuinely convinced in turning into Business device, while the optimal device information in central database alternate list is deleted, notify all two-level node central servers Change, the equipment that two-level node directly accesses central server.

Claims

A kind of 1. central server disaster-tolerant backup method, it is characterised in that：What the present invention was realized in：

Operation system is run and has set the server of central database to be referred to as central server, in the server network of classification In, central server is referred to as first nodes, and the equipment being directly connected with central server is two-level node；Each two-level node is set Put a local data base, and operational monitoring program on node, monitoring program timing detects the network condition of this node and hard Part information, and the information record of other equipment is received to local data base；Meanwhile monitoring program is set to central server with other This nodal information that preparation censorship measures；

The module being related in this method is as follows：

Configuration module：Alternately equipment is added to alternate list to the two-level node that alternative central server is worked, alternatively List includes sequence number, device IP two, and the equipment in list has disposed monitoring program, preserves list into after addition Heart database；

Memory module：Each optional equipment stores a alternate list in the database of oneself；

Detection module：Whether optional equipment timing detection connects with central server：If connection, continue timing and detect；If no Connection, then conclude that central server is delayed machine；

Performance calculating module：The monitoring program run on each optional equipment collects data below and calculates current device performance：

Weight X shared by network performance, weight Y shared by hardware performance；

Latency test：Current device needs to carry out connection test to m two-level node, carries out n test altogether, remembers ith Test P₁, P₂... ..., P_mNetwork delay caused by node is D_i1, D_i2... ..., D_im；

1 hardware performance is counted every T time, n times is counted altogether, calculates the average utilization of CPU, internal memory and hard disk respectively Rate；

In the calculating to hardware performance, weight A shared by CPU average utilizations, weight B, hard disk shared by internal memory average utilization Weight C shared by average utilization, wherein, A+B+C=1；

By the calculation of performance indicators data processing being collected into be required form after, calculate performance according to below equation：

(1) average delay：

(2) hardware performance：HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C；

(3) combination property：TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%；

Optimal selection module：After the completion of calculating, this device IP and combination property TP are distributed to remaining optional equipment by monitoring program, After monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, synthetic performance examination TP is selected Minimum conduct optimal device, if there is with this equipment performance index identical equipment, then sequence number is less in alternate list For optimal device, and optimal device information is deleted from the alternate list of local data base；

Notification module：If equipment is optimal device, the monitoring program run in equipment starts the operation system disposed, with Central database connects, and turns into central server, while deletes the optimal device information in central database alternate list, notifies The change of all two-level node central servers, the equipment that two-level node directly accesses central server；

Its specific method step is：

(1) alternately equipment is added to alternate list, alternate list bag to the two-level node that alternative central server works Sequence number, device IP two are included, the equipment in list has disposed monitoring program；List is preserved to centre data after addition Storehouse, each optional equipment store a alternate list in the database of oneself；

(2) whether optional equipment timing detection connects with central server：If connection, perform step (2)；If not connecting, Conclude that central server is delayed machine, perform step (3)；

(3) average delay：

(4) hardware performance：HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C；

(5) combination property：TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%；

(6) this device IP and combination property TP are distributed to remaining optional equipment by monitoring program；

(7) after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select comprehensive Conduct optimal device that can be minimum index TP, if there is with this equipment performance index identical equipment, then sequence in alternate list Number less is optimal device, and optimal device information is deleted from alternate list；

(8) if equipment is optimal device, the monitoring program in equipment starts the operation system disposed, with central database Connection, turn into central server, perform step (9)；Otherwise, step (10) is performed；

(9) the optimal device information in central database alternate list is deleted, notifies the change of all two-level node central servers More, the equipment that two-level node directly accesses central server, step (11) is performed；

Etc. (10) message for the central server change that optimal device to be received is sent, central server is reconnected after reception；

(11) central server is completed to replace.