CN104965770B - A kind of central server disaster-tolerant backup method - Google Patents

A kind of central server disaster-tolerant backup method Download PDF

Info

Publication number
CN104965770B
CN104965770B CN201510330091.XA CN201510330091A CN104965770B CN 104965770 B CN104965770 B CN 104965770B CN 201510330091 A CN201510330091 A CN 201510330091A CN 104965770 B CN104965770 B CN 104965770B
Authority
CN
China
Prior art keywords
central server
equipment
performance
monitoring program
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510330091.XA
Other languages
Chinese (zh)
Other versions
CN104965770A (en
Inventor
姚文斌
刘郑博
常静坤
赵辰吟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201510330091.XA priority Critical patent/CN104965770B/en
Publication of CN104965770A publication Critical patent/CN104965770A/en
Application granted granted Critical
Publication of CN104965770B publication Critical patent/CN104965770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

It is an object of the invention to protect the continuity of operation system, there is provided a kind of reliable central server disaster recovery method.This method is directed to the central server being located in good network environment, and the optional equipment that alternative central server is worked is added to alternate list, and optional equipment is connected with central server.After central server delays machine, by the network and hardware performance that calculate optional equipment, the equipment that best performance is selected from alternate list works on instead of central server, ensure the performance of central server, reduce unplanned downtime so that central server delay after machine can recovery system in time operation.

Description

A kind of central server disaster-tolerant backup method
Technical field
The present invention relates to a kind of central server disaster-tolerant backup method.
Background technology
Calamity is to reduce loss caused by disaster, ensure the important measures that continuously run of computer system for technology, be take precautions against natural calamities, The core of mitigation, its purpose are exactly that the continuity of operation system is protected after disaster generation, reduce unplanned delay as best one can The machine time.Different business is had nothing in common with each other to the Tolerance of loss of data and the service recovery time length of requirement, as bank believes System requirements amount of data lost very little even zero-data loss when disaster occurs is ceased, and requires that energy is in time after disaster generation Recover the operation of information system in ground.Therefore, from the point of view of calamity is standby, the central server for running operation system is delayed after machine, and one Kind rational alternate servers selection mechanism can while central server performance is ensured, reduce it is unplanned delay machine when Between so that central server delay after machine can recovery system in time operation.Secondly, according to the server hardware and net of dynamic change Network performance indications, the two performance in the range of sometime is calculated, then optimal device is filtered out using the method for weight distribution, The system normal operation under current allocation optimum is all the time ensured.
The content of the invention
It is an object of the invention to protect the continuity of operation system, there is provided a kind of reliable standby side of central server calamity Method.This method is directed to the central server being located in good network environment, the optional equipment that alternative central server is worked Added to alternate list, optional equipment is connected with central server.After central server delays machine, by calculating optional equipment Network and hardware performance, the equipment that best performance is selected from alternate list works on instead of central server, in guarantee The performance of central server, reduce unplanned downtime so that central server delay after machine can recovery system in time operation.
What the present invention was realized in:
Operation system is run and has set the server of central database to be referred to as central server, in the server net of classification In network, central server is referred to as first nodes, and the equipment being directly connected with central server is two-level node.Each two-level node One local data base, and operational monitoring program on node be set, monitoring program timing detect this node network condition and Hardware information, and the information record of other equipment is received to local data base.Meanwhile monitoring program is to central server and other Equipment sends this nodal information detected.
The module being related in this method is as follows:
Configuration module:Alternately equipment is added to alternate list to the two-level node that alternative central server is worked, Alternate list includes sequence number, device IP two, and the equipment in list has disposed monitoring program, preserves list after addition To central database.
Memory module:Each optional equipment stores a alternate list in the database of oneself.
Detection module:Whether optional equipment timing detection connects with central server:If connection, continue timing and detect; If not connecting, conclude that central server is delayed machine.
Performance calculating module:The monitoring program run on each optional equipment collects data below and calculates current device Energy:
Weight X shared by network performance, weight Y shared by hardware performance;
Latency test:Current device needs to carry out connection test to m two-level node, carries out n test altogether, remembers the I test P1, P2... ..., PmNetwork delay caused by node is Di1, Di2... ..., Dim
1 hardware performance is counted every T time, n times is counted altogether, calculates the average profit of CPU, internal memory and hard disk respectively With rate;
In the calculating to hardware performance, weight A shared by CPU average utilizations, weight B shared by internal memory average utilization, Weight C (A+B+C=1) shared by hard disk average utilization;
By the calculation of performance indicators data processing being collected into be required form after, according to calculating property of below equation Energy:
(1) average delay:
(2) hardware performance:HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C;
(3) combination property:TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%;
Optimal selection module:After the completion of calculating, it is alternative that this device IP and combination property TP are distributed to remaining by monitoring program Equipment, after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select combination property Conduct optimal device minimum index TP, if there is with this equipment performance index identical equipment, then sequence number in alternate list Less is optimal device, and optimal device information is deleted from the alternate list of local data base.
Notification module:If equipment is optimal device, the monitoring program run in equipment starts the business system disposed System, is connected with central database, turns into central server, while deletes the letter of the optimal device in central database alternate list Breath, notifies the change of all two-level node central servers, and two-level node refers to the equipment for directly accessing central server.
Its specific method step is:
(1) alternately equipment is added to alternate list to the two-level node that alternative central server works, alternative row Table includes sequence number, device IP two, and the equipment in list has disposed monitoring program.List is preserved to center after addition Database, each optional equipment store a alternate list in the database of oneself;
(2) whether optional equipment timing detection connects with central server:If connection, perform step (2);If do not connect It is logical, then conclude that central server is delayed machine, performs step (3);
(3) average delay:
(4) hardware performance:HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C;
(5) combination property:TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%;
(6) this device IP and combination property TP are distributed to remaining optional equipment by monitoring program;
(7) after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select comprehensive Close the minimum conduct optimal device of performance index TP, if there is with this equipment performance index identical equipment, then alternate list Less middle sequence number is optimal device, and optimal device information is deleted from alternate list;
(8) if equipment is optimal device, the monitoring program in equipment starts the operation system disposed, with middle calculation Connected according to storehouse, turn into central server, perform step (9);Otherwise, step (10) is performed;
(9) the optimal device information in central database alternate list is deleted, notifies all two-level node central servers Change, two-level node refer to directly access central server equipment, perform step (11);
Etc. (10) message for the central server change that optimal device to be received is sent, center service is reconnected after reception Device;
(11) central server is completed to replace.
The key of the present invention is how to filter out optimal device.The network performance of computing device is needed for this:By more Secondary calculating and the averaging network time delay of multiple nodes, obtain final average delay;Computing hardware performance again, distribution CPU, internal memory, The weight of hard disk utilization, is calculated hardware performance;Then the two weight of running environment distribution according to needed for system, after calculating Combination property TP is obtained, that selects combination property TP minimums turns into central server, if TP is identical, sequence in alternate list Number less equipment turns into central server.This method can reduce unplanned machine of delaying after central server surprisingly delays machine Time, the operation of timely recovery system, ensure the central server performance after replacing, improve the continuity of business, reduce because The loss that the server machine of delaying is brought, saves man power and material.
The novelty of this method is:
1. from the point of view of calamity is standby, centered on server configuration optional equipment list, after central server delays machine, lead to Cross calculate optional equipment network and hardware performance, selected from alternate list best performance equipment replace central server after Continuous work, ensures the performance of central server, reduces unplanned downtime so that central server surprisingly delay after machine can and The operation of Shi Huifu business, it ensure that the continuity of business.
2. optimal device is filtered out according to network and hardware performance dynamic.According to the network of dynamic change and hardware information meter After calculating the two performance, when can be according to system normal operation required server performance requirement distribute network performance and hardware Weight shared by energy, while can be also the different weight of hardware Distribution Indexes different in hardware performance, so as to filter out optimal set It is standby, ensure the system normal operation under allocation optimum all the time.
Embodiment
Illustrate below in conjunction with the accompanying drawings and the present invention is described in more detail:
The method of the invention is characterised by:
Operation system is run and has set the server of central database to be referred to as central server, in the server net of classification In network, central server is referred to as first nodes, and the equipment being directly connected with central server is two-level node.Each two-level node One local data base, and operational monitoring program on node be set, monitoring program timing detect this node network condition and Hardware information, and the information record of other equipment is received to local data base.Meanwhile monitoring program is to central server and other Equipment sends this nodal information detected.
The module being related in this method is as follows:
Configuration module:Alternately equipment is added to alternate list to the two-level node that alternative central server is worked, Alternate list includes sequence number, device IP two, and the equipment in list has disposed monitoring program, preserves list after addition To central database.
Memory module:Each optional equipment stores a alternate list in the database of oneself.
Detection module:Whether optional equipment timing detection connects with central server:If connection, continue timing and detect; If not connecting, conclude that central server is delayed machine.
Performance calculating module:The monitoring program run on each optional equipment collects data below and calculates current device Energy:
Weight X shared by network performance, weight Y shared by hardware performance;
Latency test:Current device needs to carry out connection test to m two-level node, carries out n test altogether, remembers the I test P1, P2... ..., PmNetwork delay caused by node is Di1, Di2... ..., Dim
1 hardware performance is counted every T time, n times is counted altogether, calculates the average profit of CPU, internal memory and hard disk respectively With rate;
In the calculating to hardware performance, weight A shared by CPU average utilizations, weight B shared by internal memory average utilization, Weight C (A+B+C=1) shared by hard disk average utilization;
By the calculation of performance indicators data processing being collected into be required form after, according to calculating property of below equation Energy:
(1) average delay:
(2) hardware performance:HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C;
(3) combination property:TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%;
Optimal selection module:After the completion of calculating, it is alternative that this device IP and combination property TP are distributed to remaining by monitoring program Equipment, after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select combination property Conduct optimal device minimum index TP, if there is with this equipment performance index identical equipment, then sequence number in alternate list Less is optimal device, and optimal device information is deleted from the alternate list of local data base.
Notification module:If equipment is optimal device, the monitoring program run in equipment starts the business system disposed System, is connected with central database, turns into central server, while deletes the letter of the optimal device in central database alternate list Breath, notifies the change of all two-level node central servers, and two-level node refers to the equipment for directly accessing central server.
Its specific method step is:
(1) alternately equipment is added to alternate list to the two-level node that alternative central server works, alternative row Table includes sequence number, device IP two, and the equipment in list has disposed monitoring program.List is preserved to center after addition Database, each optional equipment store a alternate list in the database of oneself;
(2) whether optional equipment timing detection connects with central server:If connection, perform step (2);If do not connect It is logical, then conclude that central server is delayed machine, performs step (3);
(3) average delay:
(4) hardware performance:HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C;
(5) combination property:TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%;
(6) this device IP and combination property TP are distributed to remaining optional equipment by monitoring program;
(7) after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select comprehensive Close the minimum conduct optimal device of performance index TP, if there is with this equipment performance index identical equipment, then alternate list Less middle sequence number is optimal device, and optimal device information is deleted from alternate list;
(8) if equipment is optimal device, the monitoring program in equipment starts the operation system disposed, with middle calculation Connected according to storehouse, turn into central server, perform step (9);Otherwise, step (10) is performed;
(9) the optimal device information in central database alternate list is deleted, notifies all two-level node central servers Change, two-level node refer to directly access central server equipment, perform step (11);
Etc. (10) message for the central server change that optimal device to be received is sent, center service is reconnected after reception Device;
(11) central server is completed to replace.
Its specific implementation pattern is such:
Network condition is good, and the server disposed and start operation system is referred to as central server, connects centre data Storehouse, central server are referred to as first nodes, and the equipment being directly connected with central server is two-level node.First, configuration module After alternate list configuration successful, alternate list is stored in the local data base of each equipment by memory module, then detects mould Whether the local data base of block detection device has alternate list, if the operation shape for starting timing inspection center server in the presence of if State, if detecting, central server is delayed machine, and performance evaluation module begins through Weight Value Distributed Methods and calculates this equipment performance, is counted After calculation, optimal selection module selects optimal device as central server, while updates the alternative row of local data base Table, then notification module start the operation system disposed in the equipment, equipment is connected with central database, genuinely convinced in turning into Business device, while the optimal device information in central database alternate list is deleted, notify all two-level node central servers Change, the equipment that two-level node directly accesses central server.

Claims (1)

  1. A kind of 1. central server disaster-tolerant backup method, it is characterised in that:What the present invention was realized in:
    Operation system is run and has set the server of central database to be referred to as central server, in the server network of classification In, central server is referred to as first nodes, and the equipment being directly connected with central server is two-level node;Each two-level node is set Put a local data base, and operational monitoring program on node, monitoring program timing detects the network condition of this node and hard Part information, and the information record of other equipment is received to local data base;Meanwhile monitoring program is set to central server with other This nodal information that preparation censorship measures;
    The module being related in this method is as follows:
    Configuration module:Alternately equipment is added to alternate list to the two-level node that alternative central server is worked, alternatively List includes sequence number, device IP two, and the equipment in list has disposed monitoring program, preserves list into after addition Heart database;
    Memory module:Each optional equipment stores a alternate list in the database of oneself;
    Detection module:Whether optional equipment timing detection connects with central server:If connection, continue timing and detect;If no Connection, then conclude that central server is delayed machine;
    Performance calculating module:The monitoring program run on each optional equipment collects data below and calculates current device performance:
    Weight X shared by network performance, weight Y shared by hardware performance;
    Latency test:Current device needs to carry out connection test to m two-level node, carries out n test altogether, remembers ith Test P1, P2... ..., PmNetwork delay caused by node is Di1, Di2... ..., Dim
    1 hardware performance is counted every T time, n times is counted altogether, calculates the average utilization of CPU, internal memory and hard disk respectively Rate;
    In the calculating to hardware performance, weight A shared by CPU average utilizations, weight B, hard disk shared by internal memory average utilization Weight C shared by average utilization, wherein, A+B+C=1;
    By the calculation of performance indicators data processing being collected into be required form after, calculate performance according to below equation:
    (1) average delay:
    (2) hardware performance:HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C;
    (3) combination property:TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%;
    Optimal selection module:After the completion of calculating, this device IP and combination property TP are distributed to remaining optional equipment by monitoring program, After monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, synthetic performance examination TP is selected Minimum conduct optimal device, if there is with this equipment performance index identical equipment, then sequence number is less in alternate list For optimal device, and optimal device information is deleted from the alternate list of local data base;
    Notification module:If equipment is optimal device, the monitoring program run in equipment starts the operation system disposed, with Central database connects, and turns into central server, while deletes the optimal device information in central database alternate list, notifies The change of all two-level node central servers, the equipment that two-level node directly accesses central server;
    Its specific method step is:
    (1) alternately equipment is added to alternate list, alternate list bag to the two-level node that alternative central server works Sequence number, device IP two are included, the equipment in list has disposed monitoring program;List is preserved to centre data after addition Storehouse, each optional equipment store a alternate list in the database of oneself;
    (2) whether optional equipment timing detection connects with central server:If connection, perform step (2);If not connecting, Conclude that central server is delayed machine, perform step (3);
    (3) average delay:
    (4) hardware performance:HP=CPU utilization rates × A+ memory usages × B+ hard disk utilizations × C;
    (5) combination property:TP=(average delay T × network weight X+ hardware performances HP × hardware weight Y) × 100%;
    (6) this device IP and combination property TP are distributed to remaining optional equipment by monitoring program;
    (7) after the monitoring program on each optional equipment acknowledges receipt of the information of remaining all optional equipment, select comprehensive Conduct optimal device that can be minimum index TP, if there is with this equipment performance index identical equipment, then sequence in alternate list Number less is optimal device, and optimal device information is deleted from alternate list;
    (8) if equipment is optimal device, the monitoring program in equipment starts the operation system disposed, with central database Connection, turn into central server, perform step (9);Otherwise, step (10) is performed;
    (9) the optimal device information in central database alternate list is deleted, notifies the change of all two-level node central servers More, the equipment that two-level node directly accesses central server, step (11) is performed;
    Etc. (10) message for the central server change that optimal device to be received is sent, central server is reconnected after reception;
    (11) central server is completed to replace.
CN201510330091.XA 2015-06-15 2015-06-15 A kind of central server disaster-tolerant backup method Active CN104965770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510330091.XA CN104965770B (en) 2015-06-15 2015-06-15 A kind of central server disaster-tolerant backup method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510330091.XA CN104965770B (en) 2015-06-15 2015-06-15 A kind of central server disaster-tolerant backup method

Publications (2)

Publication Number Publication Date
CN104965770A CN104965770A (en) 2015-10-07
CN104965770B true CN104965770B (en) 2018-02-02

Family

ID=54219805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510330091.XA Active CN104965770B (en) 2015-06-15 2015-06-15 A kind of central server disaster-tolerant backup method

Country Status (1)

Country Link
CN (1) CN104965770B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105763386A (en) * 2016-05-13 2016-07-13 中国工商银行股份有限公司 Service processing system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309167A (en) * 2008-06-27 2008-11-19 华中科技大学 Disaster allowable system and method based on cluster backup
CN201571075U (en) * 2009-07-22 2010-09-01 马涛 Intelligent disaster recovery system
CN102117231A (en) * 2009-12-30 2011-07-06 上海文广互动电视有限公司 Distributed data backup and disaster tolerance system and method
CN103853634A (en) * 2014-02-26 2014-06-11 北京优炫软件股份有限公司 Disaster recovery system and disaster recovery method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012789B (en) * 2009-09-07 2014-03-12 云端容灾有限公司 Centralized management type backup and disaster recovery system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309167A (en) * 2008-06-27 2008-11-19 华中科技大学 Disaster allowable system and method based on cluster backup
CN201571075U (en) * 2009-07-22 2010-09-01 马涛 Intelligent disaster recovery system
CN102117231A (en) * 2009-12-30 2011-07-06 上海文广互动电视有限公司 Distributed data backup and disaster tolerance system and method
CN103853634A (en) * 2014-02-26 2014-06-11 北京优炫软件股份有限公司 Disaster recovery system and disaster recovery method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《云灾备中系统级管理技术的关键问题》;姚文斌等;《中兴通讯技术》;20121231;第18卷(第6期);22-25 *
《数据中心业务连续性保障技术的探讨》;辛阳;《数据中心业务连续性保障技术的探讨》;20141031;22-23 *

Also Published As

Publication number Publication date
CN104965770A (en) 2015-10-07

Similar Documents

Publication Publication Date Title
CN106470133B (en) System pressure testing method and device
CN102694868B (en) A kind of group system realizes and task dynamic allocation method
CN108809757B (en) System alarm method, storage medium and server
RU2017111477A (en) Methods and systems for determining non-standard user activity
CN106878473A (en) A kind of message treatment method, server cluster and system
CN103067297B (en) A kind of dynamic load balancing method based on resource consumption prediction and device
US10740198B2 (en) Parallel partial repair of storage
CN106407052B (en) A kind of method and device detecting disk
CN101707632A (en) Method for dynamically monitoring performance of server cluster and alarming real-timely
CN104778111A (en) Alarm method and alarm device
CN107769943A (en) A kind of method and apparatus of active and standby cluster switching
Zhou et al. FTCloudSim: a simulation tool for cloud service reliability enhancement mechanisms
CN106656682A (en) Method, system and device for detecting cluster heartbeat
WO2018125628A1 (en) A network monitor and method for event based prediction of radio network outages and their root cause
CN106686099A (en) Method of realizing active-active mode across machine rooms of OracleRAC database based on infiniband network
CN103634167B (en) Security configuration check method and system for target hosts in cloud environment
US9639445B2 (en) System and method for comprehensive performance and availability tracking using passive monitoring and intelligent synthetic activity generation for monitoring a system
CN104965770B (en) A kind of central server disaster-tolerant backup method
CN103428249A (en) Collecting method and processing method for HTTP request packet, system and server
CN107818106B (en) Big data offline calculation data quality verification method and device
CN106909436A (en) Produce the method and system of the dependency relation of virtual machine message queue application program
CN106993027B (en) Remote data storage location verification method
Schörgenhumer et al. Can We Predict Performance Events with Time Series Data from Monitoring Multiple Systems?
Yu et al. Design and architecture of dell acceleration appliances for database (DAAD): A practical approach with high availability guaranteed
Liao et al. Partial replication of metadata to achieve high metadata availability in parallel file systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant