WO2012012962A1 - 容灾业务系统及容灾方法 - Google Patents

容灾业务系统及容灾方法 Download PDF

Info

Publication number
WO2012012962A1
WO2012012962A1 PCT/CN2010/076969 CN2010076969W WO2012012962A1 WO 2012012962 A1 WO2012012962 A1 WO 2012012962A1 CN 2010076969 W CN2010076969 W CN 2010076969W WO 2012012962 A1 WO2012012962 A1 WO 2012012962A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
server
disaster recovery
module
disaster
Prior art date
Application number
PCT/CN2010/076969
Other languages
English (en)
French (fr)
Inventor
张超
王慧
赵庆春
王巍
施健
张道平
张玲东
孙雷
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP10855190.4A priority Critical patent/EP2600565B1/en
Publication of WO2012012962A1 publication Critical patent/WO2012012962A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2035Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2048Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share neither address space nor persistent storage

Definitions

  • the present invention relates to the field of communications, and in particular to a disaster tolerant service system and a disaster tolerance method.
  • BACKGROUND In the field of telecommunications, most operators' service platforms are in an independent and separate construction mode. Each service platform needs to construct separate storage modules, external interfaces, operation and maintenance units, and billing units for each service. .
  • operators In order to avoid the duplication of the above-mentioned public modules and avoid duplication of investment, at this stage, whether facing the construction of new services or the expansion or replacement of existing services, operators will aim at maximizing the maximum investment.
  • the multi-service integration unified platform 'I' and 'I' can achieve this goal.
  • the multi-service convergence unified platform can reduce various costs and improve resource utilization while ensuring product stability and reliability.
  • the current multi-service convergence unified platform adopts a one-to-one disaster recovery mode after the failure of its service, that is, multiple disaster recovery servers are used, and each disaster recovery server is only for a specific service.
  • Disaster recovery mode for disaster recovery In this way, the disaster recovery mode cannot guarantee sufficient disaster tolerance for each type of service, so its reliability is insufficient.
  • the use of multiple disaster recovery servers in the disaster recovery mode greatly increases the cost of equipment investment.
  • the present invention has been made in view of the problem that the disaster tolerant service system in the related art uses a one-to-one disaster recovery mode and cannot guarantee sufficient disaster tolerance for each type of service.
  • the main object of the present invention is to provide A disaster recovery business system and disaster recovery method to solve the above problems.
  • the disaster recovery service system includes: a fault detection module, configured to detect whether a service server and/or a service running a service is in a fault state; and a server management module, configured to determine, when the detection result is yes, an alternative service server The disaster recovery server; the service loading and unloading module, which is used to install services on the disaster recovery server; and the service operation module, which is used to run services on the disaster recovery server.
  • the server management module includes: an acquisition submodule, configured to obtain an idle disaster recovery server; and a determination submodule, configured to determine a disaster recovery server for replacing the service server according to the performance of the idle disaster recovery server.
  • the disaster recovery service system further includes: a storage module, configured to store status information of the service server and service information of the service; where the status information includes at least one of the following: status information used to indicate that the service server is running normally, The status information indicating that the service server is faulty, and the status information indicating that the service server is idle; the service information includes at least one of the following: a service node of the service, a module number of the service, a service type of the service, a version of the service, and a directory of the service .
  • the disaster recovery service system further includes: a status display module, configured to display status information and service information; and a status alarm module, configured to generate an alarm when the detection result is yes.
  • the service loading and unloading module is further configured to uninstall the service on the service server in a fault state.
  • a disaster tolerance method includes: detecting whether a service server in the disaster tolerant service system and/or a service running on the service server is in a fault state; if the detection result is yes, determining a disaster recovery server for replacing the service server; Install services on the DR server. Run services on the DR server. Further, determining the disaster recovery server for the service server includes: obtaining an idle disaster recovery server in the disaster recovery service system; and determining a disaster recovery server for replacing the service server according to the performance of the idle disaster recovery server.
  • the method further includes: storing state information of the service server and service information of the service; wherein, the status information And at least one of the following: the status information used to indicate that the service server is running normally, the status information used to indicate that the service server is running, the status information used to indicate that the service server is idle, and the service information includes at least one of the following: a service node of the service , the module number of the service, the business type of the business, the version of the business, the directory of the business.
  • the method further includes: displaying status information and service information. Further, after detecting whether the service server in the disaster tolerant service system and/or the service running on the service server is in a fault state, the method further includes: when the detection result is yes, generating an alarm.
  • the service is installed on the disaster recovery server, and the scope of the service that the disaster recovery server can be disaster-tolerant is expanded, and the one-to-one disaster recovery mode of the disaster recovery service system in the related technology is solved, and the one-to-one disaster recovery mode cannot be guaranteed.
  • FIG. 1 is a block diagram of a disaster tolerant service system in accordance with an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a disaster tolerant service system in accordance with a preferred embodiment of the present invention
  • Flowchart of the method Figure 4 is an interactive flow chart of the disaster tolerance method according to the preferred embodiment of the present invention.
  • BEST MODE FOR CARRYING OUT THE INVENTION BEST MODE FOR CARRYING OUT THE INVENTION
  • FIG. 1 is a structural block diagram of a disaster tolerant service system according to an embodiment of the present invention.
  • the system includes a fault detection module 11, a server management module 12, a service loading and unloading module 13, and a service running module 14.
  • the fault detection module 11 is configured to detect whether the service server running the service and/or the service is in a fault state;
  • the server management module 12 is connected to the fault detection module 11 for detecting when the fault detection module 11 is YES. And determining a disaster recovery server for replacing the service server;
  • the service loading and unloading module 13 is connected to the fault detection module 11 and the server management module 12, and is configured to install a service on the disaster recovery server determined by the server management module 12, where the service is fault detection.
  • the module 11 detects that the service running on the server in the fault state and/or the fault detection module 11 detects that it is in
  • the service in the fault state is connected to the service loading and unloading module 13 for running the service on the disaster recovery server after the service loading and unloading module 13 installs the service.
  • the disaster recovery server only performs disaster tolerance for a specific service.
  • the service loading and unloading module 13 is installed on the disaster recovery server to expand the scope of the service that the disaster recovery server can tolerate, thereby improving the reliability of the service disaster recovery and reducing the investment of the disaster recovery service system. cost.
  • the server management module 12 includes an acquisition sub-module 121 and a determination sub-module 122. The structure is described in detail below.
  • the obtaining sub-module 121 is connected to the fault detecting module 11 for obtaining an idle disaster-tolerant server when the detection result of the fault detecting module 11 is YES.
  • the determining sub-module 122 is connected to the obtaining sub-module 121 for obtaining according to the obtaining sub-module
  • the performance of the idle disaster recovery server acquired by the module 121 determines the disaster recovery server used to replace the service server. In the preferred embodiment, if the acquisition sub-module 121 acquires multiple idle disaster recovery servers, the best performance of all the idle disaster recovery servers is selected as the selected disaster recovery server. In this way, it can provide better disaster recovery services for faulty service servers, thereby improving the reliability of disaster recovery.
  • the disaster recovery server is used as the selected disaster recovery server. If the acquisition sub-module 121 does not obtain an idle disaster recovery server, the fault detection module 11 sends an alarm message to the status alarm module 17 to indicate that there is currently no idle disaster recovery server available for selection.
  • the disaster recovery service system further includes a storage module 15, a status display module 16, and a status alarm module 17. The structure is described in detail below.
  • the storage module 15 is configured to store the status information of the service server and the service information of the service, where the status information includes at least one of the following: status information indicating that the service server is running normally, status information indicating that the service server is faulty, The status information used to indicate that the service server is idle; the service information includes at least one of the following: a service node of the service, a module number of the service, a service type of the service, a version of the service, and a directory of the service.
  • the status display module 16 is connected to the storage module 15 for displaying status information and service information stored by the storage module 15.
  • the status alarm module 17 is connected to the fault detection module 11 for generating an alarm when the detection result of the fault detection module 11 is YES.
  • the status information stored by the storage module 15 and the service information of the service can be used for displaying the status display module 16, thereby providing the user with intuitive prompt information, so that the user can manage the disaster tolerant service system.
  • the status alarm module 17 generates an alarm, which can give the user a prompt message for the user to handle the fault of the disaster recovery service system.
  • the service handling module 13 is further configured to offload services on the service server in a fault state. In the preferred embodiment, the service server can be restored to the normal idle state by uninstalling the service on the service server in the fault state.
  • the service server is used as a new disaster recovery server to implement the reuse of the disaster recovery server, thereby further improving the reliability of the service disaster recovery and reducing the cost of the disaster recovery service system.
  • the present invention also provides a preferred embodiment, which incorporates the technical solutions of the above-mentioned plurality of preferred embodiments, which will be described in detail below with reference to FIG.
  • FIG. 2 is a schematic diagram of a disaster tolerant service system according to a preferred embodiment of the present invention, including an operation and maintenance management module 21 (corresponding to the status display module and the status alarm module), a device management database 22 (corresponding to the storage module), and disaster tolerance Module 23 (corresponding to the above-described fault detection module), computer resource management center 24 (corresponding to the above-mentioned server management module), automatic deployment module 25 (corresponding to the above-mentioned service loading and unloading module), which will be described in detail below.
  • the operation and maintenance management module 21 is configured to display the status of each host and the service, and has related functions such as alarms after the device fails.
  • the device management database 22 is configured to store information of the device.
  • the disaster recovery module 23 is configured to detect the device and the service status.
  • FIG. 3 is a flowchart of a disaster tolerance method according to an embodiment of the present invention, including the following steps S302 to S308.
  • Step S302 Detect whether the service server in the disaster tolerant service system and/or the service running on the service server is in a fault state.
  • Step S304 if the detection result is yes, determine a disaster recovery server for replacing the service server.
  • Step S306 installing services on the disaster recovery server.
  • Step S308 running a service on the disaster recovery server.
  • the disaster recovery server only performs disaster tolerance for a specific service.
  • the scope of the service that the disaster recovery server can be disaster-tolerant is expanded, thereby improving the reliability of the service disaster recovery and reducing the cost of the disaster recovery service system.
  • determining the disaster recovery server for the service server includes: obtaining an idle disaster recovery server in the disaster recovery service system; and determining a disaster recovery server for replacing the service server according to the performance of the idle disaster recovery server. In the preferred embodiment, if multiple idle disaster recovery servers are obtained, the best performance among all the idle disaster recovery servers is selected as the selected disaster recovery server.
  • the disaster recovery server is selected as the disaster recovery server. If no idle disaster recovery server is obtained, an alarm message will be sent to indicate that there is no idle disaster recovery server currently available for selection.
  • the status information includes at least one of the following: status information indicating that the service server is running normally, status information indicating that the service server is faulty, and status information indicating that the service server is idle; the service information includes at least one of the following: Node, service module number, business type of service, business version, business directory.
  • the status information and the service information are displayed before detecting whether the service server in the disaster tolerant service system and/or the service running on the service server is in a fault state.
  • an alarm is generated.
  • the stored status information and the service information of the service can be used for displaying the status, so as to provide the user with intuitive prompt information, so that the user can manage the disaster tolerant service system.
  • the alarm is generated to give the user a prompt message so that the user can handle the fault of the disaster recovery service system.
  • Step S400 each module runs normally, the device is powered on, the system software such as the operating system and the device management database is installed, and the basic network is configured.
  • Each host is configured with a management IP and an administrative account password (usually a root password); Installed.
  • the disaster recovery center is set to automatic disaster recovery mode.
  • the disaster recovery center interacts with heartbeat messages between service servers to determine whether each service server is in a normal state.
  • Step S402 The disaster recovery module performs heartbeat detection with each service server by using a secure protocol (Secure Shell, referred to as SSH), and the general heartbeat time is 10 seconds/time (can be set).
  • SSH Secure Shell
  • the DR module queries the interval within the set interval. When the query returns a failure, the query will be performed again according to the interval. After the query is performed 3 times (can be set), the status of the service server is abnormal.
  • the disaster recovery module sends a service server logout request to the message interface module.
  • Step S404 After receiving the service server logout request, the message interface module sends a service server state change request to the computer resource management center, and carries the power-off identifier therein.
  • step S406 the computer resource management center changes the state of the service server and powers off. If the current business server is only having a business problem, the business server status is good. The service uninstallation and IP information deletion and other related operations are performed on the service server. If there is a problem with the service server, the related delete operation will be performed before the service is reloaded after the next time the server is restored.
  • Step S408 After performing the related operations, the computer resource management center sends a service server status change response to the message interface module, where the power-off response is carried.
  • Step S410 After receiving the response from the computer resource management center, the message interface module sends a service server logout response to the disaster recovery module.
  • Step S412 After the disaster recovery module receives the service server logout response, the device management data is performed. Library operations, and delete related business information (such as business nodes, module numbers, etc.).
  • Step S414 The disaster tolerance module sends a stop service notification request to the operation and maintenance management module, and is used to notify the operation and maintenance management module to display the service stop corresponding to the related service information on the page.
  • Step S416 After receiving the notification of the disaster recovery module, the operation and maintenance management module changes the state of the service to stop on the page, and sends a stop service notification response to the disaster recovery module after the change is completed.
  • Step S418 After receiving the stop service notification response, the disaster tolerance module deletes the information of the service server in the device management database.
  • the service server logout result notification request is sent to the operation and maintenance management module.
  • the operation and maintenance management module changes the status of the service server to a fault state, and sends a service server fault result notification response to the disaster recovery module.
  • Step S422 After the above steps are performed, the disaster recovery module performs the state detection of the disaster recovery server. If there is no idle disaster recovery server, the system sends an alarm message to the operation and maintenance management module (the current disaster recovery server is not available). If there are multiple idle disaster recovery servers, the disaster recovery module will judge the best performance of the disaster recovery server. If there is only one idle disaster recovery server, the disaster recovery processing is performed only on the disaster recovery server.
  • the disaster recovery module After the idle disaster recovery server is selected, the disaster recovery module sends a service load request to the message interface module, and the message interface module sends a disaster recovery server request of the idle disaster recovery server to the computer resource management center.
  • Step S424 After the computer resource management center finds the current idle disaster recovery server, it returns a response request to the disaster recovery server to the message interface module, and the message interface module sends a service loading response to the disaster recovery module.
  • Step S426 After receiving the service loading response, the disaster tolerance module performs module number multiplexing. Then, the disaster recovery module sends the reason module number and service load request (management IP, logical IP, module number, service type, version, and directory) to the auto-deployment module.
  • step S428 the automatic deployment module uploads the version on the idle disaster recovery server according to the service loading request, executes the installation script, automatically starts the script, and returns to the deployment success response after the success, and executes step S430. If the deployment fails due to the disaster recovery server, the 4 bar service and related files will be deleted on the failed disaster recovery server, and the deployment failure response will be returned, and then automatically The deployment module returns a service load failure response to the disaster recovery module, and the disaster recovery module continues to select the idle disaster recovery server from step S420.
  • Step S430 The automatic deployment module returns a service load success response to the disaster recovery module, and the disaster tolerance module performs logical device and physical device storage. At the same time, the loading service result notification request is sent to the operation and maintenance management module.
  • Step S432 after receiving the loading service result notification request, the operation and maintenance management module performs related display on the page. At the same time, the return service result notification response is sent to the disaster recovery module. Step S434, generating a configuration file, and automatically synchronizing to all nodes.
  • the service is installed on the disaster recovery server, and the range of services that the disaster recovery server can tolerate is expanded, and the disaster recovery service system in the related technology is used.
  • the disaster recovery mode of the first one cannot guarantee the problem of adequate disaster tolerance for each type of service, and improves the reliability of disaster recovery.
  • the disaster recovery mode of the present invention can make disaster tolerance unnecessary for a specific environment, and any service is Disaster recovery can be performed on any server and other environments, which greatly reduces the cost of the disaster recovery business system.
  • the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)
  • Hardware Redundancy (AREA)

Description

容灾业务系统及容灾方法 技术领域 本发明涉及通信领域, 具体而言, 涉及一种容灾业务系统及容灾方法。 背景技术 电信领域中, 大多数运营商的业务平台处于独立、 分离的建设模式, 每 个业务平台都需要为每个业务建设单独的存储模块、 对外接口、 操作维护单 元、 计费单元等公共模块。 为了避免上述公共模块的重复建设从而避免重复投资, 现阶段无论是面 对新业务的建设, 还是面对已有业务的扩容或替换, 运营商都会以最小的投 入产生最大的收益为目标, 而多业务融合统一平台' I"合' I"合可以实现该目标。 多业务融合统一平台可以在保证产品稳定性和可靠性的前提下, 降低各 种成本并提高资源利用率。 但是, 当前的多业务融合统一平台在其业务发生 故障后, 都是釆用一对一的容灾模式, 即, 釆用多台容灾服务器, 其中每台 容灾服务器仅仅对特定某种业务进行容灾的容灾模式。 这样, 由于该容灾模 式不能保证对每种业务具备足够的容灾能力, 因此其可靠性不足; 并且, 该 容灾模式中多台容灾服务器的使用大大增加了设备投入的成本。 发明内容 针对相关技术中的容灾业务系统釆用一对一的容灾模式, 不能保证对每 种业务具备足够的容灾能力的问题而提出本发明, 为此, 本发明的主要目的 在于提供一种容灾业务系统及容灾方法, 以解决上述问题。 为了实现上述目的,根据本发明的一个方面,提供了一种容灾业务系统。 根据本发明的容灾业务系统包括: 故障检测模块, 用于检测运行业务的 业务服务器和 /或业务是否处于故障状态; 服务器管理模块, 用于在检测结果 为是时, 确定用于替代业务服务器的容灾服务器; 业务装卸模块, 用于在容 灾服务器上安装业务; 业务运行模块, 用于在容灾服务器上运行业务。 进一步地, 服务器管理模块包括: 获取子模块, 用于获取空闲的容灾服 务器; 确定子模块, 用于根据空闲的容灾服务器的性能, 确定用于替代业务 服务器的容灾服务器。 进一步地, 上述容灾业务系统还包括: 存储模块, 用于存储业务服务器 的状态信息和业务的业务信息; 其中, 状态信息包括以下至少之一: 用于指 示业务服务器运行正常的状态信息、 用于指示业务服务器运行故障的状态信 息、 用于指示业务服务器空闲的状态信息; 业务信息包括以下至少之一: 业 务的业务节点、 业务的模块号、 业务的业务类型、 业务的版本、 业务的目录。 进一步地, 上述容灾业务系统还包括: 状态显示模块, 用于显示状态信 息和业务信息; 状态告警模块, 用于在检测结果为是时, 产生告警。 进一步地, 业务装卸模块还用于在处于故障状态的业务服务器上卸载业 务。 为了实现上述目的, 居本发明的另一个方面, 提供了一种容灾方法。 根据本发明的容灾方法包括:检测容灾业务系统中的业务服务器和 /或运 行于业务服务器上的业务是否处于故障状态; 如果检测结果为是, 确定用于 替代业务服务器的容灾服务器; 在容灾服务器上安装业务; 在容灾服务器上 运行业务。 进一步地, 确定用于替代业务服务器的容灾服务器包括: 获取容灾业务 系统中的空闲的容灾服务器; 根据空闲的容灾服务器的性能, 确定用于替代 业务服务器的容灾服务器。 进一步地,在检测容灾业务系统中的业务服务器和 /或运行于业务服务器 上的业务是否处于故障状态之前, 上述方法还包括: 存储业务服务器的状态 信息和业务的业务信息; 其中, 状态信息包括以下至少之一: 用于指示业务 服务器运行正常的状态信息、 用于指示业务服务器运行故障的状态信息、 用 于指示业务服务器空闲的状态信息; 业务信息包括以下至少之一: 业务的业 务节点、 业务的模块号、 业务的业务类型、 业务的版本、 业务的目录。 进一步地,在检测容灾业务系统中的业务服务器和 /或运行于业务服务器 上的业务是否处于故障状态之前, 上述方法还包括: 显示状态信息和业务信 息。 进一步地,在检测容灾业务系统中的业务服务器和 /或运行于业务服务器 上的业务是否处于故障状态之后, 上述方法还包括: 在检测结果为是时, 产 生告警。 通过本发明, 釆用在容灾服务器上安装业务, 扩大了容灾服务器可以容 灾的业务的范围,解决了相关技术中的容灾业务系统釆用一对一的容灾模式, 不能保证对每种业务具备足够的容灾能力的问题, 提高了容灾可靠性, 降低 了容灾业务系统投入的成本。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部 分, 本发明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的 不当限定。 在附图中: 图 1是 居本发明实施例的容灾业务系统的结构框图; 图 2是 居本发明优选实施例的容灾业务系统的示意图; 图 3是 居本发明实施例的容灾方法的流程图; 图 4是才艮据本发明优先实施例的容灾方法的交互流程图。 具体实施方式 下文中将参考附图并结合实施例来详细说明本发明。 需要说明的是, 在 不冲突的情况下, 本申请中的实施例及实施例中的特征可以相互组合。 根据本发明的实施例, 提供了一种容灾业务系统。 图 1是根据本发明实 施例的容灾业务系统的结构框图, 该系统包括故障检测模块 11 , 服务器管理 模块 12 , 业务装卸模块 13和业务运行模块 14。 下面对其结构进行详细描述。 故障检测模块 11 ,用于检测运行业务的业务艮务器和 /或该业务是否处于 故障状态; 服务器管理模块 12 , 连接至故障检测模块 11 , 用于在故障检测模 块 11的检测结果为是时, 确定用于替代业务服务器的容灾服务器; 业务装卸 模块 13 , 连接至故障检测模块 11和服务器管理模块 12 , 用于在服务器管理 模块 12确定的容灾服务器上安装业务, 该业务是故障检测模块 11检测到处 于故障状态的业务艮务器上运行的业务和 /或是故障检测模块 11检测到处于 故障状态的业务; 业务运行模块 14 , 连接至业务装卸模块 13 , 用于在业务 装卸模块 13安装业务后的容灾服务器上运行业务。 相关技术中, 容灾服务器仅仅对特定某种业务进行容灾。 本发明实施例 中, 通过业务装卸模块 13 在容灾服务器上安装业务, 扩大了容灾服务器可 以容灾的业务的范围, 从而可以提高业务容灾的可靠性, 并且降低容灾业务 系统投入的成本。 优选地, 服务器管理模块 12包括获取子模块 121和确定子模块 122。 下 面对其结构进行详细描述。 获取子模块 121 , 连接至故障检测模块 11 , 用于在故障检测模块 11的检 测结果为是时, 获取空闲的容灾服务器; 确定子模块 122 , 连接至获取子模 块 121 , 用于根据获取子模块 121获取的空闲的容灾服务器的性能, 确定用 于替代业务服务器的容灾服务器。 本优选实施例中, 如果获取子模块 121获取到多台空闲的容灾服务器, 则选择所有空闲的容灾服务器中的性能最佳的作为选定的容灾服务器。这样, 可以为故障的业务服务器提供更好的容灾服务, 从而提高容灾的可靠性。 需要说明的是,如果获取子模块 121仅仅获取到一台空闲的容灾服务器, 则将这台容灾服务器作为选定的容灾服务器。 如果获取子模块 121没有获取 到空闲的容灾服务器, 则故障检测模块 11会向状态告警模块 17发出告警信 息, 以表示当前没有可供选择的空闲的容灾服务器。 优选地, 上述容灾业务系统还包括存储模块 15 , 状态显示模块 16和状 态告警模块 17。 下面对其结构进行详细描述。 存储模块 15 ,用于存储业务服务器的状态信息和业务的业务信息;其中, 状态信息包括以下至少之一: 用于指示业务服务器运行正常的状态信息、 用 于指示业务服务器运行故障的状态信息、 用于指示业务服务器空闲的状态信 息; 业务信息包括以下至少之一: 业务的业务节点、 业务的模块号、 业务的 业务类型、 业务的版本、 业务的目录。 状态显示模块 16 , 连接至存储模块 15 , 用于显示存储模块 15存储的状态信息和业务信息。 状态告警模块 17 , 连接至故障检测模块 11 , 用于在故障检测模块 11 的检测结果为是时, 产生 告警。 本优选实施例中, 存储模块 15 存储的状态信息和业务的业务信息, 可 用于状态显示模块 16 的显示, 从而提供给用户直观的提示信息, 以便用户 对容灾业务系统进行管理。 状态告警模块 17 产生告警, 可以给用户醒目的 提示信息, 以便用户处理容灾业务系统的故障。 优选地, 业务装卸模块 13 还用于在处于故障状态的业务服务器上卸载 业务。 本优选实施例中, 通过在处于故障状态的业务月艮务器上卸载业务, 可以 将该业务服务器恢复为正常的空闲的状态。 然后, 将该业务服务器作为新的 容灾服务器, 可以实现容灾服务器的重复利用, 从而进一步提高业务容灾的 可靠性, 并且降氐容灾业务系统投入的成本。 本发明还提供了一个优选实施例, 结合了上述多个优选实施例的技术方 案, 下面结合图 2来详细描述。 图 2是 居本发明优选实施例的容灾业务系统的示意图, 包括运营维护 管理模块 21 (对应于上述状态显示模块和状态告警模块), 设备管理数据库 22 (对应于上述存储模块), 容灾模块 23 (对应于上述故障检测模块), 计算 机资源管理中心 24 (对应于上述服务器管理模块), 自动部署模块 25 (对应 于上述业务装卸模块), 下面对其进行详细描述。 运营维护管理模块 21 , 用于显示各个主机以及业务的状态, 当设备发生 故障后有相关告警等功能。 设备管理数据库 22 , 用于对设备的信息进行存储。 容灾模块 23 , 用于检测设备及业务状态, 当业务或者主机发生故障后, 进行容灾功能。 计算机资源管理中心 24 , 用于管理各个业务服务器的状态, 及被容灾功 能。 自动部署模块 25 , 用于在各个业务服务器上的业务安装。 根据本发明的实施例, 还提供了一种容灾方法。 图 3是才艮据本发明实施 例的容灾方法的流程图, 包括如下的步骤 S302至步骤 S308。 步骤 S302, 检测容灾业务系统中的业务月艮务器和 /或运行于业务月艮务器 上的业务是否处于故障状态。 步骤 S304,如果检测结果为是,确定用于替代业务艮务器的容灾艮务器。 步骤 S306, 在容灾服务器上安装业务。 步骤 S308, 在容灾艮务器上运行业务。 相关技术中, 容灾服务器仅仅对特定某种业务进行容灾。 本发明实施例 中, 通过在容灾服务器上安装业务, 扩大了容灾服务器可以容灾的业务的范 围, 从而可以提高业务容灾的可靠性, 并且降氐容灾业务系统投入的成本。 优选地, 确定用于替代业务服务器的容灾服务器包括: 获取容灾业务系 统中的空闲的容灾服务器; 根据空闲的容灾服务器的性能, 确定用于替代业 务服务器的容灾服务器。 本优选实施例中, 如果获取到多台空闲的容灾服务器, 则选择所有空闲 的容灾服务器中的性能最佳的作为选定的容灾服务器。 这样, 可以为故障的 业务服务器提供更好的容灾服务, 从而提高容灾的可靠性。 需要说明的是, 如果仅仅获取到一台空闲的容灾服务器, 则将这台容灾 服务器作为选定的容灾服务器。 如果没有获取到空闲的容灾服务器, 则会发 出告警信息, 以表示当前没有可供选择的空闲的容灾服务器。 优选地,在检测容灾业务系统中的业务月艮务器和 /或运行于业务月艮务器上 的业务是否处于故障状态之前, 存储业务服务器的状态信息和业务的业务信 息; 其中, 状态信息包括以下至少之一: 用于指示业务服务器运行正常的状 态信息、 用于指示业务服务器运行故障的状态信息、 用于指示业务服务器空 闲的状态信息; 业务信息包括以下至少之一: 业务的业务节点、 业务的模块 号、 业务的业务类型、 业务的版本、 业务的目录。 优选地,在检测容灾业务系统中的业务月艮务器和 /或运行于业务月艮务器上 的业务是否处于故障状态之前, 显示状态信息和业务信息。 优选地,在检测容灾业务系统中的业务月艮务器和 /或运行于业务月艮务器上 的业务是否处于故障状态之后, 在检测结果为是时, 产生告警。 本优选实施例中, 存储的状态信息和业务的业务信息, 可用于状态的显 示, 从而提供给用户直观的提示信息, 以便用户对容灾业务系统进行管理。 产生告警可以给用户醒目的提示信息, 以便用户处理容灾业务系统的故障。 本发明还提供了一个优选实施例, 结合了上述多个优选实施例的技术方 案, 下面结合图 4来详细描述。 图 4是才艮据本发明优先实施例的容灾方法的交互流程图, 包括如下的步 骤 S400至步骤 S434。 步骤 S400, 各个模块运行正常, 设备上电, 安装好操作系统、 设备管理 数据库等系统软件, 基础网络配置完毕, 每台主机配置好管理 IP和管理账号 密码 (通常就是 root 密码); 容灾中心安装完毕。 当前有空闲的容灾服务器 若千, 并且已经在容灾中心注册成功。 容灾中心设置为自动容灾模式。 容灾 中心通过和各个业务服务器之间的心跳消息进行交互, 确定各个业务服务器 是否状态正常。 步骤 S402, 容灾模块通过安全协议(Secure Shell, 简称为 SSH )方式与 各个业务服务器进行心跳检测, 一般心跳时间为 10秒 /次(可设置)。 当容灾 模块接收到业务服务器正常的响应后, 会在设置的间隔时间内进行查询。 当 查询返回失败后, 才艮据间隔时间会再次进行查询, 查询 3次后 (可设置)认 为业务服务器状态异常。容灾模块向消息接口模块发送业务服务器注销请求。 步骤 S404, 消息接口模块接收到业务服务器注销请求后, 向计算机资源 管理中心发送业务服务器状态变更请求, 并在其中携带下电标识。 步骤 S406, 计算机资源管理中心进行业务服务器的状态变更、 下电。 如 果当前的业务服务器只是业务有问题, 业务服务器状态是好的。 则会在业务 服务器上面进行业务卸载及 IP信息删除等相关操作。如果业务服务器发生问 题, 则会在下次业务艮务器^ ί'爹复后, 重新加载业务前进行相关删除操作。 步骤 S408, 计算机资源管理中心进行完相关操作后, 向消息接口模块发 送业务服务器状态变更响应, 其中携带下电响应。 步骤 S410, 消息接口模块接收到计算机资源管理中心响应后, 向容灾模 块发出业务月艮务器注销响应。 步骤 S412 , 容灾模块接收到业务服务器注销响应后, 进行设备管理数据 库操作, 并删除相关业务信息 (如业务节点, 模块号等)。 步骤 S414, 容灾模块向运营维护管理模块发送停止业务通知请求, 用于 通知运营维护管理模块在页面上显示与相关业务信息相应的业务停止。 步骤 S416 , 运营维护管理模块在接收到容灾模块的通知后, 在页面上把 业务的状态变更为停止, 并在变更完成后, 向容灾模块发送停止业务通知响 应。 步骤 S418, 容灾模块在接收到停止业务通知响应后, 在设备管理数据库 中删除业务服务器的信息。 删除掉后, 向运营维护管理模块发送业务服务器 注销结果通知请求。 步骤 S420, 运营维护管理模块把业务服务器的状态变为故障状态, 同时 给容灾模块发送业务服务器故障结果通知响应。 步骤 S422 , 在进行完以上步骤后, 容灾模块进行容灾服务器状态检测, 如果当前没有空闲的容灾服务器,则会向运营维护管理模块发出告警信息(当 前没有空闲的容灾服务器)。如果有多台空闲的容灾服务器,容灾模块则进行 判断, 查看所有容灾服务器中设备性能最佳的作为选定的容灾服务器。 如果 当前只有一台空闲的容灾服务器, 则只在这台容灾服务器上进行容灾处理。 选定好空闲的容灾服务器后, 容灾模块向消息接口模块发送业务加载请求, 消息接口模块向计算机资源管理中心发送空闲的容灾服务器的容灾服务器请 求。 步骤 S424, 计算机资源管理中心查找到当前的空闲的容灾月艮务器后, 向 消息接口模块回容灾服务器请求响应, 消息接口模块向容灾模块发送业务加 载响应。 步骤 S426, 容灾模块接收到业务加载响应后, 进行模块号复用。 然后, 容灾模块把原因模块号及业务加载请求(管理 IP, 逻辑 IP, 模块号, 业务类 型, 版本和目录) 发送给自动部署模块。 步骤 S428, 自动部署模块根据业务加载请求在该空闲的容灾服务器上面 上传版本, 执行安装脚本, 自动启动脚本, 成功后返回部署成功响应, 并执 行步骤 S430。 如果由于容灾服务器原因, 造成部署失败, 会在失败的容灾服 务器上面, 4巴业务及相关文件都删除, 同时返回部署失败响应, 然后由自动 部署模块向容灾模块返回业务加载失败响应,并由容灾模块继续从步骤 S420 重新选择空闲的容灾服务器。 步骤 S430, 自动部署模块向容灾模块返回业务加载成功响应, 容灾模块 进行逻辑设备和物理设备入库。 同时发送加载业务结果通知请求给运营维护 管理模块。 步骤 S432, 运营维护管理模块在接收到加载业务结果通知请求后, 会在 页面上进行相关展示。 同时返回加载业务结果通知响应给容灾模块。 步骤 S434, 生成配置文件, 自动同步到所有节点。 综上所述, 根据本发明的上述实施例, 釆用在容灾服务器上安装业务, 扩大了容灾服务器可以容灾的业务的范围, 解决了相关技术中的容灾业务系 统釆用一对一的容灾模式,不能保证对每种业务具备足够的容灾能力的问题, 提高了容灾可靠性, 并且, 本发明的容灾方式可以使得容灾不需要特定的环 境, 任何一个业务都可以在任何一台服务器等环境上进行容灾, 大大降低了 容灾业务系统投入的成本。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可 以用通用的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布 在多个计算装置所组成的网络上, 可选地, 它们可以用计算装置可执行的程 序代码来实现, 从而, 可以将它们存储在存储装置中由计算装置来执行, 并 且在某些情况下, 可以以不同于此处的顺序执行所示出或描述的步骤, 或者 将它们分别制作成各个集成电路模块, 或者将它们中的多个模块或步骤制作 成单个集成电路模块来实现。 这样, 本发明不限制于任何特定的硬件和软件 结合。 以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本 领域的技术人员来说, 本发明可以有各种更改和变化。 凡在本发明的 ^"神和 原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护 范围之内。

Claims

权 利 要 求 书 一种容灾业务系统, 其特征在于, 包括: 故障检测模块, 用于检测运行业务的业务服务器和 /或所述业务是否 处于故障状态;
服务器管理模块, 用于在所述检测结果为是时, 确定用于替代所述 业务服务器的容灾服务器;
业务装卸模块, 用于在所述容灾服务器上安装所述业务; 业务运行模块, 用于在所述容灾服务器上运行所述业务。 根据权利要求 1所述的容灾业务系统, 其特征在于, 所述服务器管理模 块包括:
获取子模块, 用于获取空闲的容灾服务器;
确定子模块, 用于根据所述空闲的容灾服务器的性能, 确定用于替 代所述业务服务器的所述容灾服务器。 根据权利要求 1所述的容灾业务系统, 其特征在于, 还包括:
存储模块, 用于存储所述业务服务器的状态信息和所述业务的业务 信息;
其中, 所述状态信息包括以下至少之一: 用于指示所述业务服务器 运行正常的状态信息、 用于指示所述业务服务器运行故障的状态信息、 用于指示所述业务服务器空闲的状态信息; 所述业务信息包括以下至少之一: 所述业务的业务节点、 所述业务 的模块号、 所述业务的业务类型、 所述业务的版本、 所述业务的目录。 根据权利要求 3所述的容灾业务系统, 其特征在于, 还包括:
状态显示模块, 用于显示所述状态信息和所述业务信息; 状态告警模块, 用于在所述检测结果为是时, 产生告警。 根据权利要求 4所述的容灾业务系统, 其特征在于, 所述业务装卸模块 还用于在处于故障状态的所述业务艮务器上卸载所述业务。
6. —种容灾方法, 其特征在于, 包括:
检测容灾业务系统中的业务服务器和 /或运行于所述业务服务器上 的业务是否处于故障状态;
如果检测结果为是, 确定用于替代所述业务服务器的容灾服务器; 在所述容灾服务器上安装所述业务;
在所述容灾服务器上运行所述业务。
7. 根据权利要求 6所述的方法, 其特征在于, 确定用于替代所述业务服务 器的所述容灾服务器包括:
获取所述容灾业务系统中的空闲的容灾服务器;
根据所述空闲的容灾服务器的性能, 确定用于替代所述业务服务器 的所述容灾服务器。
8. 根据权利要求 7所述的方法, 其特征在于, 在检测所述容灾业务系统中 的所述业务服务器和 /或运行于所述业务服务器上的所述业务是否处于 所述故障状态之前, 所述方法还包括:
存储所述业务服务器的状态信息和所述业务的业务信息; 其中, 所述状态信息包括以下至少之一: 用于指示所述业务服务器 运行正常的状态信息、 用于指示所述业务服务器运行故障的状态信息、 用于指示所述业务服务器空闲的状态信息; 所述业务信息包括以下至少之一: 所述业务的业务节点、 所述业务 的模块号、 所述业务的业务类型、 所述业务的版本、 所述业务的目录。
9. 根据权利要求 8所述的方法, 其特征在于, 在检测所述容灾业务系统中 的所述业务服务器和 /或运行于所述业务服务器上的所述业务是否处于 所述故障状态之前, 所述方法还包括: 显示所述状态信息和所述业务信 息。
10. 根据权利要求 8所述的方法, 其特征在于, 在检测所述容灾业务系统中 的所述业务服务器和 /或运行于所述业务服务器上的所述业务是否处于 所述故障状态之后, 所述方法还包括: 在所述检测结果为是时, 产生告
PCT/CN2010/076969 2010-07-26 2010-09-15 容灾业务系统及容灾方法 WO2012012962A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP10855190.4A EP2600565B1 (en) 2010-07-26 2010-09-15 Disaster tolerance service system and disaster tolerance method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010245165.7A CN101902361B (zh) 2010-07-26 2010-07-26 容灾业务系统及容灾方法
CN201010245165.7 2010-07-26

Publications (1)

Publication Number Publication Date
WO2012012962A1 true WO2012012962A1 (zh) 2012-02-02

Family

ID=43227580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/076969 WO2012012962A1 (zh) 2010-07-26 2010-09-15 容灾业务系统及容灾方法

Country Status (3)

Country Link
EP (1) EP2600565B1 (zh)
CN (1) CN101902361B (zh)
WO (1) WO2012012962A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2882136A4 (en) * 2012-11-08 2015-08-26 Zte Corp METHOD AND SYSTEM FOR IMPLEMENTING AN EMERGENCY RECOVERY CIRCUIT OF A SERVICE PLATFORM

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932196B (zh) * 2011-08-11 2015-10-07 中国移动通信集团浙江有限公司 一种主机系统状态的检测方法和装置
CN102291262B (zh) * 2011-09-01 2018-03-23 中兴通讯股份有限公司 一种容灾的方法、装置及系统
CN103580883B (zh) * 2012-07-19 2018-09-11 南京中兴软件有限责任公司 一种业务容灾方法及系统
CN104954157B (zh) * 2014-03-27 2018-12-04 中国移动通信集团湖北有限公司 一种故障自愈方法及系统
CN104734886A (zh) * 2015-03-10 2015-06-24 青岛海尔智能家电科技有限公司 一种业务服务器的管理方法、装置及系统
CN107770398A (zh) * 2016-08-22 2018-03-06 中兴通讯股份有限公司 呼叫中心的容灾方法及系统
CN106502823A (zh) * 2016-09-29 2017-03-15 北京许继电气有限公司 数据云备份方法和系统
CN106776140A (zh) * 2016-12-21 2017-05-31 博飞信息科技(上海)有限公司 超容灾备恢复一体机的装置及方法
US20230393957A1 (en) * 2020-11-05 2023-12-07 Telefonaktiebolaget Lm Ericsson (Publ) Methods and Apparatuses for Providing a Back-Up Service
CN116382967B (zh) * 2023-06-02 2023-09-12 北京国电通网络技术有限公司 用于服务器设备固件故障的自动处理方法、电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1859219A (zh) * 2006-04-18 2006-11-08 华为技术有限公司 基于设备容灾的业务接管方法、业务转接设备及备份机
CN101547084A (zh) * 2008-03-24 2009-09-30 大唐移动通信设备有限公司 一种多媒体广播业务传输系统及方法
CN101621413A (zh) * 2009-08-20 2010-01-06 中兴通讯股份有限公司 实现对web服务器进行负载均衡和容灾的装置及方法
CN101729279A (zh) * 2008-10-28 2010-06-09 中兴通讯股份有限公司 一种企业移动信息系统容灾的方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5011073B2 (ja) * 2007-11-22 2012-08-29 株式会社日立製作所 サーバ切り替え方法、およびサーバシステム
CN101719179A (zh) * 2009-11-18 2010-06-02 司光亚 一种大规模虚拟个体基础属性逆向生成方法
CN101902357B (zh) * 2010-06-29 2014-07-16 中兴通讯股份有限公司 对业务服务器进行调度的方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1859219A (zh) * 2006-04-18 2006-11-08 华为技术有限公司 基于设备容灾的业务接管方法、业务转接设备及备份机
CN101547084A (zh) * 2008-03-24 2009-09-30 大唐移动通信设备有限公司 一种多媒体广播业务传输系统及方法
CN101729279A (zh) * 2008-10-28 2010-06-09 中兴通讯股份有限公司 一种企业移动信息系统容灾的方法
CN101621413A (zh) * 2009-08-20 2010-01-06 中兴通讯股份有限公司 实现对web服务器进行负载均衡和容灾的装置及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2882136A4 (en) * 2012-11-08 2015-08-26 Zte Corp METHOD AND SYSTEM FOR IMPLEMENTING AN EMERGENCY RECOVERY CIRCUIT OF A SERVICE PLATFORM
US9684574B2 (en) 2012-11-08 2017-06-20 Zte Corporation Method and system for implementing remote disaster recovery switching of service delivery platform

Also Published As

Publication number Publication date
EP2600565A1 (en) 2013-06-05
CN101902361A (zh) 2010-12-01
CN101902361B (zh) 2014-09-10
EP2600565A4 (en) 2014-06-11
EP2600565B1 (en) 2016-01-06

Similar Documents

Publication Publication Date Title
WO2012012962A1 (zh) 容灾业务系统及容灾方法
US7237243B2 (en) Multiple device management method and system
US11307943B2 (en) Disaster recovery deployment method, apparatus, and system
US7469279B1 (en) Automatic re-provisioning of network elements to adapt to failures
CN104679530B (zh) 服务器系统与固件更新方法
JP4647234B2 (ja) ネットワーク装置をディスカバリするための方法および装置
US6671699B1 (en) Shared database usage in network devices
US8719386B2 (en) System and method for providing configuration synchronicity
CN102394914A (zh) 集群脑裂处理方法和装置
TWI701916B (zh) 用於在分布式系統中使管理能力自恢復的方法和裝置
CN108712501A (zh) 信息的发送方法、装置、计算设备以及存储介质
CN106657167B (zh) 管理服务器、服务器集群、以及管理方法
JP5617304B2 (ja) スイッチング装置、情報処理装置および障害通知制御プログラム
CN110932914B (zh) 部署方法、部署装置、混合云系统架构及计算机存储介质
CN110391940A (zh) 服务地址的响应方法、装置、系统、设备和存储介质
CN101227333B (zh) 一种容灾网管系统及其网管客户端的登陆方法
US7774589B2 (en) System, method and program for selectivity rebooting computers and other components of a distributed computer system
CN109842526B (zh) 一种容灾方法和装置
CN114124803B (zh) 设备管理方法、装置、电子设备及存储介质
CN111240700A (zh) 一种跨网段服务器os部署系统及方法
CN106302626A (zh) 一种弹性扩容方法、装置及系统
CN114168261A (zh) 一种基于OpenStack管理裸金属实例的高可用方法及装置
Cisco 2.1.00 Version Software Release Notes for Cisco WAN MGX 8850 Software
Cisco Release Notes for Cisco Info Center Release 1.2
CN114301763A (zh) 分布式集群故障的处理方法及系统、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10855190

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010855190

Country of ref document: EP