WO2014110994A1 - Integrated platform for disaster recovery of it system - Google Patents

Integrated platform for disaster recovery of it system Download PDF

Info

Publication number
WO2014110994A1
WO2014110994A1 PCT/CN2014/070331 CN2014070331W WO2014110994A1 WO 2014110994 A1 WO2014110994 A1 WO 2014110994A1 CN 2014070331 W CN2014070331 W CN 2014070331W WO 2014110994 A1 WO2014110994 A1 WO 2014110994A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
data
service
integrated platform
disaster recovery
Prior art date
Application number
PCT/CN2014/070331
Other languages
French (fr)
Chinese (zh)
Inventor
戚跃民
郝建明
伍福生
简超
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2014110994A1 publication Critical patent/WO2014110994A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements

Definitions

  • the present invention relates to disaster tolerance technologies in network communications, and in particular, to an integrated platform capable of data recovery and maintaining data consistency in the event of a disaster in an IT system. Background technique
  • Disaster tolerance is generally divided into three levels: the data level, the system level, and the business level.
  • Data-level disaster recovery is data, which means that the original data of the user will not be lost or destroyed after the disaster occurs.
  • Data-level disaster recovery is different from backup, which requires backup of data to be stored off-site.
  • System-level disaster recovery is to replicate the execution application processing energy (service server area) on the basis of data-level disaster recovery, that is, to build a support system at the backup site, and system-level disaster recovery can Provides uninterrupted application services that allow user application service requests to continue to run transparently without the impact of a disaster.
  • the present invention is directed to an integrated platform for IT system disaster recovery capable of implementing effective control of service hosts in a distributed distributed system and capable of maintaining data consistency and business continuity.
  • the integrated platform for disaster recovery of the IT system of the present invention can effectively solve the centralized problem of each host in the distributed IT system in the off-site, implement effective control on the service host, and ensure business continuity through flexible business process functions.
  • the integrated platform for disaster recovery of IT systems of the present invention realizes real-time communication with various service hosts, and unifies data recovery and business switching into business processes.
  • the integrated platform for disaster recovery of an IT system of the present invention is used for centralized management of an off-site distributed IT system, which has multiple local service hosts, multiple off-site service hosts, and management local service hosts. And two management centers of off-site business hosts, the integration platform includes:
  • a system management module configured to monitor and manage the remote distributed IT system in real time, so that the management center can obtain information of each service host in real time;
  • the system communication module is deployed on each of the local service hosts, the different service hosts, and the management center, and is used to implement communication between the local service hosts and the management center, and the different industries. Communication between the host and the management center, and communication between the management centers;
  • a data synchronization module configured to implement real-time data synchronization in the remote distributed IT system
  • a data comparison analysis module configured to implement data consistency verification in the remote distributed IT system
  • a data storage module configured to implement data storage in the remote distributed IT system
  • a business process module configured to implement various types of business processes in the remote distributed IT system
  • a service recovery module configured to be in the off-site Realizing the takeover of off-site business processes in the event of a disaster in a distributed IT system
  • the security audit module is configured to encrypt and decrypt messages received and sent between each service host and each management center.
  • the system management module, the data synchronization module, the data comparison analysis module, the data storage module, the service flow module, the service recovery module, and the security audit module are all related to
  • the system communication module is associated with the data synchronization module, the data comparison analysis module, the data storage module, and the service process module are all associated with the service recovery module.
  • the system communication module is configured to implement message sending, message parsing, command execution, and result feedback between each service host and each management center.
  • each management center and each unit module in each service host are connected to a Tuxedo/Q service through a WTC interface.
  • the security audit module encrypts the sending and receiving of messages between the management center and each unit module in the service host by using a WSL insertion message.
  • the security audit module performs rsa encryption setting by using the generation of _z in the process of inserting a message by WSL.
  • the data comparison analysis module is capable of performing data positioning and analysis according to the difference of the data and performing data matching according to the data.
  • the service recovery module includes: a first submodule for starting a remote handover system application and a database; a second submodule for acquiring a time of the disaster handover; Three sub-modules.
  • the data synchronization module is configured to deploy an image storage in a local service system to implement synchronous data replication between local service systems.
  • the data synchronization module is configured to implement asynchronous replication of data between local mirror storage and offsite storage.
  • the data synchronization module is further configured to replenish the off-site data into the local database after the local system recovers the service.
  • the two management centers are functionally identical and are backups of each other.
  • the management center uses an LDAP server to perform identity authentication.
  • the technical problems mainly solved by the present invention are as follows: (1) How to centrally manage and control distributed systems in different places; (2) How to realize fast service switching of distributed systems in different places, and ensure the continuity of services when disasters occur; (3) How to realize the automatic processing of the business; (4) How to monitor the status of the controlled terminal; (5) How to compare the consistency of the business database between the two places.
  • the technical means adopted is: Implementing a service host for all service hosts through a messaging mechanism between the management center and each controlled terminal (using Tuxedo/Q reliable message queue) control.
  • the technical means adopted is: establishing a corresponding off-site handover and switchback process for each set of operational systems of operation, and tracking with data
  • the combination of complementary modules ensures the sustainability of the business in the event of a disaster, while the latter guarantees the integrity of the data in the event of a disaster.
  • the technical means adopted is: Since the management center can realize management and control of all service hosts, the daily operations of the business system can be automated, that is, the fixed process is sent through the management center. Job instructions are used to implement; in addition, for certain business needs, the business personnel can also define a set of arbitrary processes to achieve them in a flexible manner.
  • the technical means used are: deploying the corresponding Tuxedo WSL service on each of the two management centers, and setting the WSNADDR environment variable, environment variable on each controlled end.
  • the value is the address corresponding to the WSL service published by the tuxedo server (ip address: port number), and is used by the tuxedo client program (controlled application program) to connect to the tuxedo server. If the connection fails, reconnect after 30 seconds.
  • the controlled end periodically sends a heartbeat message to the management center, and the management center determines whether the state of the controlled end is normal.
  • the technical means adopted is: through the data comparison module, it is possible to compare any one of the two business databases or one table set (multiple tables), and the comparison manner is diverse. , 1] compares the number of records on the table; 2) compares some fields in the table; 3) compares the MD5 algorithm on the table; through these data comparison methods, it can accurately find out whether the local and remote business databases are consistent. Inconsistent can tell the user where the difference is.
  • the integrated platform for disaster recovery of an IT system of the present invention can realize real-time communication between various service hosts, and can integrate data recovery and service switching into a business process. Therefore, the present invention can provide a A disaster in a distributed IT system An integrated platform for disaster recovery of IT systems that combines hardware recovery, data recovery, and business recovery.
  • FIG. 1 is a schematic structural diagram showing centralized management of an off-site distributed IT system for an integrated platform for IT system disaster recovery according to the present invention.
  • FIG. 2 is a schematic diagram showing the construction of an integrated platform for disaster recovery of an IT system of the present invention.
  • FIG. 3 is a schematic diagram showing data storage and data synchronization processing for an integrated platform for IT system disaster recovery according to the present invention.
  • FIG. 4 is a flow chart showing the processing of a unit module, that is, a controlled end, under the integrated platform management for IT system disaster recovery according to the present invention.
  • the remote distributed IT system has multiple local service hosts (having a service host 1 , a service host 2 , a service host 3 . . . ) in the local area, and multiple off-site service hosts (having services in different places) Host 4, service host 5, service host 6...), two management centers that manage local service hosts and off-site service hosts.
  • the local service host, the off-site service host, and the two management centers are connected through communication lines. Among them, the two management centers have the same management functions and are sealed each other.
  • the off-site The distributed IT system includes all of the above business hosts and management centers.
  • the "controlled end” that will be mentioned in the present invention is a module deployed on all service hosts.
  • all service hosts can accept instructions from the management center to perform related operations (thus, the host that deploys the unit module can also be referred to herein as abbreviated. For "controlled end").
  • FIG. 2 is a schematic diagram showing the construction of an integrated platform for disaster recovery of an IT system of the present invention.
  • the integrated platform for disaster recovery of an IT system of the present invention includes: real-time monitoring and managing the remote distributed IT system to enable the management center to acquire information of each service host in real time.
  • the system management module 100 is deployed in each of the local service hosts, the off-site service hosts, and the management center, and is used to implement communication between the local service hosts and the management center, communication between the different service hosts and the management center, and the management center.
  • a system communication module 200 for communication between the data synchronization module 300 for realizing real-time synchronization of data in the remote distributed IT system; data comparison for realizing consistency verification of data in the remote distributed IT system
  • An analysis module 400 for analysis module 400; a data storage module 500 for implementing data storage in the remote distributed IT system; a business process module 600 for implementing various types of business processes in the remote distributed IT system;
  • a business recovery model for realizing the takeover of off-site business processes in the event of a disaster in a distributed distributed system 700; for encrypting messages sent and received between the hosts and the respective service management center, the decryption module 800 of the security audit.
  • the system management module 100, the data synchronization module 300, the data comparison analysis module 400, the data storage module 500, the business process module 600, the service recovery module 700, and the security audit module 800 are all associated with the system communication module 200.
  • the comparison analysis module 400, the data storage module 500, and the business process module 600 are all associated with the service recovery module (700).
  • the system management module 100 can monitor and manage the remote distributed IT system in real time, so that the management center can obtain the service host in real time through a reliable message communication mechanism between each associated unit module and the management center in the system. information.
  • the system communication module 200 is deployed in each unit module and management center in the remote distributed IT system, and is used for real-time messaging, message parsing, command execution, and result feedback, and is the basis for realizing disaster recovery of the IT system.
  • the Tuxedo /Q service is connected between the management center and each unit module through the WTC interface.
  • WTC is the connection tool between BEA's WEB support product Weblogic and middleware product Tuxedo, the full name of Weblogic Tuxedo Connector.
  • WTC provides two-way access between Weblogic and Tuxedo.
  • Tuxedo is also a middleware product from BEA.
  • Tuxedo/Q components can be implemented in a reliable manner, allowing messages to be queued and stored in persistent shields such as disk or non-persistent media. Such as in memory, for later use.
  • the management center is deployed in a reliable manner, allowing messages to be queued and stored in persistent shields such as disk or non-persistent media. Such
  • Tuxedo's /Q reliable message queue is deployed on each unit module (that is, each controlled terminal) to receive the message command sent by the management center -> Execute -> to store the result message in the response queue.
  • the communication between Weblogic in the management center and Tuxedo between each unit module uses the WTC interface.
  • the management center sends the relevant command message and receives the executed return result message.
  • Each unit module receives the command message of the management center and sends a return result message of the execution.
  • Tuxedo /Q can provide reliable message service and ensure the integrity of the message delivery.
  • Such a mechanism provides a more flexible and reliable asynchronous execution method than tpacall(), which satisfies the remote distributed system. Need. Therefore, in the present invention, by adopting between the management center and each controlled terminal
  • Tuxedo /Q able to continue centralized management and control of off-site distributed systems.
  • the data synchronization module 300, the data comparison analysis module 400, and the data storage module 500 jointly construct a guarantee of data consistency in a remote distributed system, and the data synchronization module 300 and the data module 500 store services for realizing off-site distribution within the system.
  • the real-time synchronization of the system data, the data comparison analysis module 400 is used for verifying the consistency of the data distribution of the business system distributed in different places and performing data positioning and analysis according to the difference, and performing related data recovery.
  • the data synchronization module 300, the data comparison analysis module 400, and the data storage module 500 are the output of the service recovery module 600.
  • the business process module 600 is to be used to effectively implement various business processes of the off-site distribution system, and the process information is defined in the database in the form of ordered functional steps based on basic elements such as processes, steps, functions, and combined functions, and Custom modifications can be made through scripts.
  • the management center program reads the process information and interprets the execution, completes the execution of the process business functions, and implements the daily business processes fixed in the system. These business processes are collectively referred to as fixed processes.
  • Selective execution is a supplement to the fixed business process and is a very flexible way of controlling the business system.
  • the service recovery module 700 ensures that the remote distributed system can quickly implement the off-site business process takeover in the event of a disaster or the like, thereby ensuring the continuity of data and services. Sex.
  • the service recovery module 700 includes the following sub-modules: a first sub-module for initiating a remote business system application and database; a second sub-module for obtaining time for disaster switching; Three sub-modules. For example, if the transit service system of the Shanghai Center fails, you need to switch to the Beijing Center immediately.
  • the first sub-module of the service recovery module 700 will start the remote transfer system application and database, and determine whether With the switching condition, the second sub-module of the service recovery module 700 acquires the time point of the disaster handover (for subsequent data recovery), and the third sub-module of the service recovery module 700 performs network switching and the like.
  • each step in the process is essentially the control of a certain business host, and automatically completes the operation instructions issued by the management center. After the execution of the process, the process is transparent to the user.
  • the actual transaction processing location has changed from Shanghai to Beijing, ensuring the continuity of the business.
  • the Shanghai Central Business System is restored, it also corresponds to a set of business re-cutting process. After the switchback, the transaction will be sent to Shanghai for processing. And the data recovery process will track the transactions processed in Beijing during this period to the Shanghai Center.
  • the security audit module 800 is configured to avoid the plaintext transmission of messages between the management center and each unit module, and adds encryption settings to the message, and uses the -Z parameter to perform rsa encryption setting during WSL insertion of the message. Moreover, the message is received and sent in encrypted form, and the message is automatically decrypted after it is received, thus ensuring the security of data transmission.
  • the management center uses a unified LDAP (Lightweight Directory Access Protocol) server for identity authentication. The operator's permission setting information is also taken from the LDAP server, and the related authorization is checked first when performing various function operations. Only authorized users can perform the functions of each business process.
  • the security audit module 800 also records and audits login information, operational logs, and process execution.
  • each management center communicates with all business hosts in the local and remote locations to achieve system management, service implementation and recovery.
  • FIG. 3 is a schematic diagram showing data storage and data synchronization processing for an integrated platform for IT system disaster recovery according to the present invention.
  • a set of image storage is deployed in the local service system of the integrated platform for disaster recovery of the IT system of the present invention, and data synchronous replication between the local main service systems is realized and the data is bidirectionally replicated.
  • Data asynchronous replication is achieved between local mirror storage and offsite storage and the data is one-way replicated.
  • Such a data synchronization mechanism ensures that when a local business system or data disaster occurs, service recovery can be quickly performed in a different place, and data is not lost. After the local system resumes service, it can replenish the offsite data to the local database.
  • the "data replenishment” can be understood as follows, for example, continuing an example mentioned earlier in describing the "service recovery module 700":
  • the time point of the handover is recorded. After TO, after switching, all transactions have actually been transferred to the Beijing Center for processing.
  • the Shanghai Center resumes its business, it will execute the back-cutting process of the corresponding business system.
  • the switching time point Tl At the same time, it will also record the switching time point Tl, and the data tracking process will be executed later.
  • the time difference of T1-T0 is the time period during which the transaction is processed in the Beijing Center. .
  • the Beijing Management Center will issue instructions to read the data from the Beijing Central's transaction database (that is, off-site data), and pass this data through the optical network from Beijing to Shanghai. Shanghai Management Center, then Shanghai Management Center will insert this data into the corresponding business database. In this way, the transaction data is complete for both the business system and the user, just as no switching occurs.
  • FIG. 4 is a flow chart showing the processing of the unit module, that is, the controlled end to the message transmission and reception under the integrated platform management for the disaster recovery of the IT system of the present invention (that is, the system communication module) 200 specific process).
  • the process initializes, allocates space, and generates a linked list of leading nodes.
  • the servers of the two management centers respectively deploy the corresponding WUX service of tuxedo, and set the WSNADDR environment variable on each client server.
  • the value of the environment variable is the address corresponding to the WSL service issued by the tuxedo server (ip address: Port number) for the tuxedo client program (controlled application program) to connect to the tuxedo server. If the connection fails, reconnect after 30 seconds.
  • the linked list is mainly used to store state information of the currently executing execution process, and the content of each node includes a process number, a message function number, a message uniqueness flag, a parameter value set, a time when the process starts executing, and whether the node is Available flags (0 is available, 1 is not available).
  • the main process will clear the node information in the linked list corresponding to the execution process, and set the availability flag to 0 for later use.
  • determining the validity of the message mainly the verification identifier (system), function number (ftinc_id), IP address (ip), time (time), and message type (type) of the application system in the command message.
  • the value of (equal) is evaluated for validity.
  • the function processing script When the function processing script performs the function operation processing, it judges whether or not the processing is performed according to different situations, how to deal with the situation, and the like, and avoids unnecessary operations that have been erroneously returned, and returns the corresponding value.
  • a return value of 0 indicates that the function operation was processed successfully, and a non-zero indicates a failure.
  • the main process sends an interrupt signal to the corresponding execution process, and after the execution process receives the interrupt signal, the loop is stopped, and the subsequent operations are not performed.
  • the left part is the execution flow of the main process.
  • the main process is a cyclic process
  • the sequence mainly completes sending heartbeat information, accepts the message and judges the validity of the message, generates a corresponding execution process according to the content of the message, and manages the operation of the executing process being executed.
  • the industry currently recognizes three goals worth working. First, the recovery time, how long the enterprise can endure without IT, is in a state of suspension; the second is how long the network can recover; the third is the recovery at the business level. There are two most critical metrics throughout the recovery process: one is RTO and the other is RPO.
  • RTO Recovery Time Objective
  • RPO Recovery Point Objective
  • This update can be either the last week's backup data or the real-time data from the previous transaction. It can be seen that the integrated platform management for IT system disaster recovery of the present invention can provide continuous business services in the shortest time in the event of a disaster.
  • the controlled end ie, each unit module
  • the controlled end can automatically maintain a reliable connection with the server side and maintain sufficient operational robustness.
  • the running state of all deployed controlled ends ie, each unit module
  • the operating states of various business processes are provided. Effective monitoring, at the same time, capable of matching
  • the set parameters provide management and maintenance methods.
  • the integrated platform for the disaster recovery of the IT system of the present invention is designed to implement various business processes, and complete the processes in daily, planned, and disaster situations. control.
  • the implementation of fixed business processes and any functional processes is a core feature provided by disaster recovery applications.
  • process information is based on basic elements such as processes, steps, functions, and combined functions. It is defined in the form of ordered functional steps in the database, and can be modified by script customization.
  • the management center program reads out the process information and interprets the execution, and completes the execution of the process business functions, which are collectively referred to as fixed processes.
  • some temporary business system requirements such as equipment replacement, line maintenance, and fault handling, a series of necessary business functions need to be performed randomly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to an integrated platform for disaster recovery of an IT system, and is used for integrally mange a remote distributed IT system. The integrated platform comprises a system managing module (100), a system communicating module (200), a data synchronizing module (300), a data comparing and analyzing module (400), a data storing module (500), a service flow module (600), a service recovering module (700), and a security auditing module (800). The integrated platform for disaster recovery of the IT system according to the present invention can implement real-time communication between service hosts, and can integrate data recovery and service switching into a service flow; therefore, the present invention can provide the integrated platform for disaster recovery of the IT system effectively combining hardware recovery, data recovery and service recovery when a disaster occurs on the remote distributed IT system.

Description

一种针对 IT系统灾难恢复的集成平台  An integrated platform for IT system disaster recovery
技术领域 Technical field
[0001] 本发明涉及网络通信中的容灾技术, 具体地, 涉及在 IT系统 发生灾难的情况下能够进行数据恢复并保持数据一致性的集成平台。 背景技术  [0001] The present invention relates to disaster tolerance technologies in network communications, and in particular, to an integrated platform capable of data recovery and maintaining data consistency in the event of a disaster in an IT system. Background technique
[0002] 随着信息时代的到来,数据越来越突出地成为社会正常运作的 核心。 对于一个企业来讲, 数据更是影响其生存和发展的关键, 各行 业的用户和企业对网络应用和数据信息的依赖日益强烈, 突发性灾难 如火灾、 洪水、 地震或者恐怖事件等对整个企业的数据和业务生产会 造成重大影响。 因此, 如何保证在灾难发生时企业数据不丟失, 保证 系统服务尽快恢复运行, 成为人们关注的焦点之一, 因此, 容灾技术 日益成为各个行业关注的焦点。  [0002] With the advent of the information age, data has become more and more prominent as the core of the normal operation of society. For a company, data is the key to its survival and development. Users and enterprises in various industries are increasingly dependent on network applications and data information. Sudden disasters such as fires, floods, earthquakes or terrorist incidents are all The company's data and business production can have a major impact. Therefore, how to ensure that enterprise data is not lost in the event of a disaster and to ensure that system services resume operation as soon as possible becomes one of the focuses of attention. Therefore, disaster recovery technology has increasingly become the focus of attention in various industries.
[0003] 容灾从保障的程度上一般分为三个级别: 数据级、 系统级、 业 务級。  [0003] Disaster tolerance is generally divided into three levels: the data level, the system level, and the business level.
[0004] 数据级容灾的关注点在于数据, 即灾难发生后可以确保用户原 有的数据不会丟失或者遭到破坏。 数据级容灾与备份不同, 它要求数 据的备份保存到异地。  [0004] The focus of data-level disaster recovery is data, which means that the original data of the user will not be lost or destroyed after the disaster occurs. Data-level disaster recovery is different from backup, which requires backup of data to be stored off-site.
[0005] 系统级容灾是在数据级容灾的基础上再把执行应用处理能(业 务服务器区)复制一份, 也就是说, 在备份站点同样构建一套支撑系 统, 系统级容灾能够提供不间断的应用服务, 让用户应用的服务请求 能够透明地继续运行而不受到灾难发生的影响。  [0005] System-level disaster recovery is to replicate the execution application processing energy (service server area) on the basis of data-level disaster recovery, that is, to build a support system at the backup site, and system-level disaster recovery can Provides uninterrupted application services that allow user application service requests to continue to run transparently without the impact of a disaster.
[0006] 数据级容灾和系统级容灾都在 IT范畴之内的, 然而对于正常 业务而言,仅 IT系统的保障还不足够,有些用户需要构建最高级别的 业务级容灾。 [0006] Data-level disaster tolerance and system-level disaster tolerance are all within the IT category, but for normal In terms of business, only the protection of IT systems is not enough. Some users need to build the highest level of service-level disaster recovery.
[0007] 现有技术中, 当大型异地分布式系统在本地发生灾难时, 着重 于硬件(系统)级恢复、 数据级恢复或业务级恢复, 但缺乏一种将三 者有效结合起来的集成平台。  [0007] In the prior art, when a large-scale off-site distributed system disaster occurs locally, it focuses on hardware (system) level recovery, data level recovery, or service level recovery, but lacks an integrated platform that effectively combines the three. .
[0008] [0008]
发明内容 Summary of the invention
[0009] 鉴于上述问题, 本发明旨在提供一种能够对异地分布式系统中 的业务主机实施有效控制并且能够保持数据的一致性和业务的持续 性的针对 IT系统灾难恢复的集成平台。  In view of the above problems, the present invention is directed to an integrated platform for IT system disaster recovery capable of implementing effective control of service hosts in a distributed distributed system and capable of maintaining data consistency and business continuity.
[0010] 本发明的针对 IT系统灾难恢复的集成平台能够有效解决异地 分布式 IT系统中各主机的集中问题,对业务主机实施有效控制,通过 灵活的业务流程功能,保障业务持续性。本发明的成 IT系统灾难恢复 的集成平台实现了与各个业务主机之间的实时通信, 将数据恢复和业 务切换统一到业务流程中。  [0010] The integrated platform for disaster recovery of the IT system of the present invention can effectively solve the centralized problem of each host in the distributed IT system in the off-site, implement effective control on the service host, and ensure business continuity through flexible business process functions. The integrated platform for disaster recovery of IT systems of the present invention realizes real-time communication with various service hosts, and unifies data recovery and business switching into business processes.
[0011] 本发明的针对 IT系统灾难恢复的集成平台 , 用于对异地分布 式 IT系统进行集中管理, 该异地分布式 IT系统具备多个本地业务主 机、 多个异地业务主机、 管理本地业务主机和异地业务主机的两台管 理中心, 该集成平台包括:  [0011] The integrated platform for disaster recovery of an IT system of the present invention is used for centralized management of an off-site distributed IT system, which has multiple local service hosts, multiple off-site service hosts, and management local service hosts. And two management centers of off-site business hosts, the integration platform includes:
系统管理模块,用于实时监控和管理所述异地分布式 IT系统, 以使得 所述管理中心能够实时获取各业务主机的信息; a system management module, configured to monitor and manage the remote distributed IT system in real time, so that the management center can obtain information of each service host in real time;
系统通信模块, 部署于所述各本地业务主机、 各异地业务主机、 管理 中心, 并且用于实现各本地业务主机与管理中心间的通信、 各异地业 务主机与管理中心间的通信、 以及管理中心之间的通信; The system communication module is deployed on each of the local service hosts, the different service hosts, and the management center, and is used to implement communication between the local service hosts and the management center, and the different industries. Communication between the host and the management center, and communication between the management centers;
数据同步模块, 用于实现所述异地分布式 IT系统中的数据实时同步; 数据比较分析模块,用于实现所述异地分布式 IT系统中数据的一致性 验证; a data synchronization module, configured to implement real-time data synchronization in the remote distributed IT system; and a data comparison analysis module, configured to implement data consistency verification in the remote distributed IT system;
数据存储模块, 用于实现所述异地分布式 IT系统中的数据存储; 业务流程模块, 用于实现所述异地分布式 IT系统中的各类业务流程; 业务恢复模块,用于在所述异地分布式 IT系统发生灾难的情况下实现 异地业务流程的接管; a data storage module, configured to implement data storage in the remote distributed IT system; a business process module, configured to implement various types of business processes in the remote distributed IT system; and a service recovery module, configured to be in the off-site Realizing the takeover of off-site business processes in the event of a disaster in a distributed IT system;
安全审计模块, 用于对各业务主机和各管理中心之间的消息的接收和 发送进行加密、 解密。 The security audit module is configured to encrypt and decrypt messages received and sent between each service host and each management center.
[0012] 优选地, 所述系统管理模块、 所述数据同步模块、 所述数据比 较分析模块、 所述数据存储模块、 所述业务流程模块、 所述业务恢复 模块、 所述安全审计模块均与所述系统通信模块关联, 所述数据同步 模块、 所述数据比较分析模块、 所述数据存储模块、 所述业务流程模 块均与所述业务恢复模块关联。  [0012] Preferably, the system management module, the data synchronization module, the data comparison analysis module, the data storage module, the service flow module, the service recovery module, and the security audit module are all related to The system communication module is associated with the data synchronization module, the data comparison analysis module, the data storage module, and the service process module are all associated with the service recovery module.
[0013] 优选地, 所述系统通信模块用于实现各业务主机和各管理中心 之间的消息收发、 消息解析、 命令执行、 结果反馈。  [0013] Preferably, the system communication module is configured to implement message sending, message parsing, command execution, and result feedback between each service host and each management center.
[0014] 优选地, 所述各管理中心与所述各业务主机中的各单元模块之 间通过 WTC接口连接 Tuxedo/Q服务。 [0014] Preferably, each management center and each unit module in each service host are connected to a Tuxedo/Q service through a WTC interface.
[0015] 优选地, 所述安全审计模块对管理中心与业务主机中的各个单 元模块之间的消息的发送和接收通过使用 WSL插入消息进行加密。 [0015] Preferably, the security audit module encrypts the sending and receiving of messages between the management center and each unit module in the service host by using a WSL insertion message.
[0016] 优选地, 所述安全审计模块通过 WSL插入消息的过程中使用 _ z的产生进行 rsa加密设置。 [0017] 优选地, 所述数据比较分析模块能够根据数据的差异性进行数 据定位和分析并且据此进行数据追补。 [0016] Preferably, the security audit module performs rsa encryption setting by using the generation of _z in the process of inserting a message by WSL. [0017] Preferably, the data comparison analysis module is capable of performing data positioning and analysis according to the difference of the data and performing data matching according to the data.
[0018] 优选地, 所述业务恢复模块包括: 用于启动异地的转接系统应 用和数据库的第一子模块; 用于获取灾难切换的时间的第二子模块; 用于执行网络切换的第三子模块。  [0018] Preferably, the service recovery module includes: a first submodule for starting a remote handover system application and a database; a second submodule for acquiring a time of the disaster handover; Three sub-modules.
[0019] 优选地, 所述数据同步模块用于在本地业务系统中部署镜像存 储以实现本地业务系统之间的数据同步复制。  [0019] Preferably, the data synchronization module is configured to deploy an image storage in a local service system to implement synchronous data replication between local service systems.
[0020] 优选地, 所述数据同步模块用于实现本地镜像存储与异地存储 之间的数据异步复制。  [0020] Preferably, the data synchronization module is configured to implement asynchronous replication of data between local mirror storage and offsite storage.
[0021] 优选地, 所述数据同步模块还用于当本地系统恢复业务后将异 地数据回补到本地数据库中。 [0021] Preferably, the data synchronization module is further configured to replenish the off-site data into the local database after the local system recovers the service.
[0022] 优选地, 所述两台管理中心为功能相同并且互为备份。  [0022] Preferably, the two management centers are functionally identical and are backups of each other.
[0023] 优选地, 所述管理中心采用 LDAP服务器镜像身份认证。 [0023] Preferably, the management center uses an LDAP server to perform identity authentication.
[0024] 本发明主要解决的技术问题如下: ( 1 )如何对异地分布式系 统的集中管理和控制; (2 )如何实现异地分布式系统的快速业务切 换, 保证灾难发生时业务的持续性; ( 3 )如何实现业务的自动处理; ( 4 )如何监控受控端的状态; ( 5 )如何比较两地业务数据库的一致 性。 [0024] The technical problems mainly solved by the present invention are as follows: (1) How to centrally manage and control distributed systems in different places; (2) How to realize fast service switching of distributed systems in different places, and ensure the continuity of services when disasters occur; (3) How to realize the automatic processing of the business; (4) How to monitor the status of the controlled terminal; (5) How to compare the consistency of the business database between the two places.
[0025] 对于上述技术问题(1 ) , 所采用的技术手段是: 通过管理中 心和各个受控端之间的消息收发机制 (使用了 Tuxedo的 /Q可靠消息 队列) , 来实现对所有业务主机的控制。  [0025] With respect to the above technical problem (1), the technical means adopted is: Implementing a service host for all service hosts through a messaging mechanism between the management center and each controlled terminal (using Tuxedo/Q reliable message queue) control.
[0026] 对于上述技术问题(2 ) , 所采用的技术手段是: 针对运营的 每一套业务系统建立一个对应的异地切换和回切流程, 并且与数据追 补模块相结合, 前者保证灾难发生时业务的可持续性, 后者保证灾难 发生时数据的完整性。 [0026] For the above technical problem (2), the technical means adopted is: establishing a corresponding off-site handover and switchback process for each set of operational systems of operation, and tracking with data The combination of complementary modules ensures the sustainability of the business in the event of a disaster, while the latter guarantees the integrity of the data in the event of a disaster.
[0027] 对于上述技术问题(3 ) , 所采用的技术手段是: 由于管理中 心可实现对所有业务主机的管理和控制, 所以业务系统的日常作业可 实现自动化, 即通过管理中心发送固定流程的作业指令来实现; 另外 对于特定的一些业务需求, 也可以由业务人员自己定义一套任意流程 来实现, 其方式灵活多样。  [0027] For the above technical problem (3), the technical means adopted is: Since the management center can realize management and control of all service hosts, the daily operations of the business system can be automated, that is, the fixed process is sent through the management center. Job instructions are used to implement; in addition, for certain business needs, the business personnel can also define a set of arbitrary processes to achieve them in a flexible manner.
[0028] 对于上述技术问题(4 ) , 所釆用的技术手段是: 两台管理中 心上, 分别部署对应的 tuxedo的 WSL服务, 同时在每台受控端上, 设置 WSNADDR环境变量, 环境变量的值为 tuxedo服务端发布的 WSL服务对应的地址( ip地址: 端口号) , 用于 tuxedo客户端程序 (受控应用端程序 )连接到 tuxedo服务端。 如果连接失败, 则间隔 30秒后, 重新连接。 同时, 受控端会定时向管理中心发送心跳消息, 管理中心据此来判断受控端的状态是否正常。  [0028] For the above technical problem (4), the technical means used are: deploying the corresponding Tuxedo WSL service on each of the two management centers, and setting the WSNADDR environment variable, environment variable on each controlled end. The value is the address corresponding to the WSL service published by the tuxedo server (ip address: port number), and is used by the tuxedo client program (controlled application program) to connect to the tuxedo server. If the connection fails, reconnect after 30 seconds. At the same time, the controlled end periodically sends a heartbeat message to the management center, and the management center determines whether the state of the controlled end is normal.
[0029] 对于上述技术问题(5 ) , 所采用的技术手段是: 通过数据比 较模块, 可以对两地业务数据库中的任意一张表或者一个表集合(多 张表)进行比较, 比较方式多样, 有 1】对表记录数的比较; 2】对表 中某些字段的比较; 3】对表进行 MD5算法的比较; 通过这些数据比 较方式, 可以精确查找出本地和异地业务数据库是否一致, 不一致则 可以告诉用户其差异性在哪里。 [0029] With respect to the above technical problem (5), the technical means adopted is: through the data comparison module, it is possible to compare any one of the two business databases or one table set (multiple tables), and the comparison manner is diverse. , 1] compares the number of records on the table; 2) compares some fields in the table; 3) compares the MD5 algorithm on the table; through these data comparison methods, it can accurately find out whether the local and remote business databases are consistent. Inconsistent can tell the user where the difference is.
[0030] 综上所述, 本发明的针对 IT系统灾难恢复的集成平台能够实 现各个业务主机之间的实时通信, 能够将数据恢复和业务切换统一到 业务流程中, 因此,本发明能够提供一种在异地分布式 IT系统发生灾 难时将硬件恢复、数据恢复、业务恢复三者有效结合的针对 IT系统灾 难恢复的集成平台。 [0030] In summary, the integrated platform for disaster recovery of an IT system of the present invention can realize real-time communication between various service hosts, and can integrate data recovery and service switching into a business process. Therefore, the present invention can provide a A disaster in a distributed IT system An integrated platform for disaster recovery of IT systems that combines hardware recovery, data recovery, and business recovery.
[0031]  [0031]
附图说明 DRAWINGS
[0032] 图 1是表示本发明的针对 IT系统灾难恢复的集成平台所集中 管理异地分布式 IT系统的构造示意图。 1 is a schematic structural diagram showing centralized management of an off-site distributed IT system for an integrated platform for IT system disaster recovery according to the present invention.
[0033] 图 2是表示本发明的针对 IT系统灾难恢复的集成平台的构造 示意图。  2 is a schematic diagram showing the construction of an integrated platform for disaster recovery of an IT system of the present invention.
[0034] 图 3是表示本发明的针对 IT系统灾难恢复的集成平台进行的 数据存储、 数据同步处理的示意图。  3 is a schematic diagram showing data storage and data synchronization processing for an integrated platform for IT system disaster recovery according to the present invention.
[0035] 图 4是表示本发明的针对 IT系统灾难恢复的集成平台管理下 的单元模块即受控端的处理流程。  4 is a flow chart showing the processing of a unit module, that is, a controlled end, under the integrated platform management for IT system disaster recovery according to the present invention.
[0036]  [0036]
具体实施方式 detailed description
[0037] 下面介绍的是本发明的多个实施例中的一些, 旨在提供对本发 明的基本了解。 并不旨在确认本发明的关键或决定性的要素或限定所 要保护的范围。 [0037] The following are some of the various embodiments of the invention, which are intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or the scope of the invention.
[0038] 图 1是表示本发明的针对 IT系统灾难恢复的集成平台所集中 管理异地分布式 IT系统的构造示意图。 如图 1所示, 该异地分布式 IT系统具备多个本地业务主机(在本地具备业务主机 1、 业务主机 2、 业务主机 3..... ) 、 多个异地业务主机(在异地具备业务主机 4、 业务 主机 5、 业务主机 6..... ) 、 管理本地业务主机和异地业务主机的两台 管理中心。 本地业务主机、 异地业务主机以及两台管理中心通过通讯 线路相关联。 其中, 两台管理中心管理功能相同、 互为被封。 该异地 分布式 IT系统包括上述的所有业务主机和管理中心。在本发明中将会 提到的 "受控端" 是部署在所有业务主机上的一个模块。 这样对本发 明的本发明的针对 IT系统灾难恢复的集成平台来说,所有的业务主机 都可以接受来自管理中心的指令而进行相关的操作(因此, 这里也可 以将部署了该单元模块的主机简称为 "受控端" ) 。 1 is a schematic structural diagram showing centralized management of an off-site distributed IT system for an integrated platform for IT system disaster recovery according to the present invention. As shown in FIG. 1 , the remote distributed IT system has multiple local service hosts (having a service host 1 , a service host 2 , a service host 3 . . . ) in the local area, and multiple off-site service hosts (having services in different places) Host 4, service host 5, service host 6...), two management centers that manage local service hosts and off-site service hosts. The local service host, the off-site service host, and the two management centers are connected through communication lines. Among them, the two management centers have the same management functions and are sealed each other. The off-site The distributed IT system includes all of the above business hosts and management centers. The "controlled end" that will be mentioned in the present invention is a module deployed on all service hosts. Thus, for the integrated platform for disaster recovery of IT systems of the present invention of the present invention, all service hosts can accept instructions from the management center to perform related operations (thus, the host that deploys the unit module can also be referred to herein as abbreviated. For "controlled end").
[0039] 图 2是表示本发明的针对 IT系统灾难恢复的集成平台的构造 示意图。  2 is a schematic diagram showing the construction of an integrated platform for disaster recovery of an IT system of the present invention.
[0040] 如图 2所示, 本发明的针对 IT系统灾难恢复的集成平台包括: 用于实时监控和管理所述异地分布式 IT系统以使得所述管理中心能 够实时获取各业务主机的信息的系统管理模块 100; 部署于所述各本 地业务主机、 各异地业务主机、 管理中心并且用于实现各本地业务主 机与管理中心间的通信、 各异地业务主机与管理中心间的通信、 以及 管理中心之间的通信的系统通信模块 200; 用于实现所述异地分布式 IT系统中的数据实时同步的数据同步模块 300; 用于实现所述异地分 布式 IT系统中数据的一致性验证的数据比较分析模块 400; 用于实现 所述异地分布式 IT系统中的数据存储的数据存储模块 500; 用于实现 所述异地分布式 IT系统中的各类业务流程的业务流程模块 600; 用于 在所述异地分布式 ΓΤ系统发生灾难的情况下实现异地业务流程的接 管的业务恢复模块 700; 用于对各业务主机和各管理中心之间的消息 的接收和发送进行加密、 解密的安全审计模块 800。  [0040] As shown in FIG. 2, the integrated platform for disaster recovery of an IT system of the present invention includes: real-time monitoring and managing the remote distributed IT system to enable the management center to acquire information of each service host in real time. The system management module 100 is deployed in each of the local service hosts, the off-site service hosts, and the management center, and is used to implement communication between the local service hosts and the management center, communication between the different service hosts and the management center, and the management center. a system communication module 200 for communication between the data synchronization module 300 for realizing real-time synchronization of data in the remote distributed IT system; data comparison for realizing consistency verification of data in the remote distributed IT system An analysis module 400; a data storage module 500 for implementing data storage in the remote distributed IT system; a business process module 600 for implementing various types of business processes in the remote distributed IT system; A business recovery model for realizing the takeover of off-site business processes in the event of a disaster in a distributed distributed system 700; for encrypting messages sent and received between the hosts and the respective service management center, the decryption module 800 of the security audit.
[0041] 系统管理模块 100、数据同步模块 300、数据比较分析模块 400、 数据存储模块 500、 业务流程模块 600、 业务恢复模块 700、 安全审计 模块 800均与系统通信模块 200相关联。 数据同步模块 300、 数据比 较分析模块 400、 数据存储模块 500、 业务流程模块 600均与业务恢 复模块 ( 700 )关联。 [0041] The system management module 100, the data synchronization module 300, the data comparison analysis module 400, the data storage module 500, the business process module 600, the service recovery module 700, and the security audit module 800 are all associated with the system communication module 200. Data synchronization module 300, data ratio The comparison analysis module 400, the data storage module 500, and the business process module 600 are all associated with the service recovery module (700).
[0042] 系统管理模块 100能够实时监控和管理该异地分布式 IT系统, 通过系统内各关联单元模块与管理中心之间的可靠的消息通讯机制, 使得所述管理中心能够实时获取各业务主机的信息。  [0042] The system management module 100 can monitor and manage the remote distributed IT system in real time, so that the management center can obtain the service host in real time through a reliable message communication mechanism between each associated unit module and the management center in the system. information.
[0043] 系统通信模块 200部署于该异地分布式 IT系统内的各单元模 块和管理中心, 用于实时消息收发, 消息解析、命令执行、 结果反馈, 是实现 IT系统灾难恢复的基础。 管理中心和各单元模块之间通过 WTC接口连接 Tuxedo /Q服务。 WTC是 BEA公司的 WEB支持产品 Weblogic和中间件产品 Tuxedo之间的连接工具, 全称 Weblogic Tuxedo Connector。 WTC使 Weblogic和 Tuxedo之间具有双向访问能 力, Tuxedo也是 BEA公司的中间件产品, Tuxedo/Q部件能够以可靠 的方式实现,它允许消息经过排队后存储在持续介盾,如磁盘或非持续 介质如内存中,以便供以后使用。 在本发明中, 管理中心部署了 [0043] The system communication module 200 is deployed in each unit module and management center in the remote distributed IT system, and is used for real-time messaging, message parsing, command execution, and result feedback, and is the basis for realizing disaster recovery of the IT system. The Tuxedo /Q service is connected between the management center and each unit module through the WTC interface. WTC is the connection tool between BEA's WEB support product Weblogic and middleware product Tuxedo, the full name of Weblogic Tuxedo Connector. WTC provides two-way access between Weblogic and Tuxedo. Tuxedo is also a middleware product from BEA. Tuxedo/Q components can be implemented in a reliable manner, allowing messages to be queued and stored in persistent shields such as disk or non-persistent media. Such as in memory, for later use. In the present invention, the management center is deployed
Weblogic平台和 java应用,各单元模块(即各受控端)上部署了 Tuxedo 的 /Q可靠消息队列, 用来接收管理中心发来的消息命令 ->执行- >将结 果消息存入响应队列。 而管理中心的 Weblogic与各单元模块之间的 Tuxedo之间的通信使用的就是 WTC接口。 On the Weblogic platform and the Java application, Tuxedo's /Q reliable message queue is deployed on each unit module (that is, each controlled terminal) to receive the message command sent by the management center -> Execute -> to store the result message in the response queue. The communication between Weblogic in the management center and Tuxedo between each unit module uses the WTC interface.
[0044] 管理中心发送相关的命令消息, 并接收执行的返回结果消息。 各单元模块接收管理中心的命令消息, 并发送执行的返回结果消息。 当命令的执行时间超长或网络故障的情况下, Tuxedo /Q能够提供可 靠的消息服务, 保证了消息传递的完整性。 这样一种机制, 提供了比 tpacall()更加灵活同时更可靠的异步执行方法,满足了异地分布式系统 的需要。 因此, 在本发明中通过在管理中心和各个受控端之间采用[0044] The management center sends the relevant command message and receives the executed return result message. Each unit module receives the command message of the management center and sends a return result message of the execution. When the execution time of the command is too long or the network is faulty, Tuxedo /Q can provide reliable message service and ensure the integrity of the message delivery. Such a mechanism provides a more flexible and reliable asynchronous execution method than tpacall(), which satisfies the remote distributed system. Need. Therefore, in the present invention, by adopting between the management center and each controlled terminal
Tuxedo /Q, 能够对异地分布式系统继续集中管理和控制。 Tuxedo /Q, able to continue centralized management and control of off-site distributed systems.
[0045] 数据同步模块 300、数据比较分析模块 400、数据存储模块 500 共同构建了异地分布式系统内数据一致性的保证, 数据同步模块 300 和数据模块 500存储用于实现系统内异地分布的业务系统数据的实时 同步, 数据比较分析模块 400用于异地分布的业务系统数据一致性的 验证并能根据差异性进行数据定位和分析, 进行相关的数据追补。 数 据同步模块 300、 数据比较分析模块 400、 数据存储模块 500为业务 恢复模块 600的^ 5出。 [0045] The data synchronization module 300, the data comparison analysis module 400, and the data storage module 500 jointly construct a guarantee of data consistency in a remote distributed system, and the data synchronization module 300 and the data module 500 store services for realizing off-site distribution within the system. The real-time synchronization of the system data, the data comparison analysis module 400 is used for verifying the consistency of the data distribution of the business system distributed in different places and performing data positioning and analysis according to the difference, and performing related data recovery. The data synchronization module 300, the data comparison analysis module 400, and the data storage module 500 are the output of the service recovery module 600.
[0046] 业务流程模块 600要用于有效实现异地分布系统的各项业务流 程, 流程信息基于流程、 步骤、 功能、 组合功能等基本元素, 采用有 序的功能步骤的形式定义在数据库中, 并可通过脚本定制修改。 管理 中心程序读出流程信息并解释执行, 完成流程业务功能的执行, 实现 系统内固定的日常业务流程, 这些业务流程统称为固定流程。 此外, 为应付处理一些临时的系统要求, 如设备更换、 线路维护、 故障处理 等, 需要随机执行一系列必要的业务功能, 由此衍生出任意流程的功 能, 支持对于所定义的相关特殊功能的选择执行, 是对于固定业务流 程的补充, 是一种很灵活的业务系统的控制方式。  [0046] The business process module 600 is to be used to effectively implement various business processes of the off-site distribution system, and the process information is defined in the database in the form of ordered functional steps based on basic elements such as processes, steps, functions, and combined functions, and Custom modifications can be made through scripts. The management center program reads the process information and interprets the execution, completes the execution of the process business functions, and implements the daily business processes fixed in the system. These business processes are collectively referred to as fixed processes. In addition, in order to cope with some temporary system requirements, such as equipment replacement, line maintenance, fault handling, etc., it is necessary to randomly perform a series of necessary business functions, thereby deriving the functions of any process, supporting the relevant special functions defined. Selective execution is a supplement to the fixed business process and is a very flexible way of controlling the business system.
[0047] 业务恢复模块 700基于数据一致性的保证,通过业务流程的模 块展现, 确保了异地分布式系统在发生灾难等异常情况时, 可快速实 现异地业务流程接管,保证了数据和业务的持续性。业务恢复模块 700 包含下述子模块: 用于启动异地的业务系统应用和数据库的第一子模 块; 用于获取灾难切换的时间的第二子模块; 用于执行网络切换的第 三子模块。 举个例子: 比如上海中心的转接业务系统发生故障的情况 下, 需要马上切换到北京中心, 此时业务恢复模块 700的第一子模块 会启动异地的转接系统应用和数据库, 并判断是否具备切换条件, 业 务恢复模块 700的第二子模块获取灾难切换的时间点(以备后续进行 数据的追补) ,业务恢复模块 700的第三子模块执行网络切换等工作。 [0047] Based on the guarantee of data consistency, the service recovery module 700 ensures that the remote distributed system can quickly implement the off-site business process takeover in the event of a disaster or the like, thereby ensuring the continuity of data and services. Sex. The service recovery module 700 includes the following sub-modules: a first sub-module for initiating a remote business system application and database; a second sub-module for obtaining time for disaster switching; Three sub-modules. For example, if the transit service system of the Shanghai Center fails, you need to switch to the Beijing Center immediately. At this time, the first sub-module of the service recovery module 700 will start the remote transfer system application and database, and determine whether With the switching condition, the second sub-module of the service recovery module 700 acquires the time point of the disaster handover (for subsequent data recovery), and the third sub-module of the service recovery module 700 performs network switching and the like.
[0048] 如果对这些流程进行分解的话, 流程中的每一步实质上就是对 某台业务主机的控制, 自动完成管理中心发出的操作指令, 此流程执 行完后, 对用户来说都是透明的, 而实际的交易处理地点已经由上海 变成了北京, 保证了业务的持续性。 当上海中心业务系统恢复后, 还 对应一套业务回切流程, 回切后会将交易正常送至上海处理。 并且数 据追补流程会将切换这段时间在北京处理的交易追补到上海中心。 [0048] If these processes are decomposed, each step in the process is essentially the control of a certain business host, and automatically completes the operation instructions issued by the management center. After the execution of the process, the process is transparent to the user. The actual transaction processing location has changed from Shanghai to Beijing, ensuring the continuity of the business. When the Shanghai Central Business System is restored, it also corresponds to a set of business re-cutting process. After the switchback, the transaction will be sent to Shanghai for processing. And the data recovery process will track the transactions processed in Beijing during this period to the Shanghai Center.
[0049] 安全审计模块 800是为避免管理中心和各个单元模块之间消息 的明文传输, 对消息增加了加密设置, 通过 WSL插入消息的过程中 使用 -Z 的参数进行 rsa加密设置。而且,在消息的接收和发送中都以 加密形式进行, 消息收到后再自动解密, 这样能够保证数据传递的安 全。 同时, 管理中心采用统一的 LDAP ( Lightweight Directory Access Protocol )服务器进行身份认证。 操作员的权限设置信息同样取自 LDAP服务器, 在进行各项功能操作时首先检查相关授权。 只有授权 用户才能执行各项业务流程的功能。 另外, 安全审计模块 800还对登 录信息、 操作日志、 流程执行进行记录和审计。 [0049] The security audit module 800 is configured to avoid the plaintext transmission of messages between the management center and each unit module, and adds encryption settings to the message, and uses the -Z parameter to perform rsa encryption setting during WSL insertion of the message. Moreover, the message is received and sent in encrypted form, and the message is automatically decrypted after it is received, thus ensuring the security of data transmission. At the same time, the management center uses a unified LDAP (Lightweight Directory Access Protocol) server for identity authentication. The operator's permission setting information is also taken from the LDAP server, and the related authorization is checked first when performing various function operations. Only authorized users can perform the functions of each business process. In addition, the security audit module 800 also records and audits login information, operational logs, and process execution.
[0050] 将图 2所示本发明的针对 IT系统灾难恢复的集成平台对图 1 所示的异地分布式 Γ系统进行集中管理的情况下,对于本地和异地部 署的业务主机上都有受控端, 同时本地和异地各自部署管理中心, 每 个管理中心均同本地和异地的所有业务主机实现通信, 以达到系统管 理、 业务实现和恢复的目的。 [0050] In the case where the integrated platform for disaster recovery of the IT system of the present invention shown in FIG. 2 centrally manages the remote distributed system shown in FIG. 1, it is controlled on the local and off-site deployed service hosts. End, local and remote deployment of each management center, each Each management center communicates with all business hosts in the local and remote locations to achieve system management, service implementation and recovery.
[0051] 图 3是表示本发明的针对 IT系统灾难恢复的集成平台进行的 数据存储、数据同步处理的示意图。如图 3所示, 本发明的 IT系统灾 难恢复的集成平台本地业务系统中部署了一套镜像存储, 实现了本地 主业务系统之间的数据同步复制并且数据为双向复制。 在本地镜像存 储与异地存储之间实现了数据异步复制并且数据为单向复制。 这样的 数据同步机制保证了当本地业务系统或数据发生灾难时, 能迅速在异 地实现业务恢复, 且数据不会丢失。 当本地系统恢复业务后, 又可以 将异地数据回补到本地数据库中。  3 is a schematic diagram showing data storage and data synchronization processing for an integrated platform for IT system disaster recovery according to the present invention. As shown in FIG. 3, a set of image storage is deployed in the local service system of the integrated platform for disaster recovery of the IT system of the present invention, and data synchronous replication between the local main service systems is realized and the data is bidirectionally replicated. Data asynchronous replication is achieved between local mirror storage and offsite storage and the data is one-way replicated. Such a data synchronization mechanism ensures that when a local business system or data disaster occurs, service recovery can be quickly performed in a different place, and data is not lost. After the local system resumes service, it can replenish the offsite data to the local database.
[0052] 关于 "数据回补" 可以这样理解, 例如, 继续前面在描述 "业 务恢复模块 700" 时提到的一个例子: 当本地业务系统(上海)需要 切换北京时, 会记录切换的时间点 TO, 切换后, 所有的交易实际已转 换到北京中心进行处理。 当上海中心恢复业务后, 会执行对应业务系 统的回切流程, 同时也会记录切换的时间点 Tl,后续就会执行数据追 补流程, 由于 T1-T0的时间差就是交易在北京中心处理的时间段。 所 以数据追补流程此时会启动, 北京管理中心会发出指令, 从北京中心 的交易数据库中读取这段时间的数据(也就是异地数据) , 将这段数 据通过北京至上海的光纤网络传到上海管理中心, 然后上海管理中心 会将这些数据插入到对应的业务数据库中。 这样, 不论对于业务系统 还是用户来说, 交易数据都是完整的, 就像没有发生切换一样。  [0052] The "data replenishment" can be understood as follows, for example, continuing an example mentioned earlier in describing the "service recovery module 700": When the local service system (Shanghai) needs to switch to Beijing, the time point of the handover is recorded. After TO, after switching, all transactions have actually been transferred to the Beijing Center for processing. When the Shanghai Center resumes its business, it will execute the back-cutting process of the corresponding business system. At the same time, it will also record the switching time point Tl, and the data tracking process will be executed later. The time difference of T1-T0 is the time period during which the transaction is processed in the Beijing Center. . Therefore, the data recovery process will start at this time, the Beijing Management Center will issue instructions to read the data from the Beijing Central's transaction database (that is, off-site data), and pass this data through the optical network from Beijing to Shanghai. Shanghai Management Center, then Shanghai Management Center will insert this data into the corresponding business database. In this way, the transaction data is complete for both the business system and the user, just as no switching occurs.
[0053] 图 4是表示本发明的针对 IT系统灾难恢复的集成平台管理下 的单元模块即受控端对消息收发的处理流程 (也就是系统通信模块 200进行的具体流程)。 如图 4所示, 在一个单元模块, 首先, 进程 初始化, 分配空间, 并生成一个带头节点的链表。 两台管理中心的服 务器上, 分别部署对应的 tuxedo的 WSL服务, 同时在每台客户端服 务器上, 设置 WSNADDR环境变量, 环境变量的值为 tuxedo服务端 发布的 WSL服务对应的地址(ip地址: 端口号) , 用于 tuxedo客户 端程序(受控应用端程序)连接到 tuxedo服务端。 如果连接失败, 则 间隔 30秒后, 重新连接。 4 is a flow chart showing the processing of the unit module, that is, the controlled end to the message transmission and reception under the integrated platform management for the disaster recovery of the IT system of the present invention (that is, the system communication module) 200 specific process). As shown in Figure 4, in a unit module, first, the process initializes, allocates space, and generates a linked list of leading nodes. On the servers of the two management centers, respectively deploy the corresponding WUX service of tuxedo, and set the WSNADDR environment variable on each client server. The value of the environment variable is the address corresponding to the WSL service issued by the tuxedo server (ip address: Port number) for the tuxedo client program (controlled application program) to connect to the tuxedo server. If the connection fails, reconnect after 30 seconds.
[0054]链表主要用于存放当前正在执行的执行进程的状态信息,每个 节点的内容包括进程号、 消息功能号、 消息唯一性标记、参数值集合、 进程开始执行的时间、 以及该节点是否可用的标志 (0为可用, 1为 不可用) 。 当执行进程处理完成后, 主进程会把该执行进程对应的链 表中的节点信息清空, 并把可用性标志置为 0, 供以后使用。  [0054] The linked list is mainly used to store state information of the currently executing execution process, and the content of each node includes a process number, a message function number, a message uniqueness flag, a parameter value set, a time when the process starts executing, and whether the node is Available flags (0 is available, 1 is not available). After the execution process is completed, the main process will clear the node information in the linked list corresponding to the execution process, and set the availability flag to 0 for later use.
[0055] 判断消息的有效性, 主要是对命令消息中的应用系统的校验标 识(system ) 、 功能号 (ftinc— id ) 、 IP地址( ip ) 、 时间 (time ) 、 消息类型 (type )等的值(value )进行有效性判断。  [0055] determining the validity of the message, mainly the verification identifier (system), function number (ftinc_id), IP address (ip), time (time), and message type (type) of the application system in the command message. The value of (equal) is evaluated for validity.
[0056] 功能处理脚本进行功能操作处理时, 会根据不同的情况, 对是 否处理, 该如何处理等情况进行判断确定, 避免不必要已经错误的操 作, 返回对应的值。 返回值为 0表示功能操作处理成功, 非 0表示失 败。  [0056] When the function processing script performs the function operation processing, it judges whether or not the processing is performed according to different situations, how to deal with the situation, and the like, and avoids unnecessary operations that have been erroneously returned, and returns the corresponding value. A return value of 0 indicates that the function operation was processed successfully, and a non-zero indicates a failure.
[0057] 当接收到的消息是中断消息时, 主进程给对应的执行进程发送 中断信号, 执行进程接收到中断信号后, 停止循环, 不再进行后面的 操作。  [0057] When the received message is an interrupt message, the main process sends an interrupt signal to the corresponding execution process, and after the execution process receives the interrupt signal, the loop is stopped, and the subsequent operations are not performed.
[0058] 图 4中, 左边部分为主进程的执行流程。 主进程是一个循环程 序, 主要完成发送心跳信息, 接受消息并判断消息的有效性, 根据消 息内容生产对应的执行进程, 并管理正在执行的执行进程的操作。 [0058] In FIG. 4, the left part is the execution flow of the main process. The main process is a cyclic process The sequence mainly completes sending heartbeat information, accepts the message and judges the validity of the message, generates a corresponding execution process according to the content of the message, and manages the operation of the executing process being executed.
[0059]根据上述的本发明的针对 IT系统灾难恢复的集成平台管理, 能够提供快速、 简单、 有效的灾难恢复机制, 在设计目标上达到了 RPO=0, RTO=0, 在实际灾难发生时, 也能在最短时间内提供持续的 业务服务。 在灾难恢复方面, 目前业界公认有三个目标值得努力。 一 是恢复时间, 企业能忍受多长时间没有 IT, 处于停业状态; 二是网络 多长时间能够恢复; 三是业务层面的恢复。 整个恢复过程中, 最关键 的衡量指标有两个: 一个是 RTO, 另一个是 RPO。 所谓 RTO ( Recovery Time Objective )是指灾难发生后,从 IT 系统当机导致业 务停顿之时开始, 到 IT 系统恢复至可以支持各部门运作、 恢复运营 之时, 此两点之间的时间段称为 RTO。 所谓 RPO ( Recovery Point Objective )是指从系统和应用数据而言, 要实现能够恢复至可以支持 各部门业务运作, 系统及生产数据应恢复到怎样的更新程度。 这种更 新程度可以是上一周的备份数据, 也可以是上一次交易的实时数据。 可见,本发明的针对 IT系统灾难恢复的集成平台管理能够在发生灾难 时能在最短时间内提供持续的业务服务。  [0059] According to the above-mentioned integrated platform management for IT system disaster recovery, it can provide a fast, simple and effective disaster recovery mechanism, and achieves RPO=0, RTO=0 in the design goal, in the event of an actual disaster , can also provide continuous business services in the shortest time. In terms of disaster recovery, the industry currently recognizes three goals worth working. First, the recovery time, how long the enterprise can endure without IT, is in a state of suspension; the second is how long the network can recover; the third is the recovery at the business level. There are two most critical metrics throughout the recovery process: one is RTO and the other is RPO. The so-called RTO (Recovery Time Objective) refers to the time period between the two points after the disaster occurs, when the IT system crashes and the business is stopped, and when the IT system is restored to support the operation of each department and resume operations. For RTO. The so-called RPO (Recovery Point Objective) refers to the degree to which the system and production data should be restored to the extent that the system and application data can be restored to support the business operations of each department. This update can be either the last week's backup data or the real-time data from the previous transaction. It can be seen that the integrated platform management for IT system disaster recovery of the present invention can provide continuous business services in the shortest time in the event of a disaster.
[0060] 而且, 根据本发明的针对 IT系统灾难恢复的集成平台, 受控 端 (即各单元模块) 能够自动维护保持与服务器端的可靠连接、 保持 充足的可运行的健壮性。  Moreover, according to the integrated platform for IT system disaster recovery of the present invention, the controlled end (ie, each unit module) can automatically maintain a reliable connection with the server side and maintain sufficient operational robustness.
[0061] 而且, 才 据本发明的针对 IT系统灾难恢复的集成平台, 能够 对所有部署的受控端 (即各单元模块) 的运行状态进行有效的监控, 对于各种业务流程的运行状态提供有效的监控, 同时, 能够对于可配 置的参数提供管理维护方式。 [0061] Moreover, according to the integrated platform for disaster recovery of the IT system of the present invention, the running state of all deployed controlled ends (ie, each unit module) can be effectively monitored, and the operating states of various business processes are provided. Effective monitoring, at the same time, capable of matching The set parameters provide management and maintenance methods.
[0062] 而且, 才艮据本发明的针对 IT系统灾难恢复的集成平台, 对于 业务流程可实现灵活配置和组合, 比如支持通过参数化配置来应对业 务功能的一般性变化; 对于流程执行中出现的错误, 在流程中提供异 常处理的功能, 实现对于异常的有效处理。  [0062] Moreover, according to the integrated platform for disaster recovery of IT systems according to the present invention, flexible configuration and combination can be implemented for business processes, such as supporting general changes in business functions through parameterized configuration; The error, the function of exception handling is provided in the process, and the effective processing of the exception is realized.
[0063] 而且, 为了保证异地分布式系统的 RTO、 RPO的性能要求, 本发明的针对 IT系统灾难恢复的集成平台通过设计实现各种业务流 程, 完成日常、 计划内、 以及灾难情况下的流程控制。 固定业务流程 和任意功能流程的实现是灾备应用系统提供的核心功能。 为有效实现 各项业务流程, 流程信息基于流程、 步骤、 功能、 组合功能等基本元 素, 采用有序的功能步骤的形式定义在数据库中, 并可通过脚本定制 修改。 管理中心程序读出流程信息并解释执行, 完成流程业务功能的 执行, 这些业务流程统称为固定流程。 另外为应付处理一些临时的业 务系统要求, 如设备更换、 线路维护、 故障处理等, 需要随机执行一 系列必要的业务功能。  Moreover, in order to ensure the performance requirements of the RTO and the RPO of the remote distributed system, the integrated platform for the disaster recovery of the IT system of the present invention is designed to implement various business processes, and complete the processes in daily, planned, and disaster situations. control. The implementation of fixed business processes and any functional processes is a core feature provided by disaster recovery applications. In order to effectively implement various business processes, process information is based on basic elements such as processes, steps, functions, and combined functions. It is defined in the form of ordered functional steps in the database, and can be modified by script customization. The management center program reads out the process information and interprets the execution, and completes the execution of the process business functions, which are collectively referred to as fixed processes. In addition, in order to cope with some temporary business system requirements, such as equipment replacement, line maintenance, and fault handling, a series of necessary business functions need to be performed randomly.
[0064] 以上例子主要说明了本发明针对 IT系统灾难恢复的集成平台。 尽管只对其中一些本发明的具体实施方式进行了描述, 但是本领域普 通技术人员应当了解, 本发明可以在不偏离其主旨与范围内以许多其 他的形式实施。 因此, 所展示的例子与实施方式被视为示意性的而非 限制性的, 在不脱离如所附各权利要求所定义的本发明精神及范围的 情况下, 本发明可能涵盖各种的修改与替换。  [0064] The above examples mainly illustrate the integrated platform of the present invention for IT system disaster recovery. Although only a few of the specific embodiments of the present invention have been described, it will be understood by those skilled in the art that the invention may be practiced in many other forms without departing from the spirit and scope. Accordingly, the present invention is to be construed as illustrative and not restrictive, and the invention may cover various modifications without departing from the spirit and scope of the invention as defined by the appended claims With replacement.

Claims

权利 要求 书 Claim
1. 一种针对 IT系统灾难恢复的集成平台, 用于对异地分布式 IT 系统进行集中管理,该异地分布式 IT系统具备多个本地业务主机、多 个异地业务主机、 管理本地业务主机和异地业务主机的两台管理中 心, 该集成平台包括: 1. An integrated platform for disaster recovery of IT systems for centralized management of remote distributed IT systems with multiple local service hosts, multiple off-site service hosts, managed local service hosts, and off-site Two management centers for business hosts, including:
系统管理模块(100 ) , 用于实时监控和管理所述异地分布式 IT 系统, 以使得所述管理中心能够实时获取各业务主机的信息;  a system management module (100), configured to monitor and manage the remote distributed IT system in real time, so that the management center can obtain information of each service host in real time;
系统通信模块(200 ) , 部署于所述各本地业务主机、 各异地业务 主机、管理中心,并且用于实现各本地业务主机与管理中心间的通信、 各异地业务主机与管理中心间的通信、 以及管理中心之间的通信; 数据同步模块(300 ) , 用于实现所述异地分布式 IT系统中的数 据实时同步;  The system communication module (200) is deployed in each of the local service hosts, the different service hosts, and the management center, and is used to implement communication between the local service hosts and the management center, and communication between the remote service hosts and the management center. And a communication between the management centers; a data synchronization module (300), configured to implement real-time data synchronization in the remote distributed IT system;
数据比较分析模块(400 ) , 用于实现所述异地分布式 IT系统中 数据的一致性验证;  a data comparison analysis module (400), configured to implement consistency verification of data in the remote distributed IT system;
数据存储模块(500 ) , 用于实现所述异地分布式 IT系统中的数 据存储;  a data storage module (500), configured to implement data storage in the remote distributed IT system;
业务流程模块(600 ) , 用于实现所述异地分布式 IT系统中的各 类业务流程;  a business process module (600), configured to implement various types of business processes in the remote distributed IT system;
业务恢复模块( 700 ) , 用于在所述异地分布式 IT系统发生灾难 的情况下实现异地业务流程的接管;  The service recovery module (700) is configured to implement the takeover of the off-site business process in the event of a disaster of the remote distributed IT system;
安全审计模块(800 ) , 用于对各业务主机和各管理中心之间的消 息的接收和发送进行加密、 解密。  The security audit module (800) is configured to encrypt and decrypt the receiving and sending of messages between the service hosts and the management centers.
2. 如权利要求 1所述的针对 IT系统灾难恢复的集成平台,其特征 在于, 2. The integrated platform for disaster recovery of an IT system according to claim 1, characterized by Yes,
所述系统管理模块(100 ) 、 所述数据同步模块(300 ) 、 所述数 据比较分析模块(400 ) 、 所述数据存储模块(500 ) 、 所述业务流程 模块(600 ) 、 所述业务恢复模块(700 ) 、 所述安全审计模块( 800 ) 均与所述系统通信模块(200 ) 关联,  The system management module (100), the data synchronization module (300), the data comparison analysis module (400), the data storage module (500), the business process module (600), and the service recovery The module (700) and the security audit module (800) are all associated with the system communication module (200).
所述数据同步模块(300 ) 、 所述数据比较分析模块(400 ) 、 所 述数据存储模块(500 ) 、 所述业务流程模块(600 )均与所述业务恢 复模块( 700 )关联。  The data synchronization module (300), the data comparison analysis module (400), the data storage module (500), and the business process module (600) are all associated with the service recovery module (700).
3. 如权利要求 2所述的针对 IT系统灾难恢复的集成平台,其特征 在于,  3. The integrated platform for IT system disaster recovery according to claim 2, wherein
所述系统通信模块用于实现各业务主机和各管理中心之间的消息 收发、 消息解析、 命令执行、 结果反馈。  The system communication module is configured to implement message sending and receiving, message parsing, command execution, and result feedback between each service host and each management center.
4. 如权利要求 3所述的针对 IT系统灾难恢复的集成平台,其特征 在于,  4. The integrated platform for IT system disaster recovery according to claim 3, wherein
所述各管理中心与所述各业务主机中的各单元模块之间通过 WTC接口连接 Tuxedo/Q服务。  Each management center and each unit module in each service host are connected to a Tuxedo/Q service through a WTC interface.
5. 如权利要求 4所述的针对 IT系统灾难恢复的集成平台,其特征 在于,  5. The integrated platform for IT system disaster recovery according to claim 4, wherein
所述安全审计模块对管理中心与业务主机中的各个单元模块之间 的消息的发送和接收通过使用 WSL插入消息进行加密。  The security audit module encrypts the transmission and reception of messages between the management center and each unit module in the service host by using a WSL insertion message.
6. 如权利要求 5所述的针对 IT系统灾难恢复的集成平台,其特征 在于,  6. The integrated platform for IT system disaster recovery according to claim 5, wherein
所述安全审计模块通过 WSL插入消息的过程中使用 - z的产生进 行 rsa力口密设置。 The security audit module uses the generation of -z in the process of inserting a message through WSL Line rsa force secret setting.
7. 如权利要求 6所述的针对 IT系统灾难恢复的集成平台,其特征 在于,  7. The integrated platform for IT system disaster recovery according to claim 6, wherein:
所述数据比较分析模块(400 )能够对本地和异地的业务系统数据 库进行数据一致性比较, 能够对差异性进行定位, 并且能够针对其差 异性进行数据追补。  The data comparison analysis module (400) is capable of performing data consistency comparison between the local and remote business system databases, and is capable of locating the difference and performing data matching for the difference.
8. 如权利要求 7中所述的针对 IT系统灾难恢复的集成平台,其特 征在于,  8. The integrated platform for disaster recovery of an IT system as claimed in claim 7, wherein the integrated platform is characterized in that
所述业务恢复模块( 700 ) 包括:  The service recovery module (700) includes:
用于启动异地的业务系统应用和数据库, 并判断是否具备切换条 件的第一子模块;  Used to start a business system application and database in a different place, and determine whether the first sub-module of the switching condition is available;
用于获取灾难切换的时间点的第二子模块;  a second sub-module for obtaining a point in time of the disaster switch;
用于执行网络切换的第三子模块。  A third sub-module for performing network switching.
9. 如权利要求 1 ~ 8中任意一项所述的针对 IT系统灾难恢复的集 成平台, 其特征在于,  9. The integrated platform for disaster recovery of an IT system according to any one of claims 1 to 8, characterized in that
所述数据同步模块( 300 )用于在本地业务系统中部署镜像存储以 实现本地业务系统之间的数据同步复制。  The data synchronization module (300) is configured to deploy image storage in a local service system to implement synchronous data replication between local service systems.
10. 如权利要求 9所述的针对 IT系统灾难恢复的集成平台, 其特 征在于,  10. The integrated platform for disaster recovery of an IT system according to claim 9, wherein the integrated platform is characterized in that
所述数据同步模块( 300 )用于实现本地镜像存储与异地存储之间 的数据异步复制。  The data synchronization module (300) is configured to implement asynchronous replication of data between local mirror storage and offsite storage.
11. 如权利要求 10所述的针对 IT系统灾难恢复的集成平台,其特 征在于, 所述数据同步模块( 300 )还用于在本地系统恢复业务后将异地数 据回补到本地数据库中。 11. The integrated platform for disaster recovery of an IT system according to claim 10, wherein: The data synchronization module (300) is further configured to replenish offsite data into the local database after the local system recovers the service.
12. 如权利要求 11所述的针对 IT系统灾难恢复的集成平台,其特 征在于,  12. The integrated platform for IT system disaster recovery according to claim 11, wherein:
所述两台管理中心为功能相同并且互为备份。  The two management centers are functionally identical and are backups of each other.
13. 如权利要求 12所述的针对 IT系统灾难恢复的集成平台,其特 征在于,  13. The integrated platform for IT system disaster recovery according to claim 12, wherein:
所述管理中心采用 LDAP服务器镜像身份认证。  The management center uses an LDAP server to mirror identity authentication.
PCT/CN2014/070331 2013-01-15 2014-01-08 Integrated platform for disaster recovery of it system WO2014110994A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310013623.8A CN103929320A (en) 2013-01-15 2013-01-15 Integration platform for IT system disaster recovery
CN201310013623.8 2013-01-15

Publications (1)

Publication Number Publication Date
WO2014110994A1 true WO2014110994A1 (en) 2014-07-24

Family

ID=51147404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/070331 WO2014110994A1 (en) 2013-01-15 2014-01-08 Integrated platform for disaster recovery of it system

Country Status (2)

Country Link
CN (1) CN103929320A (en)
WO (1) WO2014110994A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601678B (en) * 2014-12-31 2018-10-09 江苏中科梦兰电子科技有限公司 A kind of big concurrent blank remote real-time synchronous method
CN105306272B (en) * 2015-11-10 2019-01-25 中国建设银行股份有限公司 Information system fault scenes formation gathering method and system
CN111124696B (en) * 2019-12-30 2023-06-23 北京三快在线科技有限公司 Unit group creation, data synchronization method, device, unit and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1753373A (en) * 2004-09-23 2006-03-29 华为技术有限公司 Remote disaster allowable system and method
CN101118509A (en) * 2007-09-12 2008-02-06 华为技术有限公司 Process, device and system for EMS memory data-base remote disaster tolerance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8321688B2 (en) * 2009-06-12 2012-11-27 Microsoft Corporation Secure and private backup storage and processing for trusted computing and data services

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1753373A (en) * 2004-09-23 2006-03-29 华为技术有限公司 Remote disaster allowable system and method
CN101118509A (en) * 2007-09-12 2008-02-06 华为技术有限公司 Process, device and system for EMS memory data-base remote disaster tolerance

Also Published As

Publication number Publication date
CN103929320A (en) 2014-07-16

Similar Documents

Publication Publication Date Title
US9037844B2 (en) System and method for securely communicating with electronic meters
US10110667B2 (en) System and method for providing data and application continuity in a computer system
US9912473B2 (en) Methods and computer systems with provisions for high availability of cryptographic keys
EP2281240B1 (en) Maintaining data integrity in data servers across data centers
EP3210367B1 (en) System and method for disaster recovery of cloud applications
US20180052902A1 (en) Network partition handling in fault-tolerant key management system
CN106911648B (en) Environment isolation method and equipment
CN104935672A (en) High available realizing method and equipment of load balancing service
CN102546773A (en) Providing resilient services
WO2012145963A1 (en) Data management system and method
WO2014086149A1 (en) Server account number and password management method and system, and server
US20130227568A1 (en) Systems and methods involving virtual machine host isolation over a network
JP4875781B1 (en) Distributed data storage system
WO2014110994A1 (en) Integrated platform for disaster recovery of it system
CN113127499B (en) Block chain-based micro-service method, equipment and medium
CN108600156B (en) Server and security authentication method
KR20160004721A (en) Method for Replicationing of Redo Log without Data Loss and System Thereof
WO2014032532A1 (en) Enum-dns disaster recovery method and system in ims network
WO2016101409A1 (en) Data switching method, device and system
US20120254607A1 (en) System And Method For Security Levels With Cluster Communications
CN112953897B (en) Train control system edge security node implementation method based on cloud computing equipment
CN111490971B (en) General hospital information infrastructure safety operation and maintenance and auditing method
EP2739010B1 (en) Method for improving reliability of distributed computer systems based on service-oriented architecture
CN110019515A (en) Database switching method, device, system and computer readable storage medium
JP2014053754A (en) Validity verification method of certificate, certificate verification server and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14740501

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/11/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 14740501

Country of ref document: EP

Kind code of ref document: A1