CN109901949B

CN109901949B - Application disaster recovery system and method for double-activity data center

Info

Publication number: CN109901949B
Application number: CN201910136324.0A
Authority: CN
Inventors: 朱小珍; 陈雅峰; 梁锦华; 张榕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2019-02-25
Filing date: 2019-02-25
Publication date: 2021-06-18
Anticipated expiration: 2039-02-25
Also published as: CN109901949A

Abstract

The invention discloses an application disaster recovery system and method for a double-activity data center, wherein the system comprises: a first data center deployed at a first site and a second data center deployed at a second site, the data centers of each site comprising: a gateway cluster, a service system and a database; the system comprises a database of a first data center, a database of a second data center, a service system of the first data center, a gateway cluster of the second data center and a database of the second data center, wherein the database of the first data center and the database of the second data center copy data in a bidirectional mode, and the service system of the first data center is connected with the gateway cluster of the second data center and is used for processing a service request received by the gateway cluster of the second data center; and the service system of the second data center is connected with the gateway cluster of the first data center, and is used for processing the service request received by the gateway cluster of the first data center and transmitting the service data bypassed by the information bypass device on the local site to the service system of the local site. The invention stores the service data in different places, thereby realizing the purpose of zero loss of the application data when site-level disaster occurs.

Description

Application disaster recovery system and method for double-activity data center

Technical Field

The invention relates to the field of data center application disaster recovery, in particular to an application disaster recovery system and method of a double-activity data center.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

With the popularization of the internet, each enterprise carries out daily production, operation and management in an information electronization mode, and a large-scale enterprise establishes a data center operation electronization system. Electronic data is the basis of normal operation of enterprises, and in order to prevent the operation of the enterprises from being affected by system faults, the enterprises often establish a plurality of data centers and adopt one-master-slave operation.

Currently, the main data center and standby data center mode is a mode that data of a main site data center is synchronized to a standby site data center by using a system means (for example, database replication or disk replication). Because there is a certain delay in database replication, in the case of a site-level disaster, the delayed service data is not replicated to the backup site, so that the backup site generally loses service data of tens of seconds to several minutes. In order to resist site-level disasters (data of production sites are all lost) of the data center caused by inefficient factors (such as earthquake, tsunami, fire, terrorist attack and the like), enterprises need to consider how to losslessly realize site-level application data synchronization so as to avoid business data loss and influence on normal operation of the enterprises.

Disclosure of Invention

The embodiment of the invention provides an application disaster recovery system of a double-living data center, which is used for solving the technical problem that in the prior art, the data of a station is backed up by adopting a database copying mode, and service data is easily lost due to delay of database copying, and comprises the following steps: a first data center deployed at a first site and a second data center deployed at a second site, the data centers of each site comprising: the system comprises a gateway cluster, a service system, a database, an information bypass device and an automatic account-supplementing device; the system comprises a database of a first data center, a database of a second data center, a service system of the first data center, a gateway cluster of the second data center and a database of the second data center, wherein the database of the first data center and the database of the second data center copy data in a bidirectional mode, and the service system of the first data center is connected with the gateway cluster of the second data center and is used for processing a service request received by the gateway cluster of the second data center; the service system of the second data center is connected with the gateway cluster of the first data center and is used for processing the service request received by the gateway cluster of the first data center;

wherein the database of each site includes: a read-only database and an update database; the service processing result of the service system of the first site after processing the service request received by the gateway cluster of the second site is written into the update database of the first site, and the data in the update database of the first site is copied into the read-only database of the second site; a service processing result obtained after a service request received by a gateway cluster of a first site is processed by a service system of the second site is written into an update database of the second site, and data in the update database of the second site is copied into a read-only database of the first site;

the information bypass device is connected with the gateway cluster of the local site and used for bypassing the service data transmitted by the gateway cluster of the local site; and the automatic account-supplementing device is respectively connected with the information bypass device of the local site and the service system of the local site and is used for transmitting the service data bypassed by the information bypass device on the local site to the service system of the local site.

The embodiment of the present invention further provides an application disaster recovery method for a dual-active data center, which is used to solve the technical problem in the prior art that a database replication mode is adopted to backup site data, and service data is easily lost due to delay in database replication, and the method is applied to a first site or a second site of an application disaster recovery system of the dual-active data center, and includes: receiving a service request uploaded by a gateway cluster from a remote site; processing the service request uploaded by the gateway cluster of the remote site; returning a service processing result to the gateway cluster of the remote site; and storing the service processing result in a database of a local site, and copying the service processing result to a database of a remote site.

The embodiment of the invention also provides computer equipment for solving the technical problem that in the prior art, the data of a station is backed up by adopting a database replication mode, and service data is easily lost due to delay of database replication.

An embodiment of the present invention further provides a computer-readable storage medium, which is used to solve the technical problem in the prior art that a database replication mode is adopted to backup site data, and service data is easily lost due to delay in database replication.

In the embodiment of the invention, the databases of the first site and the second site copy data in a two-way manner, so that a dual-active data center which is backup with each other is realized, the gateway cluster of the first site is connected to the service system of the second site, the gateway cluster of the second site is connected to the service system of the first site, so that the service system of the first site processes the service request received by the gateway cluster on the second site, and the service system of the second site processes the service request received by the gateway cluster on the first site, so that the service data is stored in different places, and the purpose of zero loss of application data when a site-level disaster occurs is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

fig. 1 is a schematic diagram of an application disaster backup system of a main backup data center provided in the prior art;

FIG. 2 is a schematic diagram of an application disaster recovery system of a dual-active data center according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an automatic reconciliation device provided in an embodiment of the present invention;

FIG. 4 is a flowchart of a disaster recovery method applied to a dual-active data center according to an embodiment of the present invention;

fig. 5 is a flowchart of an application disaster recovery method applied to a live data center for business transaction of a bank core system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

As can be seen from the description in the background section of the present application, in order to ensure high availability of services, an enterprise often establishes a data center backup site so as to switch load to the backup site in case of a failure of a primary site. In the prior art, data synchronization on a primary site and a standby site is usually realized by adopting a database replication method.

In order to better understand the technical principle of the implementation of the present invention, before describing the present invention, a disaster recovery system applied to a main/standby data center provided in the prior art is first briefly described.

Fig. 1 is a schematic diagram of an application disaster backup system of a master and backup data center provided in the prior art, and as shown in fig. 1, a data center deployed in a master site and a data center deployed in a backup site include: the system comprises a service system 1, a gateway cluster 2, a terminal system 3 and a database 4; the terminal system 3 is connected with the gateway cluster 2, and the gateway cluster 2 is connected with the service system 1. The terminal system 3 comprises a terminal interface and a terminal server, and is used for receiving service requests initiated by users through various user terminals 7 (such as mobile phones, tablet computers, notebook computers, computers and the like), uploading the service requests to the service system 1 through the gateway cluster 2, and after the service requests are processed by the service system 1, receiving service processing results of the service system 1 through the gateway cluster 2 and returning the service processing results to the users; the gateway cluster 2 is a connection node between the service system and the terminal system, and is used for realizing communication protocol conversion and code system conversion between the service system and the terminal system and uploading a service request from a user terminal to the service system; the service system 1 is the core of enterprise service processing, and is responsible for processing enterprise online services, receives a service request from a user terminal 7 through the gateway cluster 2, and persistently stores a service processing result in a database. The database 4 is connected with the business system 1 and is used for storing business data of enterprises, and if the business system is a core transaction business system of a bank, the database is used for storing important data of the enterprises, such as business details, customers, media, protocols, accounts and the like. The backup site of the data center is the guarantee of high availability of an enterprise business system, and the aim of site-level backup is achieved by copying data of a database of a main site in real time.

It should be noted that the terminal system 3 and the gateway cluster 2 at the standby site may also bear a certain load, so as to improve the utilization rate of the device.

In the prior art, the database of the primary site is generally synchronized to the database of the backup site by adopting an asynchronous replication mode, so that the high performance of the primary site is realized, and the high-efficiency service is provided for users. Under the conditions of high concurrency and large updating amount, the database replication generally has replication delay of tens of seconds to several minutes, so that when unpredictable site-level faults (such as earthquake, tsunami, fire, terrorist attack and the like) occur at a primary site, some service data can be lost at a standby site, the integrity of the service takeover of the standby site is influenced, and further the services of partial users are influenced.

In order to solve the problem of loss of site-level application data, the embodiment of the invention provides an application disaster recovery system of a zero-loss double-activity data center, so as to realize high availability of a site-level service system.

Fig. 2 is a schematic diagram of an application disaster recovery system of a double-active data center provided in an embodiment of the present invention, and as shown in fig. 2, the system includes: a first data center deployed at a first site (site a) and a second data center deployed at a second site (site B), the data centers of each site comprising: a gateway cluster 2, a service system 1 and a database 4; the database 4 of the first data center and the database 4 of the second data center duplicate data in a bidirectional manner, and the service system 1 of the first data center is connected with the gateway cluster 2 of the second data center and is used for processing a service request received by the gateway cluster 2 of the second data center; the service system 1 of the second data center is connected to the gateway cluster 2 of the first data center, and is configured to process the service request received by the gateway cluster 2 of the first data center.

It should be noted that the application disaster recovery system of the dual-active data center shown in fig. 2 may be applicable to both an application disaster recovery scenario of the active/standby data center and an application disaster recovery scenario of the multi-active data center. In an application disaster recovery scenario of the active/standby data center, any one of the first site and the second site shown in fig. 2 may be used as a primary site, and the other one may be used as a standby site; in an application disaster recovery scenario of a multi-active data center, the first site and the second site shown in fig. 2 may be any two sites that are backup for each other in the multi-active data center.

In addition, it should be noted that the business system 1 shown in fig. 2 may be any business system that needs to implement disaster recovery, including but not limited to a core business transaction system of a bank.

Taking a core service transaction system as an example, two data centers of a site a and a site B provide services for users at the same time, wherein when a gateway cluster is connected with the core service transaction system, the gateway cluster is deployed across sites (that is, the gateway cluster of the site a is connected with the core service transaction system of the site B, and the gateway cluster of the site B is connected with the core service transaction system of the site a). All transactions processed by the core service transaction system are forwarded by routing from the gateway cluster, and then transaction requests sent by the site A gateway cluster are sent to the core service transaction system of the site B for processing, and writing processing results are durably written into a database of the site B; and the transaction request sent by the gateway cluster of the site B is sent to a core service transaction system of the site A for processing, and the processing result is written into a database of the site A in a lasting manner.

As can be seen from fig. 2, the service processing result of the service request at the first site (site a) is directly stored in the second site (site B); the service processing result of the service request on the second site (site B) is directly stored in the first site (site A); when a disaster occurs in the first site (site A), the data of the service request on the first site (site A) cannot be influenced; when the second site has a disaster, the service data of the service request on the second site (site B) is not affected. It should be noted that, when a disaster occurs in the first site (site a), the service processing result (processing data of the service request from tens of seconds to several minutes) of the service request on the second site (site B) may not be copied to the second site (site B) in time due to the copying delay, but since the disaster does not occur in the second site, the service system of the second site only needs to process the service request; similarly in the case of a disaster at the second site, it will not be described in detail here.

As can be seen from the above, in the embodiment of the present invention, the databases of the first site and the second site copy data in a bidirectional manner, so as to implement a dual-active data center that is backup for each other, the gateway cluster of the first site is connected to the service system of the second site, and the gateway cluster of the second site is connected to the service system of the first site, so that the service system of the first site processes the service request received by the gateway cluster on the second site, and the service system of the second site processes the service request received by the gateway cluster on the first site, thereby implementing the remote storage of service data, and achieving the purpose of zero loss of application data when a site-level disaster occurs.

In order to automatically recover service data in a disaster recovery environment and implement site-level application disaster recovery, an embodiment of the present invention employs a manner of replaying service requests, and in an optional embodiment, a data center of each site may further include: the information bypass device 5 and the automatic account-supplementing device 6; the information bypass device 5 is connected with the gateway cluster 2 of the local site and is used for bypassing the service data transmitted by the gateway cluster 2 of the local site; and the automatic reimbursement device 6 is respectively connected with the information bypass device 5 of the local site and the service system 1 of the local site, and is used for transmitting the service data bypassed by the information bypass device 5 on the local site to the service system 1 of the local site.

Still taking the core service transaction system as an example, the information bypass device is connected to the gateway cluster of the local site and is responsible for bypassing all communication packets transmitted by the gateway cluster, where the bypass means that the gateway copies the communication data to the information bypass device without affecting the transmission content and path of the transaction. The bypass data includes but is not limited to key information such as service transaction codes, message serial numbers, terminal numbers, service communication areas, transaction processing states, and the like. It should be noted that the information bypass device simultaneously bypasses the request message of the transaction and the return message processed by the core service transaction system, and persists the request message and the return message to the local disk for application reimbursement in a disaster scenario. The automatic account-repairing device is connected with the information bypass device, is not used during daily production operation, and under a disaster scene, an enterprise starts a disaster-preparation account-repairing flow, automatically triggers the information bypass device to intercept a currently collected message and uploads the message to the automatic account-repairing device, the automatic account-repairing device replaying device confirms lost data after necessary processing such as account checking and the like, and transmits the lost transaction message to the core business transaction system to realize transaction redoing and complete account repairing processing.

Because the database replication is delayed for only tens of seconds to several minutes, and only a small amount of data is actually lost, the information bypass device persists the communication message only by storing the latest several minutes, so that, as an optional implementation manner, the information bypass device of each site stores the service data transmitted by the gateway cluster of the local site in a way of circularly writing files. Communication data is stored in a mode of circularly writing files, so that the capacity of a magnetic disk can be saved.

Further, the information bypass device of each site stores the service data transmitted by the gateway cluster of the local site based on a preconfigured service list. Optionally, the preconfigured service list may be a black list or a white list, and the black list is used for recording services that do not need to be bypassed; the white list is used to record the traffic that needs to be bypassed. For different application scenarios, a black list or a white list can be selected. The purpose is to filter the service which does not need the bypass to reduce the data volume of the bypass, reduce the data volume processed by the automatic account-repairing device and improve the efficiency of the account-repairing.

As an optional implementation manner, in this embodiment of the present invention, the database of each site may include: a read-only database and an update database; the service processing result of the service system of the first site after processing the service request received by the gateway cluster of the second site is written into the update database of the first site, and the data in the update database of the first site is copied into the read-only database of the second site; and writing a service processing result obtained after the service system of the second site processes the service request received by the gateway cluster of the first site into an update database of the second site, and copying data in the update database of the second site into a read-only database of the first site. As shown in fig. 2, the update database of the a site is used to store a service processing result of the a site processing the service request on the B site, and is synchronized to the read-only database of the B site in a database replication manner; the update database of the B site is used for storing a service processing result of the B site for processing the service request on the A site, and is synchronized into the read-only database of the A site in a database replication mode.

For a certain core service transaction system, the data can be divided according to the behavior characteristics of the service, and the data can be sent to different gateway clusters according to different characteristics and then sent to the corresponding core service system for data updating. For example, the region where the user is located is taken as a routing basis, and the transaction request of the user is routed to different gateway clusters according to the region where the user is located, and then is routed to the core service transaction system of the corresponding site. As shown in fig. 2, a part of transaction requests are routed to the core service transaction system of the site a by the gateway cluster of the site B for service processing, and service data is written into the update database of the site a and then copied to the read-only database of the site B by a database copying technique; and the other part of the transaction request is uploaded to a core service transaction system of the B site for service processing, the service data is written into the B site to update the database, and the data is copied to the A site read-only database by a database copying technical means.

In the embodiment of the invention, the database is split according to the updating and read-only attributes, so that the site A and the site B can simultaneously accept the service request, and the sites can live and copy each other. The replication of the read-only database and the update database, although with some delay, may also serve query transaction requests that can tolerate transient inconsistencies or historical data.

And the core business transaction system is respectively connected with the gateway cluster and the automatic account-supplementing device. The physical deployment layer of the core service transaction system is in cross deployment with the gateway cluster, so that the core service transaction system of the site A is connected with the gateway cluster of the site B, receives the transaction request of the gateway cluster of the site B, performs service processing, and writes the transaction log into the site A update database. And the core service transaction system of the site B is connected with the gateway cluster of the site A, receives the transaction request of the gateway cluster of the site A, performs service processing, and writes the transaction log into the update database of the site B. The core business transaction system simultaneously supports a client transaction request and a transaction request of automatic account compensation, the client transaction request and the transaction request of the automatic account compensation execute the same transaction main flow, the reuse of transaction resources is realized, the resource investment of enterprises on disaster recovery is reduced, and account compensation marks are recorded in an application log so as to facilitate client transaction or account compensation transaction in a partition mode. For a small number of different processes (such as real-time notification to the client and real-time notification to a third-party system), differentiation processing can be performed according to the reimbursement marks. If the account change notification completed before the disaster is notified to the client, the account change notification for the reimbursement after the disaster is notified to the client again, so as to avoid misunderstanding the client. If the third-party system is informed before the disaster, the third-party system does not need to be informed again after the disaster occurs, so that the third-party system is prevented from repeatedly receiving the account linkage processing notice, and repeated account deduction is avoided.

Taking a site-level disaster (i.e. a major failure that the whole site is unavailable) of the site a as an example, in a scenario where the site a has the site-level disaster, the site B loses a replication delay part of data in the read-only database, the part of data is updated at the site a, and data received by the site a core service transaction system is sourced from a gateway cluster of the site B. Therefore, the B site reads the data lost by the database copying delay, and the automatic compensation processing of the lost data can be realized through the B site information bypass device and the automatic compensation device, so that the zero loss of the data in a site-level disaster scene is realized. And the gateway cluster of the B site redirects the communication connection to the core service transaction system available for the B site through adjusting the communication connection and an automatic switching mechanism, so that the B site takes over the real-time service of the A site. Because the site A and the site B are symmetrically deployed, and the disaster recovery processing mechanism of the site B in the disaster scene is the same as that of the site A in the disaster scene, the embodiment of the invention can realize zero data loss in the disaster recovery scene applied by the double-activity data center, improve the high feasibility of an enterprise service system and improve the continuity and the integrity of enterprise services.

As shown in fig. 3, the automatic reconciliation device 6 at each station may specifically include: the information transmission unit 61 is connected with the information bypass device 5 of the local site, and is used for acquiring service data bypassed by the information bypass device; the disaster recovery reconciliation unit 62 is connected to the information transmission unit 61, and is configured to analyze the service data obtained by the information transmission unit 61, and generate a to-be-reconciled accounting file according to an analysis result, where the to-be-reconciled accounting file includes a service request to be replayed to a service system of a local site; and the replay and bill supplementing unit 63 is connected with the disaster backup reconciliation unit 62 and the service system 1 of the local site respectively, and is used for replaying the service request to the service system 1 of the local site according to the to-be-supplemented account file generated by the disaster backup reconciliation unit.

Still take the core business transaction system as an example, the automatic reconciliation device is responsible for the automatic reconciliation in disaster scenes, an information transmission unit in the automatic reconciliation device is connected with the information bypass device, and according to the production disaster recovery process, the intercepted transaction message information is automatically received from the information bypass device after the disaster recovery is started; the disaster recovery reconciliation unit receives the data from the information transmission unit, analyzes the communication message, acquires the business information of the transaction (such as the transaction processing state, the business transaction code, the business serial number, the application message and other key information), judges whether the transaction is successfully executed at the remote site according to the transaction processing state, and does not need to arrange the reconciliation for the transaction which is failed to be executed. For the transaction which is successfully executed, whether an application log of the site exists is checked according to the key value of the service attribute, if so, the data is not lost, and the transaction processing log is copied from the remote site to the site; if the data is not found, data is lost, and the reimbursement needs to be arranged and output to a file to be reimbursed; and the replay bill complementing unit assembles a message format meeting the core business transaction system access standard through necessary processing (such as eliminating message communication overtime, dynamically adding a complementary account mark in a transaction message and the like) of the replay bill complementing unit according to the to-be-complemented file generated by the disaster preparation and reconciliation unit, simulates a mode of message communication between the gateway cluster and the core business transaction system, and initiates a transaction request to the core business transaction system.

And the core business transaction system allows the execution of the transaction request initiated by the automatic account-supplementing device under the condition of starting the disaster recovery sign, and executes the transaction flow same as the client request, thereby realizing the reuse of transaction resources and reducing the investment of enterprises on the transaction resources. The core business transaction system records the account-supplementing mark in the application log for replaying the transaction request sent by the account-supplementing unit, and is used for enterprise management purposes such as statistics of disaster loss recovery and the like. In the concurrent execution and replay process of the replay bill complementing unit, the replay transaction execution sequence is controlled to be in accordance with the transaction execution sequence before the disaster, so that the failure of redoing caused by the reverse order of transaction is avoided, and the success rate of bill complementing is increased. The replay reimbursement unit controls the transaction not to be repeatedly executed during the replay. And recording a replay log of each transaction initiating redo, and recording the execution state of the core business transaction system on the transaction request so as to provide the execution state for the reconciled statement summarizing unit to use.

Optionally, the automatic reconciliation device at each station may further include: and the security check shielding unit 64 is connected with the service system 1 of the local site and is used for shielding the security check function of the service system of the local site so that the service system of the local site processes the service request for replaying the bill complementing element according to the normal service request processing flow.

For the core business transaction system, in order to avoid the risk of transaction redo, many rule checks are added before the transaction is adjusted for accounting (for example, a business execution time validity check, a counter or terminal serial number check, and a verification for message tamper resistance to control the accounting risk of the enterprise). Because the reimbursement transaction and the client transaction execute the same flow, the security check mechanisms can cause the basic execution of the transaction request sent by the replay reimbursement unit to fail, therefore, in a disaster reimbursement scene, the core business transaction system starts a security check shielding function, and automatically accesses the security check shielding unit when encountering security check for the reimbursement transaction request, thereby realizing the shielding of the security check and ensuring that the reimbursement transaction can be smoothly and continuously executed. The safety check shielding unit 64 is a core unit of the automatic reconciliation device 6, so that the core business transaction system can execute the same flow for the customer transaction and the reconciliation transaction, the compensation processing of transaction lost data through transaction redoing is realized, and the investment of disaster recovery resources of enterprises is saved.

Optionally, the automatic reconciliation device at each station may further include: and the replay failure processing unit 65 is connected with the replay reimbursement bill element 63 and the service system 1 of the local site and is used for determining the service request with the replay failure according to the replay log of the replay reimbursement bill element and the application log of the service system of the local site. For the core business transaction system, the transaction replay condition is analyzed by the replay failure processing unit 65, and for a small number of transactions with replay failures, the debit account involved in the transaction is arranged to be locked, so that the enterprise fund risk is avoided. And generating transaction details and providing the transaction details for business personnel to carry out manual reimbursement.

Optionally, the automatic reconciliation device at each station may further include: and the supplementary account statement summarizing unit 66 is connected with the replay supplementary account unit 63 and the business system 1 of the local site respectively, and is used for generating the statistical data of the supplementary account statement according to the replay log of the replay supplementary account unit and the application log of the business system of the local site. Optionally, the reimbursement report statistics include, but are not limited to, data loss due to database replication delay at disaster, automated reimbursement success, automated reimbursement failure, customer quantity related, capital related, and the like.

The embodiment of the present invention further provides an application disaster recovery method for a live data center of an application disaster recovery system applied to a first site or a second site in the above system embodiment, as described in the following embodiments. Because the principle of solving the problems of the embodiment of the method is similar to that of the application disaster recovery system of the double-activity data center, the implementation of the embodiment of the method can be referred to the implementation of the system, and repeated parts are not described again.

Fig. 4 is a flowchart of a disaster recovery method applied to a dual-active data center according to an embodiment of the present invention, as shown in fig. 4, including the following steps:

s401, receiving a service request uploaded by a gateway cluster from a remote site;

s402, processing a service request uploaded by a gateway cluster of a remote site;

s403, returning a service processing result to the gateway cluster of the remote site;

s404, storing the service processing result in the database of the local site and copying the service processing result to the database of the remote site.

It should be noted that, when the solutions provided in S401 to S404 are applied to the first station (a station) shown in fig. 2, the local station is the first station (a station); the remote site is a second site (site B); when the solutions provided in S401 to S404 are applied to the second site (B site) shown in fig. 2, the local site is the second site (B site); the allopatric site is the first site (site a).

In an optional embodiment, the method may further include: bypassing service data transmitted by a gateway cluster of a local site to the local site, wherein the service data comprises: service request data received by a gateway cluster of a local site and/or a service processing result returned by a service system of a remote site; and under the condition that a disaster happens to the remote site is detected, the service request is replayed according to the service data which is bypassed to the local site. Optionally, before bypassing the service data transmitted by the gateway cluster of the local site to the local site, the service data bypassed to the local site may also be filtered, and the filtered service data is stored in the local site.

Wherein, replaying the service request according to the service data bypassed to the local site may specifically include: acquiring service data bypassing a local site; analyzing the service data which bypasses the local site; generating a to-be-compensated account file according to the analysis result, wherein the to-be-compensated account file comprises a service request needing to be replayed; and replaying the service request according to the to-be-complemented account file.

Optionally, before replaying the service request according to the service data bypassed to the local site, the method may further include: and shielding the security verification function of the service system of the local site, so that the service system of the local site processes the replayed service request according to the normal service request processing flow.

Optionally, before replaying the service request according to the service data bypassed to the local site, the method may further include: and determining the service request which fails to be replayed by the local site according to a replay log and an application log of the local site, wherein the replay log is a log of replaying the service request to the local site, and the application log is a log of a service system of the local site.

Optionally, before replaying the service request according to the service data bypassed to the local site, the method may further include: and generating the statistical data of the supplementary account statement of the local site according to the replay log and the application log of the local site, wherein the replay log is a log for replaying the service request to the local site, and the application log is a log of the service system of the local site.

Fig. 5 is a flowchart of an application disaster recovery method applied to a live data center for business transaction of a bank core system according to an embodiment of the present invention, as shown in fig. 5, the method includes the following steps:

s501, the transaction message transmitted on each site gateway cluster bypasses to the information bypass device of the site.

And S502, according to the requirement of disaster recovery and account compensation, the information bypass device filters the transaction without account compensation by setting a blacklist.

S503, the information bypass device writes the content of the bypassed service data (e.g., service transaction code, transaction time information, complete communication packet, etc.) into the local file. The account compensation only needs to use the data delayed by the database copying, does not need to store the data all day long, and can adopt a file cycle writing mode to reduce the storage of the data. This step may support writing to different files according to the business attributes of the transaction, so that the differentiation processing is performed in S506.

It should be noted that S501, S502, and S503 are processes of daily production operations, and each transaction message is processed through these steps.

S504, in a scene of site-level disaster, a double-activity data center fault detection mechanism automatically realizes unplanned site switching and activates an automatic reconciliation device, including starting a disaster system recovery process; starting disaster recovery and account compensation; adjusting the disaster recovery sign of the core service transaction system of the current site to be in a disaster recovery state, and allowing the reimbursement transaction request to be sent to the core service transaction system only if the disaster recovery sign of the core service transaction system is in the disaster recovery state; and starting a safety check shielding function aiming at the account-supplementing transaction in the core business transaction system.

And S505, triggering a post-accounting file receiving process by an information transmission unit in the automatic post-accounting device, receiving a post-accounting file from an information bypass device, and transmitting the post-accounting file to a disaster recovery post-accounting unit in the automatic post-accounting device.

S506, checking each message in the reconciliation file by a disaster recovery reconciliation unit in the automatic reconciliation device, checking whether the transaction is successful in original production or not, and not needing the reconciliation for the failed transaction; for successful transaction, checking whether an application log of the site exists according to a key value of the service attribute, if so, data is not lost, and the transaction processing log is copied from a remote site to the site; if the data is not found, the data is lost, and the reimbursement needs to be arranged and output to the file to be reimbursed.

And S507, the replay bill supplementing element in the automatic bill supplementing device performs necessary processing on the message, eliminates factors influencing the transaction success rate such as overtime and the like, assembles the message according to the standard of the core service transaction system, and initiates a transaction request to the core service transaction system.

S508, in order to avoid the reverse order of the transaction in the replay process (for example, if a certain account is produced in the original mode, the account is transferred first and then transferred, and the account is transferred first and then transferred later after the reverse order, which may cause failure due to insufficient balance), the replay and reimbursement unit in the automatic reimbursement device further needs to control the time sequence in the process of concurrent processing of the replay and reimbursement, and sends the transaction request to the core business transaction system according to the original production processing order. The control of the time sequence is the key for improving the success rate of the transaction redo.

And S509, the core business transaction system executes the account-supplementing request initiated by the automatic account-supplementing device. In the embodiment of the invention, the same business process of executing the user business request and the reimbursement request is kept, and the transaction assets are reused to the maximum extent. When the safety check is met, the safety check shielding function is automatically triggered, and other business processes are normally executed.

And S510, a safety check shielding unit in the automatic reconciliation device captures a safety check result of the core business transaction system, shields failed check, and automatically adjusts the return of failure of date check, teller serial number check and message integrity check to normal return so as to facilitate the transaction continuous processing in the reconciliation scene. The step is the core for realizing the successful reimbursement of the transaction replay.

And S511, a replay bill supplementing element in the automatic bill supplementing device receives the processing result of the core business transaction system, analyzes the transaction replay state and records a replay log.

S512, a playback failure processing unit in the automatic reconciliation device locks accounts for transactions which fail to play back and relate to account change, and capital risks are avoided.

S513, the replay failure processing unit in the automatic reconciliation device generates a detail file for the replay failure transaction, and the manual reconciliation is carried out by the transfer business personnel.

And S514, generating report statistical data of the disaster according to the replay log and the application log by a reconnaissance report summarizing unit in the automatic reconnaissance device, such as data loss caused by delay of database replication in disaster, automatic reconnaissance success condition, automatic reconnaissance failure condition, client number related condition, fund related condition and the like.

The embodiment of the present invention further provides a computer device, which is used to solve the technical problem in the prior art that a database replication mode is adopted to backup site data, and service data is easily lost due to delay in database replication.

To sum up, the embodiment of the present invention provides an application disaster recovery scheme for a zero-loss double-active data center, which realizes double-active sites through bidirectional replication of a data database, so that two sites are backed up with each other; the problem of data loss caused by disaster due to database replication delay is solved by the cross deployment of the gateway cluster and the core service transaction system; by the aid of a replay type automatic account-supplementing technology, existing transaction resources are reused in recovery of application account data, development and maintenance of a large number of disaster recovery transactions aiming at disasters are avoided, disaster recovery maintenance cost of enterprises is greatly saved, data integrity is achieved with low investment, the enterprises can quickly recover business processes under the condition of site-level disasters, business continuity levels of the enterprises are improved, and site-level high availability and zero loss of business data of the enterprises are achieved.

The embodiment of the invention realizes the double-activity of the sites, zero loss of the service data, and the use of a replay type, and realizes the automatic account compensation with lower cost.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An application disaster recovery system of a double-activity data center is characterized by comprising: a first data center deployed at a first site and a second data center deployed at a second site, the data centers of each site comprising: the system comprises a gateway cluster, a service system, a database, an information bypass device and an automatic account-supplementing device;

the system comprises a database of a first data center, a database of a second data center, a service system of the first data center, a gateway cluster of the second data center and a database of the second data center, wherein the database of the first data center and the database of the second data center copy data in a bidirectional mode, and the service system of the first data center is connected with the gateway cluster of the second data center and is used for processing a service request received by the gateway cluster of the second data center; the service system of the second data center is connected with the gateway cluster of the first data center and is used for processing the service request received by the gateway cluster of the first data center;

the information bypass device is connected with the gateway cluster of the local site and used for bypassing the service data transmitted by the gateway cluster of the local site; the automatic account-supplementing device is respectively connected with the information bypass device of the local site and the service system of the local site and is used for transmitting the service data bypassed by the information bypass device on the local site to the service system of the local site;

wherein, the automatic device of mending of every website includes: the information transmission unit is connected with the information bypass device of the local site and used for acquiring the service data bypassed by the information bypass device; the disaster recovery reconciliation unit is connected with the information transmission unit and is used for analyzing the service data acquired by the information transmission unit and generating a to-be-reconciled accounting file according to an analysis result, wherein the to-be-reconciled accounting file comprises a service request which needs to be replayed to a service system of a local site; and the replay and bill supplementing unit is respectively connected with the disaster backup reconciliation unit and the service system of the local site and is used for replaying the service request to the service system of the local site according to the to-be-supplemented account file generated by the disaster backup reconciliation unit.

2. The system of claim 1, wherein the automated reconciliation means for each site further comprises:

and the safety check shielding unit is connected with the service system of the local site and is used for shielding the safety check function of the service system of the local site so that the service system of the local site processes the service request for replaying the bill complementing element according to the normal service request processing flow.

3. The system of claim 1, wherein the automated reconciliation means for each site further comprises:

and the replay failure processing unit is connected with the replay bill complementing unit and the service system of the local site and used for determining the service request with the replay failure according to the replay log of the replay bill complementing unit and the application log of the service system of the local site.

4. The system of claim 1, wherein the automated reconciliation means for each site further comprises:

and the supplementary account report summarizing unit is respectively connected with the replay supplementary account unit and the service system of the local site and is used for generating the statistical data of the supplementary account report according to the replay log of the replay supplementary account unit and the application log of the service system of the local site.

5. The system of claim 1, wherein the information bypass means of each site stores the traffic data transmitted by the gateway cluster of the local site in a manner of circularly writing a file.

6. The system of claim 1, wherein the information bypass means of each site stores traffic data transmitted by the gateway cluster of the local site based on a preconfigured traffic list.

7. An application disaster recovery method for a double-activity data center, which is applied to a first site or a second site of an application disaster recovery system for the double-activity data center according to any one of claims 1 to 6, and comprises the following steps:

receiving a service request uploaded by a gateway cluster from a remote site;

processing the service request uploaded by the gateway cluster of the remote site;

returning a service processing result to the gateway cluster of the remote site;

and storing the service processing result in a database of a local site, and copying the service processing result to a database of a remote site.

8. The method of claim 7, wherein the method further comprises:

bypassing service data transmitted by a gateway cluster of a local site to the local site, wherein the service data comprises: service request data received by a gateway cluster of a local site and/or a service processing result returned by a service system of a remote site;

and under the condition that a disaster happens to the remote site is detected, the service request is replayed according to the service data which is bypassed to the local site.

9. The method of claim 8, wherein replaying the service request based on the service data bypassing the local site comprises:

acquiring service data bypassing a local site;

analyzing the service data which bypasses the local site;

generating a to-be-compensated account file according to the analysis result, wherein the to-be-compensated account file comprises a service request needing to be replayed;

and replaying the service request according to the to-be-complemented account file.

10. The method of claim 8, wherein prior to replaying the service request based on the service data bypassed to the local site, the method further comprises:

and shielding the security verification function of the service system of the local site, so that the service system of the local site processes the replayed service request according to the normal service request processing flow.

11. The method of claim 8, wherein after replaying the service request based on the service data bypassed to the local site, the method further comprises:

and determining the service request which fails to be replayed by the local site according to a replay log and an application log of the local site, wherein the replay log is a log of replaying the service request to the local site, and the application log is a log of a service system of the local site.

12. The method of claim 8, wherein after replaying the service request based on the service data bypassed to the local site, the method further comprises:

and generating the statistical data of the supplementary account statement of the local site according to the replay log and the application log of the local site, wherein the replay log is a log for replaying the service request to the local site, and the application log is a log of the service system of the local site.

13. The method of any of claims 8 to 12, further comprising:

and filtering the service data which bypasses the local site, and storing the filtered service data in the local site.

14. A computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for disaster recovery in a dual activity data center of any one of claims 7 to 13 when executing the computer program.

15. A computer-readable storage medium storing a computer program for executing the disaster recovery method applied to the dual activity data center according to any one of claims 7 to 13.