CN117082069A - Mixed cloud multi-activity disaster recovery system - Google Patents

Mixed cloud multi-activity disaster recovery system Download PDF

Info

Publication number
CN117082069A
CN117082069A CN202310927945.7A CN202310927945A CN117082069A CN 117082069 A CN117082069 A CN 117082069A CN 202310927945 A CN202310927945 A CN 202310927945A CN 117082069 A CN117082069 A CN 117082069A
Authority
CN
China
Prior art keywords
cloud
database
disaster recovery
data
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310927945.7A
Other languages
Chinese (zh)
Inventor
冯东煜
李炯锋
王竞争
李鹏
张广庆
张英朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Diankeyun Beijing Technology Co ltd
Original Assignee
Diankeyun Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Diankeyun Beijing Technology Co ltd filed Critical Diankeyun Beijing Technology Co ltd
Priority to CN202310927945.7A priority Critical patent/CN117082069A/en
Publication of CN117082069A publication Critical patent/CN117082069A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a mixed cloud multi-activity disaster recovery system, which comprises: the multi-cloud adaptation module is responsible for adapting application programming interfaces of a plurality of cloud platforms so as to realize unified scheduling of the plurality of cloud platforms; the container cluster management module is used for managing K8S clusters deployed on the plurality of cloud platforms; a DevOps module; a domain name access analysis module; a load balancing module; a plurality of service gateway modules and a plurality of base resource modules. When one or more cloud platforms are in a disaster state, the load balancing module responds to switch the service flow to the service receiving cloud platform which operates normally. And the cloud platform data are synchronized by the cloud databases of the cloud platforms. The disaster recovery system utilizes the multi-cloud resource adaptation and containerization technology to uniformly schedule and manage the application and cloud databases on a plurality of cloud platforms, realizes non-perception switching of users when disasters occur, has lower cost compared with the prior art, and can adapt to various architectures such as X86, ARM and the like.

Description

Mixed cloud multi-activity disaster recovery system
Technical Field
The application relates to the technical field of cloud computing, in particular to a hybrid cloud multi-activity disaster recovery system.
Background
At this stage, traditional local storage and servers have failed to meet the explosive growth of data. Cloud computing platforms provide unlimited storage capacity for users to easily store and manage mass data. Cloud computing platforms typically host a large number of critical services and applications, so it is very important to ensure high availability of the cloud platform. Mixed cloud multi-activity disaster recovery system
Disaster recovery schemes in the prior art generally depend on an in-cloud disaster recovery system provided by a cloud manufacturer, and lack technologies and platforms for centralized and unified management of a plurality of cloud platforms. In the existing disaster recovery technology, a disaster recovery center in a backup machine room usually does not provide service at ordinary times, and resumes operation only when a disaster occurs. Thus, several problems are brought about: firstly, smooth switching to a disaster recovery center at key time cannot be ensured; secondly, disaster recovery resources are in an idle state at ordinary times, so that higher cost waste is caused; thirdly, disaster recovery relies on the stability of a single cloud manufacturer platform, and the problem that service is interrupted due to single cloud platform faults cannot be solved. And the existing disaster recovery solution relies on x86 architecture, with obvious limitations.
Disclosure of Invention
In view of this, the embodiment of the application provides a hybrid cloud multi-activity disaster recovery system, so as to solve or improve the problems of high construction cost and slower service flow switching of the disaster recovery system in the prior art.
The application provides a mixed cloud multi-activity disaster recovery system, which comprises:
the multi-cloud adaptation module is used for adapting application programming interfaces of the plurality of cloud platforms so as to uniformly schedule and apply the plurality of cloud platforms;
the container cluster management module is used for providing container cluster management to deploy K8S clusters on a plurality of cloud platforms and uniformly schedule and manage the K8S clusters;
the device comprises a DevOps module, wherein the DevOps module is used for publishing a container application to each cloud platform based on a K8S cluster deployed on each cloud platform and providing continuous integration and deployment management so that the container application can provide services on each cloud platform at the same time;
the domain name access resolution module is used for providing domain name resolution service to access user access;
the load balancing module is used for distributing service flow generated by user access to realize load balancing and detecting abnormal states of all cloud platforms;
the service gateway modules are respectively deployed in each cloud platform and are used for receiving the service traffic forwarded by the load balancing module and dispatching the service traffic to the container application at the rear end of the cloud platform;
the plurality of basic resource modules are respectively deployed in each cloud platform and provide computing resources, network resources and cloud databases for each container application through virtualization;
when one or more cloud platforms generate a disaster state, the load balancing module responds to the disaster state and switches the service flow to a service receiving cloud platform which operates normally;
and the data of the container application are synchronized by the cloud databases of all cloud platforms.
In some embodiments of the present application, the multi-cloud adaptation module is further configured to perform unified management on account numbers applied to the plurality of cloud platforms by the container, so as to implement migration of service data in the cloud database between the cloud platforms, and network interconnection and interworking between the cloud platforms; the container cluster management module provides container cluster management based on ARM or X86 architecture.
In some embodiments of the present application, the load balancing module detects abnormal states of each cloud platform in real time according to a set rule, where the set rule is that detection is performed according to a set interval time or according to a preset time point;
the abnormal state includes hardware fault, network interruption, natural disaster and human error.
In some embodiments of the present application, the cloud databases of the container application in each cloud platform are deployed in a multi-cluster mode, the container application fixedly accesses specified database clusters, and the clusters synchronize data in an asynchronous manner, so as to realize redundancy backup of data across clusters.
In some embodiments of the present application, when one or more cloud platforms are in a disaster state and implement the service traffic switching, in a streaming process, a update prohibition policy is implemented on a cloud database corresponding to the application of the container in the service receiving cloud platform, so that execution of a data update operation is prohibited, and dirty writing and synchronized data coverage after writing are prevented.
In some embodiments of the present application, in a normal state of a cloud database of each cloud platform, a user accesses generated service traffic to access the cloud database of a designated cloud platform according to a preset preferred path through the load balancing module;
and when the cloud database of the appointed cloud platform fails, discarding the preset preferred path, and smoothly switching the database access flow to the service receiving cloud database through a dynamic connection mechanism and a dynamic token algorithm of an arbitration scheduler.
In some embodiments of the present application, in a normal state of each cloud platform, in performing multi-cluster data synchronization on data of the container application through cloud databases of each cloud platform, when each cloud database adopts a MySQL database, a MySQL multi-master synchronization mechanism is adopted as a database disaster recovery scheme;
when each cloud database adopts a SQL SERVER database, adopting an AlwaysOn scheme as a database disaster recovery scheme; when each cloud database adopts ORACLE, oracle Extended Cluster scheme is adopted as database disaster recovery scheme, or Oracle RAC scheme and DataGuard scheme are combined as database disaster recovery scheme.
In some embodiments of the present application, in a normal state of each cloud platform, in performing multi-cluster data synchronization on the data applied by the container through the cloud database of each cloud platform, the data transmission protocol is optimized, the write command and the write data are combined into one-time transmission, and the write allocation is cancelled to complete the interaction process, so as to reduce the interaction times.
In some embodiments of the present application, in a normal state of each cloud platform, in performing multi-cluster data synchronization on the data applied by the container through the cloud database of each cloud platform, a data synchronization scheduler is utilized to introduce a data synchronization routing table and a routing scheduling algorithm, and synchronization is dynamically updated according to the cloud database arbitrated by the preferred access policy.
In some embodiments of the application, when the increase or decrease in traffic load of one or more cloud platforms reaches a threshold, the container application is expanded or contracted by dock, K8S and micro-service technology.
The application has the advantages that:
the application provides a hybrid cloud multi-activity disaster recovery system, which utilizes multi-cloud resource adaptation and containerization technology and database synchronization to synchronously deploy the same application to a plurality of cloud platforms and respectively call corresponding cloud databases, and synchronously operates and mutually backs up the cloud databases through unified scheduling and management on the cloud platforms under a normal state. When a disaster occurs, the system can rapidly switch the service flow of the cloud platform with the fault to the service receiving cloud platform, so that the stability and the continuity of the service are ensured. Compared with the traditional disaster recovery system, the disaster recovery system has low cost and high efficiency, and is simultaneously suitable for various hardware architectures such as X86, ARM and the like.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present application will be more clearly understood from the following detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application. In the drawings:
fig. 1 is a schematic diagram of a container application deployment of a hybrid cloud multi-activity disaster recovery system according to an embodiment of the present application.
Fig. 2 is a block diagram of a hybrid cloud multi-activity disaster recovery system according to another embodiment of the present application.
Fig. 3 is a schematic diagram illustrating multi-cluster data synchronization among cloud databases of the hybrid cloud multi-activity disaster recovery system according to an embodiment of the present application.
Fig. 4 is a schematic diagram illustrating multi-cluster data synchronization among cloud databases of a hybrid cloud multi-activity disaster recovery system according to another embodiment of the present application.
Fig. 5 is a schematic diagram of disaster recovery switching of the hybrid cloud multi-activity disaster recovery system according to an embodiment of the application.
Fig. 6 is a block diagram of a hybrid cloud multi-activity disaster recovery system according to another embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent. The exemplary embodiments of the present application and the descriptions thereof are used herein to explain the present application, but are not intended to limit the application.
It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present application are shown in the drawings, while other details not greatly related to the present application are omitted.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.
In the prior art, disaster recovery is established on the basis of data-level disaster recovery, and the implementation mode is that a set of same application system is established in a backup machine room, and when a disaster occurs, the operation is restored within a contracted time range (RT 0), so that the following problems exist: 1) The disaster recovery center does not provide service at ordinary times, and can not determine whether the successful switching can be performed at the key moment of switching to the disaster recovery center. 2) The disaster recovery center does not provide service at ordinary times, the whole disaster recovery resource can be in an idle state, and cost waste is high. 3) Disaster recovery relies on the stability of a single cloud manufacturer platform, and the problem of service interruption caused by the failure of the single cloud platform cannot be solved by the mode. In the existing cloud disaster recovery environment, the cloud in the cloud platform layer only supports deployment under the x86 architecture, so that the solution of disaster recovery research has limitations.
The application provides a hybrid cloud multi-activity disaster recovery system of a hybrid cloud multi-activity disaster recovery system, which is a disaster recovery solution based on a plurality of cloud platforms. The system simultaneously utilizes the resources and the capabilities of a plurality of cloud platforms, ensures high availability deployment and data backup of application programs and data among a plurality of geographic positions, and provides stronger disaster recovery and fault recovery capabilities.
The embodiment of the application provides a hybrid cloud multi-activity disaster recovery system, which comprises: the system comprises a multi-cloud adaptation module, a container cluster management module, a DevOps module, a domain name access analysis module, a load balancing module, a service gateway module and a basic resource module.
The multi-cloud adaptation module is used for adapting application programming interfaces of the plurality of cloud platforms so as to uniformly schedule the plurality of cloud platforms to be applied.
The container cluster management module is used for providing container cluster management to deploy K8S clusters on a plurality of cloud platforms and uniformly schedule and manage. The module packages the application of the bearing service and the matched resources thereof into independent operation environments on each cloud platform by utilizing a containerization technology. The containerization technology is a method for developing and deploying software, and packages an application program and all dependency items thereof into a single executable environment, so that the application program can run quickly and portably in different computing environments. Each container in the containerization technique contains code, runtime environment, system tools, library files, configuration, etc. of the application program. At the heart of the containerization technology is a container engine, the most well known and widely used container engine being the Docker. The container engine is responsible for creating, running and managing containers, providing a set of interfaces and tools to build, deploy and manage containerized applications. K8S refers to Kubernetes, an open-source container orchestration platform, whose role is to simplify the management of containerized applications, providing a highly scalable, high availability container orchestration solution.
The DevOps module publishes the container application to each cloud platform based on the K8S cluster deployed on each cloud platform and provides continuous integration and deployment management to enable the container application to provide services on each cloud platform simultaneously. The automation, customization and flexible flow scheduling arrangement requirements of business operation and maintenance services can be realized through the DevOps.
The domain name access resolution module is used for providing domain name resolution service, namely DNS, and is used for converting the human-readable domain name into an IP address which can be identified by a computer so as to access user access to realize network communication. When a user inputs a domain name in the browser, the browser sends a query request to a local DNS server, the local DNS server obtains a corresponding IP address from the DNS servers distributed globally, and the result is returned to the browser, so that the domain name resolution process is completed.
And the load balancing module is used for distributing the service flow generated by user access to realize load balancing and detecting the abnormal state of each cloud platform. The module supports the TCP/UDP/HTTP/HTTPS protocol. Load balancing is a computer network technology for distributing workload among multiple servers or computing resources to ensure high availability, performance, and scalability of the system. The goal is to enable all servers to process requests effectively, avoid performance degradation caused by overload of certain servers, and keep resource utilization balanced.
The service gateway modules are respectively deployed in each cloud platform and are used for receiving the service traffic forwarded by the load balancing module and dispatching the service traffic to the container application at the rear end of the cloud platform.
The plurality of basic resource modules are respectively deployed in each cloud platform and provide computing resources, network resources and cloud databases for each container application through virtualization.
User requirements are accessed to user access through a domain name access analysis module, and then distributed to a plurality of cloud platforms applied by the bearing container through a load balancing module, wherein the load balancing module can uniformly distribute service flow and detect abnormal states of the cloud platforms. The method comprises the steps that a plurality of cloud platforms bear container applications and related resources thereof and are packaged into an independent executable environment by utilizing a containerization technology, wherein the cloud platforms comprise a service gateway module for bearing service traffic forwarded by a load balancing module and distributing the container applications at the rear end; releasing the container application to a DevOps module of each cloud platform; the container cluster management module is used for deploying the K8S clusters and uniformly scheduling and managing the K8S clusters; and a base resource module that provides computing resources, network resources, and cloud databases for each container application through virtualization. And the databases corresponding to the same application on each platform keep data synchronization. Meanwhile, the multi-cloud adaptation module adapts application programming interfaces of a plurality of cloud platforms and performs unified scheduling on container applications of the plurality of cloud platforms.
When one or more cloud platforms are in a disaster state, the load balancing module responds to the disaster state and switches the service flow to the service receiving cloud platform which operates normally.
And the data of the container application are synchronized by the cloud databases of the cloud platforms.
Specifically, the cloud databases of the container application in each cloud platform are deployed in a multi-cluster mode, the container application fixedly accesses to the designated database clusters, and data are synchronized in an asynchronous mode among the clusters, so that data redundancy backup of the cross-clusters is realized.
Further, implementing the multi-cluster data employs the following policies, including:
the database prefers access strategies, the multi-data center scene is in a preferential mode provided by the multi-cloud multi-activity console, and I/0 can only be issued in a load balancing mode on a preferential path set by a user, so that I/O access across the database is not generated. The application service is ensured not to be interrupted abnormally, and smooth switching of database flow access is realized through a dynamic connection mechanism and a dynamic token algorithm of an arbitration scheduler. When the preferred database fails, the preferred database is switched to the non-preferred database to issue I/0, and the preferred database mode can reduce the number of interactions across the data center, thereby improving the I/0 performance.
And optimizing a data link, synchronizing data, optimizing an I/0 interaction process by carrying out protocol level optimization on data transmission, combining a write command and write data into one-time transmission, canceling the write allocation completion interaction process, and reducing the number of cross-data center I/0 interactions by half. Designing a data synchronous scheduler, introducing a data synchronous routing table and a routing scheduling algorithm, dynamically updating a synchronous strategy according to a database arbitrated by a preferred access strategy, and ensuring the accuracy and instantaneity of synchronization among multiple active databases.
And when the main database fails, the database disaster recovery management adopts a heartbeat strategy to judge the node state in real time, and the node is switched to the standby database through an arbitration mechanism after the node failure is found, so that the second-level switching of the database is realized.
And when the data protection mechanism is used for traffic switching, a forbidden update strategy is adopted during the data synchronization delay period, so that the forbidden update strategy is adopted for application in the cloud of the switching destination, the execution of data update operation is forbidden, and the problems of dirty writing and synchronous data coverage after writing are avoided.
In some embodiments of the present application, the multi-cloud adaptation module is further configured to perform unified management on account numbers applied to the plurality of cloud platforms by using the container, so as to implement migration of service data in the cloud database between the cloud platforms, and network interconnection and interworking between the cloud platforms; the container cluster management module provides container cluster management based on ARM or X86 architecture.
In some embodiments of the present application, the load balancing module detects abnormal states of each cloud platform in real time according to a set rule, where the set rule is to detect according to a set interval time or according to a preset time point; abnormal states include hardware failures, network outages, natural disasters, and human errors.
In some embodiments of the present application, in a normal state of a cloud database of each cloud platform, a user accesses generated service traffic to access the cloud database of a designated cloud platform according to a preset preferred path through a load balancing module; when the cloud database of the designated cloud platform fails, the preset preferred path is abandoned, and the database access flow is smoothly switched to the service acceptance cloud database through a dynamic connection mechanism and a dynamic token algorithm of the arbitration scheduler.
In some embodiments of the present application, in a normal state of each cloud platform, in performing multi-cluster data synchronization on data applied by a container through a cloud database of each cloud platform, when each cloud database adopts a MySQL database, a MySQL multi-master synchronization mechanism is adopted as a database disaster recovery scheme;
when each cloud database adopts a SQL SERVER database, adopting an AlwaysOn scheme as a database disaster recovery scheme; when each cloud database adopts ORACLE, oracle Extended Cluster scheme is adopted as database disaster recovery scheme, or Oracle RAC scheme and DataGuard scheme are combined as database disaster recovery scheme.
In some embodiments of the present application, in a normal state of each cloud platform, in performing multi-cluster data synchronization on data applied by a container through a cloud database of each cloud platform, a data transmission protocol is optimized, a write command and write data are combined into one-time transmission, and a write allocation completion interaction process is cancelled, so as to reduce the number of interactions.
In some embodiments of the present application, in a normal state of each cloud platform, in performing multi-cluster data synchronization on data applied by a container through a cloud database of each cloud platform, a data synchronization scheduler is utilized to introduce a data synchronization routing table and a routing scheduling algorithm, and synchronization is dynamically updated according to a cloud database arbitrated by a preferred access policy.
In some embodiments of the present application, when the increase or decrease in traffic load of one or more cloud platforms reaches a threshold, the container application is expanded or contracted by dock, K8S, and micro-service techniques.
Wherein, dock is an open-source containerized platform that provides a lightweight, portable and self-contained software packaging approach. By using a Docker, a developer may package an application and all its dependencies, including code, runtime environment, system tools, etc., into a single container. This container can run on different operating systems and ensures that applications have consistent running results in different environments.
Wherein, the micro service is a software architecture style, which divides the application program into a group of small and independent services, each of which is focused on completing a specific business function and interacting through a lightweight communication mechanism. Each micro-service may be independently developed, deployed, and extended while different programming languages and technology stacks may be used.
The application provides a mixed cloud multi-activity disaster recovery system for carrying out disaster recovery centralized and unified management on a plurality of clouds, which comprises the following specific schemes:
aiming at the problem of service disaster recovery in heterogeneous multi-cloud scenes, the embodiment of the application provides a hybrid cloud multi-activity disaster recovery system which supports X86 and home ARM architectures, as shown in figure 1, the system deploys applications with equal capacity on a plurality of cloud service platforms, adopts a multi-activity deployment mode, backups each other, receives all service traffic based on multi-activity access gateway products, dispatches the traffic to back-end applications on different clouds according to proportion or accurate routing rules, the applications deployed on the clouds simultaneously provide services to the outside, so as to realize application multi-activity and load sharing, data is synchronized among a plurality of centers through copying, and when a disaster occurs, a normal center takes over all requests based on the appointed applications as required; the network layer adopts cloud load balancing and DNS technology to ensure the automatic cutting of the access flow so as to realize that the user has no perception on disaster recovery switching; according to the data types, the data layer adopts a cloud database and a multi-place deployment synchronization mode to ensure the data consistency; the application layer adopts a multi-cloud multi-activity deployment mode, so that the availability and continuity of application services are ensured; the multi-cloud access adaptation realizes unified management of resources in a multi-cloud heterogeneous environment. The container resource scheduling can realize the unified management and scheduling of K8S clusters deployed on different clouds, and when a single-point downtime fault of the cloud occurs, the container scheduling can automatically complete the elastic migration of the application instance in a second level, so that the reliability of the service is greatly improved. Fills the blank of service application disaster recovery in a multi-cloud multi-activity environment under the domestic ARM architecture.
The hybrid cloud multi-activity disaster recovery system in this embodiment includes:
and the multi-cloud adapting module is used for: the unified management of data under the multi-cloud heterogeneous resource scene is realized by adapting to the API interfaces of different cloud platforms, and the unified management comprises unified management of multi-cloud accounts, network interconnection and intercommunication and free migration of service data among different clouds.
A container cluster management module: the container cluster management service supporting the ARM domestic architecture is provided, the unified management and scheduling of K8S clusters deployed on different clouds are realized, and the container cluster management service can comprise an electric cloud domestic ARM architecture K8S cluster, an Arian domestic ARM architecture K8S cluster and other cloud domestic ARM architecture K8S clusters.
The DevOps module: the method comprises the steps of providing continuous integration and deployment management of container applications, completing one-key release and change through codes and application packages, and completing multi-cloud automatic pipeline deployment and load balancing automatic registration of the applications.
Service gateway module: and receiving all the service traffic, and scheduling the traffic to back-end applications on different clouds according to a proportion or an accurate routing rule.
Load balancing module: the cloud primary era uses high-availability basic elements, supports protocols such as TCP/UDP/HTTP/HTTPS and the like, expands the service throughput capacity of an application system by distributing traffic to different back-end services, regularly detects the running condition of a back-end cloud server, and once the cloud server is detected to be abnormal, does not forward the traffic to an abnormal instance, eliminates single-point faults, improves the availability of the application system and realizes that a user does not feel disaster recovery switching.
Domain name access resolution: a domain name resolution service, a one-stop addressing dispatch service, is provided.
And (3) a basic resource module: through virtualization technologies of the electric cloud, the ali cloud and other clouds, virtualization of computing, storage and network resources is achieved, and operation resources are provided for service application clouding.
The technical architecture of this embodiment is shown in fig. 2, and specifically includes:
1) The service application examples related to the whole system architecture all adopt a mode of multi-cloud peer-to-peer deployment, adopt a mode of multi-cloud and multi-active deployment, and are mutually backup, so that single-point faults are avoided. Meanwhile, based on the fault migration and elastic expansion capacity of the cloud primary application, the elastic migration of the fault node application instance is automatically completed in second level, and compared with the traditional disaster recovery scheme, the second-level elastic capacity of the container technology enables users not to need to maintain redundant resources for disaster recovery for a long time, and construction cost and operation and maintenance cost of basic resources are saved.
2) Based on the multi-active access gateway product, all service flows are accepted, and the flows are scheduled to the back-end applications on different clouds according to a proportion or an accurate routing rule, the applications deployed on a plurality of clouds simultaneously provide services to the outside, and the multi-active application and load sharing are realized.
3) Based on the application system fault of the data center discovered by the monitoring management module (composed of the global load balancer and the gateway router), the front-end service request is switched to the system on other clouds by the load balancing to realize access.
4) And the multi-cloud adapting module is used for: unified management of data under the multi-cloud heterogeneous resource scene is realized by adapting to the API interfaces of different cloud platforms, and resource service and cloud resource running condition monitoring are provided for multi-cloud disaster recovery.
5) Disaster recovery scheme for core database accessed by applications: if the database selects MYSQL, a MYSQL multi-master synchronization mechanism is adopted, and if SQL SERVER is adopted, alwaysOn is adopted; if the database is ORACLE, oracle Extended Cluster or Oracle RAC plus DataGuard is used.
6) On the data layer, the database on the multi-cloud performs real-time synchronization of data in a main-standby mode through database software, and when the main database fails, the database disaster recovery management software is operated to switch to the standby database, so that second-level switching of the database is realized; when the traffic is switched, a forbidden update strategy is adopted for application in the cloud of the traffic during the data synchronization delay period, so that the execution of data update operation is forbidden, and the problems of dirty writing and synchronous data coverage after writing are avoided.
Specifically, as shown in fig. 3, the method for synchronizing multi-cluster data includes:
when the data layer multi-active component is designed in a technical architecture, a network delay between machine rooms is small in a near-distance scene of the same city, a mode of a single database cluster can be selected to carry out strong consistent data reading and writing, a remote hybrid cloud scene is adopted, data are segmented according to a certain service dimension by adopting the mode of multiple database clusters, data are synchronized in an asynchronous mode by fixedly accessing one database cluster corresponding to the machine room, and data redundancy backup of the cross clusters is carried out. For one service request, after the flow is subjected to upstream route error correction, the database is only allowed to be written in a correct machine room, and the error flow is forbidden to be written. When a certain machine room breaks down, only the upstream flow of the faulty machine room is required to be cut to zero, and in the flow cutting process, the capacity of preventing write protection during the change of a routing rule and preventing write protection during the delay of data synchronization is required to be provided, so that the dirty write problem caused by flow cutting is avoided.
Further, as shown in fig. 4, the method for synchronizing multi-cluster data includes:
when the data layer is designed by a multi-active component provided by a multi-cloud multi-active console in a technical architecture, a network delay between machine rooms is small in a scene in the same city and in a short distance, a mode of a single database cluster can be selected to carry out strong consistent data reading and writing, a remote hybrid cloud scene adopts a mode of multiple database clusters, data is segmented according to a certain service dimension, a service is provided for the outside by applying a preferred access strategy, and a corresponding database cluster is fixedly accessed.
And synchronizing data among a plurality of clusters in the aspect of data synchronization in an asynchronous mode, and performing data redundancy backup of the cross-cluster. For one service request, after the flow is subjected to upstream route error correction, the database is only allowed to be written in a correct machine room, and the error flow is forbidden to be written. And the data link optimization mode is adopted to carry out data synchronization, so that the data synchronization efficiency in the remote hybrid cloud scene is improved.
When the main database fails, the data arbitration service is used for automatically switching to the standby database, so that the second-level switching of the database is realized; when the traffic is switched, the upstream traffic of the fault cloud is required to be cut to zero, and in the process of cutting, the capacity of disabling write protection during the change period of the routing rule and disabling write protection during the delay period of the data synchronization is required to be provided, so that the dirty write problem caused by cutting is avoided.
The method comprises the following strategies:
1) Database preferred access policy: the multi-data center scene is provided by the multi-cloud multi-activity console in a preferred mode, the I/O can only be issued in a load balancing mode on a preferred path set by a user, and the I/O access of the cross-database is not generated. The application service is ensured not to be interrupted abnormally, and smooth switching of database flow access is realized through a dynamic connection mechanism and a dynamic token algorithm of an arbitration scheduler. When the preferred database fails, the preferred database is switched to the non-preferred database to issue the I/O, and the preferred database mode can reduce the interaction times of the cross-data center, so that the I/O performance is improved.
2) Data link optimization: and data synchronization, namely, optimizing an I/O interaction process by carrying out protocol level optimization on data transmission, combining a write command and write data into one-time transmission, canceling the write distribution completion interaction process, and reducing the inter-data center I/O interaction times by half. Designing a data synchronous scheduler, introducing a data synchronous routing table and a routing scheduling algorithm, dynamically updating a synchronous strategy according to a database arbitrated by a preferred access strategy, and ensuring the accuracy and instantaneity of synchronization among multiple active databases.
3) A master-slave switching mechanism: when the main database fails, the database disaster recovery management adopts a heartbeat strategy to judge the node state in real time, and the node is switched to the standby database through an arbitration mechanism after the node failure is found, so that the second-level switching of the database is realized.
4) Data protection mechanism: when the traffic is switched, a forbidden update strategy is adopted during the data synchronization delay period, so that the forbidden update strategy is adopted for the application in the cloud of the switching destination, the execution of the data update operation is forbidden, and the problems of dirty writing and synchronous data coverage after writing are avoided.
In some embodiments, when a disaster occurs, a method for performing disaster recovery switching by the system is shown in fig. 5, which specifically includes the following steps:
when the service can be normally accessed, after an access request enters a network such as a DMZ, a firewall and the like, traffic is distributed by cloud load balancing after DNS processing, a user can set a traffic policy in the cloud load balancing, and the traffic load can be shared by applications on each cloud platform according to the traffic policy, so that system paralysis caused by traffic impact is avoided.
When the service is abnormally switched, after the service on one cloud is integrally failed, the access request enters the DMZ, the firewall and other networks, and then the load balancing equipment and the domain name switching mode are adopted, so that other data centers immediately bear all the service requests, and the user is not aware of the disaster recovery switching and switches to the production center which normally works, thereby ensuring the continuity of the service and improving the reliability of the system.
In some embodiments, the cloud-multi-active disaster recovery system as shown in fig. 6 is specifically configured and functions as follows:
1) In the multi-cloud multi-activity disaster recovery system, the multi-cloud multi-activity disaster recovery deployment is realized by applying the multi-cloud multi-activity disaster recovery deployment through the shielding of a multi-cloud resource adaptation and containerization technology to a bottom virtualization platform, the real-time backup and the rapid switching of double-activity disaster recovery data are realized through a multi-cloud multi-activity disaster recovery console, the fact that if one production center fails, the other production center can still normally operate is realized, a user of the failed production center can switch to the production center which normally works without perception, the continuity of business service is ensured, and the reliability of the system is improved. The remote mixed cloud scene operation tasks are flexibly arranged through the DevOps, and the service deployment, upgrading and capacity expansion efficiency is improved.
2) Infinitely elastically stretchable: by matching the Docker, the K8S and the micro-service technology, when the traffic load flow suddenly increases, the elastic expansion and contraction of the application instance can be automatically completed in seconds, the reliability of the service is greatly improved, and great convenience is brought to operations such as expansion of the service and the like. Meanwhile, compared with the traditional Shan Yunrong disaster solution, the extension of the processing capacity expansion from a single cloud platform to a plurality of cloud platforms is realized.
3) By adopting the DevOps cloud native technology, flexible flow scheduling of cloud services, micro services, jobs, functions and the like is carried out on the cloud, so that the requirements of automation, customization and flexible flow scheduling of service operation and maintenance services are effectively met, and the research and development efficiency is continuously improved through continuous delivery practice.
4) Through the localization autonomous controllable technology, the system adapts to various hardware architectures such as X86, ARM and the like, an operating system supports the compatibility of kylin and unified message UOS, and the system can be operated in a localization hardware platform of various product platforms such as Haiyin, feiteng and the like, thereby not only meeting the requirements of high-efficiency and stable data disaster recovery management in localization environments, but also meeting the current requirements of localization substitution and autonomous controllability. In summary, according to the hybrid cloud multi-activity disaster recovery system, the system adapts application programming interfaces of a plurality of cloud platforms by arranging the multi-cloud adaptation module so as to realize unified scheduling of the plurality of cloud platforms; the method comprises the steps that a container cluster management module is arranged and used for managing K8S clusters deployed on a plurality of cloud platforms; when one or more cloud platforms are in a disaster state, the load balancing module responds to and switches the service flow to the service receiving cloud platform which operates normally. And the cloud platform data are synchronized by the cloud databases of the cloud platforms. The disaster recovery system utilizes the multi-cloud resource adaptation and containerization technology to uniformly schedule and manage the application and cloud databases on a plurality of cloud platforms, and realizes the non-perception switching of users when disasters occur. Compared with the prior art, the system has lower cost and can adapt to various architectures such as X86, ARM and the like.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A hybrid cloud multiple-activity disaster recovery system, comprising:
the multi-cloud adaptation module is used for adapting application programming interfaces of the plurality of cloud platforms so as to uniformly schedule and apply the plurality of cloud platforms;
the container cluster management module is used for providing container cluster management to deploy K8S clusters on a plurality of cloud platforms and uniformly schedule and manage the K8S clusters;
the device comprises a DevOps module, wherein the DevOps module is used for publishing a container application to each cloud platform based on a K8S cluster deployed on each cloud platform and providing continuous integration and deployment management so that the container application can provide services on each cloud platform at the same time;
the domain name access resolution module is used for providing domain name resolution service to access user access;
the load balancing module is used for distributing service flow generated by user access to realize load balancing and detecting abnormal states of all cloud platforms;
the service gateway modules are respectively deployed in each cloud platform and are used for receiving the service traffic forwarded by the load balancing module and dispatching the service traffic to the container application at the rear end of the cloud platform;
the plurality of basic resource modules are respectively deployed in each cloud platform and provide computing resources, network resources and cloud databases for each container application through virtualization;
when one or more cloud platforms generate a disaster state, the load balancing module responds to the disaster state and switches the service flow to a service receiving cloud platform which operates normally;
and the data of the container application are synchronized by the cloud databases of all cloud platforms.
2. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein the multi-cloud adaptation module is further configured to perform unified management on account numbers of the container applied to a plurality of cloud platforms, so as to implement migration of service data in the cloud database between the cloud platforms and network interconnection and interworking between the cloud platforms;
the container cluster management module provides container cluster management based on ARM or X86 architecture.
3. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein the load balancing module detects abnormal states of each cloud platform in real time according to a set rule, the set rule being detection according to a set interval time or a preset time point;
the abnormal state includes hardware fault, network interruption, natural disaster and human error.
4. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein cloud databases of the container application in each cloud platform are deployed in a multi-cluster mode, the container application fixedly accesses designated database clusters, and data is synchronized in an asynchronous mode among the clusters to realize data redundancy backup across the clusters.
5. The hybrid cloud multi-activity disaster recovery system of claim 4, wherein when one or more cloud platforms are in a disaster state and implement the service traffic switching, in a tangential flow process, a forbidden update policy is implemented on a cloud database corresponding to the application of the container in the service receiving cloud platform, execution of a data update operation is forbidden, and dirty writing and synchronous data coverage are prevented.
6. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein in a normal state of cloud databases of each cloud platform, service traffic generated by user access accesses the cloud database of the designated cloud platform according to a preset preferred path through the load balancing module;
and when the cloud database of the appointed cloud platform fails, discarding the preset preferred path, and smoothly switching the database access flow to the service receiving cloud database through a dynamic connection mechanism and a dynamic token algorithm of an arbitration scheduler.
7. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein in a normal state of each cloud platform, in the multi-cluster data synchronization of the data of the container application through the cloud databases of each cloud platform, when each cloud database adopts a MySQL database, a MySQL multi-master synchronization mechanism is adopted as a database disaster recovery scheme;
when each cloud database adopts a SQL SERVER database, adopting an AlwaysOn scheme as a database disaster recovery scheme; when each cloud database adopts ORACLE, oracle Extended Cluster scheme is adopted as database disaster recovery scheme, or Oracle RAC scheme and DataGuard scheme are combined as database disaster recovery scheme.
8. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein in a normal state of each cloud platform, in the multi-cluster data synchronization of the data applied by the container through the cloud database of each cloud platform, the data transmission protocol is optimized, the write command and the write data are combined into one-time transmission, and the write allocation is canceled to complete the interaction process, so as to reduce the interaction times.
9. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein in a normal state of each cloud platform, in the multi-cluster data synchronization of the data applied by the container through the cloud database of each cloud platform, a data synchronization scheduler is utilized to introduce a data synchronization routing table and a routing scheduling algorithm, and the synchronization is dynamically updated according to the cloud database arbitrated by the preferred access policy.
10. The hybrid cloud multi-activity disaster recovery system of claim 1, wherein the container applications are expanded or contracted by dock, K8S and micro-service techniques when an increase or decrease in traffic load of one or more cloud platforms reaches a threshold.
CN202310927945.7A 2023-07-26 2023-07-26 Mixed cloud multi-activity disaster recovery system Pending CN117082069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310927945.7A CN117082069A (en) 2023-07-26 2023-07-26 Mixed cloud multi-activity disaster recovery system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310927945.7A CN117082069A (en) 2023-07-26 2023-07-26 Mixed cloud multi-activity disaster recovery system

Publications (1)

Publication Number Publication Date
CN117082069A true CN117082069A (en) 2023-11-17

Family

ID=88706999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310927945.7A Pending CN117082069A (en) 2023-07-26 2023-07-26 Mixed cloud multi-activity disaster recovery system

Country Status (1)

Country Link
CN (1) CN117082069A (en)

Similar Documents

Publication Publication Date Title
CN112000448B (en) Application management method based on micro-service architecture
CN107707393B (en) Multi-active system based on Openstack O version characteristics
US9703608B2 (en) Variable configurations for workload distribution across multiple sites
Kakivaya et al. Service fabric: a distributed platform for building microservices in the cloud
EP2171593B1 (en) Shared data center disaster recovery systems and methods
US7225356B2 (en) System for managing operational failure occurrences in processing devices
CN102404390A (en) Intelligent dynamic load balancing method for high-speed real-time database
CN108270726B (en) Application instance deployment method and device
CN113641511B (en) Message communication method and device
EP2643771B1 (en) Real time database system
US20080244552A1 (en) Upgrading services associated with high availability systems
EP2224341B1 (en) Node system, server switching method, server device, and data transfer method
CN111460039A (en) Relational database processing system, client, server and method
EP3915224A1 (en) State controller running in a kubernetes system and method for operating same
CN114338670B (en) Edge cloud platform and network-connected traffic three-level cloud control platform with same
CN105959145B (en) A kind of method and system for the concurrent management server being applicable in high availability cluster
Mitrović et al. Improving fault-tolerance of distributed multi-agent systems with mobile network-management agents
Venâncio et al. NHAM: an NFV high availability architecture for building fault-tolerant stateful virtual functions and services
CN117082069A (en) Mixed cloud multi-activity disaster recovery system
CN116723077A (en) Distributed IT automatic operation and maintenance system
Ooi et al. Dynamic service placement and redundancy to ensure service availability during resource failures
Thanakornworakij et al. High availability on cloud with HA-OSCAR
US7558858B1 (en) High availability infrastructure with active-active designs
CN116684261A (en) Cluster architecture control method and device, storage medium and electronic equipment
CN115328651A (en) Lightweight micro-cloud system based on domestic VPX server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination