CN116302352A - Cluster disaster recovery processing method and device, electronic equipment and storage medium - Google Patents

Cluster disaster recovery processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116302352A
CN116302352A CN202310271288.5A CN202310271288A CN116302352A CN 116302352 A CN116302352 A CN 116302352A CN 202310271288 A CN202310271288 A CN 202310271288A CN 116302352 A CN116302352 A CN 116302352A
Authority
CN
China
Prior art keywords
cluster
domain name
component
backup
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310271288.5A
Other languages
Chinese (zh)
Inventor
沈德胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202310271288.5A priority Critical patent/CN116302352A/en
Publication of CN116302352A publication Critical patent/CN116302352A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

The disclosure provides a cluster disaster recovery processing method, a cluster disaster recovery processing device, electronic equipment and a storage medium, which can be applied to the technical field of computers and the field of cloud computing. The method comprises the following steps: responding to a triggering main cluster fault event, and acquiring a target snapshot file from a network memory; based on the target snapshot file, carrying out data recovery processing on the local storage unit of the standby cluster to obtain a processing result; under the condition that the processing result indicates that the data recovery is successful, starting a service component of the backup cluster; and under the condition that the health detection component is used for determining that the service component is normally started, setting the back-end address of the preset domain name as the address of the backup cluster by using the domain name resolution component, so that a user can access the service component of the backup cluster through the preset domain name.

Description

Cluster disaster recovery processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology and the field of cloud computing, and more particularly, to a cluster disaster recovery processing method, apparatus, electronic device, storage medium, and program product.
Background
Kubernetes is an open-source container cluster management system, and provides a management platform for the full life cycle of container operation, such as scheduling, deployment, service discovery, container expansion and contraction, resource recovery and the like, for the container. As Kubernetes related applications grow, the number of Kubernetes clusters is also gradually rising, and a Kubernetes multi-cluster management platform is correspondingly appeared. The Kubernetes cluster management platform needs higher disaster recovery backup capability, and in the related art, the risk of single cluster operation is offset by multi-cluster deployment application, so that the disaster recovery backup capability is improved.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the related art: the cluster disaster recovery processing method in the related art has poor timeliness.
Disclosure of Invention
In view of this, the present disclosure provides a cluster disaster backup processing method, apparatus, electronic device, readable storage medium and computer program product.
One aspect of the present disclosure provides a cluster disaster recovery processing method, including: responding to a triggering main cluster fault event, and acquiring a target snapshot file from a network memory; based on the target snapshot file, carrying out data recovery processing on the local storage unit of the backup cluster to obtain a processing result; starting the service component of the standby cluster under the condition that the processing result indicates that the data recovery is successful; and under the condition that the health detection component is used for determining that the service component is normally started, setting the back end address of the preset domain name as the address of the backup cluster by using the domain name resolution component, so that a user can access the service component of the backup cluster through the preset domain name.
According to an embodiment of the present disclosure, the cluster disaster recovery processing method further includes: responding to triggering a first timing task, and generating a first snapshot file based on a local storage unit of the standby cluster; and writing the first snapshot file into the network memory.
According to an embodiment of the present disclosure, the obtaining, in response to triggering the primary cluster failure event, the target snapshot file from the network storage includes: in response to triggering the primary cluster failure event, accessing the network memory based on a preset directory of a backup recovery component to acquire a plurality of second snapshot files from the network memory, wherein the second snapshot files are configured with timestamp information; and determining the target snapshot file from the plurality of second snapshot files based on the current system time information and the timestamp information of each of the plurality of second snapshot files.
According to an embodiment of the present disclosure, in the case that the health detection component determines that the service component is started normally, setting, by the domain name resolution component, a backend address of a preset domain name as an address of the backup cluster includes: under the condition that the health detection component is used for determining that the service component is started normally, the domain name resolution component is used for modifying the weight associated with the address of the standby cluster into a preset value, wherein the preset value is larger than the weight associated with the address of the main cluster; and setting the back-end address of the preset domain name as the address of the backup cluster in response to the weight associated with the address of the backup cluster being greater than the weight associated with the address of the main cluster.
According to an embodiment of the present disclosure, the cluster disaster recovery processing method further includes: responding to triggering a second timing task, and detecting the health state of the main cluster by utilizing the health detection component to obtain a detection result; and triggering the main cluster fault event under the condition that the detection result indicates the main cluster fault.
According to an embodiment of the disclosure, in response to triggering the second timing task, detecting, by the health detection component, a health state of the primary cluster, to obtain a detection result, including: accessing the service component of the main cluster through the preset domain name to obtain a first detection result; transmitting a test data packet to the main cluster by using an Internet protocol to obtain a second detection result; and determining the detection result based on the first detection result and the second detection result.
Another aspect of the present disclosure provides a cluster disaster recovery processing device, including: the acquisition module is used for responding to the triggering of the main cluster fault event and acquiring a target snapshot file from the network memory; the processing module is used for carrying out data recovery processing on the local storage units of the standby clusters based on the target snapshot file to obtain a processing result; the starting module is used for starting the service component of the standby cluster under the condition that the processing result indicates that the data recovery is successful; and the setting module is used for setting the rear end address of the preset domain name as the address of the backup cluster by using the domain name resolution component under the condition that the health detection component is used for determining that the service component is normally started, so that a user can access the service component of the backup cluster through the preset domain name.
Another aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more instructions that, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.
Another aspect of the present disclosure provides a computer program product comprising computer executable instructions which, when executed, are adapted to implement the method as described above.
According to the embodiment of the disclosure, the target snapshot file is acquired from the network memory, the local storage unit of the backup cluster is subjected to data recovery processing based on the target snapshot file, a processing result is obtained, the service component of the backup cluster is started, the technical means that the rear end address of the preset domain name is set as the address of the backup cluster by using the domain name resolution component is adopted, the quick recovery use of the cluster is realized, and the user can continuously access the service component of the cluster under the condition that the domain name is not changed, so that the technical problem of poor timeliness of the cluster disaster backup processing method is at least partially overcome, and the availability of the cluster is effectively improved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an exemplary system architecture to which cluster disaster recovery processing methods and apparatus may be applied, in accordance with embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a cluster disaster backup processing method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a technical architecture diagram of a cluster disaster recovery processing method master-slave cluster according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a flowchart of a cluster disaster recovery processing method primary-backup cluster service switching according to an embodiment of the disclosure;
FIG. 5 schematically illustrates a block diagram of a clustered disaster recovery processing device in accordance with an embodiment of the present disclosure; and
fig. 6 schematically illustrates a block diagram of a computer system suitable for implementing a robot in accordance with an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated.
In the related art, a Kubernetes cluster management platform layer is generally deployed by using a Kubernetes cluster, and master nodes (control nodes of the Kubernetes cluster) and etcd (open source distributed key values) nodes in the cluster all have certain high availability. In a production environment, an IaaS (infrastructure as a service) layer sets a physical fault domain according to a machine room or a floor, different fault domains adopt different IP network segments, and Kubernetes clusters are deployed in a certain fault domain, and a network access relation is opened through the IaaS layer, so that the Kubernetes cluster management platform layer can manage the Kubernetes clusters of different network segments.
The Kubernetes cluster management platform requires higher disaster recovery backup capability as the business clusters. In the application of Kubernetes clusters, the backup disaster recovery capability for the cluster scale is not available. When the fault domain where the Kubernetes cluster management platform layer is located has the problems of power failure, network disconnection and the like, and the Kubernetes cluster management platform layer is not available, a mode is needed to quickly recover the functions of the Kubernetes cluster management platform layer.
In view of this, embodiments of the present disclosure provide a cluster disaster recovery processing method, a cluster disaster recovery processing device, an electronic device, a readable storage medium, and a computer program product. A cluster disaster recovery processing method comprises the following steps: responding to a triggering main cluster fault event, and acquiring a target snapshot file from a network memory; based on the target snapshot file, carrying out data recovery processing on the local storage unit of the standby cluster to obtain a processing result; under the condition that the processing result indicates that the data recovery is successful, starting a service component of the backup cluster; and under the condition that the health detection component is used for determining that the service component is normally started, setting the back-end address of the preset domain name as the address of the backup cluster by using the domain name resolution component, so that a user can access the service component of the backup cluster through the preset domain name.
FIG. 1 schematically illustrates an exemplary system architecture to which cluster disaster recovery processing methods and apparatus may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients and/or social platform software, to name a few.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the cluster disaster recovery processing method provided in the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the cluster disaster recovery processing device provided in the embodiments of the present disclosure may be generally disposed in the server 105. The cluster disaster recovery processing method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the cluster disaster recovery processing device provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Alternatively, the cluster disaster recovery processing method provided by the embodiment of the present disclosure may be performed by the terminal device 101, 102, or 103, or may be performed by another terminal device different from the terminal device 101, 102, or 103. Accordingly, the cluster disaster recovery processing device provided by the embodiment of the present disclosure may also be provided in the terminal device 101, 102, or 103, or in another terminal device different from the terminal device 101, 102, or 103.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flowchart of a cluster disaster backup processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S204.
In operation S201, a target snapshot file is retrieved from network storage in response to triggering a primary cluster failure event.
In operation S202, based on the target snapshot file, data recovery processing is performed on the local storage units of the backup cluster, so as to obtain a processing result.
In operation S203, in the case where the processing result indicates that the data recovery is successful, the service component of the backup cluster is started.
In operation S204, in case that it is determined that the service component is normally started by the health detection component, a back end address of the preset domain name is set as an address of the backup cluster by the domain name resolution component so that the user accesses the service component of the backup cluster through the preset domain name.
According to the embodiment of the disclosure, when the fault domain where the Kubernetes cluster management platform layer is located has the problems of power failure, network disconnection and the like, the Kubernetes cluster management platform layer is not available. And responding to the fault event of the main cluster, and acquiring a target snapshot file from a network storage, wherein the network storage can be NAS storage, and the target snapshot file is a snapshot file closest to the current system time information in the NAS storage, and comprises main cluster etcd data and database data of a platform layer application.
According to the embodiment of the disclosure, the main and standby Kubernetes cluster management platform layers are deployed in two different fault domains respectively, and node names, intra-cluster domain names, cluster configurations and the like of the main and standby Kubernetes management clusters are kept consistent. After the main cluster fails, timeliness of backup cluster data is guaranteed, and based on data in the target snapshot file, data recovery processing is carried out on a local storage unit of the backup cluster, wherein the local storage unit comprises an etcd database and a database applied by a platform layer.
According to the embodiment of the disclosure, in the case that the processing result indicates that the data recovery is successful, a service component of the backup cluster is started to start each service of the backup cluster. Setting a health detection component in the main and standby clusters, setting a back-end address of a preset domain name as an address of the standby cluster by using a domain name resolution component under the condition that the health detection component is used for determining that the service component is normally started, and setting a domain name resolution component in the main and standby clusters for providing domain name resolution service accessed by a Kubernetes cluster management platform layer. And setting the back-end address of the preset domain name as the address of the backup cluster, enabling the domain name resolution component of the backup cluster to be accessible to the outside, and providing services to the outside by the backup cluster.
According to the embodiment of the disclosure, the target snapshot file is acquired from the network memory, the local storage unit of the backup cluster is subjected to data recovery processing based on the target snapshot file, a processing result is obtained, the service component of the backup cluster is started, the technical means that the rear end address of the preset domain name is set as the address of the backup cluster by using the domain name resolution component is adopted, the quick recovery use of the cluster is realized, and the user can continuously access the service component of the cluster under the condition that the domain name is not changed, so that the technical problem of poor timeliness of the cluster disaster backup processing method is at least partially overcome, and the availability of the cluster is effectively improved.
According to an embodiment of the present disclosure, the method further comprises the following operations.
Responding to triggering a first timing task, and generating a first snapshot file based on a local storage unit of the backup cluster; and writing the first snapshot file to the network memory.
According to the embodiment of the disclosure, the snapshot is synchronized to the NAS storage based on the first timing task backup cluster timing, so that the main cluster can serve in time after the main cluster is recovered. The first timing task may be set based on the platform usage frequency, and the first timing task may be set to synchronize information once for 5 minutes. The data exchange across the fault domain is realized through the NAS storage which can be accessed by both the main and the standby clusters, and the cluster access pressure caused by updating the data in the way of accessing the API is avoided.
According to embodiments of the present disclosure, in response to triggering a primary cluster failure event, obtaining a target snapshot file from network storage may include the following operations.
Responding to a trigger main cluster fault event, accessing a network memory based on a preset catalog of a backup recovery component to acquire a plurality of second snapshot files from the network memory, wherein the second snapshot files are configured with timestamp information; and determining a target snapshot file from the plurality of second snapshot files based on the current system time information and the timestamp information of each of the plurality of second snapshot files.
According to embodiments of the present disclosure, NAS storage is mounted to a backup recovery component in the form of a preset catalog, and in response to triggering a primary cluster failure event, NAS storage is accessed based on the preset catalog of the backup recovery component. The backup recovery component synchronizes the etcd data of the main cluster, the database data timing snapshot and the increment information of the platform layer application into the NAS storage, and the backup cluster accesses the NAS storage through the backup recovery component so as to acquire a plurality of second snapshot files.
According to the embodiment of the disclosure, each second snapshot file is configured with timestamp information, and based on the current system time information and the timestamp information of each of the plurality of second snapshot files, a target snapshot file is determined from the plurality of second snapshot files, namely, the snapshot file with the timestamp information closest to the current system time information is selected as the target snapshot file, so that the integrity of data recovery and the timeliness of the backup cluster data are ensured.
Fig. 3 schematically illustrates a technical architecture diagram of a cluster disaster recovery processing method master-slave cluster according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, in a case where it is determined by the health detection component that the service component is started normally, setting the back-end address of the preset domain name as the address of the backup cluster by the domain name resolution component may include the following operations.
Under the condition that the health detection component is used for determining that the service component is started normally, the domain name resolution component is used for modifying the weight associated with the address of the standby cluster into a preset value, wherein the preset value is larger than the weight associated with the address of the main cluster; and setting the back-end address of the preset domain name as the address of the backup cluster in response to the weight associated with the address of the backup cluster being greater than the weight associated with the address of the main cluster.
According to an embodiment of the present disclosure, as shown in fig. 3, a domain name resolution component is provided in both the primary cluster 301 and the backup cluster 302, and the domain name resolution component configures the primary cluster 301 as a primary service and the backup cluster 302 as an emergency backup service by setting different weights. And controlling the automatic switching of the platform layer service through the health monitoring component. The domain name of the Kubernetes cluster management platform layer accessed by the user is unchanged, and under normal conditions, the domain name resolutions of the main cluster 301 and the standby cluster 302 both point to the main cluster 301, and the health monitoring component controls the domain name resolution service of the main cluster 301 to be accessible to the outside, and the main cluster 301 provides service to the outside.
According to an embodiment of the present disclosure, when the primary cluster 301 fails, the health monitoring component of the backup cluster 302 finds that the primary cluster 301 is not available, and in the event that the health detection component determines that the service component is started normally, the weight associated with the address of the backup cluster 302 is modified to a preset value by the domain name resolution component. Wherein the preset value is greater than the weight associated with the address of the primary cluster 301 such that the domain name resolution component of the backup cluster 302 enables externally accessible. The back-end address of the preset domain name is set as the address of the backup cluster 302, the backup cluster 302 provides service to the outside, and the backup recovery component converts the backup cluster 302 into a data source end and synchronizes the incremental data to the NAS storage. And the Kubernetes cluster management platform layer can continuously meet the access requirement of a user, and after the main cluster 301 fails, the standby cluster 302 is automatically started, the failed cluster is isolated, and the management platform layer is quickly recovered.
Fig. 4 schematically illustrates a flowchart of a cluster disaster recovery processing method for primary and backup cluster service switching according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the method further comprises the following operations.
In response to triggering the second timing task, detecting the health state of the main cluster by utilizing the health detection component to obtain a detection result; and triggering a main cluster fault event under the condition that the detection result represents the main cluster fault.
According to an embodiment of the present disclosure, as shown in fig. 4, a health detection component is utilized to detect a health state of a primary cluster, resulting in a detection result. And after the primary and secondary clusters are switched, continuously monitoring the conditions of the primary and secondary clusters in real time according to the health monitoring strategy of the health detection assembly. The following operations may be included in the health detection process.
In operation S401, in the case that the primary cluster is abnormal, a cluster switching program is started.
In operation S402, data of the backup cluster is restored.
In operation S403, in the case that the recovery of the backup cluster data is successful, the backup cluster domain name resolution component provides a service to the outside.
In operation S404, in case that the primary cluster is normal, the cluster switching program is not started.
According to an embodiment of the present disclosure, health of the primary and backup clusters is detected based on the second timed task. Wherein the second timing task may be set to one detection time of 2 minutes. The health detection component comprises detection of external domain name access health and detection of whether the backup cluster access master cluster node IP is normal. The health monitoring results mainly comprise the following categories:
i, healthy, main and standby are not switched: the health monitoring of the external domain name access is normal, and the backup cluster access main cluster node IP is normal;
Unhealthy, active-standby switching: abnormal access health monitoring to the external domain name and abnormal access to the main cluster node IP by the standby cluster;
unhealthy, master-slave switching: the access health monitoring to the external domain name is abnormal, and the access of the backup cluster to the main cluster node IP is normal.
According to an embodiment of the present disclosure, in response to triggering the second timing task, detecting the health status of the primary cluster with the health detection component, resulting in a detection result, may include the following operations.
Accessing a service component of a main cluster through a preset domain name to obtain a first detection result; transmitting a test data packet to the main cluster by using an Internet protocol to obtain a second detection result; and determining a detection result based on the first detection result and the second detection result.
According to the embodiment of the disclosure, the service component of the main cluster is accessed through the preset domain name so as to determine whether the external domain name access is normal or not. The method for determining whether the backup cluster accesses the main cluster node IP normally can be realized by adopting a ping (Internet packet explorer) mode to send a test data packet to the main cluster by utilizing an Internet protocol. Through the detection of the health detection component, technicians can conduct troubleshooting and recovery of the fault cluster under the condition of not interrupting user access, and the influence of cluster faults on the availability of the management platform is reduced to the greatest extent.
It should be noted that, unless there is an execution sequence between different operations or an execution sequence between different operations in technical implementation, the execution sequence between multiple operations may be different, and multiple operations may also be executed simultaneously in the embodiment of the disclosure.
Fig. 5 schematically illustrates a block diagram of a cluster disaster backup processing device according to an embodiment of the present disclosure.
As shown in fig. 5, the cluster disaster recovery processing device 500 includes an identification signal acquisition module 510, a processing module 520, a starting module 530, and a setting module 540.
An obtaining module 510, configured to obtain a target snapshot file from a network storage in response to triggering a failure event of the primary cluster;
the processing module 520 is configured to perform data recovery processing on the local storage unit of the backup cluster based on the target snapshot file, to obtain a processing result;
a starting module 530, configured to start a service component of the backup cluster if the processing result indicates that the data recovery is successful;
the setting module 540 is configured to set, when the health detection component determines that the service component is started normally, a back end address of the preset domain name to be an address of the backup cluster by using the domain name resolution component, so that the user accesses the service component of the backup cluster through the preset domain name.
According to the embodiment of the disclosure, the target snapshot file is acquired from the network memory, the local storage unit of the backup cluster is subjected to data recovery processing based on the target snapshot file, a processing result is obtained, the service component of the backup cluster is started, the technical means that the rear end address of the preset domain name is set as the address of the backup cluster by using the domain name resolution component is adopted, the quick recovery use of the cluster is realized, and the user can continuously access the service component of the cluster under the condition that the domain name is not changed, so that the technical problem of poor timeliness of the cluster disaster backup processing method is at least partially overcome, and the availability of the cluster is effectively improved.
According to an embodiment of the present disclosure, the cluster disaster recovery processing device 500 further includes a generating module and a writing module.
And the generation module is used for responding to triggering the first timing task and generating a first snapshot file based on the local storage unit of the backup cluster.
And the writing module is used for writing the first snapshot file into the network memory.
According to an embodiment of the present disclosure, the acquisition module 510 includes an acquisition sub-module and a first determination sub-module.
An acquisition sub-module for accessing the network memory based on the preset directory of the backup recovery component in response to triggering the failure event of the main cluster to acquire a plurality of second snapshot files from the network memory, wherein the second snapshot files are configured with timestamp information
The first determining sub-module is used for determining a target snapshot file from the second snapshot files based on the current system time information and the time stamp information of each of the second snapshot files.
According to an embodiment of the present disclosure, the setup module 540 includes a modification sub-module and a setup sub-module.
And the modification sub-module is used for modifying the weight associated with the address of the standby cluster to a preset value by using the domain name resolution component under the condition that the health detection component is used for determining that the service component is started normally, wherein the preset value is larger than the weight associated with the address of the main cluster.
And the setting sub-module is used for setting the back-end address of the preset domain name as the address of the standby cluster in response to the fact that the weight associated with the address of the standby cluster is larger than the weight associated with the address of the main cluster.
According to an embodiment of the present disclosure, the cluster disaster recovery processing device 500 further includes a detection module and a trigger module.
And the detection module is used for responding to triggering the second timing task, and detecting the health state of the main cluster by utilizing the health detection component to obtain a detection result.
And the triggering module is used for triggering the fault event of the main cluster under the condition that the detection result indicates the fault of the main cluster.
According to an embodiment of the present disclosure, the detection module includes an access sub-module, a transmission sub-module, and a second determination sub-module.
And the access sub-module is used for accessing the service component of the main cluster through a preset domain name to obtain a first detection result.
And the transmitting sub-module is used for transmitting the test data packet to the main cluster by utilizing the Internet protocol to obtain a second detection result.
And the second determining submodule is used for determining the detection result based on the first detection result and the second detection result.
Any number of the modules, sub-modules, or at least some of the functionality of any number of the modules, sub-modules, according to embodiments of the present disclosure, may be implemented in one module. Any one or more of the modules, sub-modules, according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, according to embodiments of the present disclosure, may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-a-substrate, a system-on-a-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules, sub-modules according to embodiments of the present disclosure may be at least partially implemented as computer program modules that, when executed, perform the corresponding functions.
For example, any of the identification signal acquisition module 510, the processing module 520, the initiation module 530, and the setup module 540 may be combined in one module/unit/sub-unit or any of the modules/units/sub-units may be split into multiple modules/units/sub-units. Alternatively, at least some of the functionality of one or more of these modules/units/sub-units may be combined with at least some of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. At least one of the identification signal acquisition module 510, the processing module 520, the initiation module 530, and the setup module 540 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-a-substrate, a system-on-a-package, an Application Specific Integrated Circuit (ASIC), or by hardware or firmware, such as any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware, in accordance with embodiments of the present disclosure. Alternatively, at least one of the identification signal acquisition module 510, the processing module 520, the initiation module 530, and the setup module 540 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
It should be noted that, in the embodiment of the present disclosure, the cluster disaster recovery processing device portion corresponds to the cluster disaster recovery processing method portion in the embodiment of the present disclosure, and the description of the cluster disaster recovery processing device portion specifically refers to the cluster disaster recovery processing method portion, which is not described herein again.
Fig. 6 schematically illustrates a block diagram of a computer system suitable for implementing a robot in accordance with an embodiment of the disclosure. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 6, a computer electronic device 600 according to an embodiment of the present disclosure includes a processor 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. The processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 601 may also include on-board memory for caching purposes. The processor 601 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM602, and the RAM 603 are connected to each other through a bus 604. The processor 601 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM602 and/or the RAM 603. Note that the program may be stored in one or more memories other than the ROM602 and the RAM 603. The processor 601 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 600 may also include an input/output (I/O) interface 605, the input/output (I/O) interface 605 also being connected to the bus 604. The electronic device 600 may also include one or more of the following components connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 601. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 602 and/or RAM 603 and/or one or more memories other than ROM 602 and RAM 603 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, the program code for causing an electronic device to implement the cluster disaster recovery processing methods provided by the embodiments of the present disclosure when the computer program product is run on the electronic device.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 601. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of signals over a network medium, and downloaded and installed via the communication section 609, and/or installed from the removable medium 611. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (10)

1. A cluster disaster recovery processing method comprises the following steps:
responding to a triggering main cluster fault event, and acquiring a target snapshot file from a network memory;
based on the target snapshot file, carrying out data recovery processing on the local storage unit of the backup cluster to obtain a processing result;
starting a service component of the standby cluster under the condition that the processing result indicates that the data recovery is successful; and
and under the condition that the health detection component is used for determining that the service component is started normally, setting the back-end address of a preset domain name as the address of the backup cluster by using the domain name resolution component, so that a user accesses the service component of the backup cluster through the preset domain name.
2. The method of claim 1, further comprising:
responding to triggering a first timing task, and generating a first snapshot file based on a local storage unit of the standby cluster; and
and writing the first snapshot file into the network memory.
3. The method of claim 1, wherein the obtaining the target snapshot file from the network storage in response to triggering the primary cluster failure event comprises:
in response to triggering the primary cluster failure event, accessing the network memory based on a preset directory of a backup recovery component to obtain a plurality of second snapshot files from the network memory, wherein the second snapshot files are configured with timestamp information; and
and determining the target snapshot file from the second snapshot files based on the current system time information and the time stamp information of each of the second snapshot files.
4. The method of claim 1, wherein the setting, with a domain name resolution component, a backend address of a preset domain name as the address of the backup cluster in the case where the normal start of the service component is determined with a health detection component, comprises:
modifying, with the domain name resolution component, a weight associated with an address of the backup cluster to a preset value, where the preset value is greater than the weight associated with the address of the primary cluster, upon determining, with the health detection component, that the service component is normally started; and
And setting the back-end address of the preset domain name as the address of the standby cluster in response to the weight associated with the address of the standby cluster being greater than the weight associated with the address of the main cluster.
5. The method of claim 1, further comprising:
in response to triggering a second timing task, detecting the health state of the main cluster by utilizing the health detection component to obtain a detection result; and
and triggering the main cluster fault event under the condition that the detection result indicates the main cluster fault.
6. The method of claim 5, wherein the detecting, in response to triggering a second timing task, the health status of the primary cluster with the health detection component, results in a detection result, comprising:
accessing a service component of the main cluster through the preset domain name to obtain a first detection result;
transmitting a test data packet to the main cluster by using an Internet protocol to obtain a second detection result; and
the detection result is determined based on the first detection result and the second detection result.
7. A cluster disaster recovery processing device, comprising:
the acquisition module is used for responding to the triggering of the main cluster fault event and acquiring a target snapshot file from the network memory;
The processing module is used for carrying out data recovery processing on the local storage units of the standby clusters based on the target snapshot file to obtain a processing result;
the starting module is used for starting the service component of the backup cluster under the condition that the processing result indicates that the data recovery is successful;
and the setting module is used for setting the back-end address of a preset domain name as the address of the backup cluster by using the domain name resolution component under the condition that the health detection component is used for determining that the service component is normally started, so that a user can access the service component of the backup cluster through the preset domain name.
8. An electronic device, comprising:
one or more processors;
a memory for storing one or more instructions,
wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 6.
9. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to implement the method of any of claims 1 to 6.
10. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1 to 6 when executed.
CN202310271288.5A 2023-03-16 2023-03-16 Cluster disaster recovery processing method and device, electronic equipment and storage medium Pending CN116302352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310271288.5A CN116302352A (en) 2023-03-16 2023-03-16 Cluster disaster recovery processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310271288.5A CN116302352A (en) 2023-03-16 2023-03-16 Cluster disaster recovery processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116302352A true CN116302352A (en) 2023-06-23

Family

ID=86801000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310271288.5A Pending CN116302352A (en) 2023-03-16 2023-03-16 Cluster disaster recovery processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116302352A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116760835A (en) * 2023-08-15 2023-09-15 深圳华锐分布式技术股份有限公司 Distributed storage method, device and medium
CN116996369A (en) * 2023-09-26 2023-11-03 苏州元脑智能科技有限公司 Containerized management server, main and standby management method and device thereof, and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116760835A (en) * 2023-08-15 2023-09-15 深圳华锐分布式技术股份有限公司 Distributed storage method, device and medium
CN116760835B (en) * 2023-08-15 2023-10-20 深圳华锐分布式技术股份有限公司 Distributed storage method, device and medium
CN116996369A (en) * 2023-09-26 2023-11-03 苏州元脑智能科技有限公司 Containerized management server, main and standby management method and device thereof, and storage medium
CN116996369B (en) * 2023-09-26 2024-02-09 苏州元脑智能科技有限公司 Containerized management server, main and standby management method and device thereof, and storage medium

Similar Documents

Publication Publication Date Title
US10152382B2 (en) Method and system for monitoring virtual machine cluster
US8910172B2 (en) Application resource switchover systems and methods
US10055300B2 (en) Disk group based backup
CN116302352A (en) Cluster disaster recovery processing method and device, electronic equipment and storage medium
US9450700B1 (en) Efficient network fleet monitoring
US8856592B2 (en) Mechanism to provide assured recovery for distributed application
US11509505B2 (en) Method and apparatus for operating smart network interface card
US9098439B2 (en) Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs
CN107480014B (en) High-availability equipment switching method and device
US10394670B2 (en) High availability and disaster recovery system architecture
WO2016183967A1 (en) Failure alarm method and apparatus for key component, and big data management system
CN111949444A (en) Data backup and recovery system and method based on distributed service cluster
CN106941420B (en) cluster application environment upgrading method and device
CN116996369B (en) Containerized management server, main and standby management method and device thereof, and storage medium
CN110633046A (en) Storage method and device of distributed system, storage equipment and storage medium
CN111338834B (en) Data storage method and device
US8977595B1 (en) Message-recovery file log locating and monitoring
CN114338684B (en) Energy management system and method
CN116302716A (en) Cluster deployment method and device, electronic equipment and computer readable medium
WO2019241199A1 (en) System and method for predictive maintenance of networked devices
CN112685486B (en) Data management method and device for database cluster, electronic equipment and storage medium
JP6394212B2 (en) Information processing system, storage device, and program
US10666724B1 (en) Geo-replicated IoT hub
CN108920164A (en) The management method and device of host in cloud computing system
US8799926B1 (en) Active node detection in a failover computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination