CN113472556A - Backup method, backup device and server cluster system - Google Patents

Backup method, backup device and server cluster system Download PDF

Info

Publication number
CN113472556A
CN113472556A CN202010242187.1A CN202010242187A CN113472556A CN 113472556 A CN113472556 A CN 113472556A CN 202010242187 A CN202010242187 A CN 202010242187A CN 113472556 A CN113472556 A CN 113472556A
Authority
CN
China
Prior art keywords
node
container
state
standby
container application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010242187.1A
Other languages
Chinese (zh)
Inventor
董铎
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010242187.1A priority Critical patent/CN113472556A/en
Publication of CN113472556A publication Critical patent/CN113472556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application provides a backup method, a backup device and a server cluster system. The server cluster system comprises a first node and a second node, wherein the first node is a main node, the second node is a standby node, and one or more container applications are respectively deployed in the first node and the second node; and when the set switching condition is met, switching the first node into a standby node and switching the second node into an active node so that the second container application in the second node continues to operate the service according to the synchronized data. Therefore, in the present application, the server cluster system can implement data backup at a container application level.

Description

Backup method, backup device and server cluster system
Technical Field
The present application relates to the field of communications technologies, and in particular, to a backup method, an apparatus, and a server cluster system.
Background
A server cluster system, which is a system for ensuring service continuity, generally has two or more nodes, wherein the server cluster system including two nodes may be called a dual-machine cluster system, and the two nodes are divided into an active node and a standby node. A node that is performing a service is generally referred to as an active node (or primary node) and a node that is a backup of the active node is referred to as a standby node. When the active node has a problem, which causes that the running service can not run normally, the standby node is switched to the active node to continue to execute the service, so that the service is not interrupted or interrupted for a short time.
The current dual-computer cluster system only relates to the backup of nodes and operating system levels, and can not make the backup granularity more detailed.
Disclosure of Invention
The application provides a backup method, a backup device and a server cluster system, which can realize backup of a container application level in the server cluster system.
The above and other objects are achieved by the features of the independent claims. Further implementations are presented in the dependent claims, the description and the drawings.
In a first aspect, a backup method is provided, where the method is applied to a server cluster system, the server cluster system includes a first node and a second node, the first node is a primary node, the second node is a standby node, and one or more container applications are respectively deployed in the first node and the second node, and the method includes: synchronizing the data of the service to the second node in the process of operating the service by a first container application in the first node; and when the set switching condition is met, switching the first node into a standby node and switching the second node into an active node so that the second container application in the second node continues to operate the service according to the synchronized data.
In the embodiment of the application, the data backup at the container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined relative to the backup at the node level or the operating system level between the two nodes.
In one possible design, the switching condition may include, but is not limited to, at least one of the following conditions:
disconnecting the network where the first node is located;
the first node is down;
a service in at least one container application in the first node runs abnormally;
the business in the key container application in the first node runs abnormally;
the state of the second container application in the second node is more optimal relative to the first container application in the first node;
a container application that operates normally exists in the second node.
It should be noted that the foregoing illustrates several switching conditions, and may also be other switching conditions, which are not limited in the embodiments of the present application. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In one possible design, the type of container application in the first node or the second node may include at least one of: cold-spare vessels, hot-spare vessels, critical vessels or non-critical vessels.
It should be noted that the above illustrates several types of container applications, and may be other types, which are not limited in the embodiments of the present application. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In one possible design, the first container application and the second container application may be the same type of container application. For example, after the first node is switched from the active node to the standby node, the service in the hot standby container in the first node is continuously operated by the hot standby container in the second node, and usually, the service in the hot standby container in the first node is a service that requires a higher delay, and when the second node is in the standby state, the hot standby container in the second node is in the operating state, so when the second node is switched from the standby state to the active state, the hot standby container therein does not need to be started, and the efficiency is higher. For another example, after the first node is switched from the active node to the standby node, the service in the cold-standby container in the first node is continuously operated by the cold-standby container in the second node, and usually, the service in the cold-standby container in the first node is a service with a low requirement on delay, while the second node is in the standby state, the cold-standby container in the second node is in the closed state, and when the second node is switched from the standby state to the active state, the cold-standby container operates the service after being started, so that power consumption is reduced.
In one possible design, after the first node is switched to the standby node, the cold standby container in the first node may also be shut down. Therefore, when the first node is used as the active node, the cold-standby container therein can be in an operating state, and after the first node is switched to a standby state, the cold-standby container can be stopped from operating, so as to save power consumption.
In a possible design, the switching the first node to the standby node may specifically be configured to switch a first high availability HA software in the first node to a standby state, so as to control the first HA software to notify the first container application to switch to the standby state through a first container mailbox in the first node.
In this embodiment of the present application, a first container application is deployed in a first node, the first container application communicates with a first HA software through a first container mailbox, and after the first HA software is switched to a standby state, the first container application is notified through the first container mailbox to be switched to the standby state. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In a possible design, controlling the first HA software to notify, through a first container mailbox in the first node, that the first container application is switched to the standby state, and specifically, controlling the first HA software to write a first state change message to the first container mailbox, where the first state change message is used to indicate that the first HA software is switched from the active state to the standby state; and after the first container application monitors that the first container mailbox is written with a first state change message, switching the first container to be in a standby state.
In this embodiment of the present application, after monitoring that the first container mailbox is written in the first state change message, the first container application in the first node switches its own state to the standby state, and completes the process of switching the first node from the active node to the standby node. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In a possible design, the switching of the second node to the active node may specifically switch the second high availability HA software in the second node to the active state, so as to control the second HA software to notify, through the second container mailbox in the second node, that the second container application is switched to the active state.
In this embodiment of the present application, a second container application is deployed in a second node, the second container application communicates with a second HA software through a second container mailbox, and after the second HA software is switched to an active state, the second container application is notified through the second container mailbox that the second container application is switched to the active state. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In one possible design, controlling the second HA software to notify, through a second container mailbox in the second node, that the second container application is switched to the active state includes: controlling the second HA software to write a second state change message into the second container mailbox, wherein the second state change message is used for indicating that the second HA software is switched from a standby state to a main state; and controlling the second container application to switch the second container application to be in the active state after monitoring that the second container mailbox is written with the second state change message.
In this embodiment of the present application, after monitoring that the second container mailbox is written in the first state change message, the second container application in the second node switches its own state to the active state, and completes the process of switching the second node from the standby node to the active node. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In a possible design, during the process of running a service by a first container application in the first node, before synchronizing data of the service into the second node, it may be further determined that a state of the first container application in the first node is more optimal relative to a state of a second container application in the second node; or determining that the time length of the first node as the main node is longer than the time length of the second node as the main node; or determining that the length of a character string of first identification information for uniquely identifying the first node is smaller than the length of a character string of second identification information for uniquely identifying the second node.
That is, the first node and the second node determine which of the first node and the second node is the active node according to the state of the container application, the duration of the active node, or the length of the character string of the identification information. Therefore, in the embodiment of the application, data backup at a container application level can be realized between two nodes in the server cluster system, and the backup granularity is refined.
In a second aspect, a backup apparatus is further provided, where the backup apparatus is applied to a server cluster system, the server cluster system includes a first node and a second node, the first node is a primary node, the second node is a standby node, and one or more container applications are respectively deployed in the first node and the second node. The backup device comprises a communication unit, a first storage unit and a second storage unit, wherein the communication unit is used for synchronizing the data of the service to the second node in the process of operating the service by a first container application in a first node in a server cluster system; and the processing unit is used for switching the first node into a standby node and switching the second node into an active node when the set switching condition is met, so that the second container application in the second node continues to operate the service according to the synchronized data.
In one possible design, the switching condition includes at least one of the following conditions:
disconnecting the network where the first node is located;
the first node is down;
a service in at least one container application in the first node runs abnormally;
the business in the key container application in the first node runs abnormally;
the state of the second container application in the second node is more optimal relative to the first container application in the first node;
a container application that operates normally exists in the second node.
In one possible design, the type of container application in the first node or the second node may include at least one of: cold-spare vessels, hot-spare vessels, critical vessels or non-critical vessels. For example, the first container application and the second container application may be the same type of container application.
In one possible design, after switching the first node to the standby node, the processing unit is further configured to: and stopping running the cold-standby container in the first node.
In a possible design, when the processing unit is configured to switch the first node to the standby node, the processing unit may specifically be configured to: and switching the first high-availability HA software in the first node into a standby state, and controlling the first HA software to inform the first container application of switching into the standby state through a first container mailbox in the first node.
In a possible design, when the processing unit is configured to control the first HA software to notify, through a first container mailbox in the first node, that the first container application is switched to the standby state, the processing unit may be specifically configured to control the first HA software to write a first state change message to the first container mailbox, where the first state change message is used to instruct the first HA software to switch from the active state to the standby state; and controlling the first container application to switch the state of the first container to a standby state after monitoring that the first container mailbox is written with the first state change message.
In a possible design, when the processing unit is configured to switch the second node to the active node, the processing unit may be specifically configured to switch the second high availability HA software in the second node to the active state, and control the second HA software to notify, through a second container mailbox in the second node, that the second container application is switched to the active state.
In a possible design, when the processing unit is configured to control the second HA software to notify, through a second container mailbox in the second node, that the second container application is switched to the active state, the processing unit may be specifically configured to control the second HA software to write a second state change message to the second container mailbox, where the second state change message is used to instruct the second HA software to switch from the standby state to the active state; and controlling the second container application to switch the state of the second container application to be the active state after monitoring that the second container mailbox is written with the second state change message.
In a possible design, the processing unit is further configured to determine that a state of a first container application in a first node in the server cluster system is more optimal relative to a state of a second container application in a second node before synchronizing data of a service into the second node in a process in which the first container application runs the service; or determining that the time length of the first node as the main node is longer than that of the second node as the main node; or, determining that the length of a character string of first identification information for uniquely identifying the first node is smaller than the length of a character string of second identification information for uniquely identifying the second node.
In a third aspect, a backup apparatus is further provided, where the backup apparatus is applied to a server cluster system, the server cluster system includes a first node and a second node, the first node is a primary node, the second node is a standby node, and one or more container applications are respectively deployed in the first node and the second node. The backup device comprises one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the backup apparatus, cause the backup apparatus to perform the method as provided in the first aspect above.
It should be noted that, the backup apparatus provided in the second aspect or the third aspect may be the first node in the server cluster system or a component in the first node; alternatively, the backup apparatus may also be the second node in the server cluster system or a component within the second node; alternatively, the backup apparatus may also be another node or device in the server cluster system except for the first node and the second node, for example, the backup apparatus may be a master control device in the server cluster system, which is not limited in this embodiment of the present application.
In a fourth aspect, a server cluster system is further provided, which includes at least two server nodes, where each server node is respectively deployed with one or more container applications; and the backup apparatus provided in the second aspect or the third aspect described above.
It should be noted that, it is assumed that the at least two server nodes include a first node and a second node; the backup device may be the first node or a component within the first node; or, it may also be the second node or a component within the second node; alternatively, the node or the device in the server cluster system may be other nodes or devices except for the first node and the second node, for example, the node or the device may be a master control device, which is not limited in this embodiment of the present application.
In a fifth aspect, a computer-readable storage medium is also provided, which comprises a computer program, which, when run on a computer, causes the computer to perform the method as provided in any one of the possible implementations of the first aspect.
In a sixth aspect, a computer program product is also provided, the computer program product storing a computer program, the computer program comprising program instructions that, when executed by a computer, enable the computer to perform the method as provided in any one of the possible implementations of the first aspect.
For the beneficial effects of the second aspect to the sixth aspect, please refer to the description of the beneficial effects of the first aspect, and the description is not repeated here.
Drawings
FIG. 1 is a diagram of a prior art dual-computer cluster system;
fig. 2 is a schematic diagram of a dual-computer cluster system according to an embodiment of the present application;
fig. 3 is a schematic diagram of an application scenario provided in an embodiment of the present application;
fig. 4 is a schematic diagram of a backup method according to an embodiment of the present application;
fig. 5 is a schematic diagram of another backup method according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating a dual-computer cluster system according to an embodiment of the present application to implement active/standby switching;
fig. 7 is a schematic diagram of a backup device according to an embodiment of the present application;
fig. 8 is a schematic diagram of a backup device according to an embodiment of the present application.
Detailed Description
For convenience of understanding, terms related to the embodiments of the present application are explained below.
(1) The dual-computer cluster system generally includes two or more nodes, which are divided into an active node (which may be called a primary node) and a standby node. The node that is performing the traffic is usually called the active node, and a backup node that is the active node is called the standby node. When the active node has a problem, which causes that the running service can not run normally, the standby node is switched to the active node to continue to execute the service by continuing the original active node, thereby realizing the uninterrupted or short-time interruption of the service. Generally, a dual-cluster system implements data backup, active/standby switching, and the like through High Availability (HA) software to ensure service continuity. FIG. 1 illustrates an architecture of a prior art dual-computer cluster system; take the architecture as including two nodes, i.e., a first node and a second node. When the first node is a main node, the second node is a standby node; when the first node is a standby node, the second node is a primary node. The first node includes HA software, where the HA software includes a dual-machine Arbitration Module (HAARB), a dual-machine Resource Management Module (HARM), and a dual-machine Synchronization Module (HA Synchronization Module). The HAARB is used for arbitrating whether the first node is a main node or a standby node; the HARB is used to manage resources in the first node; the HASYNC is used for synchronizing the service data in the first node to the second node when the first node is used as the main node. The second node also comprises HA software, wherein the HA software also comprises a dual-machine arbitration module HAARB, a dual-machine resource management module HARM and a dual-machine synchronization module HASYNC. The HAARB is used for arbitrating whether the second node is a main node or a standby node; the HARB is used to manage resources in the second node; the HASYNC is used for synchronizing the service data in the second node to the first node when the second node is used as the main node.
(2) The container application can be called a container for short, and is a lightweight, portable and self-contained software packaging technology, and an application program (APP for short) can be run in the container and used for realizing distributed running and management of the application program. One or more application programs may be run in one container application. Types of container applications may include cold-spare containers, hot-spare containers, critical containers or non-critical containers, and the like. In the embodiment of the present application, for convenience of description, the container application may be simply referred to as a container, for example, the container 1, the container 2, and the like.
(3) At least one embodiment of the present application relates to one or more of; wherein a plurality means greater than or equal to two. In addition, it is to be understood that the terms first, second, etc. in the description of the present application are used for distinguishing between the descriptions and not necessarily for describing a sequential or chronological order.
The embodiment of the application provides a dual-computer cluster system, which includes at least two server nodes, each server node is deployed with one or more container applications, and the dual-computer cluster system can realize data backup at the container application level in different nodes, and has smaller granularity compared with data backup at the operating system level or the node level.
Fig. 2 shows a schematic architecture diagram of a dual-computer cluster system according to an embodiment of the present application. The architecture includes two nodes, a first node and a second node. When the first node is used as the main node, the second node is a standby node, and when the second node is used as the main node, the first node is a standby node. The first node may include therein first HA software, a first container mailbox (Mbox), and one or more container applications. Wherein, the first HA software comprises HAARB, HARM and SYNC. Furthermore, a container mailbox service (Mbox-server) is also deployed in the first HA software. A first container mailbox (Mbox) is used to enable communication between first HA software and one or more container applications in the first node. An Mbox-client is deployed in each container application. For example, an Mbox-client1 is deployed in container 1, an Mbox-client2 is deployed in container 2, and an Mbox-client3 is deployed in container 3. The Mbox-server is deployed in the first HA software, and the Mbox-client is deployed in the container application, so that the Mbox-server and the Mbox-client realize communication through the first Mbox, that is, the container application in the first node and the first HA software communicate through the first Mbox.
The second node includes second HA software, a second container mailbox (Mbox), and one or more container applications. The second HA software comprises HAARB, HARM and SYNC, and the second HA software also deploys container mailbox service (Mbox-server). A second container mailbox (Mbox) is used to enable communication between second HA software and one or more container applications in the second node. An Mbox-client is deployed in each container application. The Mbox-server is deployed in the second HA software, and the Mbox-client is deployed in the container application, so that the Mbox-server and the Mbox-client realize communication through the second Mbox, that is, the container application in the second node communicates with the second HA software through the second Mbox.
Assuming that a first node is a main node, a second node is a standby node, and one or more application programs are operated in a first container application in the first node; when the service in the first container application is abnormally operated, the one or more application services can be continuously operated through the second container application in the second node and continuing to operate the first container application in the first node.
In some embodiments, the dual-computer cluster system further includes a main control device, where the main control device may be configured to manage the first node and the second node, and for example, the main control device is configured to switch the first node from the active node to the standby node and switch the second node from the standby node to the active node when a set switching condition is met. In other embodiments, the dual-computer cluster system may not include the main control device, and the first node switches from the active node to the standby node when meeting the set switching condition, and notifies the second node to switch from the standby node to the active node. In other embodiments, the dual-computer cluster system may not include the main control device, and the second node switches from the standby node to the active node when the set switching condition is satisfied, and notifies the first node to switch from the active node to the standby node. Hereinafter, the dual-computer cluster system is described as an example without the main control device.
Therefore, in the embodiment of the application, the first node can back up data by taking the container application as a unit, and all data in the node does not need to be backed up each time, so that the efficiency can be improved.
Fig. 3 is a schematic diagram illustrating an application scenario (or networking deployment) provided in an embodiment of the present application. As shown in fig. 3, the HA software may be used by a first node and a second node to guarantee service continuity, where a container application is deployed and a service is provided outside through the container, for example, the container application is connected to the outside through a switch. For example, a service IP may be set, and the dual-computer cluster system specifies on which node the service IP provides a service. If the first node is arbitrated as the active node, the system designates the first node to occupy the service IP, and the container application running on the first node provides service to the outside through the service IP. And if the first node meets the switching condition, triggering main-standby switching, arbitrating the original standby node, namely the second node, as the main node after switching, occupying the service IP, and ensuring the continuity of the external service through container application in the second node.
In conjunction with the above description, the following describes specific implementations of embodiments of the present application.
The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the embodiments of the present application, "one or more" means one, two, or more than two; "and/or" describes the association relationship of the associated objects, indicating that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "is a relationship generally indicating that the former and latter associated objects are an" or ".
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Example 1
Fig. 4 is a flowchart illustrating a backup method according to an embodiment of the present application. The method may be applied to the architecture shown in fig. 2. The method comprises the following steps:
s400, the first container in the main node applies the operation service.
One or more container applications are deployed in the main node, and the types of the container applications can include a cold standby container, a hot standby container, a critical container or a non-critical container, and the like. The different types of container applications in the active node have different states or functions, for example, the cold standby container is in an operating state and service enable therein, the hot standby container is in an operating state and service enable therein, the key container is used for operating a key application, the non-key container is used for operating a non-key application, and the key application and the non-key application may be specified by a user or set by default.
S401, the data of the active node and the standby node are synchronized.
In some embodiments, the active node synchronizes data to the standby node in units of container applications, which may be understood as that the active node backs up service data in one container application to the standby node as a group of data, data backups of different container applications in the active node may be independent of each other, and backup periods or frequencies may be the same or different.
Illustratively, the active node synchronizes the service data in the first container application to the standby node, and identifies that the service data are of the first container application in the active node, so that the standby node finds the second container application in itself to continue the service running of the first container application. Alternatively, the first container application and the second container application may be the same type of container application. For example, the active node sends the service data and the type one of the first container application running the service data to the standby node, and the standby node determines the second container application of the same type in the standby node according to the type of the first container application. The type of the container application may include a cold spare container, a hot spare container, a critical container or a non-critical container, etc., and the embodiments of the present application are not limited thereto.
Optionally, the container applications in the active node and the standby node may be in a one-to-one relationship or a many-to-one relationship, which is not limited in this embodiment of the present application. The primary node or the standby node may establish a correspondence between container applications in the primary node and the standby node. For example, container 1 in the active node corresponds to container 1 in the standby node, container 2 in the active node corresponds to container 2 in the standby node, and container 3 in the active node corresponds to container 3 in the standby node. In this way, when the service in the container 1 in the active node is abnormal, the container 1 in the standby node may continue to run the service by following the container 1 in the active node.
S402, the main node determines the switching condition. Illustratively, the handover condition may include, but is not limited to, at least one of:
the network of the main node is disconnected;
the main node is down;
the service in at least one container application in the main node is abnormally operated;
the service in the key container application in the main node is abnormally operated;
the state of the container application in the standby node is better than that of the container application in the main node;
there is a container application in the standby node that is running normally.
The above-mentioned that the state of the container application in the standby node is more optimal than the state of the container application in the active node may include: the number of the container applications in the standby node is larger than that in the main node; or the number of the key container applications in the standby node is greater than that of the key container applications in the main node, or the number of the started key container applications in the standby node is greater than that of the started key container applications in the main node; or the number of the hot standby container applications in the standby node is larger than that in the active node, and the like. For example, the active node or the standby node may periodically send an inquiry message to the other side, where the inquiry message is used to inquire the container status in the other side, for example, the number of containers, the number of critical containers, the started critical container data, the number of hot standby containers, and so on, so as to determine whether to need to switch. Or the state of the container application in the standby node is more optimal relative to the state of the container application in the active node, including: the state of the second container application in the standby node is superior to the state of the first container application in the first node. For example, the second container application may perform better than the first container application.
The above conditions may be used alone or in combination. For example, after determining that the service in the key container application in the active node is abnormal, the active node may also determine whether the container application in the standby node is normally operating, if the container application in the standby node is abnormal, for example, suspended, the active-standby switching is not performed, and if the container application in the standby node is normally operating, the active-standby role switching is performed.
S403, the active node sends a switching request to the standby node, where the switching request is used to request active-standby switching.
S404, the standby node is switched to be the main node, and the second container application in the standby node continues to operate the service in the first container application after continuing to operate the original main node. As previously described, the first container application is of the same type as the second container application.
It should be noted that the standby node may also include different types of container applications, such as cold standby containers, hot standby containers, critical containers, or non-critical containers, and so on. The state of the container application in the standby node may be different from the state of the container application in the active node, e.g., the cold-standby container in the standby node is in an off state (not running) while the cold-standby container in the active node is in a running state and where the service is enabled. For another example, the hot standby container in the standby node is in a running state but the service is not enabled, and the hot standby container in the active node is in a running state and the service is enabled. Therefore, after the original standby node is switched to the active node, the state of each container application in the original standby node may be modified, for example, assuming that the second container application is a cold standby container, the second container application is activated first, and then the service in the first container is run through the second container application; assuming that the second container application is a hot standby container, the service in the first container may be run directly through the second container application without activation (because the hot standby container is always in a running state).
S405, the standby node sends the switching completion indication to the main node.
S406, the active node is switched to the standby node.
After the original primary node is switched to the standby node, the state of the container application in the original primary node can be modified. For example, the cold standby container in the original master node is deactivated, i.e., switched to the non-running state, and the hot standby container in the original master node continues to maintain the running state but the service therein is no longer enabled. The critical container in the original master node can continue to keep the service in the running state from being enabled any more, and the non-critical container application in the original master node can continue to keep or not keep the running state.
In the above embodiment 1, the active node initiates the switching. Another embodiment is described below in which a standby node initiates a switch.
Example 2
Fig. 5 shows a flowchart of another container application backup method provided in an embodiment of the present application. The method comprises the following steps:
s500, the first container in the main node applies the operation service.
S501, service data applied by a first container in the main node is synchronized to the standby node.
S502, the standby node determines that a handover condition is satisfied, where the handover condition may include at least one of:
the network of the main node is disconnected;
the main node is down;
the service in at least one container application in the main node is abnormally operated;
the service in the key container application in the main node is abnormally operated;
the state of the container application in the standby node is better than that of the container application in the main node;
there is a container application in the standby node that is running normally.
In some embodiments, the standby node may periodically send a probe message to the primary node, and if the feedback message of the primary node is not received within a preset time, it is determined that the network of the primary node is disconnected or the primary node is down; or, the standby node periodically sends inquiry information to the main node, the inquiry information is used for inquiring whether the container of the main node is abnormal or not, and if the information that the container is abnormal is fed back by the main node, the switching condition is determined to be met.
S503, the standby node sends a switching instruction to the main node, for instructing the main/standby switching.
S504, the main node sends the switching agreeing indication to the standby node. Alternatively, when the standby node determines that the switching condition is satisfied, the standby node may be automatically switched to the active node, so step S502 and step S503 are optional steps, and are indicated by dotted lines in the figure.
And S505, the standby node is switched to be the main node, and the second container application in the standby node continues to run the service in the first container application in the original main node.
S506, the standby node sends a switching completion indication to the main node.
S507, the main node is switched to a standby node.
Before step S400 or step S500 in the above embodiment, other steps may also be included, for example, two nodes establish a heartbeat connection, and make a primary-secondary decision, for example, make a primary-secondary decision after establishing a heartbeat connection. The method for the two nodes to decide the active/standby mode for the first time includes but is not limited to at least one of the following modes 1 to 3:
in the method 1, the first node determines that the container state of the container application in the first node is better than the container state of the container application in the second node, determines that the first node is the active node, and notifies the second node of being the standby node. The container status of the first node as described herein may preferably include that the number of container applications in the first node is greater than the number of container applications in the second node; or the number of the key container applications in the first node is greater than the number of the key container applications in the second node, or the number of the started key container applications in the first node is greater than the number of the started key container applications in the second node; or the number of hot standby container applications in the first node is larger than that in the second node, and the like.
In the manner 2, when the first node determines that the time length of the first node as the active node is longer than the time length of the second node as the active node, it determines that the first node is the active node and the second node is the standby node.
In the method 3, when the first node determines that the number of bytes (or the size of a character string) occupied by first identification information (e.g., an IP address) for uniquely identifying the first node is smaller than the number of bytes (or the size of a character string) occupied by second identification information (e.g., an IP address) for uniquely identifying the second node, it determines that the first node is an active node and the second node is a standby node.
Any two or more of the above-described modes 1 to 3 may be used in combination. For example, when the first node determines that the container state of the container application in the first node is consistent with the container state of the container application in the second node, it determines whether the duration of the first node as the active node is consistent with the duration of the second node as the active node; if the byte number occupied by the first identification information of the first node is consistent with the byte number occupied by the second identification information of the second node, the byte number occupied by the first identification information of the first node is compared with the byte number occupied by the second identification information of the second node, if the byte number occupied by the first identification information of the first node is smaller than the byte number occupied by the second identification information of the second node, the first node is determined to be a main node, and the second node is determined to be a standby node.
Another embodiment is described below, which describes the internal information interaction process between the active node and the standby node.
Example 4
Referring to fig. 6, a process of the active node initiating the active/standby switching may include: when the container application in the main node is abnormally operated, the first HA software can be notified through a first container mailbox (Mbox). For example, assuming that the container 1 service in the primary node is abnormal, the container 1 may "send a mail" to the first Mbox through the Mbox-client1, where the mail includes a primary/secondary switching request for requesting to switch to the secondary node. The Mbox-server in the first HA software "receives mail" from the first Mbox, determining that the service in container 1 is abnormal. The reason why the Mbox-client1 sends the mail to the first Mbox is that the Mbox-client1 writes the main/standby switching request information into the file in the first Mbox. The Mbox-server "receiving mail" from the first Mbox may be understood as the Mbox-server listening to the first Mbox for file changes, e.g., periodic listening, or when a file in the first Mbox is modified or information is written, the Mbox-server receiving a notification message and then reading the information in the first Mbox.
After the mbox-server determines that the service of the container 1 is abnormal, the HAARB is informed of the abnormal service through the HARM, and the HARRB arbitrates whether to switch roles. There are various ways for the HARRB to arbitrate whether to switch the primary and standby roles, and in the way 1, the HARRB determines that the container 1 service is abnormal, and determines that the roles are switched. In the mode 2, the HARRB determines that the container 1 is abnormal in service, determines that the container 1 is a key container, and determines that the roles are switched. In mode 3, the HARRB determines that the container 1 is abnormal in service, determines that the container in the standby node is normal, and determines that the role is switched.
After determining that the main and standby roles are switched, the HARRB arbitrates to send a main and standby switching request to the standby node, when the standby node agrees to switch to the main node, the HARRB can send an agreement switching instruction to the original main node, the original main node is switched to the standby node, and the original standby node is switched to the main node.
The process of switching the original standby node to the active node may include: the second HA software in the original standby node switches its own state to the active state, for example, the second HA software in the active state stores the synchronous data sent by the original standby node in the corresponding position in the second node. The HARM in the second HA software sends an email to the second Mbox through the internally deployed Mbox-server, wherein the email comprises a second state change message, and the second state change message is used for indicating that the second HA software is switched to the main node. The Mbox-client1 to Mbox-client3 respectively receive mails from the second Mbox, read the mail content, that is, the content of the second state change message, and determine that the second HA software is switched to the primary node. The mbox-client1 to mbox-client3 control the corresponding container application to be switched from the standby state to the main state. Specifically, the process of "sending mail" from HARM in the second HA software to the second Mbox through the Mbox-server may include: the Mbox-server in the second HA software may write a second state change message to the second Mbox. The process of Mbox-client1 through Mbox-client3 "receiving mail" from the second Mbox, respectively, may include: mbox-client 1-Mbox-client 3 respectively monitor that a second Mbox is written to a second state change message; alternatively, after the second HA software writes the second state change message to the second Mbox, the second Mbox actively notifies the Mbox-client1 to Mbox-client3 so that the Mbox-client1 to Mbox-client3 read information from the second Mbox.
The step of switching the corresponding container application from the standby state to the active state by the mbox-client 1-3 may include: assuming that the container 1 in the original standby node is a cold standby container, since the cold standby container is in a closed state before the original standby node is switched to the primary node, after the original standby node is switched to the primary node, the mbox-client1 controls the container 1 to activate and enable a service. Assuming that the container 2 in the original standby node is a hot standby container, since the hot standby container is in an operating state but service therein is not enabled before the original standby node is switched to the active node, after the original standby node is switched to the active node, the container 2 continues to maintain the operating state and service therein is enabled.
The process of switching the primary node to the standby node may include: the first HA software in the original primary node switches its own state to a standby state, for example, the first HA software in the standby state sends synchronization data to the original standby node. The HARM in the first HA software "sends a mail" to the first Mbox through the Mbox-server, wherein the mail comprises a first status change message which is used for indicating that the first HA software is switched to be a standby node. The Mbox-client1 to Mbox-client3 respectively receive mails from the first Mbox, read the mail content, i.e. the content of the first status change message, and determine that the first HA software is switched to the standby node. The mbox-client1 to mbox-client3 control the corresponding container application to be switched from the main state to the standby state. Specifically, the process of "sending mail" from the HARM in the first HA software to the first Mbox through the Mbox-server may include: the Mbox-server in the first HA software may write a first state change message to the first Mbox. The process of Mbox-client1 through Mbox-client3 respectively "receiving mail" from the first Mbox may include: mbox-client 1-Mbox-client 3 respectively monitor that the first Mbox is written to the first state change message; alternatively, after the first HA software writes the first state change message to the first Mbox, the first Mbox actively notifies the Mbox-client1 to Mbox-client3 so that the Mbox-client1 to Mbox-client3 read information from the first Mbox.
The step of switching the corresponding container application from the active state to the standby state by the mbox-client 1-3 may include: assuming that the container 1 in the original master node is a cold-standby container, the mbox-client1 controls the container 1 to stop service and deactivate to make it in a closed state. Assuming that the container 2 in the original primary node is a hot standby container, the hot standby container is in a running state and service in the hot standby container is enabled before the original primary node is switched to the standby node, and after the original primary node is switched to the standby node, the container 2 continues to keep the running state, but service in the hot standby container is not enabled any more.
For ease of understanding, referring to table 1 below, different containers are illustratively described for applying different functional characteristics in the active or standby nodes.
TABLE 1
Key container Abnormality (S) Does not initiate the role switch between main and standby
Non-critical container Abnormality (S) Sending a master-slave role switch
Refrigeration container Operating in a primary node Non-operation in standby node
Hot standby container Operating in a primary node Run in standby node
Referring to table 1, when the service in the key container in the master node is abnormal, the master/slave role switching may be initiated, and when the service in the non-key container is abnormal, the master/slave role switching does not need to be initiated, so as to avoid frequent master/slave switching. The cold standby container in the standby node is not running (which may be understood to be in an off state) and power consumption may be saved. The hot standby containers in the main node and the standby node are operated, when the service in the hot standby container in the main node is abnormal, the hot standby container in the standby node can be quickly switched to, and the service operation flow is ensured.
The following describes an apparatus for implementing the above method in the embodiment of the present application with reference to the drawings. Therefore, the above contents can be used in the subsequent embodiments, and the repeated contents are not repeated.
Fig. 7 is a block diagram of a backup device 700 according to an embodiment of the present disclosure. Exemplarily, the backup apparatus 700 is, for example, a first node, a second node, or a master control device in the dual-computer cluster system. The backup apparatus 700 includes a processing unit 710 and a communication unit 720.
As an example, the backup apparatus 700 may be the first node described above, or an apparatus capable of supporting the first node to implement the functions required by the method, such as a chip system. The processing unit 710 may be used to perform all operations performed by the first node in the embodiments shown in fig. 4-6, except transceiving operations. The communication unit 720 may be used to perform all transceiving operations performed by the first node in the embodiments shown in fig. 4-6.
Wherein the communication unit 720 may be a functional module that can perform both the sending operation and the receiving operation, for example, the communication unit 720 is a module included in the backup apparatus 700, the communication unit 720 may be configured to perform all the sending operation and the receiving operation performed by the first node in the embodiments shown in fig. 4-6, for example, when the sending operation is performed, the communication unit 720 may be considered as a sending module, and when the receiving operation is performed, the communication unit 720 may be considered as a receiving module; alternatively, the communication unit 720 may also be a general term for two functional modules, which are respectively a sending module and a receiving module, where the sending module is configured to complete sending operations, for example, the communication unit 720 is a module included in the first node, then the sending module may be configured to perform all sending operations performed by the first node in the embodiment shown in fig. 4 to 6, and the receiving module is configured to complete receiving operations, for example, the communication unit 720 is a module included in the first node, then the receiving module may be configured to perform all receiving operations performed by the first node in the embodiment shown in fig. 4 to 6.
The backup apparatus 700 may also be the second node, or an apparatus capable of supporting the second node to implement the functions required by the method, such as a chip system. The processing unit 710 may be used to perform all operations performed by the second node in the embodiments shown in fig. 4-6, except transceiving operations. The communication unit 720 may be used to perform all transceiving operations performed by the second node in the embodiments shown in fig. 4-6.
Wherein the communication unit 720 may be a functional module that can perform both the sending operation and the receiving operation, for example, the communication unit 720 is a module included in the second node, the communication unit 720 may be configured to perform all the sending operation and the receiving operation performed by the second node in the embodiments shown in fig. 4-6, for example, when the sending operation is performed, the communication unit 720 may be considered as a sending module, and when the receiving operation is performed, the communication unit 720 may be considered as a receiving module; or, the communication unit 720 may also be a general term of two functional modules, where the two functional modules are a sending module and a receiving module respectively, where the sending module is configured to complete sending operations, for example, the communication unit 720 is a module included in a car machine, the sending module may be configured to perform all sending operations performed by the second node in the embodiment shown in fig. 4 to 6, and the receiving module is configured to complete receiving operations, for example, the communication unit 720 is a module included in the second node, and the receiving module may be configured to perform all receiving operations performed by the second node in the embodiment shown in fig. 4 to 6.
For example, the backup apparatus 700 may also be other nodes or devices except the first node and the second node in the dual-computer cluster system, for example, may be a main control device in the dual-computer cluster system, or an apparatus capable of supporting the main control device to implement the functions required by the method, for example, a chip system.
Whether the backup apparatus 700 is the first node, the second node, or the master device described above. A communication unit 720, configured to synchronize data of a service to a second node in a process that a first container application in a first node in a server cluster system runs the service; and the processing unit 710 is configured to switch the first node to a standby node and switch the second node to an active node when a set switching condition is met, so that a second container application in the second node continues to run the service according to the synchronized data.
The division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation, and in addition, each functional unit in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one unit by two or more units. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
Only one or more of the various elements in fig. 7 may be implemented in software, hardware, firmware, or a combination thereof. The software or firmware includes, but is not limited to, computer program instructions or code and may be executed by a hardware processor. The hardware includes, but is not limited to, various integrated circuits such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC).
Fig. 8 is a hardware diagram of a backup apparatus 800 according to an embodiment of the present disclosure. The backup device 800 comprises at least one processor 801 and further comprises at least one memory 802 for storing program instructions and/or data. A memory 802 is coupled to the processor 801. The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, and may be an electrical, mechanical or other form for information interaction between the devices, units or modules. The processor 801 may operate in conjunction with the memory 802, the processor 801 may execute program instructions stored in the memory 802, and at least one of the at least one memory 802 may be included in the processor 801.
The backup apparatus 800 may further comprise a communication interface 803 for communicating with other devices via a transmission medium, so that the backup apparatus 800 can communicate with other devices. In embodiments of the present application, the communication interface may be a transceiver, circuit, bus, module, or other type of communication interface. In the embodiment of the present application, when the communication interface is a transceiver, the transceiver may include an independent receiver and an independent transmitter; a transceiver or an interface circuit that can integrate a transmitting/receiving function, and the like.
It should be understood that the connection medium between the processor 801, the memory 802 and the communication interface 803 is not limited in the embodiment of the present application. In the embodiment of the present application, the memory 802, the processor 801, and the communication interface 803 are connected by the communication bus 804 in fig. 8, the bus is represented by a thick line in fig. 8, and the connection manner between other components is only illustrative and not limiting. The bus may include an address bus, a data bus, a control bus, and the like. For ease of illustration, fig. 8 shows only one thick line, but does not show only one bus or one type of bus or the like.
In one example, the backup apparatus 800 is used to implement the steps performed by the first node in the processes shown in fig. 4-6, and the backup apparatus 800 may be the first node, or a chip or a circuit in the first node. A communication interface 803, for performing the transceiving operation of the first node in the above embodiments. A processor 801 for performing the processing-related operations of the first node in the above method embodiments.
In one example, the backup apparatus 800 is used to implement the steps performed by the second node in the processes shown in fig. 4-6, and the backup apparatus 800 may be the second node, or a chip or a circuit in the second node. A communication interface 803, for performing the operations related to transceiving of the second node in the above embodiments. A processor 801 for performing the processing-related operations of the second node in the above method embodiments.
As an example, the backup apparatus 800 may be another device except the first node and the second node in the dual-computer cluster system, such as a main control device, or an apparatus capable of supporting the main control device to implement the functions required by the method, such as a system on a chip.
Whether the backup apparatus 700 is the first node, the second node, or the master device described above. When the program instructions in the memory 802 are executed by the processor 801, the following steps may be implemented: synchronizing data of a service to a second node in the process of running the service by a first container application in a first node; and when the set switching condition is met, switching the first node into a standby node and switching the second node into an active node so that the second container application in the second node continues to operate the service according to the synchronized data.
In one possible design, the switching condition includes, but is not limited to, at least one of the following conditions:
disconnecting the network where the first node is located;
the first node is down;
a service in at least one container application in the first node runs abnormally;
the business in the key container application in the first node runs abnormally;
the state of the second container application in the second node is more optimal relative to the first container application in the first node;
a container application that operates normally exists in the second node.
In one possible design, the type of container application in the first node or the second node includes at least one of: cold-spare vessels, hot-spare vessels, critical vessels or non-critical vessels.
In one possible design, the first container application and the second container application are the same type of container application.
In one possible design, after processor 801 switches the first node to a standby node, processor 801 further performs: and stopping running the cold-standby container in the first node.
In one possible design, the step of switching the first node to the standby node performed by the processor 801 specifically includes: and switching the first high-availability HA software in the first node into a standby state, and controlling the first HA software to inform the first container application of switching into the standby state through a first container mailbox in the first node.
In a possible design, the step of the processor 801 executing the control program to control the first HA software to notify, through a first container mailbox in the first node, that the first container application is switched to the standby state specifically includes:
controlling the first HA software to write a first state change message into the first container mailbox, wherein the first state change message is used for indicating that the first HA software is switched from a main state to a standby state;
and controlling the first container application to switch the state of the first container to a standby state after monitoring that the first container mailbox is written with the first state change message.
In a possible design, when the processor 801 executes the step of switching the second node to the active node, the method specifically includes: and switching the second high-availability HA software in the second node to the active state, and controlling the second HA software to notify the second container application to be switched to the active state through a second container mailbox in the second node.
In a possible design, the processor 801, in executing the step of controlling the second HA software to notify, through a second container mailbox in the second node, that the second container application is switched to the active state, specifically includes:
controlling the second HA software to write a second state change message into the second container mailbox, wherein the second state change message is used for indicating that the second HA software is switched from a standby state to a main state;
and controlling the second container application to switch the state of the second container application to be the active state after monitoring that the second container mailbox is written with the second state change message.
In one possible design, before processor 801 synchronizes data of the service into the second node, further performing:
determining that a state of a first container application in the first node is more optimal relative to a state of a second container application in the second node; alternatively, the first and second electrodes may be,
determining that the time length of the first node as a main node is longer than that of the second node as the main node; alternatively, the first and second electrodes may be,
determining that a string length of first identification information for uniquely identifying the first node is smaller than a string length of second identification information for uniquely identifying the second node.
In the embodiments of the present application, the processor may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
In the embodiment of the present application, the memory may be a nonvolatile memory, such as a Hard Disk Drive (HDD) or a solid-state drive (SSD), and may also be a volatile memory, for example, a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
The method provided by the embodiment of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer instructions may be stored in, or transmitted from, a computer-readable storage medium to another computer-readable storage medium, e.g., from one website, computer, server, or data center, over a wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), for short) or wireless (e.g., infrared, wireless, microwave, etc.) network, the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more integrated servers, data centers, etc., the available medium may be magnetic medium (e.g., floppy disk, hard disk, magnetic tape), optical medium (e.g., digital video disc (digital video disc, DVD for short), or semiconductor media (e.g., SSD).
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a and b, a and c, b and c, or a and b and c, wherein a, b and c can be single or multiple.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways without departing from the scope of the application. For example, the above-described embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Additionally, the apparatus and methods described, as well as the illustrations of various embodiments, may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present application. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electronic, mechanical or other form.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (23)

1. A backup method is characterized in that the backup method is applied to a server cluster system, the server cluster system comprises a first node and a second node, the first node is a main node, the second node is a standby node, and one or more container applications are respectively deployed in the first node and the second node, the method comprises the following steps:
synchronizing data of a service to the second node in the process of running the service by a first container application in the first node;
and when the set switching condition is met, switching the first node into a standby node and switching the second node into an active node so that the second container application in the second node continues to operate the service according to the synchronized data.
2. The method of claim 1, wherein the handover condition comprises at least one of:
disconnecting the network where the first node is located;
the first node is down;
a service in at least one container application in the first node runs abnormally;
the business in the key container application in the first node runs abnormally;
the state of the second container application in the second node is more optimal relative to the first container application in the first node;
a container application that operates normally exists in the second node.
3. The method of claim 1 or 2, wherein the type of container application in the first node or the second node comprises at least one of:
cold-spare vessels, hot-spare vessels, critical vessels or non-critical vessels.
4. The method of claim 3, wherein the first container application and the second container application are a same type of container application.
5. The method of claim 3 or 4, wherein after switching the first node to a standby node, further comprising: and stopping running the cold-standby container in the first node.
6. The method of any of claims 1-5, wherein said switching said first node to a standby node comprises:
and switching the first high-availability HA software in the first node into a standby state, and controlling the first HA software to inform the first container application of switching into the standby state through a first container mailbox in the first node.
7. The method of claim 6, wherein controlling the first HA software to notify the first container application of the switch to the standby state through a first container mailbox in the first node comprises:
controlling the first HA software to write a first state change message into the first container mailbox, wherein the first state change message is used for indicating that the first HA software is switched from a main state to a standby state;
and controlling the first container application to switch the state of the first container to a standby state after monitoring that the first container mailbox is written with the first state change message.
8. The method of any of claims 1-7, wherein switching the second node to the active node comprises:
and switching the second high-availability HA software in the second node to the active state, and controlling the second HA software to notify the second container application to be switched to the active state through a second container mailbox in the second node.
9. The method of claim 8, wherein controlling the second HA software to notify the second container application of the switch to the active state via a second container mailbox in the second node comprises:
controlling the second HA software to write a second state change message into the second container mailbox, wherein the second state change message is used for indicating that the second HA software is switched from a standby state to a main state;
and controlling the second container application to switch the state of the second container application to be the active state after monitoring that the second container mailbox is written in the second state change message.
10. The method of any of claims 1-9, wherein synchronizing data of a service to the second node before the service is run by a first container application in the first node, further comprises:
determining that a state of a first container application in the first node is more optimal relative to a state of a second container application in the second node; alternatively, the first and second electrodes may be,
determining that the time length of the first node as a main node is longer than the time length of the second node as the main node; alternatively, the first and second electrodes may be,
determining that a string length of first identification information for uniquely identifying the first node is smaller than a string length of second identification information for uniquely identifying the second node.
11. A backup device is characterized in that the backup device is applied to a server cluster system, the server cluster system comprises a first node and a second node, the first node is a main node, the second node is a standby node, and one or more container applications are respectively deployed in the first node and the second node, the device comprises:
a communication unit, configured to synchronize data of a service to the second node in a process in which a first container application in the first node runs the service;
and the processing unit is used for switching the first node into a standby node and switching the second node into an active node when the set switching condition is met, so that the second container application in the second node continues to operate the service according to the synchronized data.
12. The apparatus of claim 11, wherein the handover condition comprises at least one of:
disconnecting the network where the first node is located;
the first node is down;
a service in at least one container application in the first node runs abnormally;
the business in the key container application in the first node runs abnormally;
the state of the second container application in the second node is more optimal relative to the first container application in the first node;
a container application that operates normally exists in the second node.
13. The apparatus of claim 11 or 12, wherein the type of container application in the first node or the second node comprises at least one of:
cold-spare vessels, hot-spare vessels, critical vessels or non-critical vessels.
14. The apparatus of claim 13, wherein the first container application and the second container application are a same type of container application.
15. The apparatus of claim 13 or 14, wherein the processing unit, after switching the first node to a standby node, is further to: and stopping running the cold-standby container in the first node.
16. The apparatus according to any of claims 11 to 15, wherein the processing unit, when being configured to switch the first node to a standby node, is specifically configured to:
and switching the first high-availability HA software in the first node into a standby state, and controlling the first HA software to inform the first container application of switching into the standby state through a first container mailbox in the first node.
17. The apparatus of claim 16, wherein the processing unit, when configured to control the first HA software to notify, through a first container mailbox in the first node, that the first container application is switched to the standby state, specifically includes:
controlling the first HA software to write a first state change message into the first container mailbox, wherein the first state change message is used for indicating that the first HA software is switched from a main state to a standby state;
and controlling the first container application to switch the state of the first container to a standby state after monitoring that the first container mailbox is written with the first state change message.
18. The apparatus as claimed in any one of claims 11 to 17, wherein the processing unit, when configured to switch the second node to the active node, specifically comprises:
and switching the second high-availability HA software in the second node to an active state, and controlling the second HA software to notify the second container application of switching to the active state through a second container mailbox in the second node.
19. The apparatus of claim 18, wherein when the processing unit is configured to control the second HA software to notify, through a second container mailbox in the second node, that the second container application is switched to the active state, the method specifically includes:
controlling the second HA software to write a second state change message into the second container mailbox, wherein the second state change message is used for indicating that the second HA software is switched from a standby state to a main state;
and controlling the second container application to switch the state of the second container application to be the active state after monitoring that the second container mailbox is written with the second state change message.
20. The apparatus according to any of claims 11-19, wherein the processing unit, during the process of running a service by a first container application in a first node in a server cluster system, before synchronizing data of the service into the second node, is further configured to:
determining that a state of a first container application in the first node is more optimal relative to a state of a second container application in the second node; alternatively, the first and second electrodes may be,
determining that the time length of the first node as a main node is longer than the time length of the second node as the main node; alternatively, the first and second electrodes may be,
determining that a string length of first identification information for uniquely identifying the first node is smaller than a string length of second identification information for uniquely identifying the second node.
21. A backup appliance comprising one or more processors; memory and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the processor, cause the backup device to perform the method of any of claims 1-10.
22. A server cluster system, comprising:
at least two server nodes, wherein one or more container applications are respectively deployed in each server node; and
a backup device according to any of claims 11-20, or a backup device according to claim 21.
23. A computer-readable storage medium, comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1-10.
CN202010242187.1A 2020-03-31 2020-03-31 Backup method, backup device and server cluster system Pending CN113472556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010242187.1A CN113472556A (en) 2020-03-31 2020-03-31 Backup method, backup device and server cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010242187.1A CN113472556A (en) 2020-03-31 2020-03-31 Backup method, backup device and server cluster system

Publications (1)

Publication Number Publication Date
CN113472556A true CN113472556A (en) 2021-10-01

Family

ID=77865189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010242187.1A Pending CN113472556A (en) 2020-03-31 2020-03-31 Backup method, backup device and server cluster system

Country Status (1)

Country Link
CN (1) CN113472556A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984195A (en) * 2018-06-27 2018-12-11 新华三技术有限公司 A kind of method for upgrading software and device
CN109117322A (en) * 2018-08-28 2019-01-01 郑州云海信息技术有限公司 A kind of control method, system, equipment and the storage medium of server master-slave redundancy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984195A (en) * 2018-06-27 2018-12-11 新华三技术有限公司 A kind of method for upgrading software and device
CN109117322A (en) * 2018-08-28 2019-01-01 郑州云海信息技术有限公司 A kind of control method, system, equipment and the storage medium of server master-slave redundancy

Similar Documents

Publication Publication Date Title
US10560315B2 (en) Method and device for processing failure in at least one distributed cluster, and system
US9762669B2 (en) Service management roles of processor nodes in distributed node service management
CN106330475B (en) Method and device for managing main and standby nodes in communication system and high-availability cluster
CN101110776B (en) Backup method, backup device and backup system for data business
CN110275680B (en) Double-control double-active storage system
US10037253B2 (en) Fault handling methods in a home service system, and associated household appliances and servers
US20170353360A1 (en) Communication control system, communication control method, and recording medium
CN106230622B (en) Cluster implementation method and device
JP2014137681A (en) Control device, control method, and control program
CN111585835B (en) Control method and device for out-of-band management system and storage medium
JP5039975B2 (en) Gateway device
CN105763442A (en) PON system and method avoiding interruption of LACP aggregation link in main-standby switching process
CN109582626B (en) Method, device and equipment for accessing bus and readable storage medium
CN113472556A (en) Backup method, backup device and server cluster system
WO2023229531A2 (en) Data transmission method and apparatus, terminal, and storage medium
JP5613119B2 (en) Master / slave system, control device, master / slave switching method, and master / slave switching program
CN109445984B (en) Service recovery method, device, arbitration server and storage system
CN114553900B (en) Distributed block storage management system, method and electronic equipment
US20180203773A1 (en) Information processing apparatus, information processing system and information processing method
KR20140001499A (en) Method and system for managing high availability
KR102251407B1 (en) Duplex control system and method for control in software defined network
JP2009075710A (en) Redundant system
WO2019097530A1 (en) Optimized reconciliation in a controller–switch network
CN110119111B (en) Communication method and device, storage medium, and electronic device
CN115514698A (en) Protocol calculation method, switch, cross-device link aggregation system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211001