WO2021004256A1 - Node switching method in node failure and related device - Google Patents

Node switching method in node failure and related device Download PDF

Info

Publication number
WO2021004256A1
WO2021004256A1 PCT/CN2020/097262 CN2020097262W WO2021004256A1 WO 2021004256 A1 WO2021004256 A1 WO 2021004256A1 CN 2020097262 W CN2020097262 W CN 2020097262W WO 2021004256 A1 WO2021004256 A1 WO 2021004256A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
storage device
master node
container
standby
Prior art date
Application number
PCT/CN2020/097262
Other languages
French (fr)
Chinese (zh)
Inventor
郑营飞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021004256A1 publication Critical patent/WO2021004256A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements

Definitions

  • the present invention relates to the technical field of cloud computing storage systems, in particular to a method and related equipment for node switching when a node fails.
  • each service application corresponds to a main container and at least one backup container, and the main container and the backup container are provided with a common storage device.
  • the main container can read and write data in the storage device to provide services to the outside world, and the standby container cannot read and write data in the storage device. It can only monitor the status of the main container and take over when the main container fails. The work of upgrading to the main container, read and write to the storage device to provide services.
  • the main container and backup container deployed in different physical machines communicate through the small computer system interface (SCSI) protocol.
  • SCSI small computer system interface
  • the main container can use the SCSI lock command provided by the SCSI protocol. Lock the storage device.
  • the backup container can monitor the status of the main container through the network connection. After the main container fails, the backup container can monitor the failure of the main container in time, and Immediately upgrade to the main and continue to provide external services.
  • the standby container may be created temporarily, and may be created on the same physical machine as the main container, so the standby container cannot establish a network connection with the main container, and thus cannot monitor the status of the main container.
  • the embodiment of the present invention discloses a node method and related equipment when a node fails, which can ensure that the standby node can accurately perceive the state of the primary node in a scenario where multiple nodes sharing a storage device do not perceive each other, and the primary node Take over the master node in the event of a failure, thereby improving application reliability.
  • the present application provides an inter-node switching method, including: a standby node detects a status flag of a master node stored in a storage device, and determines whether the master node is faulty according to the status flag, wherein the The master node is a node that accesses data in the storage device and provides services for users; when the backup node determines that the master node is faulty according to the status flag, the backup node takes over the master node.
  • the standby node does not need to establish a heartbeat with the master node to directly perceive the status of the master node, and indirectly determines whether the master node is faulty by detecting the status flag of the master node stored in the storage device. In the event of a failure, it takes over the master node and provides external services, thereby improving application reliability.
  • the status flag is the heartbeat value of the master node; the standby node periodically detects the heartbeat of the master node stored in the storage device Whether the value is updated; if the heartbeat value of the master node is not updated, it is determined that the master node is faulty.
  • the master node will periodically update the heartbeat value stored in the storage device, for example, periodically increment by one, so that the standby node can determine whether the master node is faulty by periodically detecting the heartbeat value In this way, it can be ensured that the standby node can still accurately sense the status of the master node without establishing a heartbeat connection with the master node.
  • the storage device further stores the master node tag, and the backup node updates the master node tag in the storage device to the backup The label of the node.
  • the storage device only stores the label of one node (that is, the label of the primary node).
  • the standby node needs to update the label of the primary node in the storage device to its own label , So that other standby nodes can determine that there is a new primary node currently, avoiding access to storage devices, ensuring data consistency and application reliability.
  • the standby node writes its own mark to the storage device every first preset duration, and after the first preset duration , Reading the node label stored in the storage device every the first preset period of time; within the second preset period of time, when the node label continuously read by the standby node and the node label of the standby node itself If the tags are the same, stop writing their own tags to the storage device; the N is a positive integer greater than or equal to 1.
  • the standby node competes for the primary node, it is achieved by writing its own mark to the storage device, and the rule that the mark written later overrides the mark written earlier is used for The mark written later has a greater probability of successful competition. If a backup node N consecutive times, for example, 3 times, the read mark is the same as its own mark, it can be considered that the backup node has successfully competed and become a new one. Master node. Through this kind of competition, the accuracy of selecting the master node can be improved, and multiple nodes can avoid accessing the storage device at the same time, and the reliability of the application can be ensured.
  • the heartbeat value stored in the storage device is cleared, and the heartbeat value is updated periodically.
  • the storage device stores the mark of the standby node, and the stored heartbeat value is the heartbeat value of the original primary node. Therefore, the standby node needs to clear it. Zero indirectly informs other standby nodes that a new master node currently exists, and periodically updates the heartbeat value to make other standby nodes perceive their own status.
  • the standby node periodically reads the mark and heartbeat value stored in the storage device, and determines whether the mark and the mark of the standby node are Is the same, and whether the heartbeat value is the same as the heartbeat value written by the backup node in the previous cycle; it is determined that the tag stored in the storage device is the same as the tag of the backup node and the heartbeat value is the same as that of the backup node.
  • the heartbeat value written by the node in the previous cycle is the same, and the standby node updates the heartbeat value.
  • the backup node after the backup node takes over the master node, it needs to periodically update the heartbeat value so that other backup nodes can perceive its state.
  • the standby node Before updating the heartbeat value each time, the standby node needs to determine whether the mark stored in the storage device is the same as its own mark and whether the heartbeat value is the same as the heartbeat value written in the previous cycle. Only the mark stored in the storage device is the same as the one written in the previous cycle. The heartbeat value is updated only when the own label is the same and the heartbeat value is the same as the heartbeat value written in the previous cycle. In this way, it can be ensured that the standby node can detect abnormal conditions in time and ensure the reliability of the application.
  • the backup node detects whether the storage device has a master node mark; if the storage device does not store the master node mark, the backup node The node takes over the master node.
  • the standby node will detect whether the primary node currently exists by detecting the primary node mark stored in the storage device when it is started. If the primary node does not currently exist, the secondary node will directly take over the primary node, and no cycle is required.
  • the ability to detect the status mark of the master node stored in the storage device can improve the efficiency of competition.
  • the present application provides a node including: a detection module for detecting the status flag of the master node stored in the storage device, and determining whether the master node is faulty according to the status flag, wherein: The master node is a node that accesses data in the storage device and provides services for users; a takeover module is used to take over the master node when the detection module determines that the master node is faulty according to the status flag .
  • the status flag is the heartbeat value of the master node; the detection module detects the status flag of the master node stored in the storage device, and When determining whether the master node is faulty according to the status flag, it is specifically used to: periodically detect whether the heartbeat value of the master node stored in the storage device is updated; if the heartbeat value of the master node is not updated, then It is determined that the master node is faulty.
  • the storage device further stores the master node identifier, and when the takeover module takes over the master node, it is specifically configured to: The label of the master node in the device is updated to the label of the node.
  • the takeover module when the takeover module updates the label of the primary node in the storage device to the label of the standby node, it is specifically configured to: Writing the tag of the node device into the storage device for a first preset duration, and after the first preset duration, read the tag stored in the storage device every first preset duration; In the second preset period of time, when the consecutively read mark N times are the same as the written mark, stop writing the mark of the node to the storage device, where N is a positive integer greater than or equal to 1.
  • the takeover module takes over the master node, it is also used to clear the heartbeat value stored in the storage device to zero, and periodically update all State the heartbeat value.
  • the takeover module when the takeover module periodically updates the heartbeat value, is specifically configured to: periodically read the data stored in the storage device And determine whether the label is the same as that of the standby node, and whether the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle; determine whether the storage device stores The mark of is the same as the mark of the standby node and the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle, and the heartbeat value is updated.
  • the detection module before the detection module detects the status flag of the master node stored in the storage device, the detection module is further configured to detect Whether there is a master node mark stored in the storage device; the takeover module is further configured to compete for the master node when the detection module detects that the storage device does not store the master node mark.
  • the present application provides a computing device, the computing device includes a processor and a memory, the processor and the memory are connected by an internal bus, the memory stores instructions, and the processor calls all The instructions in the memory are used to execute the method for switching between nodes provided in the foregoing first aspect and in combination with any one of the foregoing first aspects.
  • the present application provides a computer storage medium that stores a computer program.
  • the computer program When the computer program is executed by a processor, it can implement the above first aspect and any combination of the above first aspect.
  • a flow of the method for switching between nodes provided by an implementation method.
  • the present application provides a computer program product.
  • the computer program includes instructions.
  • the computer program can execute the first aspect and any one of the implementations in the first aspect. The process of the switching method between the provided nodes.
  • FIG. 1 is a system architecture diagram of a communication system using SCSI protocol provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an inter-node handover method provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a state change of a container during a switching process according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a sequence relationship in a competition process provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the structure of a node provided by an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the small computer system interface is an independent processor standard for system-level interfaces between computers and hardware devices (such as hard disks, optical drives, printers, scanners, etc.).
  • SCSI is a universal interface. Host adapters and SCSI peripheral controllers can be connected to the SCSI bus. Multiple peripherals on a SCSI bus can work at the same time.
  • the SCSI interface can transmit data synchronously or asynchronously.
  • a container is a virtualization technology in a computer operating system. This technology enables processes to run in a relatively independent and isolated environment (including independent file systems, namespaces, resource views, etc.), thereby simplifying the software deployment process and enhancing the software Portability and security, and improve system resource utilization.
  • applications that provide services to users are deployed in virtual machines in the form of containers, and virtual machines are generally deployed in physical machines.
  • the main container and the backup container can be set.
  • the main container and the standby container access the same storage device.
  • the main container can only read and write data in the storage device and provide services to the outside world.
  • the standby container monitors the status of the main container. When the primary container fails, the standby container is upgraded to the primary container to provide external services.
  • FIG. 1 it is a system architecture diagram in which multiple physical machines are connected to a storage device and communicate through the SCSI protocol.
  • the physical machine 121, the physical machine 122, ..., the physical machine 12n are connected to the storage device 110 at the same time.
  • One or more containers are deployed in each physical machine, and each container is deployed with applications that provide services for users.
  • the main container and the standby container can be set, and the main container and the standby container are located on different physical machines.
  • the main container needs to lock the storage device exclusively through the physical machine where the main container is located.
  • the SCSI protocol provides a lock command for locking the storage device, and the physical machine can lock the storage device accessed by the container through the lock command on the container deployed therein.
  • each container is actually allocated to a storage interval on the storage device, so the physical machine actually locks the storage space allocated to the container.
  • the main container can access the storage device.
  • the lock added by the main container to the storage device cannot be released, that is, the lock remains.
  • the physical machine where the standby container is located is connected to the physical machine where the main container is located through the network, so the standby container and the main container can also establish a connection.
  • the standby container can periodically send heartbeat information to the main container.
  • the standby container does not receive a response from the main container within a period of time, it is determined that the main container is faulty, and the standby container can be upgraded to the main container.
  • the standby container can use the mandatory cover lock command provided in the SCSI protocol to cover the exclusive lock added by the main container, and add an exclusive lock to the storage device to access the storage device and externally Provide services.
  • the standby container is temporarily created when it needs to be used, and the physical machine where the temporarily created standby container is located may be the same as the physical machine where the main container is located, or it may not be the same as the main container.
  • the new physical machine establishes a network connection, so the newly-built standby container cannot determine the state of the main container by heartbeat.
  • the standby container cannot be upgraded to the primary container in time, which affects the reliability of the application.
  • this application provides a method. Even when the network connection is not established between the main and standby containers, the standby container can detect the failure of the main container in time, thereby upgrading to the main container and continuing to provide services. .
  • Figure 2 shows a possible application scenario of an embodiment of the present application.
  • the physical machine 2100 and the physical machine 2200 are connected to the storage device 2300.
  • a virtual machine 2110 and a virtual machine 2120 are deployed in the physical machine 2100, a container 2111 is running in the virtual machine 2110, and a container 2121 is running in the virtual machine 2120; a virtual machine 2210 is deployed in the physical machine 2200, and a container 2211 is running in the virtual machine 2210 .
  • Container 2111, container 2121, and container 2211 form a container cluster, where container 2111 is the main container, and container 2121 and container 2211 are standby containers.
  • the same application is deployed in the main container and the backup container.
  • the main container shown accesses the storage device 2300 to provide external services.
  • the container 2111, the container 2121, and the container 2211 may also be directly deployed on the physical machine.
  • the storage device 2300 may be a physical storage device, such as a storage array, or a hard disk, or a section of storage space on the physical storage device, allocated to the container 2111, the container 2121, and the container 2211 storage container Data generated by the deployed application.
  • the storage device 2300 includes a mark storage area 2310, a heartbeat information storage area 2320, and a data storage area 2330.
  • the mark storage area 2310 is used to store the mark of the main container
  • the heartbeat information storage area 2320 is used to store the heartbeat value of the main container
  • the data storage area 2330 is used to store data generated during the operation of the main container.
  • the main container ie container 2111
  • the backup container ie container 2121 and container 2211
  • the data storage area 2330 can access the mark storage area 2310 and the heartbeat information storage area 2320 to monitor the status of the main container.
  • the heartbeat value in the heartbeat information storage area 2320 is periodically updated, for example, it is periodically incremented by one.
  • the container 2121 and the container 2211 respectively periodically monitor the mark storage area 2310 and the heartbeat information storage area 2320 to monitor the status of the container 2111. If it is found that the heartbeat value of the heartbeat information storage area 2320 exceeds a preset time (for example, two monitoring cycles) and is not updated, It can be determined that the container 2111 has failed and cannot provide services normally. At this time, the container 2121 and the container 2211 respectively write their own labels to the label storage area, and compete to select a new main container. The process of the container 2121 and the container 2211 competing for the main container will be described in detail below.
  • the label of the container 2121 will be stored in the label storage area 2310, and the label of the container 2111 will no longer be stored.
  • the container 2121 will clear the heartbeat value in the heartbeat information storage area and proceed. Periodically update, and access the data in the data storage area 2330 to provide external services. If the container 2211 fails to compete, it will continue to monitor the mark storage area 2310 and the heartbeat information storage area 2320 to monitor the status of the container 2121.
  • the standby container can also write in the heartbeat information storage area 2320 by detecting the main container
  • the heartbeat information determines the status of the main container, and can quickly select a new main container and provide services to the outside when the main container fails, ensuring the reliability of the service provided by the container.
  • Figure 3 is a flowchart of the container switching method
  • Figure 4 is the state change of the container during the container switching process Figure. This application takes any container as an example for detailed description. As shown in Figure 3, the method includes but is not limited to the following steps:
  • step S301 When the container is started, it is detected whether the mark storage area 2310 of the storage device 2300 is written with the mark of the main container. If the mark storage area has written the mark of the main container, step S302 is executed. If the mark storage area is not written Enter the mark of the main container, step S303 is executed.
  • the container After the container is started, it first needs to detect whether the mark storage area 2310 of the storage device 2300 is written with the mark of the main container.
  • the mark storage area 2310 in the storage device 2300 is used to store the mark of the main container.
  • the size of the mark storage area 2310 can be set according to actual needs, for example, it can be set to 512 bytes, which is not limited in this application.
  • the label of the container can uniquely identify a container.
  • the mark storage area 2310 stores the mark of the main container, it means that the main container already exists in the container cluster where the container is located, and the container is in the standby state shown in FIG. 4 as a backup container.
  • the mark storage area does not store the mark of the main container, it means that the main container does not exist in the container cluster where the container is located, and the container can compete for the main container. At this time, the container is in the election state shown in FIG. 4.
  • step S302 The container periodically detects whether the heartbeat value of the heartbeat information storage area 2320 has changed within a preset period of time, if it changes, then continue to perform step S302; if there is no change, the container is in the state shown in FIG. 4 Election status, and step S303 is executed.
  • the heartbeat information storage area 2320 is used to store the heartbeat value of the main container.
  • the heartbeat value of the heartbeat information area 2320 will be updated periodically. If other backup containers detect the heartbeat If the value is always updated, it means that the main container is always working. If other backup containers detect that the heartbeat value has not been updated within the preset time period, for example, if the heartbeat value detected in two consecutive detection cycles is the same, it means that the main The container is malfunctioning.
  • the backup container when the backup container detects whether the heartbeat value of the main container is updated, it will record the heartbeat value read in the previous cycle, and then compare the heartbeat value read in the current cycle with the heartbeat value recorded in the previous cycle. If the heartbeat value read is the same as the recorded one, it means that the heartbeat value of the main container has not been updated; if the heartbeat value currently read is different from the recorded heartbeat value, it means that the heartbeat value of the main container has been updated.
  • the heartbeat value of the primary container recorded by the backup container is 8, that is, the primary container updated the heartbeat value to 8 in the previous cycle, and the current heartbeat value read by the backup container is 9, and the backup container will compare the current read
  • the standby container can determine that the heartbeat value has been updated by the main container, and the standby container continues to periodically detect the heartbeat value of the main container and update the recorded heartbeat value to 9.
  • the main container When the main container updates the heartbeat value, it can first read the main container mark stored in the mark storage area 2310 and determine whether the read main container mark is the same as its own mark, and then read the heartbeat information storage area 2320. And determine whether the read heartbeat value is the same as the heartbeat value written in the previous cycle, if the read mark is the same as its own mark and the read heartbeat value is the same as the heartbeat value written in the previous cycle Similarly, the main container updates the heartbeat value, which can be a value, and the main container updates the heartbeat value by incrementing the value. For example, the heartbeat value stored in the heartbeat information storage area at the end of the previous cycle is 15. Then, the heartbeat value is updated to 16 in this cycle.
  • the main container will still be in the main container state shown in Figure 4; if the read mark is different from its own mark or the read heartbeat value is equal to The heartbeat value written in one cycle is not the same, it means that the current system has an unpredictable failure, and the main container needs to perform state switching, as shown in Figure 4.
  • the main container needs to exit and restart, and the entire container cluster needs to re-elect one
  • the new main container for example, the network of the main container is unstable, causing the main container to fail when writing its own mark to the mark storage area 2310 or updating the heartbeat value, but it cannot be sensed by itself, so the read main container's mark and It is different from itself; or the main container has been disconnected for a long time, and subsequently recovered (but the main container itself does not perceive it).
  • another standby container has written a new mark in the mark storage area 2310, thus Causes the mark read by the main container to be different from its own.
  • the main container periodically reads the mark stored in the mark storage area 2310 and compares it with its own mark to determine whether to update the heartbeat value or exit and restart.
  • This can be in extreme cases (for example, the unstable network of the main container causes the network
  • the connection is intermittent, the storage device 2300 has an unpredictable failure that causes the content in the identification storage area to change, etc.), restart in time to avoid continuing to read and write the data in the storage device 2300 to ensure data consistency and application reliability .
  • S303 The container periodically writes its own mark into the mark storage area 2310, and reads the mark stored in the mark storage area 2310 in the same cycle.
  • step S301 When the container is started in step S301, it is determined that the mark storage area 2310 does not store the identifier of the main container, that is, the main container does not exist, or in step S302, it is determined that the heartbeat information in the heartbeat information storage area 2320 has not changed, That is, when the main container fails, the container is in the election state shown in FIG. 4 and can compete for the main container.
  • the multiple standby containers will simultaneously compete for the primary container.
  • the container 2121 first writes its own label 1 to the label storage area 2310 at time t1
  • the container 2211 writes its own label 2 to the label storage area 2310 at time t2, and t1 is less than t2. Therefore, the mark 2 written by the container 2211 will overwrite the mark 1 written by the container 2121, that is, the mark 2 is stored in the mark storage area 2310.
  • the mark stored in the mark storage area 2310 is read at time t3.
  • the mark read by the container 2121 is mark 2.
  • write its own mark to the mark storage area 2310 again at t5 that is, mark 1.
  • the label read by the container 2211 is label 2, and after another sleep period, at time t6, write its own label, that is, label 2 to the label storage area 2310 again.
  • the container detects whether the label stored in the label storage area 2310 read continuously for N times is the same as the label written by the container within a preset period of time, if they are the same, go to step S305; if they are not the same , Go to step S302.
  • each backup container needs to write its own mark to the mark storage area 2310 first, then read it, and then write it again, and the cycle repeats periodically.
  • the label it reads each time is the label of the container written last.
  • the mark of is not the same as the written mark, but for the last container to write its own mark, the mark read each time is the same as the written mark, such as container 2211, if within a preset time period If the label read by the container for N consecutive times is the same as the label written, it can be determined that the container has been successfully upgraded to the new main container, and other containers will abandon this competition and no longer write themselves to the mark storage area 2310 Mark, re-check the heartbeat value of the heartbeat information storage area 2320, and wait for the next competition.
  • N is a positive integer greater than or equal to 1, for example, 3 or 4, which is not limited in this application.
  • each container periodically executes writing marks and reading marks and judging whether they are consistent, so as to finally determine the only mark that is read every time and write mark. Import the container with the same label and upgrade the container to the main container, which can improve the accuracy of the main container selection and ensure that the elected main container is the only one, so as to avoid the situation that multiple containers access the storage device at the same time, and ensure the application Reliability.
  • S305 The container is upgraded to the main container, and the data in the storage device is accessed to provide external services.
  • the container After the container is determined to be upgraded to the primary container, it will access the data in the storage device to provide services to the outside world, clear the heartbeat value of the heartbeat information storage area 2320, and then periodically update the heartbeat value At this time, the container is in the main container state shown in FIG. 4.
  • steps S301 to S305 involved in the foregoing method embodiments are only schematic descriptions and summaries, and should not constitute specific limitations. The involved steps can be added, reduced, or combined as needed.
  • FIGS 3 and 4 are scenarios where there are multiple standby containers in a container cluster, and when the primary container fails or there is no primary container, multiple standby containers compete to become the primary container, but for the container cluster only
  • step S302 if it is detected that the heartbeat of the primary container is not updated, that is, when the primary container fails, the standby container can be directly upgraded to the primary container, that is, the mark of the standby container is directly written
  • the storage area is marked without performing step S304, and the step of determining the main container through multiple writing and reading.
  • the method provided by the present invention is also suitable for switching between a physical machine and a virtual machine.
  • the switching method of the physical machine and the virtual machine except for the different objects of the switching, the other switching methods are the same as those of the container, which will not be repeated here.
  • the standby node by setting the status flag of the master node, such as the heartbeat information of the master node, in the storage device, and the master node periodically updates the heartbeat information, the standby node will also periodically detect the heartbeat information. However, if the heartbeat information is not updated, it can compete for the master node. In this way, even if the master node and the backup node have not established a network connection through their respective physical machines, the backup node can detect the failure of the master node in time and compete for the master node. The node continues to provide services.
  • the master node such as the heartbeat information of the master node
  • the backup physical machine does not need to establish a network connection with the master physical machine. It can be determined whether the main physical machine is faulty.
  • the standby node can be deployed on any physical machine, for example, deployed on the same physical machine as the primary node, or deployed on a physical machine that has no network connection with the primary node , Thereby reducing the constraints of virtual machine and container deployment.
  • the node 600 includes a detection module 610 and a takeover module 620. among them,
  • the detection module 610 is used to detect the status flag of the master node stored in the storage device, and determine whether the master node is faulty according to the status flag, wherein the master node accesses data in the storage device And provide service nodes for users.
  • the detection module 610 is configured to execute the aforementioned steps S301, S302, and S304, and optionally execute optional methods in the aforementioned steps.
  • the takeover module 620 is configured to take over the master node when the detection module 610 determines that the master node is faulty according to the status flag.
  • the takeover module 620 is configured to perform the foregoing steps S303 and S305, and optionally perform optional methods in the foregoing steps.
  • the status flag is the heartbeat value of the master node; the detection module 610 detects the status flag of the master node stored in the storage device, and determines the status flag according to the status flag.
  • the master node is specifically used to: periodically detect whether the heartbeat value of the master node stored in the storage device is updated; if the heartbeat value of the master node is not updated, determine that the master node is faulty.
  • the storage device also stores the master node tag, and when the takeover module 620 takes over the master node, it is specifically configured to update the master node tag in the storage device to The label of the node device.
  • the takeover module 620 updates the label of the primary node in the storage device to the label of the standby node, it is specifically configured to: The tag of the node device is written into the storage device, and after the first preset duration, the tag stored in the storage device is read every first preset duration; within the second preset duration, When the consecutively read N tags are the same as the written tags, stop writing the tag of the node device to the storage device, where N is a positive integer greater than or equal to 1.
  • the takeover module 620 after the takeover module 620 takes over the master node, it is further configured to clear the heartbeat value stored in the storage device to zero, and periodically update the heartbeat value.
  • the takeover module 620 when the takeover module 620 periodically updates the heartbeat value, the takeover module 620 is specifically configured to periodically read the mark and the heartbeat value stored in the storage device, and determine Whether the mark is the same as the mark of the standby node, and whether the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle; determine whether the mark stored in the storage device is the same as that of the standby node If the tags are the same and the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle, the heartbeat value is updated.
  • the detection module 610 before the detection module 610 detects the status mark of the master node stored in the storage device, the detection module 610 is also used to detect whether the storage device stores the master node. Node tag; the takeover module 620 is also used to take over the master node when the detection module detects that there is no master node tag stored in the storage device.
  • FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the application.
  • the computing device 700 includes a processor 710, a communication interface 720, and a memory 730.
  • the processor 710, the communication interface 720, and the memory 730 are connected to each other through an internal bus 740.
  • the computing device may be a database server.
  • the computing device 700 may be the physical machine 2110 or 2120 in FIG. 2, in which a container or a virtual machine is built. The functions performed by the container in FIG. 2 are actually performed by the processor 710 of the physical machine.
  • the processor 710 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the aforementioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logic device
  • FPGA field-programmable gate array
  • GAL general array logic
  • the bus 740 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus 740 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.
  • the memory 730 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 730 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (SSD); memory 730 may also include a combination of the above types.
  • the program code may be used to implement the functional modules shown in the node device 600, or to implement the method steps in the method embodiment shown in FIG. 6 with the standby node as the execution subject.
  • the embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored.
  • the program When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional modules described in Figure 6.
  • the embodiments of the present application also provide a computer program product, which when it runs on a computer or a processor, enables the computer or the processor to execute one or more steps in any of the foregoing methods. If each component module of the aforementioned equipment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application.
  • the implementation process of the example constitutes any limitation.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the modules in the devices in the embodiments of the present application may be combined, divided, and deleted according to actual needs.

Abstract

A node switching method in a node failure and a related device. A master node and a standby node are simultaneously connected to a storage device, but only the master node may access user data in the storage device and provides a service for a user. The standby node may access a status tag of the master node in the storage device. The standby node monitors the status tag of the master node stored in the storage device during operation and determines, according to the status tag, whether the master node fails. If the standby node determines, according to the status tag, that the master node fails, the standby node takes the place of the master node. By means of the method, when multiple nodes sharing one storage device do not sense each other, it is ensured that the standby node can accurately sense the status of the master node and takes the place of the master node when the master node fails, thereby improving the reliability of an application.

Description

一种节点故障时进行节点切换的方法及相关设备Method and related equipment for node switching when node fails 技术领域Technical field
本发明涉及云计算存储系统技术领域,尤其涉及一种节点故障时进行节点切换的方法及相关设备。The present invention relates to the technical field of cloud computing storage systems, in particular to a method and related equipment for node switching when a node fails.
背景技术Background technique
在云计算场景中,为用户提供服务的应用一般部署在虚拟机或物理机的容器中。为了保证服务应用的可靠性,每个服务应用都对应一个主容器和至少一个备容器,主容器和备容器设置有共同的存储设备。在正常工作状态下,只有主容器可以读写存储设备中的数据以对外提供服务,备容器不能读写存储设备中的数据,只能监听主容器的状态,在主容器发生故障时接替主容器的工作,升级为主容器,对该存储设备进行读写以提供服务。In cloud computing scenarios, applications that provide services to users are generally deployed in containers of virtual machines or physical machines. In order to ensure the reliability of service applications, each service application corresponds to a main container and at least one backup container, and the main container and the backup container are provided with a common storage device. Under normal working conditions, only the main container can read and write data in the storage device to provide services to the outside world, and the standby container cannot read and write data in the storage device. It can only monitor the status of the main container and take over when the main container fails. The work of upgrading to the main container, read and write to the storage device to provide services.
目前,物理机与存储设备之间通过小型计算机系统接口(small computer system interface,SCSI)协议进行通信,对于部署于不同物理机的主容器和备容器,主容器可以通过SCSI协议提供的SCSI锁命令对所述存储设备加锁,由于不同物理机间建立有网络连接,所以备容器通过网络连接可以监听主容器的状态,在主容器发生故障后,备容器可以及时监听到主容器的故障,并立即升级为主,继续对外提供服务。但在实际应用中,备容器有可能是临时创建的,且有可能与主容器在同一个物理机中创建,所以备容器无法与主容器建立网络连接,从而无法监听到主容器的状态。At present, physical machines and storage devices communicate through the small computer system interface (SCSI) protocol. For the main container and backup container deployed in different physical machines, the main container can use the SCSI lock command provided by the SCSI protocol. Lock the storage device. Since a network connection is established between different physical machines, the backup container can monitor the status of the main container through the network connection. After the main container fails, the backup container can monitor the failure of the main container in time, and Immediately upgrade to the main and continue to provide external services. However, in practical applications, the standby container may be created temporarily, and may be created on the same physical machine as the main container, so the standby container cannot establish a network connection with the main container, and thus cannot monitor the status of the main container.
因此,如何保证备容器准确的感知主容器的状态,并在主容器发生故障时进行切换是一个亟待解决的问题。Therefore, how to ensure that the standby container accurately perceives the state of the primary container and switch when the primary container fails is an urgent problem to be solved.
发明内容Summary of the invention
本发明实施例公开了一种节点故障时进行节点方法及相关设备,能够在共享一个存储设备的多个节点互不感知的场景下,保证备节点能够准确感知主节点的状态,并在主节点发生故障时接管主节点,从而提高应用的可靠性。The embodiment of the present invention discloses a node method and related equipment when a node fails, which can ensure that the standby node can accurately perceive the state of the primary node in a scenario where multiple nodes sharing a storage device do not perceive each other, and the primary node Take over the master node in the event of a failure, thereby improving application reliability.
第一方面,本申请提供了一种节点间切换方法,包括:备节点侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障,其中,所述主节点为访问所述存储设备中的数据并为用户提供服务的节点;当所述备节点根据所述状态标记确定所述主节点故障时,则所述备节点接管所述主节点。In a first aspect, the present application provides an inter-node switching method, including: a standby node detects a status flag of a master node stored in a storage device, and determines whether the master node is faulty according to the status flag, wherein the The master node is a node that accesses data in the storage device and provides services for users; when the backup node determines that the master node is faulty according to the status flag, the backup node takes over the master node.
在本申请实施例中,备节点不需要与主节点建立心跳来直接感知主节点的状态,而且通过侦测存储设备中存储的主节点的状态标记来间接确定主节点是否故障,并在主节点故障的时候接管主节点并对外提供服务,从而提高应用的可靠性。In the embodiment of this application, the standby node does not need to establish a heartbeat with the master node to directly perceive the status of the master node, and indirectly determines whether the master node is faulty by detecting the status flag of the master node stored in the storage device. In the event of a failure, it takes over the master node and provides external services, thereby improving application reliability.
结合第一方面,在第一方面一种可能的实现方式中,所述状态标记为所述主节点的心跳值;所述备节点周期性地侦测所述存储设备中存储的主节点的心跳值是否更新;如果所述主节点的心跳值没有更新,则确定所述主节点故障。With reference to the first aspect, in a possible implementation of the first aspect, the status flag is the heartbeat value of the master node; the standby node periodically detects the heartbeat of the master node stored in the storage device Whether the value is updated; if the heartbeat value of the master node is not updated, it is determined that the master node is faulty.
在本申请实施例中,主节点会周期性的更新存储设备中存储的心跳值,例如周期性进行递增加一,以使得备节点通过周期性的侦测该心跳值来判断主节点是否发生故障,通过这种方式可以保证备节点在没有与主节点建立心跳连接的情况下仍旧能够准确的感知到主节点状态。In the embodiment of this application, the master node will periodically update the heartbeat value stored in the storage device, for example, periodically increment by one, so that the standby node can determine whether the master node is faulty by periodically detecting the heartbeat value In this way, it can be ensured that the standby node can still accurately sense the status of the master node without establishing a heartbeat connection with the master node.
结合第一方面,在第一方面一种可能的实现方式中,所述存储设备中还存储所述主节点标记,所述备节点将所述存储设备中的主节点的标记更新为所述备节点的标记。With reference to the first aspect, in a possible implementation of the first aspect, the storage device further stores the master node tag, and the backup node updates the master node tag in the storage device to the backup The label of the node.
在本申请实施例中,存储设备中只会存储一个节点的标记(即主节点的标记),当备节点接管主节点时,备节点需要将存储设备中的主节点的标记更新为自己的标记,从而让其它备节点可以判断出当前已经存在新的主节点,避免对存储设备进行访问,保证数据一致性以及应用可靠性。In the embodiment of this application, the storage device only stores the label of one node (that is, the label of the primary node). When the standby node takes over the primary node, the standby node needs to update the label of the primary node in the storage device to its own label , So that other standby nodes can determine that there is a new primary node currently, avoiding access to storage devices, ensuring data consistency and application reliability.
结合第一方面,在第一方面一种可能的实现方式中,所述备节点每隔一第一预设时长将自身的标记写入所述存储设备,并在所述第一预设时长后,每隔所述第一预设时长读取所述存储设备中存储的节点标记;在第二预设时长内,当所述备节点连续读取的N次节点标记与所述备节点自身的标记相同,则停止向所述存储设备写入自身的标记;所述N为大于等于1的正整数。With reference to the first aspect, in a possible implementation manner of the first aspect, the standby node writes its own mark to the storage device every first preset duration, and after the first preset duration , Reading the node label stored in the storage device every the first preset period of time; within the second preset period of time, when the node label continuously read by the standby node and the node label of the standby node itself If the tags are the same, stop writing their own tags to the storage device; the N is a positive integer greater than or equal to 1.
在本申请提供的方案中,备节点在竞争主节点的过程中,是通过向存储设备中写入自己的标记来实现的,并且采用后面写入的标记覆盖前面写入的标记的规则,对于后面写入的标记,其竞争成功的概率更大,如果某一个备节点连续N次,例如3次,读取的标记与自己的标记相同,则可以认为该备节点已经竞争成功,成为新的主节点。通过这种竞争方式,可以提高主节点选取的准确性,避免多个节点同时访问存储设备,保证应用的可靠性。In the solution provided by this application, when the standby node competes for the primary node, it is achieved by writing its own mark to the storage device, and the rule that the mark written later overrides the mark written earlier is used for The mark written later has a greater probability of successful competition. If a backup node N consecutive times, for example, 3 times, the read mark is the same as its own mark, it can be considered that the backup node has successfully competed and become a new one. Master node. Through this kind of competition, the accuracy of selecting the master node can be improved, and multiple nodes can avoid accessing the storage device at the same time, and the reliability of the application can be ensured.
结合第一方面,在第一方面一种可能的实现方式中,所述备节点接管所述主节点之后,将存储设备中存储的心跳值清零,并周期性更新所述心跳值。With reference to the first aspect, in a possible implementation manner of the first aspect, after the standby node takes over the master node, the heartbeat value stored in the storage device is cleared, and the heartbeat value is updated periodically.
在本申请实施例中,备节点在接管主节点之后,存储设备中存储的是该备节点的标记,而存储的心跳值是原来的主节点的心跳值,因此,该备节点需要将其清零以间接的通知其它备节点,当前已经存在新的主节点,并通过周期性的更新心跳值以使得其它备节点感知到其自身的状态。In the embodiment of this application, after the standby node takes over the primary node, the storage device stores the mark of the standby node, and the stored heartbeat value is the heartbeat value of the original primary node. Therefore, the standby node needs to clear it. Zero indirectly informs other standby nodes that a new master node currently exists, and periodically updates the heartbeat value to make other standby nodes perceive their own status.
结合第一方面,在第一方面一种可能的实现方式中,所述备节点周期性读取所述存储设备中存储的标记和心跳值,并判断所述标记与所述备节点的标记是否相同,以及所述心跳值与所述备节点在上一周期写入的心跳值是否相同;确定所述存储设备中存储的标记与所述备节点的标记相同且所述心跳值与所述备节点在上一周期写入的心跳值相同,所述备节点更新所述心跳值。With reference to the first aspect, in a possible implementation of the first aspect, the standby node periodically reads the mark and heartbeat value stored in the storage device, and determines whether the mark and the mark of the standby node are Is the same, and whether the heartbeat value is the same as the heartbeat value written by the backup node in the previous cycle; it is determined that the tag stored in the storage device is the same as the tag of the backup node and the heartbeat value is the same as that of the backup node. The heartbeat value written by the node in the previous cycle is the same, and the standby node updates the heartbeat value.
在本申请实施例中,备节点在接管主节点之后,需要周期性的更新心跳值以使得其它备节点可以感知其状态。该备节点在每次更新心跳值之前,需要判断存储设备中存储的标记与自己的标记是否相同以及心跳值是否与上一周期写入的心跳值是否相同,只有在存储设备中存储的标记与自己的标记相同且心跳值与上一周期写入的心跳值相同的情况下才更新心跳值,通过这种方式可以保证该备节点能够及时发现异常情况,保证应用的可靠性。In the embodiment of the present application, after the backup node takes over the master node, it needs to periodically update the heartbeat value so that other backup nodes can perceive its state. Before updating the heartbeat value each time, the standby node needs to determine whether the mark stored in the storage device is the same as its own mark and whether the heartbeat value is the same as the heartbeat value written in the previous cycle. Only the mark stored in the storage device is the same as the one written in the previous cycle. The heartbeat value is updated only when the own label is the same and the heartbeat value is the same as the heartbeat value written in the previous cycle. In this way, it can be ensured that the standby node can detect abnormal conditions in time and ensure the reliability of the application.
结合第一方面,在第一方面一种可能的实现方式中,所述备节点侦测所述存储设备中是否存储有主节点标记;如果所述存储设备中没有存储主节点标记,所述备节点接管所述主节点。With reference to the first aspect, in a possible implementation of the first aspect, the backup node detects whether the storage device has a master node mark; if the storage device does not store the master node mark, the backup node The node takes over the master node.
在本申请实施例中,备节点在启动时会通过侦测存储设备中存储的主节点标记来判断当前是否存在主节点,若当前不存在主节点,备节点直接接管主节点,不需要再周期性侦测存储设备中存储的主节点的状态标记,可以提高竞争效率。In the embodiment of this application, the standby node will detect whether the primary node currently exists by detecting the primary node mark stored in the storage device when it is started. If the primary node does not currently exist, the secondary node will directly take over the primary node, and no cycle is required. The ability to detect the status mark of the master node stored in the storage device can improve the efficiency of competition.
第二方面,本申请提供了一种节点,包括:侦测模块,用于侦测存储设备中的存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障,其中,所述主节点为访问所述存储设备中的数据并为用户提供服务的节点;接管模块,用于当所述侦测模块根据所述状态标记确定所述主节点故障时,接管所述主节点。In a second aspect, the present application provides a node including: a detection module for detecting the status flag of the master node stored in the storage device, and determining whether the master node is faulty according to the status flag, wherein: The master node is a node that accesses data in the storage device and provides services for users; a takeover module is used to take over the master node when the detection module determines that the master node is faulty according to the status flag .
结合第二方面,在第二方面一种可能的实现方式中,所述状态标记为所述主节点的心跳值;所述侦测模块在侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障时,具体用于:周期性地侦测所述存储设备中存储的主节点的心跳值是否更新;如果所述主节点的心跳值没有更新,则确定所述主节点故障。With reference to the second aspect, in a possible implementation of the second aspect, the status flag is the heartbeat value of the master node; the detection module detects the status flag of the master node stored in the storage device, and When determining whether the master node is faulty according to the status flag, it is specifically used to: periodically detect whether the heartbeat value of the master node stored in the storage device is updated; if the heartbeat value of the master node is not updated, then It is determined that the master node is faulty.
结合第二方面,在第二方面一种可能的实现方式中,所述存储设备中还存储所述主节点标记,所述接管模块在接管所述主节点时,具体用于:将所述存储设备中的主节点的标记更新为所述节点的标记。With reference to the second aspect, in a possible implementation of the second aspect, the storage device further stores the master node identifier, and when the takeover module takes over the master node, it is specifically configured to: The label of the master node in the device is updated to the label of the node.
结合第二方面,在第二方面一种可能的实现方式中,所述接管模块在将所述存储设备中的主节点的标记更新为所述备节点的标记时,具体用于:每隔一第一预设时长将所述节点设备的标记写入所述存储设备,并在所述第一预设时长后,每隔所述第一预设时长读取所述存储设备中存储的标记;在第二预设时长内,当连续读取的N次标记与写入的标记相同,停止向所述存储设备写入所述节点的标记,所述N为大于等于1的正整数。With reference to the second aspect, in a possible implementation of the second aspect, when the takeover module updates the label of the primary node in the storage device to the label of the standby node, it is specifically configured to: Writing the tag of the node device into the storage device for a first preset duration, and after the first preset duration, read the tag stored in the storage device every first preset duration; In the second preset period of time, when the consecutively read mark N times are the same as the written mark, stop writing the mark of the node to the storage device, where N is a positive integer greater than or equal to 1.
结合第二方面,在第二方面一种可能的实现方式中,所述接管模块在接管所述主节点之后,还用于将所述存储设备中存储的心跳值清零,并周期性更新所述心跳值。With reference to the second aspect, in a possible implementation of the second aspect, after the takeover module takes over the master node, it is also used to clear the heartbeat value stored in the storage device to zero, and periodically update all State the heartbeat value.
结合第二方面,在第二方面一种可能的实现方式中,所述接管模块在周期性更新所述心跳值时,所述接管模块,具体用于:周期性读取所述存储设备中存储的标记和心跳值,并判断所述标记与所述备节点的标记是否相同,以及所述心跳值与所述备节点在上一周期写入的心跳值是否相同;确定所述存储设备中存储的标记与所述备节点的标记相同且所述心跳值与所述备节点在上一周期写入的心跳值相同,更新所述心跳值。With reference to the second aspect, in a possible implementation of the second aspect, when the takeover module periodically updates the heartbeat value, the takeover module is specifically configured to: periodically read the data stored in the storage device And determine whether the label is the same as that of the standby node, and whether the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle; determine whether the storage device stores The mark of is the same as the mark of the standby node and the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle, and the heartbeat value is updated.
结合第二方面,在第二方面一种可能的实现方式中,在所述侦测模块侦测存储设备中存储的主节点的状态标记之前,所述侦测模块,还用于侦测所述存储设备中是否存储有主节点标记;所述接管模块,还用于当所述侦测模块侦测到所述存储设备中没有存储主节点标记时,竞争所述主节点。With reference to the second aspect, in a possible implementation of the second aspect, before the detection module detects the status flag of the master node stored in the storage device, the detection module is further configured to detect Whether there is a master node mark stored in the storage device; the takeover module is further configured to compete for the master node when the detection module detects that the storage device does not store the master node mark.
第三方面,本申请提供了一种计算设备,所述计算设备包括处理器和存储器,所述处理器和所述存储器通过内部总线相连,所述存储器中存储有指令,所述处理器调用所述存储器中的指令以执行上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的节点间切换的方法。In a third aspect, the present application provides a computing device, the computing device includes a processor and a memory, the processor and the memory are connected by an internal bus, the memory stores instructions, and the processor calls all The instructions in the memory are used to execute the method for switching between nodes provided in the foregoing first aspect and in combination with any one of the foregoing first aspects.
第四方面,本申请提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,当所述计算机程序被处理器执行时,可以实现上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的节点间切换方法的流程。In a fourth aspect, the present application provides a computer storage medium that stores a computer program. When the computer program is executed by a processor, it can implement the above first aspect and any combination of the above first aspect. A flow of the method for switching between nodes provided by an implementation method.
第五方面,本申请提供了一种计算机程序产品,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的节点间切换方法的流程。In a fifth aspect, the present application provides a computer program product. The computer program includes instructions. When the computer program is executed by a computer, the computer can execute the first aspect and any one of the implementations in the first aspect. The process of the switching method between the provided nodes.
附图说明Description of the drawings
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请实施例提供的一种利用SCSI协议进行通信的系统架构图;FIG. 1 is a system architecture diagram of a communication system using SCSI protocol provided by an embodiment of the present application;
图2是本申请实施例提供的一种应用场景示意图;Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图3是本申请实施例提供的一种节点间切换方法的流程示意图;FIG. 3 is a schematic flowchart of an inter-node handover method provided by an embodiment of the present application;
图4是本申请实施例提供的一种切换过程中容器的状态变化示意图;4 is a schematic diagram of a state change of a container during a switching process according to an embodiment of the present application;
图5是本申请实施例提供的一种竞争过程中的时序关系示意图;FIG. 5 is a schematic diagram of a sequence relationship in a competition process provided by an embodiment of the present application;
图6是本申请实施例提供的一种节点的结构示意图;FIG. 6 is a schematic diagram of the structure of a node provided by an embodiment of the present application;
图7是本申请实施例提供的一种计算设备的结构示意图。Fig. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The following describes the technical solutions in the embodiments of the present application clearly and completely in conjunction with the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.
首先,结合附图对本申请中所涉及的部分用语和相关技术进行解释说明,以便于本领域技术人员理解。First of all, some terms and related technologies involved in this application will be explained with reference to the drawings, so as to facilitate the understanding of those skilled in the art.
小型计算机系统接口(small computer system,SCSI)是一种用于计算机和硬件设备(例如硬盘、光驱、打印机、扫描仪等)之间系统级接口的独立处理器标准。SCSI是一个通用接口,在SCSI母线上可以连接主机适配器和SCSI外设控制器,挂在一个SCSI母线上的多个外设可以同时工作,SCSI接口可以同步或异步传输数据。The small computer system interface (SCSI) is an independent processor standard for system-level interfaces between computers and hardware devices (such as hard disks, optical drives, printers, scanners, etc.). SCSI is a universal interface. Host adapters and SCSI peripheral controllers can be connected to the SCSI bus. Multiple peripherals on a SCSI bus can work at the same time. The SCSI interface can transmit data synchronously or asynchronously.
容器是计算机操作系统中的一种虚拟化技术,该技术使得进程运行于相对独立和隔离的环境(包含独立的文件系统、命名空间、资源视图等),从而能够简化软件的部署流程,增强软件的可移植性和安全性,并提高系统资源利用率。A container is a virtualization technology in a computer operating system. This technology enables processes to run in a relatively independent and isolated environment (including independent file systems, namespaces, resource views, etc.), thereby simplifying the software deployment process and enhancing the software Portability and security, and improve system resource utilization.
一般,在云计算场景下,为用户提供服务的应用以容器的方式部署在虚拟机,而虚拟机一般部署在物理机中。为保证应用的可靠性,防止单个容器故障造成应用不可用,可以设定主容器和备容器。主容器和备容器访问同一个存储设备,在同一时刻,只能由主容器读写存储设备中的数据并对外提供服务,备容器监听主容器的状态。当主容器发生故障时,备容器升级为主容器以对外提供服务。Generally, in a cloud computing scenario, applications that provide services to users are deployed in virtual machines in the form of containers, and virtual machines are generally deployed in physical machines. In order to ensure the reliability of the application and prevent the application from being unavailable due to the failure of a single container, the main container and the backup container can be set. The main container and the standby container access the same storage device. At the same time, the main container can only read and write data in the storage device and provide services to the outside world. The standby container monitors the status of the main container. When the primary container fails, the standby container is upgraded to the primary container to provide external services.
如图1所示,为多个物理机连接至一个存储设备,并通过SCSI协议进行通信的系统架构图。如图所示,物理机121、物理机122、…、物理机12n同时连接至存储设备110。每个物理机中部署有一个或多个容器,每个容器中部署有为用户提供服务的应用。为保证应用的可靠性,防止单个容器故障造成应用不可用,可以设定主容器和备容器,且主容器与备容器位于不同的物理机。为保障同一时刻,只有主容器可以对所述存储设备访问,主容器需要通过主容器所在的物理机对所述存储设备加排他锁。目前,SCSI协议中提供了对存储设备加锁的加锁命令,物理机可以对部署其中的容器通过所述加锁命令对容器所访问的存储设备进行加锁。在实际应用中,实际上给每个容器分配的为存储设备上的一段存储区间,所以物理机实际上是对分配给容器的存储空间加锁。在对主容器所访问的存储设备加锁后,主容器即可访问存储设备。但是在主容器发生故障之后,主容 器对所述存储设备所加的锁无法解除,即造成锁残留。备容器所在的物理机与主容器所在的物理机通过网络连接,所以备容器和主容器也可以建立连接,在主容器正常运行期间,对于已经与所述主容器建立网络连接的备容器来说,备容器可以周期性发送心跳信息至主容器,当备容器在一段时间内没有收到主容器的回应时,则确定所述主容器发生故障,则备容器可以升级为主容器。对于由于主容器故障,造成的锁残留,备容器可以使用SCSI协议中提供的强制覆盖锁命令覆盖主容器所加的排他锁,并给所述存储设备添加排他锁以访问所述存储设备并对外提供服务。然而,在一些场景中,备容器是在需要使用的时候,才被临时创建的,且被临时创建的备容器所在的物理机可能与主容器所在的物理机相同,或者并没有与主容器所在的物理机建立网络连接,所以被新建的备容器无法通过心跳的方式确定主容器的状态,导致在主容器故障时,备容器无法及时升级为主容器,从而影响应用的可靠性。As shown in Figure 1, it is a system architecture diagram in which multiple physical machines are connected to a storage device and communicate through the SCSI protocol. As shown in the figure, the physical machine 121, the physical machine 122, ..., the physical machine 12n are connected to the storage device 110 at the same time. One or more containers are deployed in each physical machine, and each container is deployed with applications that provide services for users. To ensure the reliability of the application and prevent the application from being unavailable due to a single container failure, the main container and the standby container can be set, and the main container and the standby container are located on different physical machines. To ensure that only the main container can access the storage device at the same time, the main container needs to lock the storage device exclusively through the physical machine where the main container is located. At present, the SCSI protocol provides a lock command for locking the storage device, and the physical machine can lock the storage device accessed by the container through the lock command on the container deployed therein. In practical applications, each container is actually allocated to a storage interval on the storage device, so the physical machine actually locks the storage space allocated to the container. After the storage device accessed by the main container is locked, the main container can access the storage device. However, after the main container fails, the lock added by the main container to the storage device cannot be released, that is, the lock remains. The physical machine where the standby container is located is connected to the physical machine where the main container is located through the network, so the standby container and the main container can also establish a connection. During the normal operation of the main container, for the standby container that has established a network connection with the main container , The standby container can periodically send heartbeat information to the main container. When the standby container does not receive a response from the main container within a period of time, it is determined that the main container is faulty, and the standby container can be upgraded to the main container. For the lock remaining due to the failure of the main container, the standby container can use the mandatory cover lock command provided in the SCSI protocol to cover the exclusive lock added by the main container, and add an exclusive lock to the storage device to access the storage device and externally Provide services. However, in some scenarios, the standby container is temporarily created when it needs to be used, and the physical machine where the temporarily created standby container is located may be the same as the physical machine where the main container is located, or it may not be the same as the main container. The new physical machine establishes a network connection, so the newly-built standby container cannot determine the state of the main container by heartbeat. As a result, when the main container fails, the standby container cannot be upgraded to the primary container in time, which affects the reliability of the application.
为了解决上述问题,本申请提供了一种方法,即使在主备容器之间没有建立网络连接的情况下,备容器也可以及时侦测到主容器的故障,从而升级为主容器,继续提供服务。In order to solve the above problems, this application provides a method. Even when the network connection is not established between the main and standby containers, the standby container can detect the failure of the main container in time, thereby upgrading to the main container and continuing to provide services. .
图2示出了本申请实施例的一种可能的应用场景。如图2所示,在该应用场景中,物理机2100和物理机2200连接至存储设备2300。物理机2100中部署有虚拟机2110和虚拟机2120,虚拟机2110中运行有容器2111,虚拟机2120中运行有容器2121;物理机2200中部署有虚拟机2210,虚拟机2210中运行有容器2211。容器2111、容器2121和容器2211组成一个容器集群,其中容器2111为主容器,容器2121和容器2211为备容器。主容器和备容器中部署有相同的应用,在正常运行时,所示主容器访问所述存储设备2300对外提供服务。在其他实施例中,容器2111、容器2121、和容器2211也可以直接部署在物理机上。Figure 2 shows a possible application scenario of an embodiment of the present application. As shown in FIG. 2, in this application scenario, the physical machine 2100 and the physical machine 2200 are connected to the storage device 2300. A virtual machine 2110 and a virtual machine 2120 are deployed in the physical machine 2100, a container 2111 is running in the virtual machine 2110, and a container 2121 is running in the virtual machine 2120; a virtual machine 2210 is deployed in the physical machine 2200, and a container 2211 is running in the virtual machine 2210 . Container 2111, container 2121, and container 2211 form a container cluster, where container 2111 is the main container, and container 2121 and container 2211 are standby containers. The same application is deployed in the main container and the backup container. During normal operation, the main container shown accesses the storage device 2300 to provide external services. In other embodiments, the container 2111, the container 2121, and the container 2211 may also be directly deployed on the physical machine.
在本发明实施例中,存储设备2300可以为一个物理存储设备,例如存储阵列,或者硬盘,也可以为物理存储设备上的一段存储空间,分配给容器2111、容器2121、和容器2211存储容器中部署的应用所产生的数据。存储设备2300包括标记存储区域2310、心跳信息存储区域2320和数据存储区域2330。其中,标记存储区域2310用于存储主容器的标记,心跳信息存储区域2320用于存储主容器的心跳值,数据存储区域2330用于存储所述主容器运行过程中产生的数据。其中,主容器(即容器2111)可以访问所述存储设备2300的全部区域,而备容器(即容器2121和容器2211)不能访问数据存储区域2330,但是可以访问标记存储区域2310和心跳信息存储区域2320以监测主容器的状态。In the embodiment of the present invention, the storage device 2300 may be a physical storage device, such as a storage array, or a hard disk, or a section of storage space on the physical storage device, allocated to the container 2111, the container 2121, and the container 2211 storage container Data generated by the deployed application. The storage device 2300 includes a mark storage area 2310, a heartbeat information storage area 2320, and a data storage area 2330. The mark storage area 2310 is used to store the mark of the main container, the heartbeat information storage area 2320 is used to store the heartbeat value of the main container, and the data storage area 2330 is used to store data generated during the operation of the main container. Among them, the main container (ie container 2111) can access all areas of the storage device 2300, while the backup container (ie container 2121 and container 2211) cannot access the data storage area 2330, but can access the mark storage area 2310 and the heartbeat information storage area 2320 to monitor the status of the main container.
当容器2111作为主容器时,会周期性的更新心跳信息存储区域2320中的心跳值,例如周期性的将其进行加一。容器2121和容器2211分别周期性监测标记存储区域2310和心跳信息存储区域2320以监测容器2111的状态,若发现心跳信息存储区域2320中心跳值超过预设时间(例如两个监测周期)没有更新,则可以确定容器2111发生故障,已经不能正常提供服务。此时,容器2121和容器2211分别向标记存储区域写入自身的标记,竞争选出新的主容器,关于容器2121及容器2211竞争主容器的过程将在下文做详细描述。假设容器2121竞争成功,成为新的主容器,那么,标记存储区域2310中将会存储容器2121的标记,不再存储容器2111的标记,容器2121将心跳信息存储区域中的心跳值清零并进行周期性更新,并且访问数据存储区域2330中的数据以对外提供服务。容器2211竞争失败,则会继续监测标记存储区域2310和心跳信息存储区域2320以监测 容器2121的状态。When the container 2111 serves as the main container, the heartbeat value in the heartbeat information storage area 2320 is periodically updated, for example, it is periodically incremented by one. The container 2121 and the container 2211 respectively periodically monitor the mark storage area 2310 and the heartbeat information storage area 2320 to monitor the status of the container 2111. If it is found that the heartbeat value of the heartbeat information storage area 2320 exceeds a preset time (for example, two monitoring cycles) and is not updated, It can be determined that the container 2111 has failed and cannot provide services normally. At this time, the container 2121 and the container 2211 respectively write their own labels to the label storage area, and compete to select a new main container. The process of the container 2121 and the container 2211 competing for the main container will be described in detail below. Assuming that the container 2121 competes successfully and becomes the new main container, the label of the container 2121 will be stored in the label storage area 2310, and the label of the container 2111 will no longer be stored. The container 2121 will clear the heartbeat value in the heartbeat information storage area and proceed. Periodically update, and access the data in the data storage area 2330 to provide external services. If the container 2211 fails to compete, it will continue to monitor the mark storage area 2310 and the heartbeat information storage area 2320 to monitor the status of the container 2121.
通过在存储区域中设置标记存储区域及心跳信息存储区域,即使在备容器与主容器之间没有建立网络连接的情况下,备容器也可以通过侦测主容器在心跳信息存储区域2320中写入的心跳信息确定主容器的状态,并可以在主容器发生故障时,快速选出新的主容器并对外提供服务,保证容器所提供服务的可靠性。By setting the mark storage area and the heartbeat information storage area in the storage area, even if there is no network connection between the standby container and the main container, the standby container can also write in the heartbeat information storage area 2320 by detecting the main container The heartbeat information determines the status of the main container, and can quickly select a new main container and provide services to the outside when the main container fails, ensuring the reliability of the service provided by the container.
结合图2所示的应用场景,下面将结合图3及图4描述本申请实施例提供的容器切换方法,图3为容器切换方法的流程图,图4为在容器切换过程中容器的状态变化图。本申请以任意一个容器为例进行详细描述,如图3所示,该方法包括但不限于以下步骤:In combination with the application scenario shown in Figure 2, the container switching method provided by the embodiment of the present application will be described below in conjunction with Figures 3 and 4. Figure 3 is a flowchart of the container switching method, and Figure 4 is the state change of the container during the container switching process Figure. This application takes any container as an example for detailed description. As shown in Figure 3, the method includes but is not limited to the following steps:
S301:容器启动时侦测存储设备2300的标记存储区域2310是否写入主容器的标记,若所述标记存储区域已经写入主容器的标记,则执行步骤S302,若所述标记存储区域没有写入主容器的标记,则执行步骤S303。S301: When the container is started, it is detected whether the mark storage area 2310 of the storage device 2300 is written with the mark of the main container. If the mark storage area has written the mark of the main container, step S302 is executed. If the mark storage area is not written Enter the mark of the main container, step S303 is executed.
容器在启动后,首先需要侦测存储设备2300的标记存储区域2310是否写入主容器的标记。存储设备2300中的标记存储区域2310用于存储主容器的标记,其中,标记存储区域2310的大小可以根据实际需要进行设置,例如可以设置为512字节,本申请对此不作限定。所述容器的标记可以唯一确定一个容器。After the container is started, it first needs to detect whether the mark storage area 2310 of the storage device 2300 is written with the mark of the main container. The mark storage area 2310 in the storage device 2300 is used to store the mark of the main container. The size of the mark storage area 2310 can be set according to actual needs, for example, it can be set to 512 bytes, which is not limited in this application. The label of the container can uniquely identify a container.
若标记存储区域2310存储有主容器的标记,则说明该容器所在的容器集群中已经存在主容器,则所述容器作为备容器处于图4的所示的备用状态。If the mark storage area 2310 stores the mark of the main container, it means that the main container already exists in the container cluster where the container is located, and the container is in the standby state shown in FIG. 4 as a backup container.
若标记存储区域没有存储主容器的标记,说明该容器所在的容器集群中不存在主容器,则所述容器可以去竞争主容器,此时所述容器作为处于图4的所示的选举状态。If the mark storage area does not store the mark of the main container, it means that the main container does not exist in the container cluster where the container is located, and the container can compete for the main container. At this time, the container is in the election state shown in FIG. 4.
S302:所述容器周期性检测心跳信息存储区域2320的心跳值是否在预设时长内发生改变,若发生改变,则继续执行步骤S302;若未发生改变,则所述容器处于图4所示的选举状态,并执行步骤S303。S302: The container periodically detects whether the heartbeat value of the heartbeat information storage area 2320 has changed within a preset period of time, if it changes, then continue to perform step S302; if there is no change, the container is in the state shown in FIG. 4 Election status, and step S303 is executed.
如图2所述,该心跳信息存储区域2320用于存储主容器的心跳值,在主容器正常运行时,会周期性更新该心跳信息区域2320的心跳值,如果其它备容器检测到所述心跳值一直在更新,则表示主容器一直在工作状态,如果其它备容器检测到所述心跳值在预设时长内没有更新,例如连续两个检测周期检测到的心跳值相同,则表示所述主容器发生故障。As shown in Figure 2, the heartbeat information storage area 2320 is used to store the heartbeat value of the main container. When the main container is running normally, the heartbeat value of the heartbeat information area 2320 will be updated periodically. If other backup containers detect the heartbeat If the value is always updated, it means that the main container is always working. If other backup containers detect that the heartbeat value has not been updated within the preset time period, for example, if the heartbeat value detected in two consecutive detection cycles is the same, it means that the main The container is malfunctioning.
应理解,备容器在检测主容器的心跳值是否更新时,会记录上一周期读取到的心跳值,然后将当前周期读取到心跳值与记录的上一周期心跳值进行比较,若当前读取到的心跳值与记录的相同,则说明主容器的心跳值没有更新;若当前读取到的心跳值与记录的不同,则说明主容器的心跳值已经更新。It should be understood that when the backup container detects whether the heartbeat value of the main container is updated, it will record the heartbeat value read in the previous cycle, and then compare the heartbeat value read in the current cycle with the heartbeat value recorded in the previous cycle. If the heartbeat value read is the same as the recorded one, it means that the heartbeat value of the main container has not been updated; if the heartbeat value currently read is different from the recorded heartbeat value, it means that the heartbeat value of the main container has been updated.
示例性的,备容器记录的主容器的心跳值为8,即主容器在上一周期将心跳值更新为8,备容器当前读取到的心跳值为9,备容器将比较当前读取到的心跳值与记录的心跳值,从而备容器可以确定心跳值已经被主容器更新,备容器继续周期性的检测主容器的心跳值,并将记录的心跳值更新为9。Exemplarily, the heartbeat value of the primary container recorded by the backup container is 8, that is, the primary container updated the heartbeat value to 8 in the previous cycle, and the current heartbeat value read by the backup container is 9, and the backup container will compare the current read The standby container can determine that the heartbeat value has been updated by the main container, and the standby container continues to periodically detect the heartbeat value of the main container and update the recorded heartbeat value to 9.
主容器在更新所述心跳值时,可以先读取标记存储区域2310所存储的主容器标记并判断读取到的主容器标记与自身的标记是否相同,然后读取心跳信息存储区域2320所存储的心跳值并判断读取到的心跳值与上一周期写入的心跳值是否相同,若读取到的标记与自身的标记相同且读取到的心跳值与上一周期写入的心跳值相同,主容器更新心跳值,该心跳值可以是一个数值,主容器对所述心跳值的更新即为递增该数值,例如,上一周 期结束该心跳信息存储区域所存储的心跳值为15,那么,本周期将所述心跳值更新为16,此时,主容器仍将处于图4所示的主容器状态;若读取到标记与自身的标记不相同或读取到的心跳值与上一周期写入的心跳值不相同,则说明当前系统产生了不可预知的故障,主容器需要进行状态切换,如图4所示,此时主容器需要退出重启,整个容器集群需要重新选举出一个新的主容器,例如,主容器的网络不稳定,导致主容器在写入自身标记到标记存储区域2310或更新心跳值时失败,而自身又不能感知到,所以导致读取的主容器标记与自身的不同;或者是主容器已经较长时间断开连接,后续又恢复(但是主容器自身感知不到),此时已经有别的备容器在标记存储区域2310写入了新的标记,从而导致主容器读取的标记与自身的不同。When the main container updates the heartbeat value, it can first read the main container mark stored in the mark storage area 2310 and determine whether the read main container mark is the same as its own mark, and then read the heartbeat information storage area 2320. And determine whether the read heartbeat value is the same as the heartbeat value written in the previous cycle, if the read mark is the same as its own mark and the read heartbeat value is the same as the heartbeat value written in the previous cycle Similarly, the main container updates the heartbeat value, which can be a value, and the main container updates the heartbeat value by incrementing the value. For example, the heartbeat value stored in the heartbeat information storage area at the end of the previous cycle is 15. Then, the heartbeat value is updated to 16 in this cycle. At this time, the main container will still be in the main container state shown in Figure 4; if the read mark is different from its own mark or the read heartbeat value is equal to The heartbeat value written in one cycle is not the same, it means that the current system has an unpredictable failure, and the main container needs to perform state switching, as shown in Figure 4. At this time, the main container needs to exit and restart, and the entire container cluster needs to re-elect one The new main container, for example, the network of the main container is unstable, causing the main container to fail when writing its own mark to the mark storage area 2310 or updating the heartbeat value, but it cannot be sensed by itself, so the read main container's mark and It is different from itself; or the main container has been disconnected for a long time, and subsequently recovered (but the main container itself does not perceive it). At this time, another standby container has written a new mark in the mark storage area 2310, thus Causes the mark read by the main container to be different from its own.
可以理解,主容器通过周期性读取标记存储区域2310所存储的标记并与自身标记进行比较,从而确定是更新心跳值还是退出重启,可以在极端情况下(例如主容器的网络不稳定导致网络连接时断时续、存储设备2300产生不可预知故障导致标识存储区域中的内容发生改变等),及时的重启,避免继续读写存储设备2300中的数据,保证数据的一致性和应用的可靠性。It can be understood that the main container periodically reads the mark stored in the mark storage area 2310 and compares it with its own mark to determine whether to update the heartbeat value or exit and restart. This can be in extreme cases (for example, the unstable network of the main container causes the network The connection is intermittent, the storage device 2300 has an unpredictable failure that causes the content in the identification storage area to change, etc.), restart in time to avoid continuing to read and write the data in the storage device 2300 to ensure data consistency and application reliability .
S303:所述容器周期性地将自身的标记写入所述标记存储区域2310,并以同样的周期读取所述标记存储区域2310所存储的标记。S303: The container periodically writes its own mark into the mark storage area 2310, and reads the mark stored in the mark storage area 2310 in the same cycle.
当在步骤S301中容器启动时,确定所述标记存储区域2310没有存储主容器的标识,即不存在主容器,或者在步骤S302中,确定所述心跳信息存储区域2320中的心跳信息没有变化,即主容器故障时,则所述容器处于图4所示的选举状态,可以去竞争主容器。When the container is started in step S301, it is determined that the mark storage area 2310 does not store the identifier of the main container, that is, the main container does not exist, or in step S302, it is determined that the heartbeat information in the heartbeat information storage area 2320 has not changed, That is, when the main container fails, the container is in the election state shown in FIG. 4 and can compete for the main container.
由于在一个容器集群中,可能存在多个备容器,例如图2中的容器2121和容器2211,在容器集群中不存在主容器或者主容器故障时,多个备容器会同时去竞争主容器。如图5所示,在竞争时,容器2121首先在t1时刻向标记存储区域2310写入了自己的标记1,容器2211在t2时刻向标记存储区域2310写入了自己的标记2,t1小于t2,因此容器2211写入的标记2将会覆盖容器2121写入的标记1,即标记存储区域2310中存储的是标记2。容器2121在写入标记1之后,间隔一段时间后(例如一个睡眠周期时长),在t3时刻读取标记存储区域2310所存储的标记,此时,容器2121读取到的标记为标记2,又间隔一个睡眠周期后,在t5时刻再次向标记存储区域2310写入自己的标记,即标记1;容器2211在写入标记2之后,间隔一个睡眠周期,在t4时刻读取标记存储区域2310所存储的标记,此时,容器2211读取到的标记为标记2,又间隔一个睡眠周期后,在t6时刻再次向标记存储区域2310写入自己的标记,即标记2。Since there may be multiple standby containers in a container cluster, such as the container 2121 and the container 2211 in FIG. 2, when the primary container does not exist in the container cluster or the primary container fails, the multiple standby containers will simultaneously compete for the primary container. As shown in Figure 5, during the competition, the container 2121 first writes its own label 1 to the label storage area 2310 at time t1, and the container 2211 writes its own label 2 to the label storage area 2310 at time t2, and t1 is less than t2. Therefore, the mark 2 written by the container 2211 will overwrite the mark 1 written by the container 2121, that is, the mark 2 is stored in the mark storage area 2310. After the container 2121 writes the mark 1, after a period of time (for example, the duration of a sleep cycle), the mark stored in the mark storage area 2310 is read at time t3. At this time, the mark read by the container 2121 is mark 2. After an interval of a sleep cycle, write its own mark to the mark storage area 2310 again at t5, that is, mark 1. After the container 2211 writes mark 2, an interval of sleep cycle, read the stored mark storage area 2310 at t4 At this time, the label read by the container 2211 is label 2, and after another sleep period, at time t6, write its own label, that is, label 2 to the label storage area 2310 again.
S304:所述容器检测在一预设时长内,连续N次读取的所述标记存储区域2310所存储的标记与所述容器写入的标记是否相同,若相同,执行步骤S305;若不相同,执行步骤S302。S304: The container detects whether the label stored in the label storage area 2310 read continuously for N times is the same as the label written by the container within a preset period of time, if they are the same, go to step S305; if they are not the same , Go to step S302.
具体地,通过上述图4描述可知,每一个备容器需要先向标记存储区域2310写入自己的标记,然后读取,之后再次写入,周期性的循环反复。对于图5所示的竞争方法,对于第一个写入自身标记的容器来说,例如容器2121,其每次读取到的标记为最后写入的那个容器的标记,所述每次读取的标记与写入的标记并不相同,但对于最后一个写入自身标记的容器来说,其每次读取到的标记与写入的标记相同,如容器2211,若在一预设时长内,该容器连续N次读取的标记与写入的标记相同,则可以确定该容器已经成功升级为新的主容器,其它的容器将放弃本次竞争,不再向标记存储区域2310写入自己的标记,重新检测心跳信息存储区域2320的心跳值,等待下一次竞争。N为大于等于1的 正整数,例如可以为3或4,本申请对此不作限定。Specifically, it can be seen from the description of FIG. 4 that each backup container needs to write its own mark to the mark storage area 2310 first, then read it, and then write it again, and the cycle repeats periodically. For the competition method shown in FIG. 5, for the container that writes its own label first, for example, container 2121, the label it reads each time is the label of the container written last. The mark of is not the same as the written mark, but for the last container to write its own mark, the mark read each time is the same as the written mark, such as container 2211, if within a preset time period If the label read by the container for N consecutive times is the same as the label written, it can be determined that the container has been successfully upgraded to the new main container, and other containers will abandon this competition and no longer write themselves to the mark storage area 2310 Mark, re-check the heartbeat value of the heartbeat information storage area 2320, and wait for the next competition. N is a positive integer greater than or equal to 1, for example, 3 or 4, which is not limited in this application.
可以看出,在竞争选出新的主容器的过程中,每个容器通过周期性执行写入标记以及读取标记并判断是否一致,从而最终可以确定出唯一一个每次读取的标记与写入的标记一致的容器,并将该容器升级为主容器,可以提高主容器选取的准确性,保证选举出的主容器是唯一的,从而避免出现多个容器同时访问存储设备的情况,保证应用的可靠性。It can be seen that in the process of selecting a new main container through competition, each container periodically executes writing marks and reading marks and judging whether they are consistent, so as to finally determine the only mark that is read every time and write mark. Import the container with the same label and upgrade the container to the main container, which can improve the accuracy of the main container selection and ensure that the elected main container is the only one, so as to avoid the situation that multiple containers access the storage device at the same time, and ensure the application Reliability.
S305:所述容器升级为主容器,并访问存储设备中的数据以对外提供服务。S305: The container is upgraded to the main container, and the data in the storage device is accessed to provide external services.
具体地,所述容器在确定升级为主容器之后,将会访问存储设备中的数据,从而对外提供服务,并将心跳信息存储区域2320的心跳值进行清零,然后周期性的更新该心跳值,此时所述容器处于图4所示的主容器状态。Specifically, after the container is determined to be upgraded to the primary container, it will access the data in the storage device to provide services to the outside world, clear the heartbeat value of the heartbeat information storage area 2320, and then periodically update the heartbeat value At this time, the container is in the main container state shown in FIG. 4.
应理解,上述方法实施例所涉及的步骤S301至步骤S305只是示意性的描述概括,不应构成具体限定,可以根据需要对所涉及的步骤进行增加、减少或合并。It should be understood that steps S301 to S305 involved in the foregoing method embodiments are only schematic descriptions and summaries, and should not constitute specific limitations. The involved steps can be added, reduced, or combined as needed.
图3及图4的所描述的实施例为容器集群中存在多个备容器时,在主容器故障或者不存在主容器,多个备容器竞争成为主容器的场景,但是对于容器集群中只存在一个备容器的场景,在步骤S302中,若检测到主容器的心跳没有更新,即主容器故障时,可以直接将所述备容器升级为主容器,即直接将所述备容器的标记写入所述标记存储区域,而不需要执行步骤S304,通过多次写入及读取来确定主容器的步骤。The embodiments described in Figures 3 and 4 are scenarios where there are multiple standby containers in a container cluster, and when the primary container fails or there is no primary container, multiple standby containers compete to become the primary container, but for the container cluster only In the scenario of a standby container, in step S302, if it is detected that the heartbeat of the primary container is not updated, that is, when the primary container fails, the standby container can be directly upgraded to the primary container, that is, the mark of the standby container is directly written The storage area is marked without performing step S304, and the step of determining the main container through multiple writing and reading.
另外,虽然上述实施例是以容器为例进行描述,但本发明提供的方法也适用于对物理机及虚拟机的切换。对于物理机和虚拟机的切换方法,除了切换的对象不同外,其他与容器的切换方法相同,在此不再赘述。In addition, although the foregoing embodiment is described with a container as an example, the method provided by the present invention is also suitable for switching between a physical machine and a virtual machine. With regard to the switching method of the physical machine and the virtual machine, except for the different objects of the switching, the other switching methods are the same as those of the container, which will not be repeated here.
本发明实施例通过在存储设备中设置主节点的状态标记,例如主节点的心跳信息,并由主节点周期性的更新所述心跳信息,备节点也会周期性的侦测所述心跳信息,但所述心跳信息没有更新,则可以竞争为主节点,这样,主节点和备节点即使没有通过各自的物理机建立网络连接,备节点也可以及时侦测到主节点的故障,从而竞争为主节点继续提供服务。In the embodiment of the present invention, by setting the status flag of the master node, such as the heartbeat information of the master node, in the storage device, and the master node periodically updates the heartbeat information, the standby node will also periodically detect the heartbeat information. However, if the heartbeat information is not updated, it can compete for the master node. In this way, even if the master node and the backup node have not established a network connection through their respective physical machines, the backup node can detect the failure of the master node in time and compete for the master node. The node continues to provide services.
另外,由于本发明实施例中,由于主节点与备节点之间不需要建立网络连接,所以对于主节点和备节点都为物理机来说,备物理机不需要与主物理机建立网络连接,就可以确定主物理机是否故障。而对于主节点和备节点为虚拟机和容器来说,备节点可以部署在任意的物理机上,例如,与主节点部署在同一个物理机上,或者部署在与主节点没有建立网络连接的物理机上,从而减少了虚拟机和容器部署时的限制。In addition, since in the embodiment of the present invention, there is no need to establish a network connection between the master node and the backup node, if the master node and the backup node are both physical machines, the backup physical machine does not need to establish a network connection with the master physical machine. It can be determined whether the main physical machine is faulty. For the primary node and the standby node are virtual machines and containers, the standby node can be deployed on any physical machine, for example, deployed on the same physical machine as the primary node, or deployed on a physical machine that has no network connection with the primary node , Thereby reducing the constraints of virtual machine and container deployment.
上述详细阐述了本申请实施例的方法,为了便于更好的实施本申请实施例的上述方案,相应地,下面还提供用于配合实施上述方案的相关设备。The foregoing describes the methods of the embodiments of the present application in detail. In order to facilitate better implementation of the above-mentioned solutions of the embodiments of the present application, correspondingly, related equipment for cooperating with the implementation of the foregoing solutions is provided below.
参见图6,图6为本申请实施例提供的一种节点的结构示意图。如图6所示,该节点600包括侦测模块610和接管模块620。其中,Refer to Fig. 6, which is a schematic structural diagram of a node provided by an embodiment of the application. As shown in FIG. 6, the node 600 includes a detection module 610 and a takeover module 620. among them,
侦测模块610,用于侦测存储设备中的存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障,其中,所述主节点为访问所述存储设备中的数据并为用户提供服务的节点。The detection module 610 is used to detect the status flag of the master node stored in the storage device, and determine whether the master node is faulty according to the status flag, wherein the master node accesses data in the storage device And provide service nodes for users.
具体地,所述侦测模块610用于执行前述步骤S301、S302、S304,且可选的执行前述步骤中可选的方法。Specifically, the detection module 610 is configured to execute the aforementioned steps S301, S302, and S304, and optionally execute optional methods in the aforementioned steps.
接管模块620,用于当所述侦测模610块根据所述状态标记确定所述主节点故障时, 接管所述主节点。The takeover module 620 is configured to take over the master node when the detection module 610 determines that the master node is faulty according to the status flag.
具体地,所述接管模块620用于执行前述步骤S303和S305,且可选的执行前述步骤中可选的方法。Specifically, the takeover module 620 is configured to perform the foregoing steps S303 and S305, and optionally perform optional methods in the foregoing steps.
在一种可能的实现中,所述状态标记为所述主节点的心跳值;所述侦测模块610在侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障时,具体用于:周期性地侦测所述存储设备中存储的主节点的心跳值是否更新;如果所述主节点的心跳值没有更新,则确定所述主节点故障。In a possible implementation, the status flag is the heartbeat value of the master node; the detection module 610 detects the status flag of the master node stored in the storage device, and determines the status flag according to the status flag. When the master node is faulty, it is specifically used to: periodically detect whether the heartbeat value of the master node stored in the storage device is updated; if the heartbeat value of the master node is not updated, determine that the master node is faulty.
在一种可能的实现中,所述存储设备中还存储所述主节点标记,所述接管模块620在接管所述主节点时,具体用于将所述存储设备中的主节点的标记更新为所述节点设备的标记。In a possible implementation, the storage device also stores the master node tag, and when the takeover module 620 takes over the master node, it is specifically configured to update the master node tag in the storage device to The label of the node device.
在一种可能的实现中,所述接管模块620在将所述存储设备中的主节点的标记更新为所述备节点的标记时,具体用于:每隔一第一预设时长将所述节点设备的标记写入所述存储设备,并在所述第一预设时长后,每隔所述第一预设时长读取所述存储设备中存储的标记;在第二预设时长内,当连续读取的N次标记与写入的标记相同,停止向所述存储设备写入所述节点设备的标记,所述N为大于等于1的正整数。In a possible implementation, when the takeover module 620 updates the label of the primary node in the storage device to the label of the standby node, it is specifically configured to: The tag of the node device is written into the storage device, and after the first preset duration, the tag stored in the storage device is read every first preset duration; within the second preset duration, When the consecutively read N tags are the same as the written tags, stop writing the tag of the node device to the storage device, where N is a positive integer greater than or equal to 1.
在一种可能的实现中,所述接管模块620在接管所述主节点之后,还用于将所述存储设备中存储的心跳值清零,并周期性更新所述心跳值。In a possible implementation, after the takeover module 620 takes over the master node, it is further configured to clear the heartbeat value stored in the storage device to zero, and periodically update the heartbeat value.
在一种可能的实现中,所述接管模块620在周期性更新所述心跳值时,所述接管模块620具体用于:周期性读取所述存储设备中存储的标记和心跳值,并判断所述标记与所述备节点的标记是否相同,以及所述心跳值与所述备节点在上一周期写入的心跳值是否相同;确定所述存储设备中存储的标记与所述备节点的标记相同且所述心跳值与所述备节点在上一周期写入的心跳值相同,更新所述心跳值。In a possible implementation, when the takeover module 620 periodically updates the heartbeat value, the takeover module 620 is specifically configured to periodically read the mark and the heartbeat value stored in the storage device, and determine Whether the mark is the same as the mark of the standby node, and whether the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle; determine whether the mark stored in the storage device is the same as that of the standby node If the tags are the same and the heartbeat value is the same as the heartbeat value written by the standby node in the previous cycle, the heartbeat value is updated.
在一种可能的实现中,在所述侦测模块610侦测存储设备中存储的主节点的状态标记之前,所述侦测模块610,还用于侦测所述存储设备中是否存储有主节点标记;所述接管模块620,还用于当所述侦测模块侦测到所述存储设备中没有存储主节点标记时,接管所述主节点。In a possible implementation, before the detection module 610 detects the status mark of the master node stored in the storage device, the detection module 610 is also used to detect whether the storage device stores the master node. Node tag; the takeover module 620 is also used to take over the master node when the detection module detects that there is no master node tag stored in the storage device.
应理解,上述节点的结构仅仅作为一种示例,不应构成具体的限定,可以根据需要对节点的各个模块进行增加、减少或合并。此外,节点中的各个模块的操作和/或功能分别为了实现上述图3所描述的方法的相应流程,为了简洁,在此不再赘述。It should be understood that the structure of the node described above is merely an example, and should not constitute a specific limitation, and various modules of the node can be added, reduced, or combined as needed. In addition, the operations and/or functions of each module in the node are used to implement the corresponding process of the method described in FIG. 3 above. For brevity, details are not repeated here.
参见图7,图7为本申请实施例提供的一种计算设备的结构示意图。如图7所示,该计算设备700包括:处理器710、通信接口720以及存储器730,所述处理器710、通信接口720以及存储器730通过内部总线740相互连接。应理解,该计算设备可以是数据库服务器。Refer to FIG. 7, which is a schematic structural diagram of a computing device provided by an embodiment of the application. As shown in FIG. 7, the computing device 700 includes a processor 710, a communication interface 720, and a memory 730. The processor 710, the communication interface 720, and the memory 730 are connected to each other through an internal bus 740. It should be understood that the computing device may be a database server.
所述计算设备700可以是图2中的物理机2110或者2120,其中构建有容器或者虚拟机。图2中的容器所执行的功能实际上是由所述物理机的处理器710来执行。The computing device 700 may be the physical machine 2110 or 2120 in FIG. 2, in which a container or a virtual machine is built. The functions performed by the container in FIG. 2 are actually performed by the processor 710 of the physical machine.
所述处理器710可以由一个或者多个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、 通用阵列逻辑(generic array logic,GAL)或其任意组合。The processor 710 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The aforementioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
总线740可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线740可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但不表示仅有一根总线或一种类型的总线。The bus 740 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 740 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.
存储器730可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器730也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM)、快闪存储器(flash memory)、硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器730还可以包括上述种类的组合。程序代码可以是用来实现节点设备600所示的的功能模块,或者用于实现图6所示的方法实施例中以备节点为执行主体的方法步骤。The memory 730 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 730 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (SSD); memory 730 may also include a combination of the above types. The program code may be used to implement the functional modules shown in the node device 600, or to implement the method steps in the method embodiment shown in FIG. 6 with the standby node as the execution subject.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤,以及实现上述图6所描述的任意一个功能模块的功能。The embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional modules described in Figure 6.
本申请实施例还提供了一种计算机程序产品,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。上述所涉及的设备的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中。The embodiments of the present application also provide a computer program product, which when it runs on a computer or a processor, enables the computer or the processor to execute one or more steps in any of the foregoing methods. If each component module of the aforementioned equipment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该 计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the method of the embodiment of the present application can be adjusted, merged, and deleted in order according to actual needs.
本申请实施例装置中的模块可以根据实际需要进行合并、划分和删减。The modules in the devices in the embodiments of the present application may be combined, divided, and deleted according to actual needs.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (13)

  1. 一种节点间切换方法,其特征在于,所述方法包括:A method for switching between nodes, characterized in that the method includes:
    备节点侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障,其中,所述主节点为访问所述存储设备中的数据为用户提供服务的节点;The standby node detects the status flag of the master node stored in the storage device, and determines whether the master node is faulty according to the status flag, wherein the master node is a node that accesses data in the storage device and provides services for users ;
    当所述备节点根据所述状态标记确定所述主节点故障时,则所述备节点接管所述主节点。When the standby node determines that the primary node is faulty according to the status flag, the standby node takes over the primary node.
  2. 如权利要求1所述的方法,其特征在于,所述状态标记为所述主节点的心跳值;The method according to claim 1, wherein the status mark is a heartbeat value of the master node;
    所述备节点侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障包括:The detection of the status flag of the master node stored in the storage device by the standby node, and determining whether the master node is faulty according to the status flag includes:
    所述备节点周期性地侦测所述存储设备中存储的主节点的心跳值是否更新;The standby node periodically detects whether the heartbeat value of the master node stored in the storage device is updated;
    如果所述主节点的心跳值没有更新,则确定所述主节点故障。If the heartbeat value of the master node is not updated, it is determined that the master node is faulty.
  3. 如权利要求1或2所述的方法,其特征在于,所述存储设备中还存储所述主节点标记,所述备节点接管所述主节点包括:The method according to claim 1 or 2, wherein the storage device further stores the master node mark, and the backup node taking over the master node comprises:
    所述备节点将所述存储设备中的主节点的标记更新为所述备节点的标记。The standby node updates the label of the master node in the storage device to the label of the standby node.
  4. 如权利要求3所述的方法,其特征在于,所述备节点将所述存储设备中的主节点的标记更新为所述备节点的标记,包括:The method according to claim 3, wherein the updating of the label of the primary node in the storage device by the standby node to the label of the standby node comprises:
    所述备节点每隔一第一预设时长将自身的标记写入所述存储设备,并在所述第一预设时长后,每隔所述第一预设时长读取所述存储设备中存储的节点标记;The standby node writes its mark to the storage device every first preset time period, and after the first preset time period, reads it from the storage device every first preset time period Stored node tag;
    在第二预设时长内,当所述备节点连续读取的N次节点标记与所述备节点自身的标记相同,则停止向所述存储设备写入自身的标记;所述N为大于等于1的正整数。In the second preset time period, when the node label continuously read N times by the standby node is the same as the label of the standby node, it stops writing its own label to the storage device; the N is greater than or equal to A positive integer of 1.
  5. 如权利要求2所述的方法,其特征在于,所述方法还包括:The method of claim 2, wherein the method further comprises:
    所述备节点接管所述主节点之后,将所述存储设备中存储的心跳值清零,并周期性更新所述心跳值。After the standby node takes over the master node, it clears the heartbeat value stored in the storage device to zero, and periodically updates the heartbeat value.
  6. 如权利要求1-5任一项所述的方法,其特征在于,在所述备节点侦测存储设备中存储的主节点的状态标记之前,所述方法还包括:The method according to any one of claims 1 to 5, wherein before the standby node detects the status flag of the master node stored in the storage device, the method further comprises:
    所述备节点侦测所述存储设备中是否存储有主节点标记;The standby node detects whether a master node mark is stored in the storage device;
    如果所述存储设备中没有存储主节点标记,所述备节点竞争所述主节点。If the primary node mark is not stored in the storage device, the backup node competes for the primary node.
  7. 一种节点,其特征在于,包括:A node, characterized in that it includes:
    侦测模块,用于侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障,其中,所述主节点为访问所述存储设备中的数据为用户提供服务的节点;The detection module is used to detect the status flag of the master node stored in the storage device, and determine whether the master node is faulty according to the status flag, wherein the master node is a user for accessing data in the storage device Nodes that provide services;
    接管模块,用于当所述侦测模块根据所述状态标记确定所述主节点故障时,接管所述主节点。The takeover module is configured to take over the master node when the detection module determines that the master node is faulty according to the status flag.
  8. 如权利要求7所述的节点,其特征在于,所述状态标记为所述主节点的心跳值; 所述侦测模块在侦测存储设备中存储的主节点的状态标记,并根据所述状态标记确定所述主节点是否故障时,具体用于:The node according to claim 7, wherein the status flag is the heartbeat value of the master node; the detection module detects the status flag of the master node stored in the storage device, and based on the status When the mark determines whether the master node is faulty, it is specifically used for:
    周期性地侦测所述存储设备中存储的主节点的心跳值是否更新;Periodically detecting whether the heartbeat value of the master node stored in the storage device is updated;
    如果所述主节点的心跳值没有更新,则确定所述主节点故障。If the heartbeat value of the master node is not updated, it is determined that the master node is faulty.
  9. 如权利要求7或8所述的节点,其特征在于,所述存储设备中还存储所述主节点标记,所述接管模块在接管所述主节点时,具体用于将所述存储设备中的主节点的标记更新为所述节点的标记。The node according to claim 7 or 8, wherein the storage device further stores the master node mark, and when the takeover module takes over the master node, it is specifically configured to use the storage device The label of the master node is updated to the label of the node.
  10. 如权利要求8所述的节点,其特征在于,所述接管模块在将所述存储设备中的主节点的标记更新为所述备节点的标记时,具体用于:The node according to claim 8, wherein the takeover module is specifically configured to: when updating the label of the primary node in the storage device to the label of the standby node:
    每隔一第一预设时长将所述节点设备的标记写入所述存储设备,并在所述第一预设时长后,每隔所述第一预设时长读取所述存储设备中存储的标记;Write the tag of the node device to the storage device every first preset time length, and read the storage device from the storage device every first preset time after the first preset time length Mark of
    在第二预设时长内,当连续读取的N次标记与写入的标记相同,停止向所述存储设备写入所述节点的标记,所述N为大于等于1的正整数。In the second preset period of time, when the consecutively read N tags are the same as the written tags, stop writing the tag of the node to the storage device, where N is a positive integer greater than or equal to 1.
  11. 如权利要求8所述的节点设备,其特征在于,The node device according to claim 8, wherein:
    所述接管模块在接管所述主节点之后,还用于将所述存储设备中存储的心跳值清零,并周期性更新所述心跳值。After the takeover module takes over the master node, it is also used to clear the heartbeat value stored in the storage device to zero, and periodically update the heartbeat value.
  12. 如权利要求7-11任一项所述的方法,其特征在于,在所述侦测模块侦测存储设备中存储的主节点的状态标记之前,所述侦测模块还用于侦测所述存储设备中是否存储有主节点标记;The method according to any one of claims 7-11, wherein, before the detection module detects the status mark of the master node stored in the storage device, the detection module is further configured to detect the Whether the master node mark is stored in the storage device;
    所述接管模块,还用于当所述侦测模块侦测到所述存储设备中没有存储主节点标记时,竞争所述主节点。The takeover module is further configured to compete for the master node when the detection module detects that the storage device does not store the master node mark.
  13. 一种计算设备,其特征在于,包括处理器和存储器,所述处理器和所述存储器通过内部总线相连,所述存储器中存储有指令,所述处理器调用所述存储器中的指令以执行如权利要求1-6任一项所述的方法。A computing device, characterized by comprising a processor and a memory, the processor and the memory are connected through an internal bus, the memory is stored with instructions, and the processor calls the instructions in the memory to execute such as The method of any one of claims 1-6.
PCT/CN2020/097262 2019-07-08 2020-06-19 Node switching method in node failure and related device WO2021004256A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910612025.X 2019-07-08
CN201910612025 2019-07-08
CN201911057449.0 2019-10-29
CN201911057449.0A CN112199240B (en) 2019-07-08 2019-10-29 Method for switching nodes during node failure and related equipment

Publications (1)

Publication Number Publication Date
WO2021004256A1 true WO2021004256A1 (en) 2021-01-14

Family

ID=74004723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097262 WO2021004256A1 (en) 2019-07-08 2020-06-19 Node switching method in node failure and related device

Country Status (2)

Country Link
CN (1) CN112199240B (en)
WO (1) WO2021004256A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116107814A (en) * 2023-04-04 2023-05-12 阿里云计算有限公司 Database disaster recovery method, equipment, system and storage medium
CN116743550A (en) * 2023-08-11 2023-09-12 之江实验室 Processing method of fault storage nodes of distributed storage cluster

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116266152A (en) * 2021-12-16 2023-06-20 中移(苏州)软件技术有限公司 Information processing method and device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5167028A (en) * 1989-11-13 1992-11-24 Lucid Corporation System for controlling task operation of slave processor by switching access to shared memory banks by master processor
CN106789246A (en) * 2016-12-22 2017-05-31 广西防城港核电有限公司 The changing method and device of a kind of active/standby server
CN109302445A (en) * 2018-08-14 2019-02-01 新华三云计算技术有限公司 Host node state determines method, apparatus, host node and storage medium
CN109446169A (en) * 2018-10-22 2019-03-08 北京计算机技术及应用研究所 A kind of double control disk array shared-file system
CN109783280A (en) * 2019-01-15 2019-05-21 上海海得控制系统股份有限公司 Shared memory systems and shared storage method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231681B (en) * 2011-06-27 2014-07-30 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
US9411772B2 (en) * 2014-06-30 2016-08-09 Echelon Corporation Multi-protocol serial nonvolatile memory interface
CN104679907A (en) * 2015-03-24 2015-06-03 新余兴邦信息产业有限公司 Realization method and system for high-availability and high-performance database cluster
US9836368B2 (en) * 2015-10-22 2017-12-05 Netapp, Inc. Implementing automatic switchover
US10855515B2 (en) * 2015-10-30 2020-12-01 Netapp Inc. Implementing switchover operations between computing nodes
US10243780B2 (en) * 2016-06-22 2019-03-26 Vmware, Inc. Dynamic heartbeating mechanism
CN108011737B (en) * 2016-10-28 2021-06-01 华为技术有限公司 Fault switching method, device and system
CN107122271B (en) * 2017-04-13 2020-07-07 华为技术有限公司 Method, device and system for recovering node event
CN109005045B (en) * 2017-06-06 2022-01-25 北京金山云网络技术有限公司 Main/standby service system and main node fault recovery method
CN109815049B (en) * 2017-11-21 2021-03-26 北京金山云网络技术有限公司 Node downtime recovery method and device, electronic equipment and storage medium
CN108880898B (en) * 2018-06-29 2020-09-08 新华三技术有限公司 Main and standby container system switching method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5167028A (en) * 1989-11-13 1992-11-24 Lucid Corporation System for controlling task operation of slave processor by switching access to shared memory banks by master processor
CN106789246A (en) * 2016-12-22 2017-05-31 广西防城港核电有限公司 The changing method and device of a kind of active/standby server
CN109302445A (en) * 2018-08-14 2019-02-01 新华三云计算技术有限公司 Host node state determines method, apparatus, host node and storage medium
CN109446169A (en) * 2018-10-22 2019-03-08 北京计算机技术及应用研究所 A kind of double control disk array shared-file system
CN109783280A (en) * 2019-01-15 2019-05-21 上海海得控制系统股份有限公司 Shared memory systems and shared storage method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116107814A (en) * 2023-04-04 2023-05-12 阿里云计算有限公司 Database disaster recovery method, equipment, system and storage medium
CN116107814B (en) * 2023-04-04 2023-09-22 阿里云计算有限公司 Database disaster recovery method, equipment, system and storage medium
CN116743550A (en) * 2023-08-11 2023-09-12 之江实验室 Processing method of fault storage nodes of distributed storage cluster
CN116743550B (en) * 2023-08-11 2023-12-29 之江实验室 Processing method of fault storage nodes of distributed storage cluster

Also Published As

Publication number Publication date
CN112199240B (en) 2024-01-30
CN112199240A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
WO2021004256A1 (en) Node switching method in node failure and related device
RU2596585C2 (en) Method for sending data, data receiving method and data storage device
CN108616382B (en) Method and device for upgrading network card firmware, network card and equipment
US7840662B1 (en) Dynamically managing a network cluster
US8601314B2 (en) Failover method through disk take over and computer system having failover function
US8041791B2 (en) Computer system, management server, and mismatched connection configuration detection method
US7007192B2 (en) Information processing system, and method and program for controlling the same
US9367412B2 (en) Non-disruptive controller replacement in network storage systems
US6968382B2 (en) Activating a volume group without a quorum of disks in the volume group being active
CN111147274B (en) System and method for creating a highly available arbitration set for a cluster solution
WO2021082465A1 (en) Method for ensuring data consistency and related device
US11182252B2 (en) High availability state machine and recovery
US11734133B2 (en) Cluster system and fail-over control method of cluster system
CN115454329A (en) Management method, device, equipment and storage medium for storage cluster equipment
US20210019221A1 (en) Recovering local storage in computing systems
CN114115703A (en) Bare metal server online migration method and system
JP6788188B2 (en) Control device and control program
WO2024000535A1 (en) Partition table update method and apparatus, and electronic device and storage medium
US7305497B2 (en) Performing resource analysis on one or more cards of a computer system wherein a plurality of severity levels are assigned based on a predetermined criteria
CN115664943A (en) Method and device for determining master-slave relationship, storage medium and electronic equipment
CN115454339A (en) Data storage method, device and medium
CN115757266A (en) SOC chip, electronic equipment and method for preventing configuration data from being lost
CN110825487A (en) Management method for preventing split brain of virtual machine and main server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20837405

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20837405

Country of ref document: EP

Kind code of ref document: A1