WO2023041073A1 - Method for data synchronisation between multiple nodes, and system, device, and storage medium - Google Patents

Method for data synchronisation between multiple nodes, and system, device, and storage medium Download PDF

Info

Publication number
WO2023041073A1
WO2023041073A1 PCT/CN2022/119470 CN2022119470W WO2023041073A1 WO 2023041073 A1 WO2023041073 A1 WO 2023041073A1 CN 2022119470 W CN2022119470 W CN 2022119470W WO 2023041073 A1 WO2023041073 A1 WO 2023041073A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
synchronization
response
hardware resource
data
Prior art date
Application number
PCT/CN2022/119470
Other languages
French (fr)
Chinese (zh)
Inventor
刘涛
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023041073A1 publication Critical patent/WO2023041073A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the present application relates to the technical field of data management, and in particular to a data synchronization method, system, device and non-volatile readable storage medium among multiple nodes.
  • server management platform in order to ensure the stability of server products, multiple servers are usually formed into an operating system.
  • the hardware configuration and hardware attributes of each server node in the operating system are consistent, and the operating system can access many hardware Resources, among them, a part of hardware resources that are accessed by all server nodes are called shared resources.
  • This method cannot realize the direct shared hardware resource data synchronization between arbitrary nodes, especially when a large number of shared hardware resource data are synchronized, the shared hardware resource data pressure of the central node is high, and there will be a problem that the shared resource synchronization data takes too long, and During the use of the server, once the synchronization of shared resources fails, this is unacceptable for server products, because in the server industry, there are requirements for server products not to stop or lose data, which also means Server products have extremely high requirements for stability.
  • this application proposes a data synchronization method, system, device and non-volatile readable storage medium among multiple nodes.
  • an aspect of the embodiment of the present application provides a method for synchronizing data between multiple nodes, which specifically includes the following steps:
  • the node In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
  • the designated node After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
  • the node receives the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized;
  • a completion flag is sent to the specified node.
  • the node checks whether the shared hardware resource data of the specified node is synchronized, including:
  • the node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.
  • the method further comprises:
  • the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
  • the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.
  • the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
  • the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
  • the node in response to a node in the management platform triggering a preset condition, the node sends a data request command step to a designated node in the management platform, including:
  • the application layer module of the node acquires a data request command
  • the data synchronization module of the node sends the data request command to the specified node in the form of broadcast.
  • the specified node after receiving the data request command, obtains the shared hardware resource data and sends it to the node, including:
  • the data synchronization module of the designated node After receiving the data request command, the data synchronization module of the designated node transmits the data request command to the application layer module of the designated node;
  • the application layer module of the designated node After the application layer module of the designated node obtains the data request command, it obtains the shared hardware resource data and sends it to the node.
  • the shared hardware resource data includes management and control data of the server.
  • the management and control data of the server includes temperature, voltage, manufacturer, system version and power supply of the server.
  • the step further includes:
  • the node continues to initiate the synchronization of shared hardware resource data to the designated node for repair;
  • the alert state is cleared and logged.
  • monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data step of the corresponding node in response to the absence of the node include:
  • the presence status of the node is determined to be out of presence.
  • the node in response to the node being in the slot, after determining that the in-position status of the node is the in-position step, it also includes:
  • Synchronization of shared hardware resource data occurs in response to node presence.
  • step of sending a completion flag to the specified node in response to the completion of the synchronization further include:
  • the node exits the check of whether the shared hardware resource data of the specified node is synchronized, and waits for the next trigger check.
  • the step of judging whether the number of failures is less than the preset number of times it also includes:
  • the node that fails to synchronize is disconnected.
  • Another aspect of the embodiment of the present application also provides a data synchronization system between multiple nodes, including:
  • a sending module configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
  • Synchronization module the synchronization module is configured to obtain shared hardware resource data and send it to the node after the specified node receives the data request command;
  • a check module configured for the node to receive the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized;
  • a completion module configured to send a completion flag to a designated node in response to synchronization completion.
  • the inspection module includes:
  • the check submodule is used to check whether to save the shared hardware resource data of the specified node on the node;
  • a synchronization completion submodule configured to complete the synchronization in response to saving the shared hardware resource data of the designated node
  • the synchronization failure sub-module is configured to fail the synchronization in response to not saving the shared hardware resource data of the specified node.
  • a computer device including: at least one processor; and at least one memory for storing computer-readable instructions, and at least one processor executes computer-readable instructions to implement any Steps of a data synchronization method among multiple nodes in an embodiment.
  • a non-volatile readable storage medium stores computer-readable instructions.
  • the At least one processor executes the steps of the method for synchronizing data between multiple nodes in any embodiment.
  • FIG. 1 is a block diagram of a data synchronization method between nodes provided by one or more embodiments of the present application;
  • FIG. 2 is a schematic diagram of a data synchronization system between multiple nodes provided by one or more embodiments of the present application;
  • FIG. 3 is a schematic diagram of a multi-node interconnection structure provided by one or more embodiments of the present application.
  • FIG. 4 is a schematic structural diagram of a computer device provided by one or more embodiments of the present application.
  • Fig. 5 is a schematic structural diagram of a computer-readable storage medium provided by one or more embodiments of the present application.
  • the first aspect of the embodiments of the present application proposes an embodiment of a data synchronization method among multiple nodes. As shown in Figure 1, it includes the following steps:
  • Step S101 in response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
  • Step S103 after receiving the data request command, the designated node obtains the shared hardware resource data and sends it to the node;
  • Step S105 the node receives the shared hardware resource data sent by the designated node, and checks at the node whether the shared hardware resource data of the designated node is synchronized;
  • Step S107 in response to the completion of the synchronization, send a completion flag to the designated node.
  • each node is a server.
  • Each server includes a data synchronization module and an application layer module.
  • the specified nodes can be all nodes except the node triggering the preset condition, or one or more nodes except the node triggering the preset condition.
  • Shared hardware resource data refers to server management and control data, such as server temperature, voltage, manufacturer, system version, power supply, and other information.
  • This embodiment can quickly synchronize the shared hardware resource data of any node to the current node, realizing the consistency of multi-node data, and the data synchronization speed is fast and the stability is good.
  • checking at the node whether the shared hardware resource data of the specified node is synchronized includes:
  • the node checks whether to save the shared hardware resource data of the specified node, if the shared hardware resource data of the specified node is saved, the synchronization is completed, and if the shared hardware resource data of the specified node is not saved, the synchronization fails.
  • the method further comprises:
  • the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
  • the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.
  • the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
  • Monitoring the presence status is to monitor whether the node is in the slot. By monitoring the presence status of each node, it is possible to find the node that is not in position, and perform data synchronization in time after the slot is reinserted between nodes, ensuring the consistency of data between nodes , improved server stability.
  • the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
  • Node 1 has just been reset and started. After node 1 is reset, the application layer module of node 1 initiates a data request command, and the data request command is broadcast to all nodes in the frame through the data synchronization module of node 1 (except this node, that is, node 2 /3/4);
  • the data synchronization module of node 2/3/4 After the data synchronization module of node 2/3/4 receives the data request command, it passes the received request command to the application layer module. After receiving the command, the application layer module of node 2/3/4 starts to obtain their own shared hardware Resource data, and after obtaining the shared hardware resource data, synchronize to node 1 through their respective data synchronization modules;
  • node 1 After node 1 receives the shared hardware resource data synchronized by node 2/3/4, it checks whether it has saved the data synchronized by other nodes. In response to node 1 saving the data synchronized by node 2/3/4, send a completion flag to node 2/3/4, and exit the check, waiting for the next trigger detection; in response to the lack of data synchronized by one or more nodes , for example, the shared hardware resource data of node 2 is missing, indicating that the synchronization is not completed, and the shared hardware resource data synchronization of node 2 fails.
  • Node 1 independently initiates a data request command to node 2, repeats the above process, and responds to the fact that data synchronization still fails after multiple initiations, node 1 generates an alarm status, and synchronizes the alarm status to node 3/4.
  • node 1 After node 1 generates an alarm, this node continues to re-initiate synchronization to node 2, trying to repair. After the response is successfully repaired, the alarm status is eliminated and recorded in the log, so as to increase the stability of data synchronization. In response to multiple attempts still failing, report to the system and disconnect node 2.
  • node 1/2/3/4 is detected during operation through heartbeat or in-position status. If any node restarts or a node is unplugged or upgraded, the synchronization flag of the node will be lost by detecting the heartbeat status. And clear the corresponding shared hardware resource data, or clear the synchronization flag and clear the corresponding shared hardware resource data through the absence status of the node. After the node works normally, re-initiate data synchronization of shared hardware resources.
  • data synchronization can be arbitrarily performed between multiple nodes, especially when the server needs to manage the data status of the system to provide service support for business scenarios, it is particularly important to have data from other nodes at any node, such as management
  • the system provides key shared hardware resource data, which determines the establishment and business use of the cluster. The server needs to obtain this data during the boot process to ensure correct boot and the system can provide external services.
  • the data synchronization between any nodes is realized, and the data synchronization can be performed by broadcasting, multicasting or unicasting according to the state of the nodes, so that all nodes can save the shared hardware resource data of other nodes, ensuring The consistency of data between nodes is ensured, and the data synchronization speed is fast and the stability is good.
  • the embodiment of the present application also provides a data synchronization system between multiple nodes, including:
  • the sending module 110 is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than the triggering preset condition in the management platform;
  • Synchronization module 120 the synchronization module 120 is configured to obtain the shared hardware resource data and send it to the node after the specified node receives the data request command;
  • Checking module 130 the checking module 130 is configured for the node to receive the shared hardware resource data sent by the specified node, and check whether the shared hardware resource data of the specified node is synchronized at the node;
  • the completion module 140 is configured to send a completion flag to the specified node in response to the completion of the synchronization.
  • an embodiment of the present application also provides a computer device 20, which includes at least one processor 210 and at least one memory 220 , the memory 220 stores computer-readable instructions 221 executable by the processor, and the processor 210 implements the following method steps when executing the computer-readable instructions 221:
  • the node In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
  • the designated node After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
  • the node receives the shared hardware resource data sent by the specified node, and checks whether the shared hardware resource data of the specified node is synchronized at the node;
  • a completion flag is sent to the specified node.
  • the node checks whether the shared hardware resource data of the specified node is synchronized, including:
  • the node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.
  • the method steps further include:
  • the method steps further include: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
  • the method steps further include: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to loss of the heartbeat status between nodes.
  • the method steps further include: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
  • the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
  • the embodiment of the present application also provides a non-volatile readable storage medium 30, which stores Computer readable instructions 310 that, when executed by a processor, perform a method of:
  • the node In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
  • the designated node After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
  • the node receives the shared hardware resource data sent by the specified node, and checks whether the shared hardware resource data of the specified node is synchronized at the node;
  • a completion flag is sent to the specified node.
  • the node checks whether the shared hardware resource data of the specified node is synchronized, including:
  • the node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.
  • the method further comprises:
  • the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
  • the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.
  • the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
  • the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
  • the computer-readable instructions when executed, may include the processes of the embodiments of the above-mentioned methods.
  • the storage medium of the computer-readable instructions may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM).
  • ROM read-only memory
  • RAM random access memory
  • the computer-readable instructions can be stored in a non-volatile memory
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Abstract

Disclosed in the present application are a method for data synchronisation, and a system, a device, and a storage medium, the method comprising: in response to a node of a management platform triggering a preset condition, sending a data request command to a specified node, the specified node being a node in the management platform other than the node triggering the preset condition; after receiving the data request command, the specified node acquiring shared hardware resource data and sending same to the node; checking whether synchronisation of the shared hardware resource data is complete; and, in response to the synchronisation being complete, sending a completion flag to the specified node.

Description

一种多节点间的数据同步方法、系统、设备及存储介质A data synchronization method, system, device and storage medium among multiple nodes
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年9月19日提交中国专利局,申请号为202111102440.4,申请名称为“一种多节点间的数据同步方法、系统、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 19, 2021, with the application number 202111102440.4, and the application name is "A method, system, device and storage medium for data synchronization between multiple nodes". The entire contents are incorporated by reference in this application.
技术领域technical field
本申请涉及数据管理技术领域,尤其涉及一种多节点间的数据同步方法、系统、设备及非易失性可读存储介质。The present application relates to the technical field of data management, and in particular to a data synchronization method, system, device and non-volatile readable storage medium among multiple nodes.
背景技术Background technique
在服务器的管理平台中,为保证服务器产品的稳定性,通常将多个服务器组成一个操作系统,操作系统内每个服务器节点的硬件配置、硬件属性都是一致的,操作系统可以访问众多的硬件资源,其中,对于所有服务器节点都访问的一部分硬件资源,称为共享资源。In the server management platform, in order to ensure the stability of server products, multiple servers are usually formed into an operating system. The hardware configuration and hardware attributes of each server node in the operating system are consistent, and the operating system can access many hardware Resources, among them, a part of hardware resources that are accessed by all server nodes are called shared resources.
当前在服务器管理平台的多个(尤其三个或者三个以上)服务器节点在上电的过程中或者重启或升级后,都需要共享硬件资源数据同步。现有的多节点的共享硬件资源数据同步是以一个节点为主,其是中心节点位置,由中心节点向其他节点进行数据同步。此方法不能实现任意节点之间的直接共享硬件资源数据同步,尤其在大量共享硬件资源数据同步时,中心节点的共享硬件资源数据压力大,会出现共享资源同步数据耗时过长的问题,并且服务器在使用过程中,一旦出现同步共享资源失败,这对服务器的产品而言,是难以接受的,因为在服务器行业中,对服务器的产品有不宕机、不丢数据的要求,这也意味着服务器产品对稳定性有极高的要求。Currently, multiple (especially three or more) server nodes on the server management platform need to share hardware resource data synchronization during power-on or after restart or upgrade. Existing multi-node shared hardware resource data synchronization is mainly based on one node, which is the position of the central node, and the central node performs data synchronization to other nodes. This method cannot realize the direct shared hardware resource data synchronization between arbitrary nodes, especially when a large number of shared hardware resource data are synchronized, the shared hardware resource data pressure of the central node is high, and there will be a problem that the shared resource synchronization data takes too long, and During the use of the server, once the synchronization of shared resources fails, this is unacceptable for server products, because in the server industry, there are requirements for server products not to stop or lose data, which also means Server products have extremely high requirements for stability.
发明内容Contents of the invention
有鉴于此,本申请提出了一种多节点间的数据同步方法、系统、设备及非易失性可 读存储介质。In view of this, this application proposes a data synchronization method, system, device and non-volatile readable storage medium among multiple nodes.
基于上述目的,本申请实施例的一方面提供了一种多节点间的数据同步方法,具体包括如下步骤:Based on the above purpose, an aspect of the embodiment of the present application provides a method for synchronizing data between multiple nodes, which specifically includes the following steps:
响应于管理平台中的节点触发预设条件,节点向管理平台内的指定节点发送数据请求命令,其中指定节点为管理平台内触发预设条件以外的节点;In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点;After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
节点接收指定节点发送的共享硬件资源数据,并在节点检查指定节点的共享硬件资源数据是否同步完成;以及The node receives the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized; and
响应于同步完成,向指定节点发送完成标志。In response to completion of the synchronization, a completion flag is sent to the specified node.
在一些实施方式中,在节点检查指定节点的共享硬件资源数据是否同步完成,包括:In some implementation manners, the node checks whether the shared hardware resource data of the specified node is synchronized, including:
在节点检查是否保存指定节点的共享硬件资源数据,响应于保存指定节点的共享硬件资源数据,确定同步完成,响应于没保存指定节点的共享硬件资源数据,确定同步失败。The node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.
在一些实施方式中,方法进一步包括:In some embodiments, the method further comprises:
响应于同步失败,记录同步失败的节点的失败次数,并判断失败次数是否小于预设次数;以及In response to a synchronization failure, record the number of failures of the node where the synchronization failed, and determine whether the number of failures is less than a preset number of times; and
响应于失败次数小于预设次数,返回响应于管理平台中的节点触发预设条件的步骤以向同步失败的节点重新发起同步。In response to the number of times of failure being less than the preset number of times, returning to the step of triggering the preset condition in response to the node in the management platform to re-initiate synchronization to the node where the synchronization failed.
在一些实施方式中,方法进一步包括:响应于失败次数大于预设次数,节点产生告警状态,并将告警状态同步至管理平台内的其余节点。In some embodiments, the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
在一些实施方式中,方法进一步包括:通过心跳监测各个节点间的连接状态,并响应于节点间的心跳状态丢失,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.
在一些实施方式中,方法进一步包括:监测节点的在位状态,响应于节点不在位,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
在一些实施方式中,触发预设条件包括:节点复位启动、上电启动、同步失败中的任意一个。In some implementations, the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
在一些实施方式中,响应于管理平台中的节点触发预设条件,节点向管理平台内的 指定节点发送数据请求命令步骤,包括:In some embodiments, in response to a node in the management platform triggering a preset condition, the node sends a data request command step to a designated node in the management platform, including:
响应于管理平台中的节点触发预设条件,节点的应用层模块获取数据请求命令;以及In response to a node in the management platform triggering a preset condition, the application layer module of the node acquires a data request command; and
节点的数据同步模块以广播形式,将数据请求命令发送给指定节点。The data synchronization module of the node sends the data request command to the specified node in the form of broadcast.
在一些实施方式中,指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点步骤,包括:In some embodiments, after receiving the data request command, the specified node obtains the shared hardware resource data and sends it to the node, including:
指定节点的数据同步模块接收到数据请求命令后,将数据请求命令传递给指定节点的应用层模块;以及After receiving the data request command, the data synchronization module of the designated node transmits the data request command to the application layer module of the designated node; and
所指定节点的应用层模块在获取到数据请求命令后,获取共享硬件资源数据并发送给节点。After the application layer module of the designated node obtains the data request command, it obtains the shared hardware resource data and sends it to the node.
在一些实施方式中,共享硬件资源数据包括服务器的管控数据。In some implementations, the shared hardware resource data includes management and control data of the server.
在一些实施方式中,服务器的管控数据包括服务器的温度、电压、生产厂家、系统版本以及电源。In some implementations, the management and control data of the server includes temperature, voltage, manufacturer, system version and power supply of the server.
在一些实施方式中,响应于失败次数大于预设次数,节点产生告警状态步骤之后,还包括:In some implementations, after the node generates an alarm state in response to the number of failures being greater than the preset number of times, the step further includes:
节点继续向指定节点发起对共享硬件资源数据的同步,以进行修复;以及The node continues to initiate the synchronization of shared hardware resource data to the designated node for repair; and
响应于修复成功,消除告警状态,并记录到日记。In response to a successful repair, the alert state is cleared and logged.
在一些实施方式中,监测节点的在位状态,响应于节点不在位,清除对应节点的同步标志和共享硬件资源数据步骤,包括:In some embodiments, monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data step of the corresponding node in response to the absence of the node include:
监测节点是否在槽位;Monitor whether the node is in the slot;
响应于节点在槽位,确定节点的在位状态为在位;以及Responsive to the node being in the slot, determining the presence status of the node as being present; and
响应于节点不在槽位,确定节点的在位状态为不在位。In response to the node being out of the slot, the presence status of the node is determined to be out of presence.
在一些实施方式中,响应于节点在槽位,确定节点的在位状态为在位步骤之后,还包括:In some embodiments, in response to the node being in the slot, after determining that the in-position status of the node is the in-position step, it also includes:
响应于节点在位,进行共享硬件资源数据的同步。Synchronization of shared hardware resource data occurs in response to node presence.
在一些实施方式中,响应于同步完成,向指定节点发送完成标志步骤之后,还包括:In some implementations, after the step of sending a completion flag to the specified node in response to the completion of the synchronization, further include:
节点退出对指定节点的共享硬件资源数据是否同步完成的检查,并等待下一次的触发检查。The node exits the check of whether the shared hardware resource data of the specified node is synchronized, and waits for the next trigger check.
在一些实施方式中,判断失败次数是否小于预设次数步骤之后,还包括:In some embodiments, after the step of judging whether the number of failures is less than the preset number of times, it also includes:
响应于失败次数大于预设次数,将同步失败的节点断开。In response to the number of times of failure being greater than the preset number of times, the node that fails to synchronize is disconnected.
本申请实施例的另一方面,还提供了一种多节点间的数据同步系统,包括:Another aspect of the embodiment of the present application also provides a data synchronization system between multiple nodes, including:
发送模块,发送模块配置为响应于管理平台中的节点触发预设条件,节点向管理平台内的指定节点发送数据请求命令,其中指定节点为管理平台内触发预设条件以外的节点;A sending module, the sending module is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
同步模块,同步模块配置为指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点;Synchronization module, the synchronization module is configured to obtain shared hardware resource data and send it to the node after the specified node receives the data request command;
检查模块,检查模块配置为节点接收指定节点发送的共享硬件资源数据,并在节点检查指定节点的共享硬件资源数据是否同步完成;以及A check module, the check module is configured for the node to receive the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized; and
完成模块,完成模块配置为响应于同步完成,向指定节点发送完成标志。A completion module configured to send a completion flag to a designated node in response to synchronization completion.
在一些实施方式中,检查模块包括:In some embodiments, the inspection module includes:
检查子模块,用于在节点检查是否保存指定节点的共享硬件资源数据;The check submodule is used to check whether to save the shared hardware resource data of the specified node on the node;
同步完成子模块,用于响应于保存指定节点的共享硬件资源数据,则同步完成;以及A synchronization completion submodule, configured to complete the synchronization in response to saving the shared hardware resource data of the designated node; and
同步失败子模块,用于响应于没保存指定节点的共享硬件资源数据,则同步失败。The synchronization failure sub-module is configured to fail the synchronization in response to not saving the shared hardware resource data of the specified node.
本申请实施例的又一方面,还提供了一种计算机设备,包括:至少一个处理器;以及至少一个存储器,用于存储计算机可读指令,至少一个处理器执行计算机可读指令,以实现任一实施例中的多节点间的数据同步方法的步骤。In yet another aspect of the embodiments of the present application, there is also provided a computer device, including: at least one processor; and at least one memory for storing computer-readable instructions, and at least one processor executes computer-readable instructions to implement any Steps of a data synchronization method among multiple nodes in an embodiment.
本申请实施例的再一方面,还提供了一种非易失性可读存储介质,非易失性可读存储介质存储计算机可读指令,计算机可读指令被至少一个处理器执行时,使得至少一个处理器执行任一实施例中的多节点间的数据同步方法的步骤。In yet another aspect of the embodiments of the present application, a non-volatile readable storage medium is also provided. The non-volatile readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by at least one processor, the At least one processor executes the steps of the method for synchronizing data between multiple nodes in any embodiment.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application, and those skilled in the art can obtain other embodiments according to these drawings without creative efforts.
图1为本申请一个实施例或多个实施例提供的节点间的数据同步方法的框图;FIG. 1 is a block diagram of a data synchronization method between nodes provided by one or more embodiments of the present application;
图2为本申请一个实施例或多个实施例提供的多节点间的数据同步系统的示意图;FIG. 2 is a schematic diagram of a data synchronization system between multiple nodes provided by one or more embodiments of the present application;
图3为本申请一个实施例或多个实施例提供的多节点的互联结构示意图;FIG. 3 is a schematic diagram of a multi-node interconnection structure provided by one or more embodiments of the present application;
图4为本申请一个实施例或多个实施例提供的计算机设备的结构示意图;FIG. 4 is a schematic structural diagram of a computer device provided by one or more embodiments of the present application;
图5为本申请一个实施例或多个实施例提供的计算机可读存储介质的结构示意图。Fig. 5 is a schematic structural diagram of a computer-readable storage medium provided by one or more embodiments of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。In order to make the purpose, technical solution and advantages of the present application clearer, the embodiments of the present application will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings.
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本申请实施例的限定,后续实施例对此不再一一说明。It should be noted that all expressions using "first" and "second" in the embodiments of this application are to distinguish between two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present application, which will not be described one by one in the subsequent embodiments.
基于上述目的,本申请实施例的第一个方面,提出了一种多节点间的数据同步方法的实施例。如图1所示,其包括如下步骤:Based on the above purpose, the first aspect of the embodiments of the present application proposes an embodiment of a data synchronization method among multiple nodes. As shown in Figure 1, it includes the following steps:
步骤S101、响应于管理平台中的节点触发预设条件,节点向管理平台内的指定节点发送数据请求命令,其中指定节点为管理平台内触发预设条件以外的节点;Step S101, in response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
步骤S103、指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点;Step S103, after receiving the data request command, the designated node obtains the shared hardware resource data and sends it to the node;
步骤S105、节点接收指定节点发送的共享硬件资源数据,并在节点检查指定节点的共享硬件资源数据是否同步完成;Step S105, the node receives the shared hardware resource data sent by the designated node, and checks at the node whether the shared hardware resource data of the designated node is synchronized;
步骤S107、响应于同步完成,向指定节点发送完成标志。Step S107, in response to the completion of the synchronization, send a completion flag to the designated node.
具体的,每个节点是一台服务器,每个服务器包括数据同步模块和应用层模块,应用层模块用于获取要同步的数据或发起同步数据的请求,数据同步模块用于发送应用层模块获取的数据。指定节点可以是除触发预设条件的节点外的所有节点,也可以是触发 预设条件的节点外的一个或多个节点。共享硬件资源数据指的是服务器的管控数据,例如服务器的温度、电压、生产厂家、系统版本、电源等信息。Specifically, each node is a server. Each server includes a data synchronization module and an application layer module. The data. The specified nodes can be all nodes except the node triggering the preset condition, or one or more nodes except the node triggering the preset condition. Shared hardware resource data refers to server management and control data, such as server temperature, voltage, manufacturer, system version, power supply, and other information.
本实施例能够快速的把任意节点的共享硬件资源数据同步到本节点,实现了多节点数据的一致性,数据同步速度快,稳定性好。This embodiment can quickly synchronize the shared hardware resource data of any node to the current node, realizing the consistency of multi-node data, and the data synchronization speed is fast and the stability is good.
在一些实施方式中,在节点检查指定节点的共享硬件资源数据是否同步完成包括:In some implementation manners, checking at the node whether the shared hardware resource data of the specified node is synchronized includes:
在节点检查是否保存指定节点的共享硬件资源数据,响应于保存指定节点的共享硬件资源数据,则同步完成,响应于没保存指定节点的共享硬件资源数据,则同步失败。The node checks whether to save the shared hardware resource data of the specified node, if the shared hardware resource data of the specified node is saved, the synchronization is completed, and if the shared hardware resource data of the specified node is not saved, the synchronization fails.
在一些实施方式中,方法进一步包括:In some embodiments, the method further comprises:
响应于同步失败,记录同步失败的节点的失败次数,并判断失败次数是否小于预设次数;In response to a synchronization failure, record the number of failures of the node where the synchronization failed, and determine whether the number of failures is less than a preset number of times;
响应于失败次数小于预设次数,返回响应于管理平台中的节点触发预设条件的步骤以向同步失败的节点重新发起同步。In response to the number of times of failure being less than the preset number of times, returning to the step of triggering the preset condition in response to the node in the management platform to re-initiate synchronization to the node where the synchronization failed.
通过多次发起数据同步,保证了多节点间数据同步的稳定性。By initiating data synchronization multiple times, the stability of data synchronization between multiple nodes is guaranteed.
在一些实施方式中,方法进一步包括:响应于失败次数大于预设次数,节点产生告警状态,并将告警状态同步至管理平台内的其余节点。In some embodiments, the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
在一些实施方式中,方法进一步包括:通过心跳监测各个节点间的连接状态,并响应于节点间的心跳状态丢失,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.
通过心跳监测各个节点间的连接状态,能够在节点间心跳丢失时及时发现异常,并在节点间连接恢复正常后及时进行数据同步,保证了节点间数据的一致性,提高了服务器的稳定性。By monitoring the connection status between nodes through heartbeat, it is possible to detect abnormalities in time when the heartbeat between nodes is lost, and to synchronize data in time after the connection between nodes returns to normal, ensuring the consistency of data between nodes and improving the stability of the server.
在一些实施方式中,方法进一步包括:监测节点的在位状态,响应于节点不在位,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
监测在位状态即监测节点是否在槽位,通过监测各个节点的在位状态,能够发现不在位的节点,并在节点间重新插入槽位后及时进行数据同步,保证了节点间数据的一致性,提高了服务器的稳定性。Monitoring the presence status is to monitor whether the node is in the slot. By monitoring the presence status of each node, it is possible to find the node that is not in position, and perform data synchronization in time after the slot is reinserted between nodes, ensuring the consistency of data between nodes , improved server stability.
在一些实施方式中,触发预设条件包括:节点复位启动、上电启动、同步失败中的任意一个。In some implementations, the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
下面通过具体的实施例对本申请的多个实施方式进行说明。Multiple implementations of the present application will be described below through specific examples.
假设当前服务器管理平台中有4个服务器节点互联组成一个操作系统,4个节点的互联示意图如图3所示。Assume that there are 4 server nodes interconnected to form an operating system in the current server management platform, and the schematic diagram of the interconnection of the 4 nodes is shown in Figure 3.
节点1刚刚复位启动,节点1在复位后,节点1的应用层模块发起数据请求命令,数据请求命令通过节点1的数据同步模块以广播形式分发到框内所有节点(本节点除外,即节点2/3/4);Node 1 has just been reset and started. After node 1 is reset, the application layer module of node 1 initiates a data request command, and the data request command is broadcast to all nodes in the frame through the data synchronization module of node 1 (except this node, that is, node 2 /3/4);
节点2/3/4的数据同步模块收到数据请求命令后,将收到的请求命令传递给应用层模块,节点2/3/4的应用层模块收到命令后,开始获取各自的共享硬件资源数据,并在获取完共享硬件资源数据后通过各自的数据同步模块向节点1同步;After the data synchronization module of node 2/3/4 receives the data request command, it passes the received request command to the application layer module. After receiving the command, the application layer module of node 2/3/4 starts to obtain their own shared hardware Resource data, and after obtaining the shared hardware resource data, synchronize to node 1 through their respective data synchronization modules;
节点1接收到节点2/3/4同步的共享硬件资源数据后,检查自己是否保存了其他节点同步的数据。响应于节点1保存了节点2/3/4同步的数据,向节点2/3/4发送完成标志,并退出检查,等待下一次的触发检测;响应于缺失了一个或多个节点同步的数据,例如,缺失了节点2的共享硬件资源数据,说明同步未完成,节点2的共享硬件资源数据同步失败。After node 1 receives the shared hardware resource data synchronized by node 2/3/4, it checks whether it has saved the data synchronized by other nodes. In response to node 1 saving the data synchronized by node 2/3/4, send a completion flag to node 2/3/4, and exit the check, waiting for the next trigger detection; in response to the lack of data synchronized by one or more nodes , for example, the shared hardware resource data of node 2 is missing, indicating that the synchronization is not completed, and the shared hardware resource data synchronization of node 2 fails.
节点1向节点2单独发起数据请求命令,重复上述过程,响应于经过多次发起数据同步仍然失败,节点1生成告警状态,将告警状态同步到节点3/4。Node 1 independently initiates a data request command to node 2, repeats the above process, and responds to the fact that data synchronization still fails after multiple initiations, node 1 generates an alarm status, and synchronizes the alarm status to node 3/4.
更进一步的,在节点1产生告警后,此节点仍继续向节点2重新发起同步,尝试修复。响应于修复成功后,消除告警状态,并记录到日志,以此来增加数据同步的稳定性。响应于尝试多次仍然失败,则上报系统,将节点2断开。Furthermore, after node 1 generates an alarm, this node continues to re-initiate synchronization to node 2, trying to repair. After the response is successfully repaired, the alarm status is eliminated and recorded in the log, so as to increase the stability of data synchronization. In response to multiple attempts still failing, report to the system and disconnect node 2.
更进一步的,通过心跳或者在位状态检测节点1/2/3/4在运行过程中的稳定性,任意节点发生重启或者节点拔插或者升级场景,则通过检测心跳状态丢失该节点的同步标志并清除对应的共享硬件资源数据,或者通过该节点的不在位状态清除同步标志和清除对应的共享硬件资源数据。在该节点正常工作后,重新发起共享硬件资源数据同步。Furthermore, the stability of node 1/2/3/4 is detected during operation through heartbeat or in-position status. If any node restarts or a node is unplugged or upgraded, the synchronization flag of the node will be lost by detecting the heartbeat status. And clear the corresponding shared hardware resource data, or clear the synchronization flag and clear the corresponding shared hardware resource data through the absence status of the node. After the node works normally, re-initiate data synchronization of shared hardware resources.
通过上述方案,可以在多个节点之间任意的进行数据同步,特别是在服务器需要管理系统的数据状态为业务场景提供服务支撑的时候,在任意节点有其他节点的数据就尤为重要,例如管理系统提供关键共享硬件资源数据,此关键数据决定了集群的建立和业务使用。服务器需要在开机过程中获取此数据,确保正确开机且系统可以对外提供服务。Through the above scheme, data synchronization can be arbitrarily performed between multiple nodes, especially when the server needs to manage the data status of the system to provide service support for business scenarios, it is particularly important to have data from other nodes at any node, such as management The system provides key shared hardware resource data, which determines the establishment and business use of the cluster. The server needs to obtain this data during the boot process to ensure correct boot and the system can provide external services.
通过本申请的方案,实现了任意节点间的数据同步,且根据节点的状态可以选择广播、组播或单播的方式进行数据同步,使所有节点均能够保存其他节点的共享硬件资源 数据,保证了节点间数据的一致性,数据同步速度快,稳定性好。Through the scheme of this application, the data synchronization between any nodes is realized, and the data synchronization can be performed by broadcasting, multicasting or unicasting according to the state of the nodes, so that all nodes can save the shared hardware resource data of other nodes, ensuring The consistency of data between nodes is ensured, and the data synchronization speed is fast and the stability is good.
基于同一申请构思,根据本申请的另一个方面,如图2所示,本申请的实施例还提供了一种多节点间的数据同步系统,包括:Based on the same application concept, according to another aspect of the present application, as shown in Figure 2, the embodiment of the present application also provides a data synchronization system between multiple nodes, including:
发送模块110,发送模块110配置为响应于管理平台中的节点触发预设条件,节点向管理平台内的指定节点发送数据请求命令,其中指定节点为管理平台内触发预设条件以外的节点;The sending module 110, the sending module 110 is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than the triggering preset condition in the management platform;
同步模块120,同步模块120配置为指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点; Synchronization module 120, the synchronization module 120 is configured to obtain the shared hardware resource data and send it to the node after the specified node receives the data request command;
检查模块130,检查模块130配置为节点接收指定节点发送的共享硬件资源数据,并在节点检查指定节点的共享硬件资源数据是否同步完成;Checking module 130, the checking module 130 is configured for the node to receive the shared hardware resource data sent by the specified node, and check whether the shared hardware resource data of the specified node is synchronized at the node;
完成模块140,完成模块140配置为响应于同步完成,向指定节点发送完成标志。The completion module 140 is configured to send a completion flag to the specified node in response to the completion of the synchronization.
基于同一申请构思,根据本申请的另一个方面,如图4所示,本申请的实施例还提供了一种计算机设备20,在该计算机设备20中包括至少一个处理器210以及至少一个存储器220,存储器220存储有可由处理器上执行的计算机可读指令221,处理器210执行计算机可读指令221时实现如下的方法步骤:Based on the same application idea, according to another aspect of the present application, as shown in FIG. 4 , an embodiment of the present application also provides a computer device 20, which includes at least one processor 210 and at least one memory 220 , the memory 220 stores computer-readable instructions 221 executable by the processor, and the processor 210 implements the following method steps when executing the computer-readable instructions 221:
响应于管理平台中的节点触发预设条件,节点向管理平台内的指定节点发送数据请求命令,其中指定节点为管理平台内触发预设条件以外的节点;In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点;After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
节点接收指定节点发送的共享硬件资源数据,并在节点检查指定节点的共享硬件资源数据是否同步完成;The node receives the shared hardware resource data sent by the specified node, and checks whether the shared hardware resource data of the specified node is synchronized at the node;
响应于同步完成,向指定节点发送完成标志。In response to completion of the synchronization, a completion flag is sent to the specified node.
在一些实施方式中,在节点检查指定节点的共享硬件资源数据是否同步完成,包括:In some implementation manners, the node checks whether the shared hardware resource data of the specified node is synchronized, including:
在节点检查是否保存指定节点的共享硬件资源数据,响应于保存指定节点的共享硬件资源数据,确定同步完成,响应于没保存指定节点的共享硬件资源数据,确定同步失败。The node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.
在一些实施方式中,方法步骤进一步包括:In some embodiments, the method steps further include:
响应于同步失败,记录同步失败的节点的失败次数,并判断失败次数是否小于预设 次数;In response to a synchronization failure, record the number of failures of the node where the synchronization failed, and determine whether the number of failures is less than a preset number of times;
响应于失败次数小于预设次数,返回响应于管理平台中的节点触发预设条件的步骤以向同步失败的节点重新发起同步。In response to the number of times of failure being less than the preset number of times, returning to the step of triggering the preset condition in response to the node in the management platform to re-initiate synchronization to the node where the synchronization failed.
在一些实施方式中,方法步骤进一步包括:响应于失败次数大于预设次数,节点产生告警状态,并将告警状态同步至管理平台内的其余节点。In some embodiments, the method steps further include: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
在一些实施方式中,方法步骤进一步包括:通过心跳监测各个节点间的连接状态,并响应于节点间的心跳状态丢失,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method steps further include: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to loss of the heartbeat status between nodes.
在一些实施方式中,方法步骤进一步包括:监测节点的在位状态,响应于节点不在位,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method steps further include: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
在一些实施方式中,触发预设条件包括:节点复位启动、上电启动、同步失败中的任意一个。In some implementations, the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
基于同一申请构思,根据本申请的另一个方面,如图5所示,本申请的实施例还提供了一种非易失性可读存储介质30,非易失性可读存储介质30存储有被处理器执行时执行如下方法的计算机可读指令310:Based on the same application concept, according to another aspect of the present application, as shown in FIG. 5 , the embodiment of the present application also provides a non-volatile readable storage medium 30, which stores Computer readable instructions 310 that, when executed by a processor, perform a method of:
响应于管理平台中的节点触发预设条件,节点向管理平台内的指定节点发送数据请求命令,其中指定节点为管理平台内触发预设条件以外的节点;In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
指定节点接收到数据请求命令后,获取共享硬件资源数据并发送到节点;After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
节点接收指定节点发送的共享硬件资源数据,并在节点检查指定节点的共享硬件资源数据是否同步完成;The node receives the shared hardware resource data sent by the specified node, and checks whether the shared hardware resource data of the specified node is synchronized at the node;
响应于同步完成,向指定节点发送完成标志。In response to completion of the synchronization, a completion flag is sent to the specified node.
在一些实施方式中,在节点检查指定节点的共享硬件资源数据是否同步完成,包括:In some implementation manners, the node checks whether the shared hardware resource data of the specified node is synchronized, including:
在节点检查是否保存指定节点的共享硬件资源数据,响应于保存指定节点的共享硬件资源数据,确定同步完成,响应于没保存指定节点的共享硬件资源数据,确定同步失败。The node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.
在一些实施方式中,方法进一步包括:In some embodiments, the method further comprises:
响应于同步失败,记录同步失败的节点的失败次数,并判断失败次数是否小于预设次数;In response to a synchronization failure, record the number of failures of the node where the synchronization failed, and determine whether the number of failures is less than a preset number of times;
响应于失败次数小于预设次数,返回响应于管理平台中的节点触发预设条件的步骤以向同步失败的节点重新发起同步。In response to the number of times of failure being less than the preset number of times, returning to the step of triggering the preset condition in response to the node in the management platform to re-initiate synchronization to the node where the synchronization failed.
在一些实施方式中,方法进一步包括:响应于失败次数大于预设次数,节点产生告警状态,并将告警状态同步至管理平台内的其余节点。In some embodiments, the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.
在一些实施方式中,方法进一步包括:通过心跳监测各个节点间的连接状态,并响应于节点间的心跳状态丢失,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.
在一些实施方式中,方法进一步包括:监测节点的在位状态,响应于节点不在位,则清除对应节点的同步标志和共享硬件资源数据。In some embodiments, the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.
在一些实施方式中,触发预设条件包括:节点复位启动、上电启动、同步失败中的任意一个。In some implementations, the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.
最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过计算机可读指令来指令相关硬件来完成,指令可存储于一非易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,计算机可读指令的存储介质可为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。上述计算机可读指令的实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。Finally, it should be noted that those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the instructions can be stored in a non-volatile readable storage In the medium, when executed, the computer-readable instructions may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium of the computer-readable instructions may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM). The above embodiments of computer-readable instructions can achieve the same or similar effect as any of the above-mentioned method embodiments corresponding thereto.
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现的功能,但是这种实现决定不应被解释为导致脱离本申请实施例公开的范围。Those of skill would also appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present application.
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。The above are the exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or required in an individual form, they may also be understood as plural unless explicitly limited to a singular number.
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个 或者一个以上相关联地列出的项目的任意和所有可能组合。It should be understood that as used herein, the singular form "a" and "an" are intended to include the plural forms as well, unless the context clearly supports an exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
上述本申请实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments disclosed in the above-mentioned embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过计算机可读指令来指令相关的硬件完成,计算机可读指令可以存储于一种非易失性可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, or by computer-readable instructions to instruct related hardware to complete. The computer-readable instructions can be stored in a non-volatile memory In the read storage medium, the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present application, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present application, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present application.

Claims (20)

  1. 一种多节点间的数据同步方法,其特征在于,包括:A method for synchronizing data between multiple nodes, comprising:
    响应于管理平台中的节点触发预设条件,所述节点向管理平台内的指定节点发送数据请求命令,其中指定节点为所述管理平台内触发预设条件以外的节点;In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;
    所述指定节点接收到所述数据请求命令后,获取共享硬件资源数据并发送到所述节点;After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;
    所述节点接收所述指定节点发送的共享硬件资源数据,并在所述节点检查所述指定节点的共享硬件资源数据是否同步完成;以及The node receives the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized; and
    响应于同步完成,向所述指定节点发送完成标志。In response to completion of the synchronization, a completion flag is sent to the designated node.
  2. 根据权利要求1所述的方法,其特征在于,所述在所述节点检查所述指定节点的共享硬件资源数据是否同步完成步骤,包括:The method according to claim 1, wherein the step of checking at the node whether the shared hardware resource data of the designated node is synchronized includes:
    在所述节点检查是否保存所述指定节点的共享硬件资源数据;Checking at the node whether to save the shared hardware resource data of the specified node;
    响应于保存所述指定节点的共享硬件资源数据,确定同步完成;以及determining that synchronization is complete in response to saving the shared hardware resource data for the designated node; and
    响应于没保存所述指定节点的共享硬件资源数据,确定同步失败。In response to the shared hardware resource data of the designated node not being saved, it is determined that the synchronization fails.
  3. 根据权利要求1所述的方法,其特征在于,进一步包括:The method according to claim 1, further comprising:
    响应于同步失败,记录同步失败的节点的失败次数,并判断所述失败次数是否小于预设次数;以及In response to a synchronization failure, record the number of failures of the node whose synchronization failed, and determine whether the number of failures is less than a preset number of times; and
    响应于所述失败次数小于所述预设次数,返回响应于管理平台中的节点触发预设条件,以向同步失败的节点重新发起同步。In response to the number of times of failure being less than the preset number of times, a response is returned to the node in the management platform to trigger a preset condition, so as to re-initiate synchronization to the node where synchronization failed.
  4. 根据权利要求3所述的方法,其特征在于,进一步包括:The method according to claim 3, further comprising:
    响应于所述失败次数大于所述预设次数,所述节点产生告警状态,并将所述告警状态同步至管理平台内的其余节点。In response to the failure times being greater than the preset times, the node generates an alarm status and synchronizes the alarm status to other nodes in the management platform.
  5. 根据权利要求1所述的方法,其特征在于,进一步包括:The method according to claim 1, further comprising:
    通过心跳监测各个节点间的连接状态,响应于所述节点间的心跳状态丢失,清除对应节点的同步标志和共享硬件资源数据。The connection state between the nodes is monitored through the heartbeat, and the synchronization flag and the shared hardware resource data of the corresponding node are cleared in response to the loss of the heartbeat state between the nodes.
  6. 根据权利要求1所述的方法,其特征在于,进一步包括:The method according to claim 1, further comprising:
    监测所述节点的在位状态,响应于所述节点不在位,清除对应节点的同步标志和共享硬件资源数据。Monitoring the presence status of the node, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the absence of the node.
  7. 根据权利要求1所述的方法,其特征在于,所述触发预设条件包括:所述节点复位启动、上电启动、同步失败中的任意一个。The method according to claim 1, wherein the trigger preset condition includes: any one of the node reset start, power-on start, and synchronization failure.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述响应于管理平台中的节点触发预设条件,所述节点向管理平台内的指定节点发送数据请求命令步骤,包括:The method according to any one of claims 1-7, characterized in that, in response to a node in the management platform triggering a preset condition, the node sends a data request command step to a specified node in the management platform, comprising:
    响应于所述管理平台中的所述节点触发预设条件,所述节点的应用层模块获取所述数据请求命令;以及In response to the node in the management platform triggering a preset condition, the application layer module of the node acquires the data request command; and
    所述节点的数据同步模块以广播形式,将所述数据请求命令发送给所述指定节点。The data synchronization module of the node sends the data request command to the designated node in a broadcast form.
  9. 根据权利要求1-7任一项所述的方法,其特征在于,所述所述指定节点接收到所述数据请求命令后,获取共享硬件资源数据并发送到所述节点步骤,包括:The method according to any one of claims 1-7, wherein after the designated node receives the data request command, the step of obtaining shared hardware resource data and sending it to the node includes:
    所述指定节点的数据同步模块接收到所述数据请求命令后,将所述数据请求命令传递给所述指定节点的应用层模块;以及After receiving the data request command, the data synchronization module of the designated node transmits the data request command to the application layer module of the designated node; and
    所指定节点的所述应用层模块在获取到所述数据请求命令后,获取所述共享硬件资源数据并发送给所述节点。After obtaining the data request command, the application layer module of the designated node obtains the shared hardware resource data and sends it to the node.
  10. 根据权利要求1-7任一项所述的方法,其特征在于,所述共享硬件资源数据包括服务器的管控数据。The method according to any one of claims 1-7, wherein the shared hardware resource data includes server management and control data.
  11. 根据权利要求10所述的方法,其特征在于,所述服务器的所管控数据包括服务器的温度、电压、生产厂家、系统版本以及电源。The method according to claim 10, wherein the controlled data of the server includes temperature, voltage, manufacturer, system version and power supply of the server.
  12. 根据权利要求4所述的多节点间的数据同步方法,其特征在于,所述响应于所述失败次数大于所述预设次数,所述节点产生告警状态步骤之后,还包括:The method for synchronizing data between multiple nodes according to claim 4, wherein, after the step of generating an alarm state in response to the number of failures being greater than the preset number of times, the node further includes:
    所述节点继续向所述指定节点发起对所述共享硬件资源数据的同步,以进行修复;以及The node continues to initiate synchronization of the shared hardware resource data to the designated node for repair; and
    响应于修复成功,消除所述告警状态,并记录到日记。In response to successful repair, the alarm state is eliminated and recorded in a diary.
  13. 根据权利要求6所述的方法,其特征在于,所述监测所述节点的在位状态,响应于所述节点不在位,清除对应节点的同步标志和共享硬件资源数据步骤,包括:The method according to claim 6, wherein the step of monitoring the presence status of the node and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the absence of the node comprises:
    监测所述节点是否在槽位;monitoring whether the node is in the slot;
    响应于所述节点在槽位,确定所述节点的在位状态为在位;以及In response to the node being in the slot, determining the presence status of the node to be present; and
    响应于所述节点不在槽位,确定所述节点的在位状态为不在位。In response to the node not being in the slot, it is determined that the presence status of the node is not present.
  14. 根据权利要求13所述的方法,其特征在于,所述响应于所述节点在槽位,确定所述节点的在位状态为在位步骤之后,还包括:The method according to claim 13, wherein after the step of determining that the presence status of the node is in position in response to the node being in the slot, further comprising:
    响应于所述节点在位,进行共享硬件资源数据的同步。Synchronization of shared hardware resource data is performed in response to the node being present.
  15. 根据权利要求1-7任一项所述的方法,其特征在于,所述响应于同步完成,向所述指定节点发送完成标志步骤之后,还包括:The method according to any one of claims 1-7, wherein after the step of sending a completion flag to the specified node in response to completion of the synchronization, further comprising:
    所述节点退出对所述指定节点的共享硬件资源数据是否同步完成的检查,并等待下一次的触发检查。The node quits checking whether the shared hardware resource data of the designated node is synchronized, and waits for the next trigger check.
  16. 根据权利要求3所述的方法,其特征在于,所述判断所述失败次数是否小于预设次数步骤之后,还包括:The method according to claim 3, wherein after the step of judging whether the number of failures is less than a preset number of times, further comprising:
    响应于所述失败次数大于所述预设次数,将同步失败的节点断开。In response to the number of times of failure being greater than the preset number of times, the node that fails to synchronize is disconnected.
  17. 一种多节点间的数据同步系统,其特征在于,包括:A data synchronization system between multiple nodes is characterized in that it comprises:
    发送模块,所述发送模块配置为响应于管理平台中的节点触发预设条件,所述节点向管理平台内的指定节点发送数据请求命令,其中指定节点为所述管理平台内触发预设条件以外的节点;A sending module, the sending module is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is outside the triggering preset condition in the management platform the node;
    同步模块,所述同步模块配置为所述指定节点接收到所述数据请求命令后,获取共享硬件资源数据并发送到所述节点;a synchronization module, the synchronization module is configured to obtain shared hardware resource data and send it to the node after the designated node receives the data request command;
    检查模块,所述检查模块配置为所述节点接收所述指定节点发送的共享硬件资源数据,并在所述节点检查所述指定节点的共享硬件资源数据是否同步完成;以及A check module, the check module is configured for the node to receive the shared hardware resource data sent by the specified node, and check at the node whether the shared hardware resource data of the specified node is synchronized; and
    完成模块,所述完成模块配置为响应于同步完成,向所述指定节点发送完成标志。A completion module configured to send a completion flag to the designated node in response to synchronization completion.
  18. 根据权利要求17所述的系统,其特征在于,所述检查模块包括:The system according to claim 17, wherein the checking module comprises:
    检查子模块,用于在所述节点检查是否保存所述指定节点的共享硬件资源数据;The checking submodule is used to check whether to save the shared hardware resource data of the specified node at the node;
    同步完成子模块,用于响应于保存所述指定节点的共享硬件资源数据,确定同步完成;以及a synchronization completion submodule, configured to determine that synchronization is complete in response to saving the shared hardware resource data of the designated node; and
    同步失败子模块,用于响应于没保存所述指定节点的共享硬件资源数据,确定同步失败。The synchronization failure submodule is configured to determine that the synchronization fails in response to the fact that the shared hardware resource data of the specified node is not saved.
  19. 一种计算机设备,包括:A computer device comprising:
    至少一个处理器;以及at least one processor; and
    至少一个存储器,用于存储计算机可读指令,at least one memory for storing computer readable instructions,
    其特征在于,所述至少一个处理器执行所述计算机可读指令,以实现如权利要求1-16任意一项所述的方法的步骤。It is characterized in that the at least one processor executes the computer-readable instructions to realize the steps of the method according to any one of claims 1-16.
  20. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质存储计算机可读指令,所述计算机可读指令被至少一个处理器执行时,使得所述至少一个处理器执行如权利要求1-16任意一项所述的方法的步骤。A non-volatile readable storage medium, wherein the non-volatile readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by at least one processor, the at least one The processor executes the steps of the method according to any one of claims 1-16.
PCT/CN2022/119470 2021-09-19 2022-09-16 Method for data synchronisation between multiple nodes, and system, device, and storage medium WO2023041073A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111102440.4 2021-09-19
CN202111102440.4A CN113890880A (en) 2021-09-19 2021-09-19 Method, system, equipment and storage medium for data synchronization among multiple nodes

Publications (1)

Publication Number Publication Date
WO2023041073A1 true WO2023041073A1 (en) 2023-03-23

Family

ID=79010059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/119470 WO2023041073A1 (en) 2021-09-19 2022-09-16 Method for data synchronisation between multiple nodes, and system, device, and storage medium

Country Status (2)

Country Link
CN (1) CN113890880A (en)
WO (1) WO2023041073A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113890880A (en) * 2021-09-19 2022-01-04 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for data synchronization among multiple nodes

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160092677A1 (en) * 2014-09-30 2016-03-31 Amazon Technologies, Inc. Allocation of shared system resources
CN106407264A (en) * 2016-08-25 2017-02-15 成都索贝数码科技股份有限公司 High-availability and high-consistency database cluster system and command processing method thereof
CN109885424A (en) * 2019-01-16 2019-06-14 平安科技(深圳)有限公司 A kind of data back up method, device and computer equipment
CN110941665A (en) * 2019-10-31 2020-03-31 北京浪潮数据技术有限公司 Data synchronization method, data synchronization device and data synchronization equipment between nodes
CN111031126A (en) * 2019-12-10 2020-04-17 江苏满运软件科技有限公司 Cluster cache sharing method, system, equipment and storage medium
CN112153133A (en) * 2020-09-18 2020-12-29 苏州浪潮智能科技有限公司 Data sharing method, device and medium
CN113890880A (en) * 2021-09-19 2022-01-04 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for data synchronization among multiple nodes

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160092677A1 (en) * 2014-09-30 2016-03-31 Amazon Technologies, Inc. Allocation of shared system resources
CN106407264A (en) * 2016-08-25 2017-02-15 成都索贝数码科技股份有限公司 High-availability and high-consistency database cluster system and command processing method thereof
CN109885424A (en) * 2019-01-16 2019-06-14 平安科技(深圳)有限公司 A kind of data back up method, device and computer equipment
CN110941665A (en) * 2019-10-31 2020-03-31 北京浪潮数据技术有限公司 Data synchronization method, data synchronization device and data synchronization equipment between nodes
CN111031126A (en) * 2019-12-10 2020-04-17 江苏满运软件科技有限公司 Cluster cache sharing method, system, equipment and storage medium
CN112153133A (en) * 2020-09-18 2020-12-29 苏州浪潮智能科技有限公司 Data sharing method, device and medium
CN113890880A (en) * 2021-09-19 2022-01-04 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for data synchronization among multiple nodes

Also Published As

Publication number Publication date
CN113890880A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
US10560315B2 (en) Method and device for processing failure in at least one distributed cluster, and system
US11809291B2 (en) Method and apparatus for redundancy in active-active cluster system
US9237092B2 (en) Method, apparatus, and system for updating ring network topology information
WO2016107173A1 (en) Post-cluster brain split quorum processing method and quorum storage device and system
CN105933407B (en) method and system for realizing high availability of Redis cluster
US20140032173A1 (en) Information processing apparatus, and monitoring method
US9886358B2 (en) Information processing method, computer-readable recording medium, and information processing system
CN102394914A (en) Cluster brain-split processing method and device
WO2016180005A1 (en) Method for processing virtual machine cluster and computer system
US20130205017A1 (en) Computer failure monitoring method and device
WO2023041073A1 (en) Method for data synchronisation between multiple nodes, and system, device, and storage medium
WO2009117946A1 (en) Main-spare realizing method for dispatch servers and dispatch server
CN111752488B (en) Management method and device of storage cluster, management node and storage medium
CN110971662A (en) Two-node high-availability implementation method and device based on Ceph
JP2016066303A (en) Server device, redundant configuration server system, information taking-over program and information taking-over method
CN110661599B (en) HA implementation method, device and storage medium between main node and standby node
EP2698949B1 (en) METHOD AND SYSTEM FOR SETTING DETECTION FRAME TIMEOUT DURATION OF ETHERNET NODEs
CN112468330B (en) Method, system, equipment and medium for setting fault node
CN111510336B (en) Network equipment state management method and device
CN109445984B (en) Service recovery method, device, arbitration server and storage system
CN110569303B (en) MySQL application layer high-availability system and method suitable for various cloud environments
US11954509B2 (en) Service continuation system and service continuation method between active and standby virtual servers
WO2019105067A1 (en) Channel establishment method and base station
CN115643237B (en) Data processing system for conference
CN115134220B (en) Master-slave server switching method and device, computing equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22869439

Country of ref document: EP

Kind code of ref document: A1