WO2023041073A1

WO2023041073A1 - Method for data synchronisation between multiple nodes, and system, device, and storage medium

Info

Publication number: WO2023041073A1
Application number: PCT/CN2022/119470
Authority: WO
Inventors: 刘涛
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2021-09-19
Filing date: 2022-09-16
Publication date: 2023-03-23
Also published as: CN113890880A

Abstract

Disclosed in the present application are a method for data synchronisation, and a system, a device, and a storage medium, the method comprising: in response to a node of a management platform triggering a preset condition, sending a data request command to a specified node, the specified node being a node in the management platform other than the node triggering the preset condition; after receiving the data request command, the specified node acquiring shared hardware resource data and sending same to the node; checking whether synchronisation of the shared hardware resource data is complete; and, in response to the synchronisation being complete, sending a completion flag to the specified node.

Description

A data synchronization method, system, device and storage medium among multiple nodes

Cross References to Related Applications

This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 19, 2021, with the application number 202111102440.4, and the application name is "A method, system, device and storage medium for data synchronization between multiple nodes". The entire contents are incorporated by reference in this application.

technical field

The present application relates to the technical field of data management, and in particular to a data synchronization method, system, device and non-volatile readable storage medium among multiple nodes.

Background technique

In the server management platform, in order to ensure the stability of server products, multiple servers are usually formed into an operating system. The hardware configuration and hardware attributes of each server node in the operating system are consistent, and the operating system can access many hardware Resources, among them, a part of hardware resources that are accessed by all server nodes are called shared resources.

Currently, multiple (especially three or more) server nodes on the server management platform need to share hardware resource data synchronization during power-on or after restart or upgrade. Existing multi-node shared hardware resource data synchronization is mainly based on one node, which is the position of the central node, and the central node performs data synchronization to other nodes. This method cannot realize the direct shared hardware resource data synchronization between arbitrary nodes, especially when a large number of shared hardware resource data are synchronized, the shared hardware resource data pressure of the central node is high, and there will be a problem that the shared resource synchronization data takes too long, and During the use of the server, once the synchronization of shared resources fails, this is unacceptable for server products, because in the server industry, there are requirements for server products not to stop or lose data, which also means Server products have extremely high requirements for stability.

Contents of the invention

In view of this, this application proposes a data synchronization method, system, device and non-volatile readable storage medium among multiple nodes.

Based on the above purpose, an aspect of the embodiment of the present application provides a method for synchronizing data between multiple nodes, which specifically includes the following steps:

In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;

After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;

The node receives the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized; and

In response to completion of the synchronization, a completion flag is sent to the specified node.

In some implementation manners, the node checks whether the shared hardware resource data of the specified node is synchronized, including:

The node checks whether to save the shared hardware resource data of the specified node, and determines that the synchronization is completed in response to saving the shared hardware resource data of the specified node, and determines that the synchronization fails in response to not saving the shared hardware resource data of the specified node.

In some embodiments, the method further comprises:

In response to a synchronization failure, record the number of failures of the node where the synchronization failed, and determine whether the number of failures is less than a preset number of times; and

In response to the number of times of failure being less than the preset number of times, returning to the step of triggering the preset condition in response to the node in the management platform to re-initiate synchronization to the node where the synchronization failed.

In some embodiments, the method further includes: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.

In some embodiments, the method further includes: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the loss of the heartbeat status between nodes.

In some embodiments, the method further includes: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.

In some implementations, the trigger preset condition includes: any one of node reset start, power-on start, and synchronization failure.

In some embodiments, in response to a node in the management platform triggering a preset condition, the node sends a data request command step to a designated node in the management platform, including:

In response to a node in the management platform triggering a preset condition, the application layer module of the node acquires a data request command; and

The data synchronization module of the node sends the data request command to the specified node in the form of broadcast.

In some embodiments, after receiving the data request command, the specified node obtains the shared hardware resource data and sends it to the node, including:

After receiving the data request command, the data synchronization module of the designated node transmits the data request command to the application layer module of the designated node; and

After the application layer module of the designated node obtains the data request command, it obtains the shared hardware resource data and sends it to the node.

In some implementations, the shared hardware resource data includes management and control data of the server.

In some implementations, the management and control data of the server includes temperature, voltage, manufacturer, system version and power supply of the server.

In some implementations, after the node generates an alarm state in response to the number of failures being greater than the preset number of times, the step further includes:

The node continues to initiate the synchronization of shared hardware resource data to the designated node for repair; and

In response to a successful repair, the alert state is cleared and logged.

In some embodiments, monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data step of the corresponding node in response to the absence of the node include:

Monitor whether the node is in the slot;

Responsive to the node being in the slot, determining the presence status of the node as being present; and

In response to the node being out of the slot, the presence status of the node is determined to be out of presence.

In some embodiments, in response to the node being in the slot, after determining that the in-position status of the node is the in-position step, it also includes:

Synchronization of shared hardware resource data occurs in response to node presence.

In some implementations, after the step of sending a completion flag to the specified node in response to the completion of the synchronization, further include:

The node exits the check of whether the shared hardware resource data of the specified node is synchronized, and waits for the next trigger check.

In some embodiments, after the step of judging whether the number of failures is less than the preset number of times, it also includes:

In response to the number of times of failure being greater than the preset number of times, the node that fails to synchronize is disconnected.

Another aspect of the embodiment of the present application also provides a data synchronization system between multiple nodes, including:

A sending module, the sending module is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;

Synchronization module, the synchronization module is configured to obtain shared hardware resource data and send it to the node after the specified node receives the data request command;

A check module, the check module is configured for the node to receive the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized; and

A completion module configured to send a completion flag to a designated node in response to synchronization completion.

In some embodiments, the inspection module includes:

The check submodule is used to check whether to save the shared hardware resource data of the specified node on the node;

A synchronization completion submodule, configured to complete the synchronization in response to saving the shared hardware resource data of the designated node; and

The synchronization failure sub-module is configured to fail the synchronization in response to not saving the shared hardware resource data of the specified node.

In yet another aspect of the embodiments of the present application, there is also provided a computer device, including: at least one processor; and at least one memory for storing computer-readable instructions, and at least one processor executes computer-readable instructions to implement any Steps of a data synchronization method among multiple nodes in an embodiment.

In yet another aspect of the embodiments of the present application, a non-volatile readable storage medium is also provided. The non-volatile readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by at least one processor, the At least one processor executes the steps of the method for synchronizing data between multiple nodes in any embodiment.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the application will be apparent from the description, drawings, and claims.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application, and those skilled in the art can obtain other embodiments according to these drawings without creative efforts.

FIG. 1 is a block diagram of a data synchronization method between nodes provided by one or more embodiments of the present application;

FIG. 2 is a schematic diagram of a data synchronization system between multiple nodes provided by one or more embodiments of the present application;

FIG. 3 is a schematic diagram of a multi-node interconnection structure provided by one or more embodiments of the present application;

FIG. 4 is a schematic structural diagram of a computer device provided by one or more embodiments of the present application;

Fig. 5 is a schematic structural diagram of a computer-readable storage medium provided by one or more embodiments of the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the embodiments of the present application will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of this application are to distinguish between two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present application, which will not be described one by one in the subsequent embodiments.

Based on the above purpose, the first aspect of the embodiments of the present application proposes an embodiment of a data synchronization method among multiple nodes. As shown in Figure 1, it includes the following steps:

Step S101, in response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;

Step S103, after receiving the data request command, the designated node obtains the shared hardware resource data and sends it to the node;

Step S105, the node receives the shared hardware resource data sent by the designated node, and checks at the node whether the shared hardware resource data of the designated node is synchronized;

Step S107, in response to the completion of the synchronization, send a completion flag to the designated node.

Specifically, each node is a server. Each server includes a data synchronization module and an application layer module. The data. The specified nodes can be all nodes except the node triggering the preset condition, or one or more nodes except the node triggering the preset condition. Shared hardware resource data refers to server management and control data, such as server temperature, voltage, manufacturer, system version, power supply, and other information.

This embodiment can quickly synchronize the shared hardware resource data of any node to the current node, realizing the consistency of multi-node data, and the data synchronization speed is fast and the stability is good.

In some implementation manners, checking at the node whether the shared hardware resource data of the specified node is synchronized includes:

The node checks whether to save the shared hardware resource data of the specified node, if the shared hardware resource data of the specified node is saved, the synchronization is completed, and if the shared hardware resource data of the specified node is not saved, the synchronization fails.

In some embodiments, the method further comprises:

In response to a synchronization failure, record the number of failures of the node where the synchronization failed, and determine whether the number of failures is less than a preset number of times;

By initiating data synchronization multiple times, the stability of data synchronization between multiple nodes is guaranteed.

By monitoring the connection status between nodes through heartbeat, it is possible to detect abnormalities in time when the heartbeat between nodes is lost, and to synchronize data in time after the connection between nodes returns to normal, ensuring the consistency of data between nodes and improving the stability of the server.

Monitoring the presence status is to monitor whether the node is in the slot. By monitoring the presence status of each node, it is possible to find the node that is not in position, and perform data synchronization in time after the slot is reinserted between nodes, ensuring the consistency of data between nodes , improved server stability.

Multiple implementations of the present application will be described below through specific examples.

Assume that there are 4 server nodes interconnected to form an operating system in the current server management platform, and the schematic diagram of the interconnection of the 4 nodes is shown in Figure 3.

Node 1 has just been reset and started. After node 1 is reset, the application layer module of node 1 initiates a data request command, and the data request command is broadcast to all nodes in the frame through the data synchronization module of node 1 (except this node, that is, node 2 /3/4);

After the data synchronization module of node 2/3/4 receives the data request command, it passes the received request command to the application layer module. After receiving the command, the application layer module of node 2/3/4 starts to obtain their own shared hardware Resource data, and after obtaining the shared hardware resource data, synchronize to node 1 through their respective data synchronization modules;

After node 1 receives the shared hardware resource data synchronized by node 2/3/4, it checks whether it has saved the data synchronized by other nodes. In response to node 1 saving the data synchronized by node 2/3/4, send a completion flag to node 2/3/4, and exit the check, waiting for the next trigger detection; in response to the lack of data synchronized by one or more nodes , for example, the shared hardware resource data of node 2 is missing, indicating that the synchronization is not completed, and the shared hardware resource data synchronization of node 2 fails.

Node 1 independently initiates a data request command to node 2, repeats the above process, and responds to the fact that data synchronization still fails after multiple initiations, node 1 generates an alarm status, and synchronizes the alarm status to node 3/4.

Furthermore, after node 1 generates an alarm, this node continues to re-initiate synchronization to node 2, trying to repair. After the response is successfully repaired, the alarm status is eliminated and recorded in the log, so as to increase the stability of data synchronization. In response to multiple attempts still failing, report to the system and disconnect node 2.

Furthermore, the stability of node 1/2/3/4 is detected during operation through heartbeat or in-position status. If any node restarts or a node is unplugged or upgraded, the synchronization flag of the node will be lost by detecting the heartbeat status. And clear the corresponding shared hardware resource data, or clear the synchronization flag and clear the corresponding shared hardware resource data through the absence status of the node. After the node works normally, re-initiate data synchronization of shared hardware resources.

Through the above scheme, data synchronization can be arbitrarily performed between multiple nodes, especially when the server needs to manage the data status of the system to provide service support for business scenarios, it is particularly important to have data from other nodes at any node, such as management The system provides key shared hardware resource data, which determines the establishment and business use of the cluster. The server needs to obtain this data during the boot process to ensure correct boot and the system can provide external services.

Through the scheme of this application, the data synchronization between any nodes is realized, and the data synchronization can be performed by broadcasting, multicasting or unicasting according to the state of the nodes, so that all nodes can save the shared hardware resource data of other nodes, ensuring The consistency of data between nodes is ensured, and the data synchronization speed is fast and the stability is good.

Based on the same application concept, according to another aspect of the present application, as shown in Figure 2, the embodiment of the present application also provides a data synchronization system between multiple nodes, including:

The sending module 110, the sending module 110 is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than the triggering preset condition in the management platform;

Synchronization module 120, the synchronization module 120 is configured to obtain the shared hardware resource data and send it to the node after the specified node receives the data request command;

Checking module 130, the checking module 130 is configured for the node to receive the shared hardware resource data sent by the specified node, and check whether the shared hardware resource data of the specified node is synchronized at the node;

The completion module 140 is configured to send a completion flag to the specified node in response to the completion of the synchronization.

Based on the same application idea, according to another aspect of the present application, as shown in FIG. 4 , an embodiment of the present application also provides a computer device 20, which includes at least one processor 210 and at least one memory 220 , the memory 220 stores computer-readable instructions 221 executable by the processor, and the processor 210 implements the following method steps when executing the computer-readable instructions 221:

The node receives the shared hardware resource data sent by the specified node, and checks whether the shared hardware resource data of the specified node is synchronized at the node;

In some embodiments, the method steps further include:

In some embodiments, the method steps further include: in response to the number of failures being greater than a preset number, the node generates an alarm status, and synchronizes the alarm status to other nodes in the management platform.

In some embodiments, the method steps further include: monitoring the connection status between nodes through heartbeat, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to loss of the heartbeat status between nodes.

In some embodiments, the method steps further include: monitoring the presence status of the node, and clearing the synchronization flag and the shared hardware resource data of the corresponding node in response to the absence of the node.

Based on the same application concept, according to another aspect of the present application, as shown in FIG. 5 , the embodiment of the present application also provides a non-volatile readable storage medium 30, which stores Computer readable instructions 310 that, when executed by a processor, perform a method of:

In some embodiments, the method further comprises:

Finally, it should be noted that those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the instructions can be stored in a non-volatile readable storage In the medium, when executed, the computer-readable instructions may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium of the computer-readable instructions may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM). The above embodiments of computer-readable instructions can achieve the same or similar effect as any of the above-mentioned method embodiments corresponding thereto.

Those of skill would also appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present application.

The above are the exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or required in an individual form, they may also be understood as plural unless explicitly limited to a singular number.

It should be understood that as used herein, the singular form "a" and "an" are intended to include the plural forms as well, unless the context clearly supports an exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The serial numbers of the embodiments disclosed in the above-mentioned embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, or by computer-readable instructions to instruct related hardware to complete. The computer-readable instructions can be stored in a non-volatile memory In the read storage medium, the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present application, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present application, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present application.

Claims

A method for synchronizing data between multiple nodes, comprising:

In response to a node in the management platform triggering a preset condition, the node sends a data request command to a designated node in the management platform, wherein the designated node is a node other than triggering the preset condition in the management platform;

After the designated node receives the data request command, it obtains the shared hardware resource data and sends it to the node;

The node receives the shared hardware resource data sent by the specified node, and checks at the node whether the shared hardware resource data of the specified node is synchronized; and

In response to completion of the synchronization, a completion flag is sent to the designated node.
The method according to claim 1, wherein the step of checking at the node whether the shared hardware resource data of the designated node is synchronized includes:

Checking at the node whether to save the shared hardware resource data of the specified node;

determining that synchronization is complete in response to saving the shared hardware resource data for the designated node; and

In response to the shared hardware resource data of the designated node not being saved, it is determined that the synchronization fails.
The method according to claim 1, further comprising:

In response to a synchronization failure, record the number of failures of the node whose synchronization failed, and determine whether the number of failures is less than a preset number of times; and

In response to the number of times of failure being less than the preset number of times, a response is returned to the node in the management platform to trigger a preset condition, so as to re-initiate synchronization to the node where synchronization failed.
The method according to claim 3, further comprising:

In response to the failure times being greater than the preset times, the node generates an alarm status and synchronizes the alarm status to other nodes in the management platform.
The method according to claim 1, further comprising:

The connection state between the nodes is monitored through the heartbeat, and the synchronization flag and the shared hardware resource data of the corresponding node are cleared in response to the loss of the heartbeat state between the nodes.
The method according to claim 1, further comprising:

Monitoring the presence status of the node, and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the absence of the node.
The method according to claim 1, wherein the trigger preset condition includes: any one of the node reset start, power-on start, and synchronization failure.
The method according to any one of claims 1-7, characterized in that, in response to a node in the management platform triggering a preset condition, the node sends a data request command step to a specified node in the management platform, comprising:

In response to the node in the management platform triggering a preset condition, the application layer module of the node acquires the data request command; and

The data synchronization module of the node sends the data request command to the designated node in a broadcast form.
The method according to any one of claims 1-7, wherein after the designated node receives the data request command, the step of obtaining shared hardware resource data and sending it to the node includes:

After receiving the data request command, the data synchronization module of the designated node transmits the data request command to the application layer module of the designated node; and

After obtaining the data request command, the application layer module of the designated node obtains the shared hardware resource data and sends it to the node.
The method according to any one of claims 1-7, wherein the shared hardware resource data includes server management and control data.
The method according to claim 10, wherein the controlled data of the server includes temperature, voltage, manufacturer, system version and power supply of the server.
The method for synchronizing data between multiple nodes according to claim 4, wherein, after the step of generating an alarm state in response to the number of failures being greater than the preset number of times, the node further includes:

The node continues to initiate synchronization of the shared hardware resource data to the designated node for repair; and

In response to successful repair, the alarm state is eliminated and recorded in a diary.
The method according to claim 6, wherein the step of monitoring the presence status of the node and clearing the synchronization flag and shared hardware resource data of the corresponding node in response to the absence of the node comprises:

monitoring whether the node is in the slot;

In response to the node being in the slot, determining the presence status of the node to be present; and

In response to the node not being in the slot, it is determined that the presence status of the node is not present.
The method according to claim 13, wherein after the step of determining that the presence status of the node is in position in response to the node being in the slot, further comprising:

Synchronization of shared hardware resource data is performed in response to the node being present.
The method according to any one of claims 1-7, wherein after the step of sending a completion flag to the specified node in response to completion of the synchronization, further comprising:

The node quits checking whether the shared hardware resource data of the designated node is synchronized, and waits for the next trigger check.
The method according to claim 3, wherein after the step of judging whether the number of failures is less than a preset number of times, further comprising:

In response to the number of times of failure being greater than the preset number of times, the node that fails to synchronize is disconnected.
A data synchronization system between multiple nodes is characterized in that it comprises:

A sending module, the sending module is configured to respond to a node in the management platform triggering a preset condition, and the node sends a data request command to a designated node in the management platform, wherein the designated node is outside the triggering preset condition in the management platform the node;

a synchronization module, the synchronization module is configured to obtain shared hardware resource data and send it to the node after the designated node receives the data request command;

A check module, the check module is configured for the node to receive the shared hardware resource data sent by the specified node, and check at the node whether the shared hardware resource data of the specified node is synchronized; and

A completion module configured to send a completion flag to the designated node in response to synchronization completion.
The system according to claim 17, wherein the checking module comprises:

The checking submodule is used to check whether to save the shared hardware resource data of the specified node at the node;

a synchronization completion submodule, configured to determine that synchronization is complete in response to saving the shared hardware resource data of the designated node; and

The synchronization failure submodule is configured to determine that the synchronization fails in response to the fact that the shared hardware resource data of the specified node is not saved.
A computer device comprising:

at least one processor; and

at least one memory for storing computer readable instructions,

It is characterized in that the at least one processor executes the computer-readable instructions to realize the steps of the method according to any one of claims 1-16.
A non-volatile readable storage medium, wherein the non-volatile readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by at least one processor, the at least one The processor executes the steps of the method according to any one of claims 1-16.