CN112214466A

CN112214466A - Distributed cluster system, data writing method, electronic equipment and storage device

Info

Publication number: CN112214466A
Application number: CN201910630685.0A
Authority: CN
Inventors: 关超; 卜辉
Original assignee: Hytera Communications Corp Ltd
Current assignee: Hytera Communications Corp Ltd
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2021-01-12

Abstract

The invention discloses a distributed cluster system, a data writing method, electronic equipment and a storage device. The distributed cluster system comprises a management node and a plurality of data nodes which are connected with each other; the management node is configured to: sending a first write-in command containing data to be written to all data nodes, and determining a normal data node in which the data to be written has been written and a fault data node in which the data to be written has not been written; sending the node information of the fault data node to the normal data node; the normal data nodes are used for: receiving a first writing instruction, and writing data to be written; and receiving node information of the fault data node, and sending a second write-in instruction containing data to be written to the fault data node based on the node information so as to indicate the fault data node receiving the second write-in instruction to write in the data to be written. By the method, the invention can avoid the over dependence of the data writing process of the distributed cluster system on the network condition so as to ensure the high availability of the distributed cluster system.

Description

Distributed cluster system, data writing method, electronic equipment and storage device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a distributed cluster system, a data writing method, an electronic device, and a storage apparatus.

Background

In the practical application of the distributed cluster, it is often necessary to write data of multiple files simultaneously to ensure consistency, and multiple backup files, multiple configuration files, and the like may be written. Even if such data is stored using an existing distributed file system or database, at least the network address of other nodes of the distributed file system or database needs to be configured in the local file of each node, and the address needs to be uniformly modified when changed.

When writing multiple files, complete and strict verification is often adopted to ensure consistency, that is, all nodes are successfully written and executed, but when a network problem occurs in an individual node, the writing cannot be successfully executed all the time. Or the mode of manually writing each file and then starting each node after configuration is finished is directly used, and the workload is overlarge during writing.

Therefore, how to solve the excessive dependence of the data writing process of the distributed cluster on the network condition is a technical problem that needs to be solved currently by those skilled in the art.

Disclosure of Invention

The technical problem mainly solved by the invention is to provide a distributed cluster system, a data writing method, electronic equipment and a storage device, so as to solve the problem of excessive dependence of the data writing process of the distributed cluster system on network conditions and ensure high availability of the distributed cluster system.

In order to solve the above technical problem, one technical solution adopted by the present invention is to provide a distributed cluster system, where the distributed cluster system includes a management node and a plurality of data nodes that are connected to each other; the management node is configured to: sending a first writing instruction containing data to be written to all the data nodes, and determining the data nodes as normal data nodes in which the data to be written is written and fault data nodes in which the data to be written is not written; sending the node information of the fault data node to the normal data node; the normal data node is used for: receiving the first writing instruction sent by the management node, and responding to the first writing instruction to write the data to be written; and receiving node information of the fault data node sent by the management node, determining the fault data node based on the node information, and sending a second write-in instruction containing data to be written to the fault data node so as to indicate the fault data node receiving the second write-in instruction to write in the data to be written.

In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a data writing method, where the method is applied to a distributed cluster system composed of a management node and a plurality of data nodes, and the method includes: the management node sends a first write-in instruction containing data to be written to all the data nodes; determining the plurality of data nodes as normal data nodes in which the data to be written has been written and fault data nodes in which the data to be written has not been written based on an execution result of the first write instruction; and sending the node information of the fault data node to the normal data node, so that the normal data node indicates the fault data node to rewrite the data to be written.

In order to solve the above technical problem, another technical solution adopted by the present invention is to provide an electronic device, which includes a communication circuit, a memory, and a processor coupled to each other; the communication circuit is used for communicating with each data node; the memory is used for storing program data; the processor executing the program data for implementing the method as described above; or the electronic device is a management node in the distributed cluster system.

In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a storage device, which stores program data that can be executed to implement the method as described above.

The invention has the beneficial effects that: different from the situation of the prior art, the distributed cluster system of the application sends a first write-in instruction containing data to be written to all data nodes through the management nodes, and determines a plurality of data nodes as normal data nodes in which the data to be written is written and fault data nodes in which the data to be written is not written, because the normal data nodes in the data nodes can receive the first write-in instruction sent by the management nodes and respond to the first write-in instruction to write the data to be written, but the fault data nodes cannot write the data to be written, the management nodes send node information of the fault data nodes to the normal data nodes, the normal data nodes receive node information of the fault data nodes sent by the management nodes, determine the fault data nodes based on the node information and send a second write-in instruction containing the data to be written to the fault data nodes, the data to be written in is written in by the fault data node receiving the second write-in instruction, so that when the data written in by the fault data node fails through the management node, the data can be written in the fault data node again through other normal data nodes, the problem that the data writing process of the distributed cluster system cannot be completed due to the fact that partial fault data nodes cannot write in data due to network gates or network faults and the like can be avoided, the problem that the data writing process of the distributed cluster system excessively depends on the network condition is solved, and the high availability of the distributed cluster system is ensured.

Drawings

Fig. 1 is a schematic structural diagram of an embodiment of a distributed cluster system provided in the present invention;

FIG. 2 is a schematic diagram of a working principle of a first application scenario of the distributed cluster system provided by the present invention;

FIG. 3 is a schematic diagram illustrating a second application scenario of the distributed cluster system provided in the present invention;

FIG. 4 is a schematic diagram illustrating an operation principle of a third application scenario of the distributed cluster system provided in the present invention;

FIG. 5 is a flowchart illustrating a data writing method according to a first embodiment of the present invention;

FIG. 6 is a flowchart illustrating a data writing method according to a second embodiment of the present invention;

FIG. 7 is a flowchart illustrating a data writing method according to a third embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention;

FIG. 9 is a schematic structural diagram of an embodiment of a memory device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a distributed cluster system according to an embodiment of the present invention. The distributed cluster system in the present application includes a management node 10 and a plurality of data nodes 12, where the management node 10 is connected to each data node 12, and each data node 12 is also connected to each other, and the number of the data nodes 12 is a plurality, for example, 4, 5, and the like, and may be set according to actual needs, which is not limited herein. In the present application, the management node 10 is configured to: sending a first write instruction containing data to be written to all the data nodes 12, and determining the data nodes 12 as normal data nodes in which the data to be written has been written and fault data nodes in which the data to be written has not been written; and sending the node information of the fault data node to the normal data node. The normal data nodes are used for: receiving a first write command sent by the management node 10, and writing data to be written in response to the first write command; and receiving node information of the failed data node sent by the management node 10, determining the failed data node based on the node information, and sending a second write instruction containing data to be written to the failed data node to indicate the failed data node receiving the second write instruction to write the data to be written.

In the data writing process of the distributed cluster system, a first writing instruction containing data to be written is sent to all data nodes 12 through the management node 10, the data nodes 12 receive the first writing instruction, and the data to be written is written in response to the first writing instruction; however, due to the failure of the data node 12 itself or the network with the management node 10, some data nodes 12 may not receive the first write command, or receive the first write command but cannot respond to the first write command, so that the data nodes 12 cannot write the data to be written, the data nodes 12 that do not write the data to be written are determined as failed data nodes, and the other data nodes 12 that can receive the first write command and can respond to the first write command to write the data to be written are determined as normal data nodes. If the failed data node cannot receive the first write command due to a network reason between the failed data node and the management node 10, the management node 10 may send node information of the failed data node to the normal data node, and after receiving the node information of the failed data node sent by the management node 10, the normal data node may determine the failed data node based on the node information and send a second write command containing data to be written to the failed data node, and if a network link between the failed data node and the normal data node is normal, the failed data node may receive the second write command and respond to the second write command to write the data to be written.

In this embodiment, when data is written into all the data nodes 12 through the management node 10, the data to be written may not be received by the first write instruction due to a gatekeeper or a network failure between a partial failure data node and the management node 10, and data may be attempted to be written into the failure data node again through other normal data nodes into which the data to be written has been written, so that it is possible to avoid that the data cannot be written into the partial failure data node due to a gatekeeper or a network failure or the like between the partial failure data node and the management node 10, and further, the data write process of the distributed cluster system cannot be completed, thereby solving the problem of excessive dependence of the data write process of the distributed cluster system on a network condition, and ensuring high availability of the distributed cluster system.

Referring to fig. 2, fig. 2 is a schematic diagram of a working principle of a first application scenario of the distributed cluster system provided in the present invention. In the application scenario, the distributed cluster system includes a management node 20, a data node 220, a data node 222, a data node 224, a data node 226, and a data node 228; when data writing is needed, a first writing instruction containing data to be written is sent to the data node 220, the data node 222, the data node 224, the data node 226 and the data node 228 through the management node 20; because the links between the data node 220, the data node 224, and the data node 226 and the management node 20 are normal, the data node 220, the data node 224, and the data node 226 can all receive the first write command, and after receiving the first write command, the data node 220, the data node 224, and the data node 226 all respond to the first write command and write data to be written, and at this time, the data node 220, the data node 224, and the data node 226 are confirmed to be normal data nodes; since a gatekeeper exists between the data node 222 and the management node 20 and a network fault exists between the data node 228 and the management node 20, neither the data node 222 nor the data node 228 can receive the first write instruction, and thus data to be written cannot be written, and at this time, the data node 222 and the data node 228 are confirmed to be a faulty data node; at this time, the management node 20 may transmit the node information of the data node 222 and the data node 228 confirmed as the failed data node to the data node 220, the data node 224 and the data node 226 confirmed as the normal data node, and after receiving the node information of the data node 222 and the data node 228 transmitted by the management node 20, the data node 220, the data node 224 and the data node 226 may determine that the data node 222 and the data node 228 are the failed data node based on the node information, so that the data node 220, the data node 224 and the data node 226 respectively transmit a second write instruction containing data to be written to the data node 222 and the data node 228; since the links between the data node 220, the data node 224, and the data node 226 and the data node 222 are all normal, the data node 222 may receive a second write instruction sent by the data node 220, the data node 224, and the data node 226, respectively, and the data node 222 may respond to the second write instruction and write data to be written; because the links between the data node 220 and the data node 226 and the data node 228 are both normal, and the link between the data node 224 and the data node 228 is failed, the data node 228 may receive a second write instruction sent by the data node 220 and the data node 226, respectively, and further the data node 228 may also respond to the second write instruction and write data to be written; to this end, data node 220, data node 222, data node 224, data node 226, and data node 228 may all successfully write the data to be written. It can be understood that, the data node 222 and the data node 228 both receive the second write instructions sent by two or more normal data nodes, at this time, the data node 222 and the data node 228 may respond to each second write instruction and write the data to be written for multiple times, and certainly, after writing the data to be written for the second time and later, the data to be written does not affect the data itself, and certainly, after the data node 222 and the data node 228 respond to the first second write instruction and write the data to be written, when receiving other second write instructions subsequently, the data to be written included in the other second write instructions may be compared with the existing data, if the data is the same as the first second write instruction, the data node does not respond to the other second write instruction, and if the data is different, the data to be written is written in response to the other second write instructions; that is, as long as one of the

data nodes

220, 224 and 226 can successfully write the data to be written into the

data node

222 or 228, the data writing into the

data node

222 or 228 is successful.

With continued reference to fig. 1, in an embodiment, after the management node 10 performs the above-mentioned sending of the node information of the failed data node to the normal data node, the management node 10 is further configured to: updating normal data nodes and fault data nodes in the data nodes 12, and judging whether the updated normal data nodes meet preset conditions; and if so, determining that the data writing of the distributed cluster system is successful.

It can be understood that, after the management node 10 sends the node information of the failed data node to the normal data node, the normal data node receives the node information of the failed data node, determines the failed data node based on the node information, and sends a second write instruction containing data to be written to the failed data node to instruct the failed data node receiving the second write instruction to write the data to be written; at this time, there may be a case where some faulty data nodes can receive the second write instruction and write data to be written, these faulty data nodes should be actually regarded as normal data nodes, and other faulty data nodes cannot receive the second write instruction, and therefore still are faulty data nodes, so the management node 10 may further update and confirm the normal data nodes and the faulty data nodes in the data nodes 12, and then determine whether the updated normal data nodes satisfy the preset condition; and when the updated normal data node does not meet the preset condition, determining that the data writing of the distributed cluster system is successful. It can be understood that, if it is determined that the data writing of the distributed cluster system is successful, the data writing process of the distributed cluster system of the present application is successful even if there still exists some data nodes 12 as failed data nodes, which ensures high availability of the distributed cluster system.

As an implementation manner, the preset condition in the present application may be that all preset important data nodes in the plurality of data nodes 12 are normal data nodes. It can be understood that, among all the data nodes 12 of the distributed cluster system, there may be data nodes 12 with high importance, and these data nodes 12 with high importance may be preset as important data nodes, that is, these data nodes 12 with high importance are preset important data nodes; it can be understood that, when data of the distributed cluster system is written, data of the data node 12 with high importance must be successfully written to indicate that the data of the distributed cluster system is successfully written, and whether data of other data nodes 12 with low importance is successfully written does not directly affect whether data of the distributed cluster system is successfully written. Therefore, when the updated normal data nodes meet the preset condition that the preset important data nodes are all normal data nodes, the normal data nodes write the data to be written, which indicates that all the preset important data nodes write the data to be written, and at this time, the data writing success of the distributed cluster system can be determined no matter whether other data nodes 12 which are not the preset important data nodes are fault data nodes.

Further, the failed data node is configured to: and after the self fault is eliminated, acquiring the current data in the preset important data node, and updating the current data into the self node. It can be understood that, when the updated normal data node satisfies the preset condition that the preset important data nodes are all normal data nodes, it is determined that the data writing of the distributed cluster system is successful, at this time, a faulty data node may exist in all the other data nodes 12 that are not the preset important data nodes, and the faulty data node at this time may be caused by a failure of its own or due to a failure of links with the management node 10 and other data nodes 12 in the distributed cluster system. It can be understood that no matter the cause of the failure of the failed data node or the condition that the links between the failed data node and the management node 10 and the other data nodes 12 have a failure, after the failure of the failed data node is eliminated, the failed data node still needs to write the data to be written into the failed data node after the failure is eliminated, at this time, the failed data node after the failure is eliminated may obtain the current data in the preset important data node and update the current data into the own node, and since the preset important data node has written the data to be written into the data, the current data in the preset important data node is updated into the own node, and it may be ensured that the failed data node after the failure is eliminated also completes the writing process of the data to be written into the data.

Referring to fig. 3, fig. 3 is a schematic diagram of a working principle of a second application scenario of the distributed cluster system provided in the present invention. In the application scenario, the distributed cluster system includes a management node 30, a data node 320, a data node 322, a data node 324, a data node 326, and a data node 328, where the data node 320 and the data node 324 are preset important data nodes, and the data node 322, the data node 326, and the data node 328 are non-important data nodes; when data writing is needed, a first writing instruction containing data to be written is sent to the data node 320, the data node 322, the data node 324, the data node 326 and the data node 328 through the management node 30; since the links between the

data nodes

320, 322, 324 and 326 and the management node 30 are normal and there is no self-failure, the

data nodes

320, 322, 324 and 326 can all receive and respond to the first write command and write the data to be written, and at this time, the

data nodes

320, 322, 324 and 326 are confirmed to be normal data nodes; since a network fault exists between the data node 328 and the management node 30 and the data node 328 itself is down, the data node 328 cannot receive the first write instruction and cannot write data to be written, and the data node 328 is determined to be a faulty data node; then, the management node 30 may send the node information of the data node 328 identified as the failed data node to the data node 320, the data node 322, the data node 324, and the data node 326 identified as the normal data nodes, and send a second write instruction containing data to be written to the data node 328 through the data node 320, the data node 322, the data node 324, and the data node 326, respectively; since the data node 328 is down, it still cannot receive the second write command sent by the data node 320, the data node 322, the data node 324, and the data node 326, respectively, and thus cannot write the data to be written. Although the data node 328 cannot write the data to be written, that is, the distributed cluster system has a failed data node, since the data node 320 and the data node 324, which are preset important data nodes, are normal data nodes and the data to be written has already been written, it may be determined that the data writing of the distributed cluster system is successful at this time. After the downtime fault of the data node 328 is eliminated, the data node 328 may obtain current data in the data node 320 and/or the data node 324 from the data node 320 and/or the data node 324 serving as a preset important data node, and update the current data in the data node 320 and/or the data node 324 to its own node, and since the data node 320 and the data node 324 have written data to be written, the data node 328 updates the current data in the data node 320 and/or the data node 324 to its own node, which may ensure that the data to be written is written.

Referring to fig. 1, as another possible implementation manner, each data node 12 of the present application is preset with a corresponding weight value, and the preset condition may be that the sum of the weight values of all normal data nodes is greater than a preset threshold. It can be understood that, in all the data nodes 12 of the distributed cluster system, the importance of each data node 12 may be different, and therefore, each data node 12 may be preset with a corresponding weight value according to the importance degree thereof; when data of the distributed cluster system is written, the data of the distributed cluster system can be successfully written only when the sum of the weighted values of all the data nodes 12 in which the data is successfully written, namely all the normal data nodes, is greater than a preset threshold value, and the data nodes 12 in which other data is not written, namely the fault data nodes, do not influence the data writing success of the distributed cluster system.

Further, the failed data node is configured to: after the self fault is eliminated, the current data in each data node 12 is obtained, the current data of the data node 12 with the same current data and the sum of the weight values larger than the preset threshold value is selected, and the selected current data is updated to the self node. It can be understood that when the updated normal data node satisfies the preset condition that the sum of the weight values of all the normal data nodes is greater than the preset threshold, it is determined that the data writing of the distributed cluster system is successful, and at this time, there may still be a failed data node, where the failed data node may be due to a failure of its own or due to a failure of links with the management node 10 and other data nodes 12 in the distributed cluster system. It can be understood that, no matter the cause of the failure itself or the link between the failed data node and the management node 10 and other data nodes 12 has a failure, after the failure of the failed data node is eliminated, the failed data node still needs to write data to be written, at this time, the failed data node after the failure is eliminated can obtain current data in each data node 12, select current data of the data nodes 12 having the same current data and the sum of weight values being greater than a preset threshold, and update the selected current data into the own node, because when several data nodes 12 have the same current data and the sum of weight values corresponding to these data nodes 12 is greater than the preset threshold, it indicates that these data nodes 12 have written data, and therefore select the current data of these data nodes 12, and the selected current data is updated to the node of the node, so that the fault data node with the fault eliminated can also complete the writing process of the data to be written.

Referring to fig. 4, fig. 4 is a schematic diagram of a working principle of a third application scenario of the distributed cluster system provided in the present invention. In the application scenario, the distributed cluster system includes a management node 40, a data node 420, a data node 422, a data node 424, a data node 426, and a data node 428, and each data node is preset with a corresponding weight value according to its importance degree, where the weight value of the data node 420 is 4.2, the weight value of the data node 422 is 3.5, the weight value of the data node 424 is 4.5, the weight value of the data node 426 is 3.7, the weight value of the data node 428 is 4.1, and a preset threshold may be set to be half of the sum of the weight values of all the data nodes, that is, the preset threshold is (4.2+3.5+4.5+3.7+4.1)/2 ═ 10; when data writing is needed, a first writing instruction containing data to be written is sent to the data node 420, the data node 422, the data node 424, the data node 426 and the data node 428 through the management node 40; since the links between the data node 422, the data node 424 and the data node 426 and the management node 40 are normal and have no self-failure, the data node 422, the data node 424 and the data node 426 can receive and respond to the first write command and write data to be written, and at this time, the data node 422, the data node 424 and the data node 426 are confirmed to be normal data nodes; since a network fault exists between each of the data node 420 and the data node 428 and the management node 40, and the data node 420 and the data node 428 are down, neither the data node 420 nor the data node 428 can receive the first write instruction, and thus data to be written cannot be written, and at this time, the data node 420 and the data node 428 are confirmed to be a faulty data node; then the management node 40 may send the node information of the data node 420 and the data node 428 confirmed as the failed data node to the data node 422, the data node 424 and the data node 426 confirmed as the normal data node, and send a second write instruction containing data to be written to the data node 428 through the data node 422, the data node 424 and the data node 426, respectively, so as to attempt to write data to the data node 420 and the data node 428; however, the

data nodes

420 and 428 are down, so that the data to be written cannot be written. Although the data node 420 and the data node 428 cannot write data to be written, that is, a failed data node exists in the distributed cluster system, since the data node 422, the data node 424 and the data node 426 are normal data nodes, the sum of the weight values of all the normal data nodes is 3.5+4.5+ 3.7-11.7, which is greater than the preset threshold 10, and it can be determined that the data writing of the distributed cluster system is successful. Then, when the downtime fault of the data node 420 is eliminated, the data node 420 may respectively acquire current data in the data node 422, the data node 424, the data node 426 and the data node 428, because the data node 428 is still in the downtime state, the data node 420 actually acquires the current data in the data node 422, the data node 424 and the data node 426, because the data node 422, the data node 424 and the data node 426 are normal data nodes and all write data to be written into, the data node 422, the data node 424 and the data node 426 have the same current data, and the sum of weight values 11.7 of the data node 422, the data node 424 and the data node 426 is greater than the preset threshold value 10, so the data node 420 may update the same current data in the data node 422, the data node 424 and the data node 426 into its own node, it can be guaranteed that the data node 420 has written the data to be written.

Continuing to refer to fig. 1, in another embodiment, the node information of the failed data node includes address information of the failed data node. It can be understood that the node information of the failed data node includes address information of the failed data node, so that after the normal data node receives the node information of the failed data node sent by the management node 10, it can be implemented that the failed data node is determined based on the node information and a second write instruction containing data to be written is sent to the failed data node.

As an implementation manner, when the management node 10 performs the above-mentioned determination of the plurality of data nodes 12 as a normal data node to which data to be written has been written and a failed data node to which data to be written has not been written, the method specifically includes: the management node 10 receives a first write success message of the data to be written, which is fed back by the data node 12, determines the data node 12 sending the first write success message as a normal data node, and determines the remaining data nodes 12 as failed data nodes. When executing the normal data node and the failed data node of the updated data nodes 12, the management node 10 specifically includes: the management node 10 receives a second successful write-in message of the data to be written, which is fed back by the failed data node, and updates the failed data node sending the second successful write-in message to a normal data node.

It is understood that, after the management node 10 sends the first write command containing the data to be written to all the data nodes 12, some of the data nodes 12 may receive the first write command and may respond to the first write command to write the data to be written, so that these data nodes 12 that successfully write the data to be written may send a first write success message to the management node 10 to feed back that they successfully write the data to be written, so that the management node 10 may determine the data node 12 that sent the first write success message as a normal data node and the remaining data nodes 12 as a failed data node. After the management node 10 sends the node information of the failed data node to the normal data node, the normal data node receives the node information of the failed data node, determines the failed data node based on the node information, and sends a second write-in instruction containing data to be written to the failed data node to indicate the failed data node receiving the second write-in instruction to write in the data to be written; at this time, a part of the data nodes 12 originally determined as the failed data node can receive the second write instruction and write the data to be written, and at this time, the part of the data nodes 12 originally determined as the failed data node can send a second write success message to the normal data node and send the second write success message to the management node 10 by the normal data node to feed back the successful write of the data to be written, so that the management node 10 can update the data nodes 12 originally determined as the failed data node, which send the second write success message, to the normal data node.

Referring to fig. 5, fig. 5 is a flowchart illustrating a data writing method according to a first embodiment of the present invention. The data writing method provided by the application is applied to a distributed cluster system consisting of a management node and a plurality of data nodes, and comprises the following steps:

s501: and the management node sends a first write instruction containing data to be written to all the data nodes.

S502: and determining the plurality of data nodes as normal data nodes in which the data to be written is written and fault data nodes in which the data to be written is not written based on the execution result of the first write instruction.

S503: and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

In this embodiment, a management node sends a first write instruction containing data to be written to all data nodes to write data to all data nodes, where some data nodes may receive the first write instruction and may respond to the first write instruction to write the data to be written, but other data nodes may not receive or respond to the first write instruction due to a failure of the other data nodes or a network failure with the management node, and based on a result of whether the first write instruction is responded to and executed, the management node may determine all data nodes as a normal data node to which the data to be written has been written and a failed data node to which the data to be written has not been written; at this time, the management node can send the node information of the failed data node to the normal data node, and then try to write data into the failed data node again through the normal data node into which the data to be written, so that the problem that the data writing process of the distributed cluster system cannot be completed due to the fact that a gateway or a network fault exists between the management node and some failed data nodes, and the like, and the data writing process of the distributed cluster system cannot be completed is solved, and the high availability of the distributed cluster system is ensured.

Referring to fig. 6, fig. 6 is a flowchart illustrating a data writing method according to a second embodiment of the present invention. The data writing method in the embodiment includes the following steps:

s601: and the management node sends a first write instruction containing data to be written to all the data nodes.

S602: and determining the plurality of data nodes as normal data nodes in which the data to be written is written and fault data nodes in which the data to be written is not written based on the execution result of the first write instruction.

S603: and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

The difference from the first embodiment of the data writing method of the present application is that the data writing method in the present embodiment further includes the steps of:

s604: and updating normal data nodes and fault data nodes in the data nodes, and judging whether the updated normal data nodes meet preset conditions. If not, the data writing of the distributed cluster system is not successful.

S605: and if so, determining that the data writing of the distributed cluster system is successful.

It can be understood that after the normal data node receives the node information of the failed data node sent by the management node, and further determines the failed data node based on the node information and sends a second write-in instruction containing data to be written to the failed data node, there may be a portion of failed data nodes that can receive the second write-in instruction and write the data to be written, and these failed data nodes should be regarded as normal data nodes in practice; therefore, in this embodiment, the management node further updates and confirms the normal data node and the faulty data node in the data nodes, and then determines whether the updated normal data node meets the preset condition; and when the updated normal data node does not meet the preset condition, determining that the data writing of the distributed cluster system is successful. Therefore, when the updated normal data nodes meet the preset conditions, even if part of the data nodes still exist as fault data nodes, the data writing process of the distributed cluster system can be determined to be successful, and the high availability of the distributed cluster system is ensured.

As an implementation manner, the preset condition may be that all preset important data nodes in the plurality of data nodes are normal data nodes. It can be understood that, when the updated normal data nodes meet the preset condition that the preset important data nodes are all normal data nodes, the normal data nodes have written the data to be written, which indicates that all the preset important data nodes have written the data to be written, and at this time, whether other data nodes which are not the preset important data nodes are fault data nodes or not can determine that the data writing of the distributed cluster system is successful.

As another possible implementation manner, each data node is preset with a corresponding weight value, and the preset condition may be that the sum of the weight values of all normal data nodes is greater than a preset threshold. It can be understood that, in all data nodes of the distributed cluster system, the importance of each data node may be different, and therefore, each data node may be preset with a corresponding weight value according to the importance degree thereof; when data of the distributed cluster system is written, the data of the distributed cluster system can be successfully written only when the sum of the weighted values of all the data nodes which are successfully written, namely all the normal data nodes, is greater than a preset threshold value, and the data nodes which are not written with other data, namely the fault data nodes, do not influence the data writing success of the distributed cluster system.

In an embodiment, the node information of the failed data node includes address information of the failed data node. It can be understood that the node information of the failed data node includes address information of the failed data node, so that after the normal data node receives the node information of the failed data node sent by the management node, the node information can be used for determining the failed data node based on the node information and sending a second write instruction containing data to be written to the failed data node.

Referring to fig. 7, fig. 7 is a flowchart illustrating a data writing method according to a third embodiment of the present invention. The data writing method in the embodiment includes the following steps:

s701: and the management node sends a first write instruction containing data to be written to all the data nodes.

S702: and receiving a first write success message of the data to be written, which is fed back by the data node, determining the data node which sends the first write success message as a normal data node, and determining the other data nodes as fault data nodes.

S703: and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

S704: and receiving a second write success message of the data to be written, which is fed back by the fault data node, updating the fault data node sending the second write success message into a normal data node, and judging whether the updated normal data node meets the preset condition.

S705: and if so, determining that the data writing of the distributed cluster system is successful.

In this embodiment, the data node that has sent the first write success message is determined as a normal data node by the management node, the other data nodes are determined as failed data nodes, and the data node that has sent the second write success message and was originally determined as a failed data node is updated to a normal data node, so that the normal data node and the failed data node in all the data nodes can be accurately confirmed, and thus when it is determined that the updated normal data node satisfies the preset condition, even if some data nodes still exist as failed data nodes, it can be determined that the data write process of the distributed cluster system is successful, and high availability of the distributed cluster system is ensured.

For the distributed cluster system composed of the management node and the plurality of data nodes, which is applied in the data writing method provided by the present application, the specific relevant content of the data writing method embodiment of the present application is referred to the detailed description in the distributed cluster system embodiment.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 80 in the present application includes a communication circuit 800, a memory 802, and a processor 804 coupled to each other; the communication circuit 800 is used for communicating with each data node; the memory 802 is used to store program data; the processor 804 executes the program data for implementing any of the data writing methods described above.

In another embodiment, the electronic device 80 of the present application is a management node 10 in the distributed cluster system.

For details of the embodiment of the electronic device 80 of the present application, please refer to the detailed description in the above embodiments of the distributed cluster system and the data writing method.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a memory device according to an embodiment of the invention. The storage device 90 in the present application stores therein program data 900, and the program data 900 can be executed to implement the data writing method as described above. The storage device 90 may be a storage chip in a server, a readable and writable storage tool such as an SD card, or a server.

In the several embodiments provided in the present application, it should be understood that the disclosed distributed cluster system and data writing method, electronic device, and storage apparatus may be implemented in other ways. For example, the above-described device architecture implementations are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A distributed cluster system, comprising a management node and a plurality of data nodes connected to each other;

the management node is configured to: sending a first writing instruction containing data to be written to all the data nodes, and determining the data nodes as normal data nodes in which the data to be written is written and fault data nodes in which the data to be written is not written; sending the node information of the fault data node to the normal data node;

the normal data node is used for: receiving the first writing instruction sent by the management node, and responding to the first writing instruction to write the data to be written; and receiving node information of the fault data node sent by the management node, determining the fault data node based on the node information, and sending a second write-in instruction containing data to be written to the fault data node so as to indicate the fault data node receiving the second write-in instruction to write in the data to be written.

2. The distributed cluster system of claim 1, wherein after performing the sending of the node information of the failed data node to the normal data node, the management node is further configured to: updating normal data nodes and fault data nodes in the data nodes, and judging whether the updated normal data nodes meet preset conditions or not; and if so, determining that the data writing of the distributed cluster system is successful.

3. The distributed clustering system of claim 2,

the preset condition is that all preset important data nodes in the plurality of data nodes are the normal data nodes.

4. The distributed clustering system of claim 3,

the failed data node is configured to: and after the self fault is eliminated, acquiring the current data in the preset important data node, and updating the current data to the self node.

5. The distributed clustering system of claim 2,

each data node is preset with a corresponding weight value; the preset condition is that the sum of the weight values of all normal data nodes is greater than a preset threshold value.

6. The distributed clustering system of claim 5,

the failed data node is configured to: and after the self fault is eliminated, acquiring the current data in each data node, selecting the current data of the data nodes with the same current data and the sum of the weight values larger than the preset threshold value, and updating the selected current data into the self node.

7. The distributed cluster system of claim 2, wherein the node information of the failed data node includes address information of the failed data node; and/or

When the management node executes the determination of the plurality of data nodes as a normal data node in which the data to be written has been written and a fault data node in which the data to be written has not been written, the management node includes:

receiving a first write success message for the data to be written, which is fed back by the data node, determining the data node sending the first write success message as the normal data node, and determining the other data nodes as the fault data node;

when the management node performs the updating of the normal data node and the failed data node, the management node includes:

and receiving a second write success message of the data to be written, which is fed back by the fault data node, and updating the fault data node sending the second write success message into the normal data node.

8. A data writing method is applied to a distributed cluster system consisting of a management node and a plurality of data nodes, and comprises the following steps:

the management node sends a first write-in instruction containing data to be written to all the data nodes;

determining the plurality of data nodes as normal data nodes in which the data to be written has been written and fault data nodes in which the data to be written has not been written based on an execution result of the first write instruction;

and sending the node information of the fault data node to the normal data node, so that the normal data node indicates the fault data node to rewrite the data to be written.

9. The data writing method according to claim 8, wherein after the sending of the node information of the failed data node to the normal data node so that the normal data node instructs the failed data node to rewrite the data to be written, the method includes:

updating normal data nodes and fault data nodes in the data nodes, and judging whether the updated normal data nodes meet preset conditions or not;

and if so, determining that the data writing of the distributed cluster system is successful.

10. The data writing method according to claim 9, wherein the predetermined condition is that all predetermined important data nodes of the plurality of data nodes are the normal data nodes.

11. The data writing method of claim 9, wherein each of the data nodes is preset with a corresponding weight value; the preset condition is that the sum of the weight values of all normal data nodes is greater than a preset threshold value.

12. The data writing method according to claim 9, wherein the node information of the failed data node includes address information of the failed data node; and/or

The determining, based on the execution result of the first write instruction, the plurality of data nodes as a normal data node to which the data to be written has been written and a faulty data node to which the data to be written has not been written specifically includes:

the updating of the normal data node and the fault data node in the data nodes specifically includes:

13. An electronic device, comprising communication circuitry, a memory, and a processor coupled to one another; the communication circuit is used for communicating with each data node; the memory is used for storing program data; the processor executing the program data for implementing the method according to any of claims 8-12;

or the electronic device is a management node in the distributed cluster system of any one of claims 1 to 7.

14. A storage device storing program data executable to implement a method according to any one of claims 8 to 12.