CN112214466B

CN112214466B - Distributed cluster system, data writing method, electronic equipment and storage device

Info

Publication number: CN112214466B
Application number: CN201910630685.0A
Authority: CN
Inventors: 关超; 卜辉
Original assignee: Hytera Communications Corp Ltd
Current assignee: Hytera Communications Corp Ltd
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2024-05-14
Anticipated expiration: 2039-07-12
Also published as: CN112214466A

Abstract

The invention discloses a distributed cluster system, a data writing method, electronic equipment and a storage device. The distributed cluster system comprises a management node and a plurality of data nodes which are connected with each other; the management node is used for: transmitting a first writing instruction containing data to be written to all data nodes, and determining normal data nodes in which the data to be written are written and fault data nodes in which the data to be written are not written; and transmitting node information of the fault data node to the normal data node; the normal data node is used for: receiving a first writing instruction, and writing data to be written; and receiving node information of the fault data node, and sending a second writing instruction containing data to be written to the fault data node based on the node information so as to indicate the fault data node receiving the second writing instruction to write the data to be written. By the method, the data writing process of the distributed cluster system can be prevented from excessively depending on network conditions, so that high availability of the distributed cluster system is ensured.

Description

Distributed cluster system, data writing method, electronic equipment and storage device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a distributed cluster system, a data writing method, an electronic device, and a storage device.

Background

In the practical application of the distributed cluster, data of multiple files is often required to be written simultaneously to ensure consistency, and the data may be written into multiple backup files, multiple configuration files, and the like. Even if such data is stored using an existing distributed file system or database, it is at least necessary to configure the network addresses of the distributed file system or other nodes of the database in the local files of each node, and this address needs to be uniformly modified when it changes.

When writing a plurality of files, in order to ensure consistency, complete and strict verification is often adopted, that is, writing of all nodes is successful, but when a network problem occurs in an individual node, writing cannot be successfully performed. Or directly writing each file manually, and starting each node after configuration is completed, wherein the workload is excessive during writing.

Therefore, how to solve the excessive dependence of the data writing process of the distributed cluster on the network situation is a technical problem that needs to be solved currently by those skilled in the art.

Disclosure of Invention

The invention mainly solves the technical problem of providing a distributed cluster system, a data writing method, electronic equipment and a storage device, so as to solve the problem of excessive dependence of the data writing process of the distributed cluster system on network conditions and ensure high availability of the distributed cluster system.

In order to solve the technical problem, the invention adopts a technical scheme that a distributed cluster system is provided, and the distributed cluster system comprises a management node and a plurality of data nodes which are connected with each other; the management node is used for: transmitting a first writing instruction containing data to be written to all the data nodes, and determining the data nodes as normal data nodes written with the data to be written and fault data nodes not written with the data to be written; and transmitting node information of the failed data node to the normal data node; the normal data node is configured to: receiving the first writing instruction sent by the management node, and responding to the first writing instruction to write the data to be written; and receiving node information of the fault data node sent by the management node, determining the fault data node based on the node information, and sending a second writing instruction containing data to be written to the fault data node so as to indicate the fault data node receiving the second writing instruction to write the data to be written.

In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a data writing method, where the method is applied to a distributed cluster system composed of a management node and a plurality of data nodes, and the method includes: the management node sends a first writing instruction containing data to be written to all the data nodes; based on the execution result of the first writing instruction, determining the plurality of data nodes as normal data nodes written with the data to be written and fault data nodes not written with the data to be written; and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

In order to solve the technical problem, another technical scheme adopted by the invention is to provide electronic equipment, wherein the electronic equipment comprises a communication circuit, a memory and a processor which are mutually coupled; the communication circuit is used for communicating with each data node; the memory is used for storing program data; the processor executing the program data for implementing the method as described above; or the electronic equipment is a management node in the distributed cluster system.

In order to solve the above-mentioned technical problem, another technical aspect adopted by the present invention is to provide a storage device, which stores program data that can be executed to implement the method as described above.

The beneficial effects of the application are as follows: in the distributed cluster system, the management node sends the first write instruction containing the data to be written to all the data nodes, and determines the plurality of data nodes as the normal data nodes which are written with the data to be written and the fault data nodes which are not written with the data to be written, because the normal data nodes in the data nodes can receive the first write instruction sent by the management node, the first write instruction is responded to write the data to be written, the fault data nodes cannot write the data to be written, the management node sends the node information of the fault data nodes to the normal data nodes, the normal data nodes receive the node information of the fault data nodes sent by the management node, and determine the fault data nodes based on the node information and send the second write instruction containing the data to be written to the fault data nodes, the fault data nodes which are received the second write the data to be written are indicated, when the fault data nodes fail to write the data through the management node, the fault data nodes can be written again through the other normal data nodes, the problem that the fault data nodes cannot be written into the fault data nodes is caused by the management node is solved, the distributed cluster system can not write the data to the fault cluster system, and the problem that the fault cluster system cannot write the fault cluster system is caused by the fault cluster system is high in the fault condition is solved.

Drawings

FIG. 1 is a schematic diagram illustrating a distributed cluster system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the working principle of a first application scenario of the distributed cluster system provided by the present invention;

fig. 3 is a schematic diagram of a working principle of a second application scenario of the distributed cluster system provided by the present invention;

Fig. 4 is a schematic diagram of a working principle of a third application scenario of the distributed cluster system provided by the present invention;

FIG. 5 is a flowchart of a first embodiment of a data writing method according to the present invention;

FIG. 6 is a flowchart of a second embodiment of a data writing method according to the present invention;

FIG. 7 is a flowchart of a third embodiment of a data writing method according to the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present invention;

fig. 9 is a schematic structural diagram of an embodiment of a memory device according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an embodiment of a distributed cluster system according to the present application. The distributed cluster system in the present application includes a management node 10 and a plurality of data nodes 12, wherein the management node 10 is connected with each data node 12, and each data node 12 is also connected with each other, and the number of the data nodes 12 is a plurality, for example, 4, 5, etc., which can be set according to actual needs, and is not limited herein. In the present application, the management node 10 is configured to: transmitting a first writing instruction containing data to be written to all the data nodes 12, and determining a plurality of data nodes 12 as normal data nodes written with the data to be written and fault data nodes not written with the data to be written; and transmitting node information of the failed data node to the normal data node. The normal data node is used for: receiving a first write instruction sent by the management node 10, responding to the first write instruction to write data to be written; and receiving node information of the fault data node sent by the management node 10, determining the fault data node based on the node information, and sending a second writing instruction containing data to be written to the fault data node to instruct the fault data node receiving the second writing instruction to write the data to be written.

In the data writing process of the distributed cluster system, a first writing instruction containing data to be written is sent to all data nodes 12 through a management node 10, the first writing instruction is received by the data nodes 12, and the data to be written is written in response to the first writing instruction; however, due to a failure of the data node 12 itself or a network reason with the management node 10, a part of the data nodes 12 may not receive the first write command or receive the first write command but cannot respond to the first write command, so that the part of the data nodes 12 cannot write the data to be written, the data nodes 12 not writing the data to be written are determined as failed data nodes, and other data nodes 12 which may receive the first write command and may respond to the first write command to write the data to be written are determined as normal data nodes. If the first write command cannot be received due to the network between the management node 10 and the failure data node, the management node 10 may send the node information of the failure data node to the normal data node, and after receiving the node information of the failure data node sent by the management node 10, the normal data node may determine the failure data node based on the node information and send a second write command containing the data to be written to the failure data node, and if the network link between the failure data node and the normal data node is normal, the failure data node may receive the second write command and respond to the second write command to write the data to be written.

In this embodiment, when data is written to all the data nodes 12 by the management node 10, the data to be written may not be written in due to the failure of the network gate or the network caused by the failure of the network gate between the data nodes and the management node 10, and the data can be written to the failed data node again through other normal data nodes in which the data to be written is already written, so that the problem that the data cannot be written in due to the failure of the network gate or the network caused by the failure of the network between the data nodes and the management node 10, and further the data writing process of the distributed cluster system cannot be completed, the problem that the data writing process of the distributed cluster system excessively depends on the network condition is solved, and the high availability of the distributed cluster system is ensured.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a working principle of a first application scenario of the distributed cluster system provided by the present invention. In the present application scenario, the distributed cluster system includes a management node 20 and a data node 220, a data node 222, a data node 224, a data node 226, and a data node 228; when data writing is needed, the management node 20 sends a first writing instruction containing data to be written to the data node 220, the data node 222, the data node 224, the data node 226 and the data node 228; since the links of the data node 220, the data node 224 and the data node 226 between the management nodes 20 are normal, the data node 220, the data node 224 and the data node 226 can each receive the first writing instruction, after receiving the first writing instruction, the data node 220, the data node 224 and the data node 226 respond to the first writing instruction and write the data to be written, and at this time, the data node 220, the data node 224 and the data node 226 are confirmed to be normal data nodes; because there is a network gate between the data node 222 and the management node 20, and there is a network failure between the data node 228 and the management node 20, neither the data node 222 nor the data node 228 can receive the first write command, and therefore the data to be written cannot be written, and at this time, the data node 222 and the data node 228 are confirmed as failed data nodes; at this time, the management node 20 may transmit the node information of the data node 222 and the data node 228, which are confirmed as the failed data node, to the data node 220, the data node 224 and the data node 226, which are confirmed as the normal data node, and the data node 220, the data node 224 and the data node 226 may determine that the data node 222 and the data node 228 are failed data nodes based on the node information after receiving the node information of the data node 222 and the data node 228 transmitted by the management node 20, so that the data node 220, the data node 224 and the data node 226 transmit the second writing instruction containing the data to be written to the data node 222 and the data node 228, respectively; since the links between the data node 220, the data node 224 and the data node 226 and the data node 222 are normal, the data node 222 may receive the second writing instruction sent by the data node 220, the data node 224 and the data node 226, respectively, and the data node 222 may respond to the second writing instruction and write the data to be written; since the links between the data node 220 and the data node 226 and the data node 228 are normal, the link between the data node 224 and the data node 228 fails, the data node 228 may receive the second write command sent by the data node 220 and the data node 226, respectively, and the data node 228 may also respond to the second write command and write the data to be written; to this end, data node 220, data node 222, data node 224, data node 226, and data node 228 may all successfully write the data to be written. It can be understood that, the data node 222 and the data node 228 both receive the second write command sent by two or more normal data nodes, and the data node 222 and the data node 228 can respond to each second write command and write the data to be written for multiple times, while the second and subsequent writing of the data to be written will not affect the data itself, or after the data node 222 and the data node 228 respond to the first and second write commands and write the data to be written, the data to be written contained in the other second write commands can be compared with the existing data of itself when the other second write commands are subsequently received, if the data to be written is the same, the other second write commands are not responded, and if the data to be written is different, the other second write commands are responded, and the data to be written is written; that is, as long as one of the data node 220, the data node 224 and the data node 226 can successfully write the data to be written to the data node 222 or the data node 228, the data writing to the data node 222 or the data node 228 is successful.

With continued reference to fig. 1, in one embodiment, after the management node 10 performs the above-mentioned sending of the node information of the failed data node to the normal data node, the management node 10 is further configured to: updating normal data nodes and fault data nodes in the data nodes 12, and judging whether the updated normal data nodes meet preset conditions or not; if yes, determining that the data of the distributed cluster system is successfully written.

It can be understood that after the management node 10 transmits the node information of the failed data node to the normal data node, the normal data node receives the node information of the failed data node, determines the failed data node based on the node information, and transmits a second write instruction containing the data to be written to the failed data node, so as to instruct the failed data node that receives the second write instruction to write the data to be written; at this time, there may be some faulty data nodes capable of receiving the second write command and writing the data to be written, and these faulty data nodes should actually be regarded as normal data nodes, while other faulty data nodes cannot receive the second write command, so they are still faulty data nodes, so the management node 10 may further update and confirm the normal data nodes and the faulty data nodes in the data nodes 12, and then determine whether the updated normal data nodes meet the preset condition; when the updated normal data node meets the preset condition, the data writing of the distributed cluster system is determined to be successful, and when the updated normal data node does not meet the preset condition, the data writing of the distributed cluster system is determined to be failed. It can be appreciated that if it is determined that the data writing of the distributed cluster system is successful, the data writing process of the distributed cluster system of the present application is successful even if there is still a part of the data nodes 12 as the failure data nodes, ensuring high availability of the distributed cluster system.

As an embodiment, the preset condition of the present application may be that all preset important data nodes in the plurality of data nodes 12 are normal data nodes. It can be understood that, among all the data nodes 12 of the distributed cluster system, there may be data nodes 12 with high importance, and these data nodes 12 with high importance may be preset as important data nodes, that is, these data nodes 12 with high importance are preset important data nodes; it can be understood that, when the data of the distributed cluster system is written, the data of the data node 12 with high importance must be written successfully, which indicates that the data of the distributed cluster system is written successfully, and whether the data of the other data nodes 12 with low importance is written successfully does not directly affect whether the data of the distributed cluster system is written successfully. Therefore, when the preset conditions met by the updated normal data nodes are all normal data nodes, since the normal data nodes have written the data to be written, it is indicated that all the preset important data nodes have written the data to be written, and whether other data nodes 12 which are not the preset important data nodes are fault data nodes or not can be determined to be successful in writing the data of the distributed cluster system.

Further, the failure data node is configured to: after the self fault is eliminated, current data in the preset important data node is acquired, and the current data is updated to the self node. It can be understood that when the preset conditions met by the updated normal data nodes are all normal data nodes, it is determined that the data writing of the distributed cluster system is successful, and at this time, there may be a case of faulty data nodes in all the data nodes 12 that are not the preset important data nodes, and the faulty data nodes at this time may be caused by self-fault reasons or due to faults in links with the management node 10 and other data nodes 12 in the distributed cluster system. It can be understood that, no matter the cause of the self-failure or the link between the failed data node and the management node 10 and other data nodes 12 has a failure, after the failure of the failed data node is removed, the failed data node still needs to write the data to be written after the failure is removed, at this time, the failed data node after the failure is removed can acquire the current data in the preset important data node and update the current data into the self-node, and because the preset important data node has already written the data to be written, the current data in the preset important data node is updated into the self-node, so that the failed data node after the failure is removed can be ensured to complete the writing process of the data to be written.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a working principle of a second application scenario of the distributed cluster system provided by the present invention. In the present application scenario, the distributed cluster system includes a management node 30 and a data node 320, a data node 322, a data node 324, a data node 326, and a data node 328, where the data node 320 and the data node 324 are preset important data nodes, and the data node 322, the data node 326, and the data node 328 are non-important data nodes; when data writing is needed, a first writing instruction containing data to be written is sent to the data node 320, the data node 322, the data node 324, the data node 326 and the data node 328 through the management node 30; since the links between the data node 320, the data node 322, the data node 324 and the data node 326 are normal and have no self-failure respectively between the management nodes 30, the data node 320, the data node 322, the data node 324 and the data node 326 can all receive and respond to the first writing instruction and write the data to be written, and at this time, the data node 320, the data node 322, the data node 324 and the data node 326 are confirmed to be normal data nodes; because there is a network failure between the data node 328 and the management node 30, and the data node 328 is down, the data node 328 cannot receive the first write command, and cannot write the data to be written, and at this time, the data node 328 confirms as a failed data node; the management node 30 may then transmit the node information of the data node 328 identified as the failed data node to the data node 320, the data node 322, the data node 324, and the data node 326 identified as the normal data node, and transmit the second write instruction containing the data to be written to the data node 328 through the data node 320, the data node 322, the data node 324, and the data node 326, respectively; since the data node 328 is down itself, it still cannot receive the second write command sent by the data node 320, the data node 322, the data node 324, and the data node 326, respectively, and thus still cannot write the data to be written. Although the data node 328 cannot write the data to be written at this time, that is, the distributed cluster system has a faulty data node, since the data node 320 and the data node 324, which are preset important data nodes, are all normal data nodes, the data to be written can be determined that the data writing of the distributed cluster system is successful at this time. After the downtime of the data node 328 is removed, the data node 328 may obtain the current data in the data node 320 and/or the data node 324 from the data node 320 and/or the data node 324 that are preset important data nodes, and update the current data in the data node 320 and/or the data node 324 to the self node, and since the data node 320 and the data node 324 have already written the data to be written, the data node 328 updates the current data in the data node 320 and/or the data node 324 to the self node, and may ensure that the data to be written is written.

With continued reference to fig. 1, as another embodiment, each data node 12 of the present application is preset with a corresponding weight value, and the preset condition may be that the sum of the weight values of all normal data nodes is greater than a preset threshold. It can be appreciated that, among all the data nodes 12 of the distributed cluster system, the importance of each data node 12 may be different, and therefore, each data node 12 may be preset with a corresponding weight value according to the importance degree thereof; when the data of the distributed cluster system is written, the data of the distributed cluster system can be written successfully only when the sum of the weight values of all the data nodes 12 with successful data writing, namely all the normal data nodes, is larger than a preset threshold value, and the data nodes 12 with other data which are not written at the moment, namely the fault data nodes, do not influence the data writing success of the distributed cluster system.

Further, the failure data node is configured to: after the own fault is removed, the current data in each data node 12 is acquired, the current data of the data nodes 12 having the same current data and the sum of the weight values is larger than the preset threshold value is selected, and the selected current data is updated to the own node. It may be appreciated that when the updated preset condition met by the normal data node is that the sum of the weight values of all the normal data nodes is greater than the preset threshold, it is determined that the data writing of the distributed cluster system is successful, and at this time, there may still be a faulty data node, where the faulty data node may be caused by a self-fault or a fault in a link with the management node 10 and other data nodes 12 in the distributed cluster system. It will be understood that, whether the failure causes itself or due to the failure of the links between the failed data node and the management node 10 and other data nodes 12, after the failure of the failed data node is removed, the failed data node still needs to write the data to be written, at this time, the failed data node after the failure is removed may acquire the current data in each data node 12, select the current data of the data nodes 12 having the same current data and the sum of the weight values is greater than the preset threshold, and update the selected current data to the self node, because when the data nodes 12 have the same current data and the sum of the weight values corresponding to the data nodes 12 is greater than the preset threshold, it is indicated that the data nodes 12 have already been written, so that the current data of the data nodes 12 are selected, and update the selected current data to the self node, so that the failed data node after the failure is removed can be guaranteed to complete the writing process of the data to be written.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a working principle of a third application scenario of the distributed cluster system provided by the present invention. In this application scenario, the distributed cluster system includes a management node 40 and a data node 420, a data node 422, a data node 424, a data node 426, and a data node 428, where each data node is pre-set with a corresponding weight value according to its importance degree, where the data node 420 has a corresponding weight value of 4.2, the data node 422 has a corresponding weight value of 3.5, the data node 424 has a corresponding weight value of 4.5, the data node 426 has a corresponding weight value of 3.7, the data node 428 has a corresponding weight value of 4.1, and the preset threshold may be set to be half of the sum of all the data nodes, i.e. the preset threshold is (4.2+3.5+4.5+3.7+4.1)/2=10; when data writing is needed, a first writing instruction containing data to be written is sent to the data node 420, the data node 422, the data node 424, the data node 426 and the data node 428 through the management node 40; since the links between the data node 422, the data node 424 and the data node 426 between the management nodes 40 are normal and have no self-failure, the data node 422, the data node 424 and the data node 426 can each receive and respond to the first writing instruction and write the data to be written, and at this time, the data node 422, the data node 424 and the data node 426 are confirmed to be normal data nodes; because network faults exist between the data node 420 and the data node 428 and the management node 40 respectively, and the data node 420 and the data node 428 are down, the data node 420 and the data node 428 cannot receive the first writing instruction, and cannot write data to be written, and at the moment, the data node 420 and the data node 428 are confirmed to be fault data nodes; the management node 40 may then send the node information of the data nodes 420, 428 identified as the failed data node to the data nodes 422, 424, 426 identified as the normal data node, and send the second write instruction containing the data to be written to the data nodes 428 via the data nodes 422, 424, 426, respectively, to attempt to write the data to the data nodes 420, 428; however, the data node 420 and the data node 428 are down, so that the data to be written cannot be written. Although the data nodes 420 and 428 cannot write the data to be written, i.e. the distributed cluster system has a faulty data node, the data nodes 422, 424 and 426 are all normal data nodes, so that the sum of the weight values of all normal data nodes is 3.5+4.5+3.7=11.7, which is greater than the preset threshold 10, so that it can be determined that the data writing of the distributed cluster system is successful. After the downtime of the data node 420 is removed, the data node 420 may acquire the current data in the data node 422, the data node 424, the data node 426 and the data node 428 respectively, and since the data node 428 is still in the downtime state, the data node 420 actually acquires the current data in the data node 422, the data node 424 and the data node 426, and since the data node 422, the data node 424 and the data node 426 are all normal data nodes, the data node 422, the data node 424 and the data node 426 all have the same current data, and the sum of the weight values 11.7 of the data node 422, the data node 424 and the data node 426 is greater than the preset threshold 10, the data node 420 may update the same current data in the data node 422, the data node 424 and the data node 426 to its own node, and the data node 420 may be guaranteed to be written with the data to be written.

With continued reference to fig. 1, in another embodiment, the node information of the failed data node includes address information of the failed data node. It may be appreciated that the node information of the failed data node includes address information of the failed data node, so that after the normal data node receives the node information of the failed data node sent by the management node 10, it may be implemented to determine the failed data node based on the node information and send a second write instruction containing data to be written to the failed data node.

As an embodiment, when the management node 10 performs the above-described determination of the plurality of data nodes 12 as the normal data node to which data to be written is written and the failed data node to which data to be written is not written, it specifically includes: the management node 10 receives a first writing success message of the data to be written fed back by the data node 12, determines the data node 12 sending the first writing success message as a normal data node, and determines the rest of the data nodes 12 as fault data nodes. When executing the normal data node and the failure data node in the update data node 12, the management node 10 specifically includes: the management node 10 receives a second writing success message of the data to be written fed back by the fault data node, and updates the fault data node sending the second writing success message to the normal data node.

It will be appreciated that after the management node 10 transmits a first write instruction containing data to be written to all the data nodes 12, part of the data nodes 12 may receive the first write instruction and may respond to the first write instruction to write the data to be written, so that the data nodes 12 that successfully write the data to be written may transmit a first write success message to the management node 10 to feed back their successful write data to be written, so that the management node 10 may determine the data node 12 that transmitted the first write success message as a normal data node and the remaining data nodes 12 as faulty data nodes. After the management node 10 sends the node information of the fault data node to the normal data node, the normal data node receives the node information of the fault data node, determines the fault data node based on the node information and sends a second writing instruction containing the data to be written to the fault data node so as to indicate the fault data node which receives the second writing instruction to write the data to be written; at this time, a part of the data nodes 12 originally determined as the faulty data node can receive the second write instruction and write the data to be written, at this time, the part of the data nodes 12 originally determined as the faulty data node may send a second write success message to the normal data node, and the normal data node sends the second write success message to the management node 10 to feed back the successful write of the data to be written, so that the management node 10 may update the data nodes 12 originally determined as the faulty data node, which have sent the second write success message, to the normal data node.

Referring to fig. 5, fig. 5 is a flowchart illustrating a first embodiment of a data writing method according to the present application. The data writing method provided by the application is applied to a distributed cluster system consisting of a management node and a plurality of data nodes, and the data writing method in the embodiment comprises the following steps:

s501: and the management node sends a first writing instruction containing data to be written to all the data nodes.

S502: and determining the plurality of data nodes as normal data nodes written with the data to be written and fault data nodes not written with the data to be written based on the execution result of the first writing instruction.

S503: and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

In this embodiment, a first write instruction including data to be written is sent to all data nodes by a management node to write data to all data nodes, where a part of the data nodes may receive the first write instruction and may respond to the first write instruction to write the data to be written, but other data nodes may not receive or respond to the first write instruction due to a failure of themselves or a network failure between the management node and the management node, and based on a result of such a response to the first write instruction and execution, the management node may determine all data nodes as normal data nodes to which the data to be written and failed data nodes to which the data to be written are not written; at this time, the management node can send the node information of the fault data node to the normal data node, and then try to write the data to the fault data node again through the normal data node which has written the data to be written, so that the problem that the data writing process of the distributed cluster system cannot be completed due to the fact that part of fault data nodes cannot write the data due to the existence of a network gate or network faults and the like between the management node and the management node is avoided, the problem that the data writing process of the distributed cluster system excessively depends on the network condition is solved, and high availability of the distributed cluster system is ensured.

Referring to fig. 6, fig. 6 is a flowchart illustrating a second embodiment of a data writing method according to the present invention. The data writing method in this embodiment includes the steps of:

s601: and the management node sends a first writing instruction containing data to be written to all the data nodes.

S602: and determining the plurality of data nodes as normal data nodes written with the data to be written and fault data nodes not written with the data to be written based on the execution result of the first writing instruction.

S603: and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

The difference from the first embodiment of the data writing method of the present application is that the data writing method in the present embodiment further includes the steps of:

s604: and updating normal data nodes and fault data nodes in the data nodes, and judging whether the updated normal data nodes meet preset conditions or not. If not, the data writing of the distributed cluster system is not successful.

S605: if yes, determining that the data of the distributed cluster system is successfully written.

It may be appreciated that after the normal data node receives the node information of the fault data node sent by the management node, and further determines the fault data node based on the node information and sends a second write instruction containing the data to be written to the fault data node, there may be a portion of the fault data nodes capable of receiving the second write instruction and writing the data to be written, and these fault data nodes should actually be regarded as the normal data node; therefore, in this embodiment, the management node further updates and confirms the normal data node and the failure data node in the data nodes, and then determines whether the updated normal data node meets the preset condition; when the updated normal data node meets the preset condition, the data writing of the distributed cluster system is determined to be successful, and when the updated normal data node does not meet the preset condition, the data writing of the distributed cluster system is determined to be failed. Therefore, when the updated normal data nodes meet the preset conditions, the data writing process of the distributed cluster system can be determined to be successful even if part of the data nodes still exist as the fault data nodes, and the high availability of the distributed cluster system is ensured.

As an implementation manner, the preset condition may be that all preset important data nodes in the plurality of data nodes are normal data nodes. It can be understood that when the preset conditions met by the updated normal data nodes are all normal data nodes, since the normal data nodes have written the data to be written, it is indicated that all the preset important data nodes have written the data to be written, and whether other data nodes which are not the preset important data nodes are fault data nodes or not can be determined to be successful in writing the data of the distributed cluster system.

As another embodiment, each data node is preset with a corresponding weight value, and the preset condition may be that the sum of the weight values of all the normal data nodes is greater than a preset threshold. It can be appreciated that, in all the data nodes of the distributed cluster system, the importance of each data node may be different, so each data node may be preset with a corresponding weight value according to the importance degree thereof; when the data of the distributed cluster system is written, the data of the distributed cluster system can be written successfully only when the sum of the weight values of all data nodes which are successfully written, namely all normal data nodes, is larger than a preset threshold value, and the data nodes which are not written with other data at the moment, namely the fault data nodes, do not influence the data writing success of the distributed cluster system.

In an embodiment, the node information of the failed data node comprises address information of the failed data node. It may be appreciated that the node information of the failed data node includes address information of the failed data node, so that after the normal data node receives the node information of the failed data node sent by the management node, it may be implemented to determine the failed data node based on the node information and send a second write instruction containing data to be written to the failed data node.

Referring to fig. 7, fig. 7 is a flowchart illustrating a third embodiment of a data writing method according to the present invention. The data writing method in this embodiment includes the steps of:

s701: and the management node sends a first writing instruction containing data to be written to all the data nodes.

S702: and receiving a first writing success message of data to be written fed back by the data node, determining the data node sending the first writing success message as a normal data node, and determining the rest data nodes as fault data nodes.

S703: and sending the node information of the fault data node to the normal data node so that the normal data node indicates the fault data node to rewrite the data to be written.

S704: and receiving a second writing success message of the data to be written fed back by the fault data node, updating the fault data node sending the second writing success message into a normal data node, and judging whether the updated normal data node meets a preset condition.

S705: if yes, determining that the data of the distributed cluster system is successfully written.

In this embodiment, the management node determines the data node that has sent the first writing success message as the normal data node, and determines the remaining data nodes as the failure data node, and updates the data node that has sent the second writing success message and is originally determined as the failure data node as the normal data node, so that the normal data node and the failure data node in all the data nodes can be accurately confirmed, and when it is determined that the updated normal data node meets the preset condition, it can be achieved that the data writing process of the distributed cluster system is successful even if some data nodes still exist as the failure data node, and high availability of the distributed cluster system is ensured.

The distributed cluster system comprising the management node and the plurality of data nodes, to which the data writing method provided by the application is applied, may be the distributed cluster system provided by the application, and for specific relevant content of the data writing method embodiment of the application, please refer to the detailed description in the distributed cluster system embodiment.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application. The electronic device 80 of the present application includes a communication circuit 800, a memory 802, and a processor 804 coupled to each other; the communication circuit 800 is used for communicating with each data node; the memory 802 is used for storing program data; the processor 804 executes the program data for implementing any of the data writing methods described above.

In another embodiment, the electronic device 80 of the present application is the management node 10 in the distributed cluster system described above.

For a specific description of an embodiment of the electronic device 80 of the present application, please refer to the above-mentioned embodiment of the distributed cluster system and the data writing method.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a memory device according to an embodiment of the application. The storage device 90 in the present application stores therein program data 900, and the program data 900 can be executed to implement the data writing method as described above. The storage device 90 may be a storage chip in a server, a tool for storing data in a readable and writable manner such as an SD card, a server, or the like.

In the embodiments of the present application, it should be understood that the disclosed distributed cluster system, the data writing method, the electronic device, and the storage device may be implemented in other manners. For example, the device architecture embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing description is only of embodiments of the present invention, and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims

1. A distributed cluster system, wherein the distributed cluster system comprises a management node and a plurality of data nodes, the management node is connected with each data node, and each data node is connected with each other;

The management node is used for: transmitting a first writing instruction containing data to be written to all the data nodes, and determining the data nodes as normal data nodes written with the data to be written and fault data nodes not written with the data to be written; and if the failed data node cannot receive the first write command due to a network between the failed data node and the management node, transmitting node information of the failed data node to the normal data node; after executing the sending the node information of the failed data node to the normal data node, so that the normal data node instructs the failed data node to rewrite the data to be written, the management node is further configured to: updating normal data nodes and fault data nodes in the data nodes, and judging whether the updated normal data nodes meet preset conditions or not; if yes, determining that the data of the distributed cluster system is successfully written; when the data of the distributed cluster system is successfully written, the updated data nodes comprise partial fault data nodes or all the data nodes are normal data nodes;

The normal data node is configured to: receiving the first writing instruction sent by the management node, and responding to the first writing instruction to write the data to be written; and receiving node information of the fault data node sent by the management node, determining the fault data node based on the node information, and sending a second writing instruction containing data to be written to the fault data node so as to indicate the fault data node receiving the second writing instruction to write the data to be written.

2. The distributed cluster system of claim 1, wherein,

The preset condition is that preset important data nodes in the plurality of data nodes are all normal data nodes.

3. The distributed cluster system of claim 2, wherein,

The fault data node is configured to: after the self fault is eliminated, current data in the preset important data node is acquired, and the current data is updated to the self node.

4. The distributed cluster system of claim 1, wherein,

Each data node is preset with a corresponding weight value; the preset condition is that the sum of the weight values of all normal data nodes is larger than a preset threshold value.

5. The distributed cluster system of claim 4, wherein,

The fault data node is configured to: after the fault of the self is eliminated, current data in each data node is obtained, the current data of the data nodes with the same current data and the weight value sum larger than the preset threshold value are selected, and the selected current data are updated to the self nodes.

6. The distributed cluster system of claim 1, wherein the node information of the failed data node comprises address information of the failed data node; and/or

The management node, when executing the normal data node which determines the plurality of data nodes as having written the data to be written and the fault data node which has not written the data to be written, includes:

receiving a first writing success message of the data to be written fed back by the data node, determining the data node sending the first writing success message as the normal data node, and determining the rest data nodes as the fault data nodes;

the management node, when executing the updating of the normal data node and the failure data node in the data nodes, includes:

And receiving a second writing success message of the data to be written fed back by the fault data node, and updating the fault data node sending the second writing success message into the normal data node.

7. A data writing method, wherein the method is applied to a distributed cluster system composed of a management node and a plurality of data nodes, the management node and each data node are connected with each other, and each data node is connected with each other, the method comprises:

the management node sends a first writing instruction containing data to be written to all the data nodes;

Based on the execution result of the first writing instruction, determining the plurality of data nodes as normal data nodes written with the data to be written and fault data nodes not written with the data to be written;

if the fault data node cannot receive the first writing instruction due to the network between the fault data node and the management node, node information of the fault data node is sent to the normal data node, so that the normal data node indicates the fault data node to rewrite the data to be written;

Updating normal data nodes and fault data nodes in the data nodes, and judging whether the updated normal data nodes meet preset conditions or not; if yes, determining that the data of the distributed cluster system is successfully written; and when the data of the distributed cluster system is successfully written, the updated data nodes comprise partial fault data nodes or all normal data nodes.

8. The data writing method of claim 7, wherein the preset condition is that all preset important data nodes in the plurality of data nodes are normal data nodes.

9. The data writing method according to claim 7, wherein each of the data nodes is preset with a corresponding weight value; the preset condition is that the sum of the weight values of all normal data nodes is larger than a preset threshold value.

10. The data writing method of claim 7, wherein the node information of the failed data node includes address information of the failed data node; and/or

The determining, based on the execution result of the first writing instruction, the plurality of data nodes as a normal data node to which the data to be written is written and a fault data node to which the data to be written is not written specifically includes:

the updating of the normal data node and the fault data node in the data nodes specifically comprises the following steps:

11. An electronic device comprising a communication circuit, a memory, and a processor coupled to each other; the communication circuit is used for communicating with each data node; the memory is used for storing program data; the processor executing the program data for implementing the method of any one of claims 7-10;

Or the electronic device is a management node in the distributed cluster system of any of claims 1 to 6.

12. A storage device, characterized in that program data are stored, which program data are executable to implement the method according to any of claims 7-10.