CN110955382A - Method and device for writing data in distributed system - Google Patents

Method and device for writing data in distributed system Download PDF

Info

Publication number
CN110955382A
CN110955382A CN201811124586.7A CN201811124586A CN110955382A CN 110955382 A CN110955382 A CN 110955382A CN 201811124586 A CN201811124586 A CN 201811124586A CN 110955382 A CN110955382 A CN 110955382A
Authority
CN
China
Prior art keywords
node
data
standby
standby node
master node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811124586.7A
Other languages
Chinese (zh)
Inventor
叶小杰
陈家强
江舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811124586.7A priority Critical patent/CN110955382A/en
Publication of CN110955382A publication Critical patent/CN110955382A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

When requesting to write data in the distributed system, the main node can temporarily ignore the feedback of the standby nodes by detecting whether the performance of each standby node meets the requirement or not for the standby nodes with lower performance, and only waits for the successful write feedback of the standby nodes with the performance meeting the requirement, namely, as long as the standby nodes with the performance meeting the requirement return response messages, the main node can feed back the successful write to the terminal The problem of low efficiency is solved, and therefore the data writing efficiency of the distributed system is improved.

Description

Method and device for writing data in distributed system
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for writing data in a distributed system.
Background
A distributed system is a high performance system that can provide multi-copy fault tolerance. Typically, strong consistency is maintained between multiple copies of data in a distributed system. When receiving a write data request, the distributed system feeds back the write success if all the nodes of the copies (i.e., all the standby nodes) are successfully written. However, due to high occupancy rate of a Central Processing Unit (CPU) by a disk-brushing operation, network jitter, and the like, some nodes may have performance degradation in a short time and cannot write data in time, and a delay time for data writing on these nodes is prolonged. And because the data writing time on the nodes is too long, the delay time of the distributed system for the successful feedback writing of the data writing request is prolonged, so that the data writing efficiency of the distributed system is low.
Disclosure of Invention
The embodiment of the application provides a method and a device for writing data in a distributed system, so as to solve the problem of overlong data writing delay time of the distributed system caused by performance jitter of individual nodes, and thus improve the data writing efficiency of the distributed system.
In a first aspect, a method for writing data in a distributed system is provided, which specifically includes: the method comprises the steps that a main node receives a data writing request, wherein the data writing request comprises data to be written; the master node writes the data to be written into a first standby node and a second standby node respectively; when the master node determines that the performance of the first standby node is lower than a set performance threshold, ignoring whether the first standby node sends a first response message that the data to be written is successfully written to the master node; and the main node feeds back successful writing to the terminal when receiving a second response message which is sent by the second standby node and in which the data to be written is successfully written. Therefore, the standby nodes needing to ignore feedback and waiting for feedback are determined through performance detection, and the main node can effectively avoid the problems of long time and low efficiency of successful feedback writing of the whole distributed system caused by slow feedback of the standby nodes with lower performance, so that the data writing efficiency of the distributed system is improved.
In a possible implementation manner of the first aspect, the write data request further includes a logical address of the data to be written; then, the method further comprises: the main node determines a partition corresponding to the data to be written according to the logic address; and the main node determines the main node, the first standby node and the second standby node according to the partition and the corresponding relation between the partition and the main node stored in the metadata server. Therefore, the main node and the standby node into which the data to be written are written can be determined through the corresponding relation between the partition and each node pre-stored on the metadata server, and the efficiency and the accuracy of data writing operation are improved.
In some possible implementations, the method further includes: and the master node records the state of the first standby node as a temporary failure state, wherein the temporary failure state is used for indicating whether the master node ignores whether the first standby node is successfully written or not when executing the data writing request. Therefore, whether the master node needs to ignore the feedback of the standby node or not is indicated in a state marking mode, the performance of each standby node does not need to be detected when data is written each time, and the data writing efficiency is effectively improved.
In a specific implementation, after the master node determines that the performance of the first standby node is lower than a threshold, the method further includes: when detecting that the performance of the first standby node reaches the set performance threshold, the master node records the state of the first standby node as an incomplete normal state, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a data writing request. Therefore, whether the master node needs to ignore the feedback of the standby node or not is indicated in a state marking mode, and the possibility of losing data of the standby node is indicated, so that the data writing efficiency is effectively improved, and the reliability of data storage in the distributed system is improved.
In some possible implementations, the method further includes: when the state of the first standby node is in an incomplete normal state, the main node sends the data lost by the first standby node to the first standby node; after the first standby node receives the lost data, the main node judges whether the data stored in the first standby node is consistent with the data stored in the main node; when the data stored in the first standby node is consistent with the data stored in the main node, the main node modifies the state of the first standby node into a normal state, the normal state is used for indicating that the data stored in the first standby node is consistent with the data stored in the main node, and the main node needs to determine whether the first standby node is successfully written in when executing a data writing request. Therefore, the backup node which experiences performance abnormity and recovers can be effectively synchronized with the data stored in the main node, the data writing efficiency is effectively improved, and the reliability of data storage in the distributed system is improved.
In a second aspect, there is also provided an apparatus for writing data in a distributed system, including: the device comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a data writing request by a main node, and the data writing request comprises data to be written; the writing unit is used for respectively writing the data to be written into a first standby node and a second standby node by the main node; a first determining unit, configured to ignore that, when determining that the performance of the first standby node is lower than a set performance threshold, the master node sends, to the master node, a first response message indicating that the data to be written is successfully written; and the feedback unit is used for feeding back the successful writing to the terminal when the master node receives a second response message which is sent by the second standby node and in which the data to be written is successfully written.
In a possible implementation manner of the second aspect, the write data request further includes a logical address of the data to be written; then, the apparatus further comprises: a second determining unit, configured to determine, by the master node, a partition corresponding to the data to be written according to the logical address; a third determining unit, configured to determine, by the master node, the first standby node, and the second standby node according to the partition and through a correspondence between the partition and the master node, where the correspondence is stored in the metadata server.
In some possible implementations, the apparatus further includes: a first recording unit, configured to record, by the master node, a state of the first standby node as a temporary failure state, where the temporary failure state is used to indicate that the master node ignores whether writing of the first standby node is successful when executing the write data request.
In a specific implementation, the apparatus further includes: a second recording unit, configured to record, after the master node determines that the performance of the first standby node is lower than a threshold, the state of the first standby node as an incomplete normal state when the master node detects that the performance of the first standby node reaches the set performance threshold, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
In some possible implementations, the apparatus further includes: a sending unit, configured to send, by the master node, data lost by the first standby node to the first standby node when the state of the first standby node is in an incomplete normal state; a determining unit, configured to, after the first standby node receives the lost data, the master node determine whether data stored in the first standby node is consistent with data stored in the master node; a state modifying unit, configured to, when the data stored in the first standby node is consistent with the data stored in the master node, modify, by the master node, the state of the first standby node into a normal state, where the normal state is used to indicate that the data stored in the first standby node is consistent with the data stored in the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
In a third aspect, an apparatus for writing data in a distributed system is further provided, where the apparatus includes at least one connected processor and a memory, where the memory is used to store program codes, and the processor is used to call the program codes in the memory to execute the method for writing data in a distributed system provided in the first aspect.
In a fourth aspect, a system for writing data in a distributed system is further provided, and includes a master node, a first standby node, and a second standby node; the master node is configured to execute the method for writing data in the distributed system provided in the first method.
In a fifth aspect, embodiments of the present application further provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method for writing data in a distributed system provided in the first aspect.
In a sixth aspect, embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the method for writing data in a distributed system provided in the first aspect.
In the embodiment of the application, when data needs to be written in a distributed system comprising a main node and a plurality of standby nodes, the main node firstly receives a data writing request carrying the data to be written, and then the main node respectively writes the data to be written in the corresponding standby nodes; at this time, the master node may determine the standby node with the performance lower than the set performance threshold as the first standby node that can ignore the feedback by judging that the performance of each standby node is lower than the set performance threshold, and determine the standby node with the performance not lower than the set performance threshold as the second standby node that needs to wait for the feedback, so that the master node only needs to wait for receiving a response message indicating that the writing is successful, which is sent by the second standby node, and may feed back the writing success to the terminal regardless of whether the response message fed back by the first standby node is received. It can be seen that when requesting to write data in the distributed system, the master node may temporarily ignore the feedback of the standby nodes for lower performance (for example, too slow data writing speed) by detecting whether the performance of each standby node meets the requirements, and only wait for successful write feedback of the standby nodes with the performance meeting the requirements, that is, as long as the standby nodes with the performance meeting the requirements return response messages, the master node may feed back successful write to the terminal, compared with a distributed system data writing manner in which the master node waits for all the standby nodes to feed back successful write response messages to the terminal no matter whether the standby nodes with poor performance exist, the master node can effectively avoid the problems of long time and low efficiency of successful feedback write feedback of the whole distributed system due to slow feedback of the standby nodes with lower performance, therefore, the data writing efficiency of the distributed system is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic diagram of a network system framework involved in an application scenario in an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for writing data in a distributed system according to an embodiment of the present disclosure;
fig. 3 is a flowchart of a method for determining a master node and a slave node in an embodiment of the present application;
fig. 4 is a schematic diagram of a network system framework involved in another application scenario in the embodiment of the present application;
FIG. 5 is a flowchart illustrating a data synchronization method according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating an example of a method for writing data in a distributed system according to an embodiment of the present application;
FIG. 7 is a block diagram illustrating an example of a method for writing data in a distributed system according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of another example of a method for writing data in a distributed system according to an embodiment of the present application;
FIG. 9 is a block diagram illustrating a further example of a method for writing data in a distributed system according to an embodiment of the present invention;
FIG. 10 is a block diagram of another example of a method for writing data in a distributed system according to an embodiment of the present application;
FIG. 11 is a schematic structural diagram of an apparatus for writing data in a distributed system according to an embodiment of the present application;
FIG. 12 is a schematic structural diagram of a system for writing data in a distributed system according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an apparatus for writing data in a distributed system in an embodiment of the present application.
Detailed Description
In order to ensure the reliability of data requested to be written by a terminal, the distributed system generally needs to write data to be written into a master node and write the data to be written into a plurality of standby nodes of the master node. At present, the method of maintaining strong consistency is adopted in the data writing method in the distributed system, that is, the data is required to be written successfully on each node, and then the request for writing data is completed, and the writing success is fed back to the corresponding terminal.
Fig. 1 is a schematic diagram of a process for writing data in a specific distributed system. As shown in fig. 1, the scenario may specifically include: a distributed system 10 and a terminal 20, wherein the distributed system 10 comprises a main node 100, a standby node 101 and a standby node 102. To maintain strong consistency, the specific process of writing data may be: firstly, the terminal 20 sends a data writing request to the master node 100 in the distributed system 10, wherein the data writing request carries data x to be written; the master node 100 writes data x on itself, and writes the data x into the standby node 101 and the standby node 102, respectively; then, the standby node 101 feeds back a response message a to the master node 100 after the data x is successfully written, the standby node 102 feeds back a response message b to the master node 100 after the data x is successfully written, and only when the master node 100 receives the response message a and the response message b, the data x is regarded as successfully written in the distributed system 10, so that the successful writing is triggered to be fed back to the terminal 20.
However, the operations such as disk refreshing often occupy a large amount of resources of the CPU, and even cause network jitter, so that some nodes in the distributed system may have performance degradation in a short time, and at this time, once data needs to be written on the performance-degraded nodes, for example: if the performance of the standby node 101 in fig. 1 is degraded, the time required for the master node 100 to write data x on the standby node 101 is far longer than the time required for the master node 100 to write data x on the standby node 102 with normal performance, and then the master node 100 waits for a long time to receive the feedback of the standby node 101 after receiving the feedback of the standby node 102, so as to inform the terminal 20 that the data x is successfully written at this time. That is, since the write efficiency of the standby node with reduced performance is low, it is difficult to complete data writing quickly and efficiently.
Based on this, an embodiment of the present application provides a method for writing data in a distributed system, where when a master node writes data to be written into a plurality of standby nodes corresponding to the master node, the master node may determine a standby node that needs to wait for a feedback response message (that is, a standby node whose performance is not lower than a set performance threshold) by judging whether performance of each standby node is lower than the set performance threshold, and trigger the master node to successfully feed back the data to a terminal by using the feedback response message of the standby node that needs to wait for the feedback response message; and ignoring whether a standby node that does not meet the set performance threshold (i.e., a standby node with performance below the set performance threshold) feeds back a response message. Therefore, even if the performance of some standby nodes is reduced due to the problems of network jitter and the like, the master node can determine the standby nodes with the reduced performance through judgment, so that the feedback of the standby nodes can be temporarily ignored, and only the feedback of the standby nodes with good performance is considered, so that the problems of long time and underground efficiency of successful feedback writing of the whole distributed system due to slow feedback of the standby nodes with low performance are effectively solved, and the data writing efficiency of the distributed system is improved.
For example, still taking the scenario shown in fig. 1 as an example, assuming that the terminal 20 is to write data x to the distributed system 10 including the master node 100, the standby node 101, and the standby node 102, the method provided in the embodiment of the present application may specifically be: firstly, the terminal 20 sends a data writing request to the master node 100 in the distributed system 10, wherein the data writing request carries data x to be written; the master node 100 writes data x on itself, and writes the data x into the standby node 101 and the standby node 102, respectively; then, the master node 100 compares the performance of the standby node 101 and the performance of the standby node 102 with the set performance threshold, and finds that the performance of the standby node 101 is lower than the set performance threshold, but the performance of the standby node 102 is not lower than the set performance threshold, ignores whether the standby node 101 sends a response message a indicating that the data x is successfully written to the master node 100, and as long as a response message b fed back to the master node 100 by the standby node 102 is received, the master node 100 can feed back the success of writing to the terminal 20.
It can be understood that, in the process of writing data x, it takes 10 milliseconds for the dummy device node 101 to successfully write data x due to network jitter; the standby node 102 with normal performance can successfully write the data x only in 1 millisecond, and then the master node 100 can feed back successful writing to the terminal 10 at least in 10 milliseconds by adopting a method for keeping strong consistency; however, by adopting the method provided by the embodiment of the present application, regardless of whether the backup node 101 feeds back, the master node 100 can feed back success of writing to the terminal 10 only after 1 millisecond, which greatly improves the efficiency of writing data.
It should be noted that the master node 100, the standby node 101, and the standby node 102 in the distributed system 10 may be deployed in three mutually independent devices; any two nodes can be deployed in the same device, and the rest other nodes can be deployed in independent devices; all three nodes may also be deployed in the same device, which is not specifically limited in this embodiment of the present application. The terminal 20 may be any terminal that can communicate with the distributed system 10, and in particular, the terminal 20 may write data in the distributed system 10, and in some examples, the terminal 20 may specifically be: smart phones, laptop personal computers, desktop personal computers, tablet computers, and the like.
It is to be understood that the above scenario is only one example of a scenario provided in the embodiment of the present application, and the embodiment of the present application is not limited to this scenario.
A specific implementation manner of a method and an apparatus for writing data in a distributed system in the embodiments of the present application is described in detail below with reference to the accompanying drawings.
Referring to fig. 2, a flowchart of a method for writing data in a distributed system in an embodiment of the present application is shown. The data writing method provided in this embodiment may be applied to the scenario corresponding to fig. 1, for example, and then the data writing method in the distributed system may specifically include:
step 201, a master node receives a data writing request, where the data writing request includes data to be written.
It is understood that the data writing request may be a message sent by the terminal to the distributed system, and is used to request that data to be written is written in the distributed system. The data to be written may be carried in the write data request, and the data to be written is obtained by analyzing the write data request.
As an example, the data writing request may be directly sent by the terminal to the master node of the distributed system, and then, when receiving the data writing request, the master node may parse the data writing request, obtain the data to be written, and then store the data to be written into the master node.
As another example, the data writing request may also be sent by a terminal to any node of the distributed system, and then the node that receives the data writing request may acquire the master node corresponding to the data writing request and send the data writing request to the master node, and then the master node may obtain data to be written by parsing the data writing request and store the data to be written in the master node.
In some possible implementation manners, the data writing request may further include a logical address of the data to be written, and the logical address is used to determine the master node and the standby node, where the data to be written needs to be written correspondingly. Then, for the distributed system, as shown in fig. 3, the determining the mode of the master node and the standby node to which the data to be written in the data write request needs to be written may specifically include:
step 301, determining, by a node in the distributed system, a partition corresponding to the data to be written according to the logical address.
It should be understood that the node in the distributed system refers to a node in the distributed system that receives a data writing request sent by a terminal, for example: the nodes in the distributed system may be master nodes, for example: the nodes in the distributed system may also be other nodes in the distributed system.
In specific implementation, the nodes in the distributed system may first obtain the logical address and the data to be written by analyzing the write data request, and then determine the partition corresponding to the data to be written according to the logical address. As an example, the determining, according to the logical address, a partition mode corresponding to the data to be written may specifically be: and performing hash calculation on the logical address by the node in the distributed system, and taking an obtained calculation result as a partition corresponding to the data to be written.
Step 302, a node in the distributed system determines the master node, the first standby node, and the second standby node according to the partition through the correspondence between the partition and the master node stored on the metadata server.
It is understood that, when storing data to be written, the data to be written is generally stored in the primary node and the corresponding plurality of standby nodes respectively for the sake of reliability of the stored data. It should be noted that, for a piece of data, data to be written needs to be written in the piece of data, and after determining corresponding partitions according to a logical address, the partitions all correspond to a fixed master node and a fixed standby node and are used for storing the data to be written.
In this embodiment of the present application, the corresponding relationship between the partition and the master node may be stored on the metadata server, and then, after the node in the distributed system obtains the partition according to step 301, the master node corresponding to the partition and the corresponding first standby node and second standby node may be determined according to the corresponding relationship between the partition and the master node.
The metadata server refers to a server having a function of storing a correspondence between a partition and a master node, and may specifically be a dedicated server dedicated for storing a correspondence between a partition and a master node, or may be another server having a function of storing a correspondence between a partition and a master node, which is not specifically limited in the embodiment of the present application.
In a specific implementation, in one case, when the node in the distributed system is the master node determined in step 302, step 202 may be directly performed, that is, the master node writes data to be written into the determined first standby node and second standby node, respectively; in another case, when the node in the distributed system is not a master node, the embodiment of the present application may further include: the nodes in the distributed system send the write data request received from the terminal to the corresponding master node, which then executes the embodiment shown in fig. 2.
For example, the embodiment of the present application is applied to a scenario as shown in fig. 4, where the scenario includes a metadata server 410, a terminal 420, and a distributed system 430. The metadata server 410 stores therein a Partition of each piece of data and a Partition View of a correspondence relationship between a master node, a first standby node, and a second standby node, for example: one PartitionView may include: partition-master node-first standby node-second standby node.
As an implementation manner, if a node in the distributed system is a master node, the process of determining each node storing data specifically includes: when the terminal 420 needs to write data, the metadata server 410 may send a Partition View corresponding to the data to the terminal 420; when the terminal 420 receives the Partition View, the corresponding Partition h (y) can be determined according to the logical address y, and the master node corresponding to h (y) is searched in the Partition View and determined as the master node 431; then, the terminal 420 may send a write data request carrying the data to the determined master node 431; the metadata server 410 may further send the Partition View corresponding to the data to the master node 431, where the master node 431 calculates a corresponding Partition h (y) according to a logical address y in the received write data request, and determines that the master node 431 is the master node storing the data x and the corresponding standby nodes are the first standby node 432 and the second standby node 433 according to the Partition h (y) and the received Partition View.
As another implementation, if the node in the distributed system is not the master node, the process of determining each node storing data specifically includes: when the terminal 420 needs to write data, the terminal 420 may send a data writing request corresponding to the data to any node 43N in the distributed system 430, and at this time, the metadata server 410 may send a Partition View corresponding to the data to the node 43N; when the node 43N receives the Partition View, it may determine the corresponding Partition h (y) according to the logical address y, and search the Partition View for the master node corresponding to h (y) to determine the master node 431; then, the node 43N may send a write data request carrying the data to the determined master node 431; the metadata server 410 may further send the Partition View corresponding to the data to the master node 431, where the master node 431 calculates a corresponding Partition h (y) according to a logical address y in the received write data request, and determines that the master node 431 is the master node storing the data x and the corresponding standby nodes are the first standby node 432 and the second standby node 433 according to the Partition h (y) and the received Partition View.
It should be noted that the metadata server 410 may be a functional node located in the distributed system 430 and having a function of storing a correspondence between partitions and a master node, and specifically may be a server device separately deployed independently from the master node 431, the first standby node 432, and the second standby node 433, or may be deployed in the same device with any one of the nodes, or may be deployed in the same device with any two of the nodes, or may be deployed in the same device with all of the three nodes, which is not specifically limited in this embodiment of the present application.
Therefore, the main node is determined, and the corresponding standby nodes are determined after the main node receives the data writing request, so that preparation is made for writing data in the distributed system.
Step 202, the master node writes the data to be written into the first standby node and the second standby node respectively.
In specific implementation, when the master node receives a data writing request, the data to be written is analyzed from the data writing request, and it is determined that the standby nodes corresponding to the data to be written stored by the master node are the first standby node and the second standby node, the data to be written can be directly and respectively sent to the first standby node and the second standby node, and the first standby node and the second standby node are instructed to store the data to be written.
Step 203, when determining that the performance of the first standby node is lower than a set performance threshold, the master node ignores whether the first standby node sends a first response message that the data to be written is successfully written to the master node;
and 204, when receiving a second response message that the data to be written is successfully written and sent by the second standby node, the master node feeds back the success of writing to the terminal.
It is understood that each node in the distributed system may detect the performance of the node when writing data, and the performance may represent the efficiency of the node in writing data, for example, the performance may be the delay time of the node in writing data, and for example, the performance may also be the speed of the node in writing data.
It is understood that the performance threshold may be a minimum allowable value that is preset and represents the performance of the non-ignored standby node. When detecting that the performance of the standby node is lower than the set performance threshold, indicating that the current performance of the standby node is poor, and needing to ignore whether the standby node sends a response message that the data to be written is successfully written to the host node; when detecting that the performance of the standby node is not lower than the set performance threshold, the current performance of the standby node is good, and the standby node needs to wait for sending a response message that the data to be written is successfully written to the host node.
In a specific implementation, a standby node with a performance lower than a set performance threshold is referred to as a first standby node, where the first standby node is a standby node with a poor performance, and may specifically be a standby node with slow data writing in a short time due to network jitter and other problems. It should be noted that the first standby node may refer to one node or may refer to multiple nodes, and is not specifically limited in this embodiment. Conversely, a standby node with performance not lower than the set performance threshold is marked as a second standby node, and the performance of the second standby node is good, for example, writing data is fast. It should be noted that the second standby node may refer to one node or multiple nodes, and is not specifically limited in this embodiment.
After the master node writes the data to be written into the first standby node, and when the first standby node successfully writes the data into the first standby node, the first standby node may generate a first response message and send the first response message to the master node, so as to inform the master node that the data to be written is successfully written into the first standby node. Since the performance of the first standby node is lower than the set performance threshold, it is likely that the data is written slowly on the first standby node, and a long time is required for generating the first response message indicating that the data to be written is successfully written on the first standby node. Then, the master node may ignore whether the first response message is received from the first standby node, that is, the master node no longer needs to pay attention to whether the first standby node is successfully written, and whether the first standby node is successfully written does not affect that the master node feeds back the success of writing to the terminal. In addition, after the master node writes the data to be written into the second standby node, when the second standby node successfully writes, the second standby node may generate a second response message, and send the second response message to the master node, so as to inform the master node that the data to be written is successfully written into the second standby node. If a plurality of second standby nodes exist, the master node may feed back success in writing to the terminal only when receiving the second response messages fed back by all the second standby nodes.
It should be noted that the feedback of successful writing may be directly sent by the master node to the terminal, or may be fed back to the terminal by the master node through other nodes in the distributed system.
In some possible implementations, the master node may record a first standby node with a performance lower than a set performance threshold as a temporary failure state, where the temporary failure state is used to instruct the master node to determine that the performance of the first standby node is lower than the set performance threshold, and further, to instruct the master node to ignore whether the first standby node is successfully written when executing the write data request. Then, step 203 may specifically be: when the master node detects that the state of the first standby node is a temporary failure state, the master node determines that the performance of the first standby node is lower than a set performance threshold, and ignores whether the first standby node sends a first response message that the data to be written is successfully written to the master node.
It will be appreciated that, between step 203, the master node may also: detecting the performance of each standby node, and judging whether the performance of each standby node is lower than a set performance threshold value; and recording the standby node with the performance lower than the set performance threshold as a first standby node, and marking the first standby node as a temporary failure state.
For example, the set performance threshold is specifically a time threshold, and then when the data to be written all need to be written into the same data entry, the master node may detect a write-in duration of each standby node storing each data to be written.
As an example, in a certain data writing request, the master node may obtain the writing duration of each standby node when data is written last n times, calculate the average writing duration of each standby node, and determine the standby node with a larger average writing duration as the standby node with a performance lower than the set performance threshold when the difference between the average writing durations is greater than the time threshold; and when the difference value of the average writing duration is not more than the time threshold, determining that all the standby nodes with the performance not lower than the set performance threshold. For example: assuming that the time threshold is 300 μ s, and n is 4, including the standby node 1, the standby node 2, and the standby node 3, the write duration of the standby node 1 for the last 4 times may be obtained: t is11、T12、T13And T14The latest 4 times of writing duration of the standby node 2: t is21、T22、T23And T24And the write duration of the last 4 times of the backup node 3: t is31、T32、T33And T34Calculating the average writing time length of the standby node 1 as T1=(T11+T12+T13+T14) 4, average writing time length of backup node 2 is T2=(T21+T22+T23+T24) 4 and the average write duration of the spare node 3 is T3=(T31+T32+T33+T34) 4, div of a division; if T is1-T2If it is more than 300 microseconds, T is determined1The corresponding standby node 1 is a standby node with the performance lower than the set performance threshold value if T2-T3If the time is less than or equal to 300 microseconds, determining T2And T3The corresponding standby nodes 2 and 3 are standby nodes with performance not lower than a set performance threshold.
It is understood that the first standby node with performance lower than the set performance threshold may be marked by the master node as a temporary failure state due to poor performance in order not to affect the overall data writing performance (especially the data writing efficiency) of the distributed system. For example, for the above example including standby node 1, standby node 2, and standby node 3, the master node may mark standby node 1 with performance below a set performance threshold as a temporary failure state.
The temporary failure state is used for indicating the main node to determine that the performance of the first standby node is lower than a set performance threshold. In a specific implementation, when the master node receives a data writing request, as long as the first standby node marked as a temporary failure state is found, it may be determined that the performance of the first standby node is lower than a set performance threshold, that is, the performance of the first standby node is poor. It is understood that, during the period when the first standby node is marked as a temporary failure state, after the master node executes the write data request to write the data to be written to the first standby node, the master node may ignore whether the write of the first standby node is successful, that is, the master node does not need to consider whether the first standby node sends the first response message. For example, in the above example including the standby node 1, the standby node 2, and the standby node 3, the master node may ignore whether the standby node 1 in the temporary failure state feeds back the response message, and may feed back success in writing to the terminal as long as it waits for the response message fed back by the standby node 2 and the standby node 3.
In some possible implementation manners, after determining that the performance of the first standby node is lower than the threshold, ignoring whether the first standby node feeds back the first response message, considering that the first standby node is likely to have performance degraded in a short time due to short-time network jitter and the like, when the network jitter and the like disappears, the performance of the first standby node is likely to be recovered, and at this time, in order to ensure reliability of data writing to the maximum extent, the embodiment of the present application may further include: when detecting that the performance of the first standby node reaches a set performance threshold, the master node records the state of the first standby node as an incomplete normal state, where the incomplete normal state is used to indicate that data stored by the first standby node may not be consistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written when executing a data writing request.
It is understood that after step 203, the master node may also continuously or periodically detect the performance of the first standby node and determine that the performance of the first standby node reaches the set performance threshold.
For example, the set performance threshold is specifically a time threshold, and then when the data to be written all need to be written into the same data entry, the master node may detect a write duration for storing each data to be written after the first standby node is marked as a temporary failure state.
As an example, in a certain data writing request, the master node may obtain a writing duration of the first standby node when data is written into the first standby node for the latest m times, calculate an average writing duration of the first standby node, and determine that the performance of the first standby node reaches the set performance threshold when a difference between the average writing duration of the first standby node and average writing durations of other standby nodes is smaller than a time threshold. For example: assuming that the time threshold is 50 μ s, and m is 3, and includes the standby node 2 in the normal state and the standby node 1 in the temporary failure state, the write duration of the standby node 1 for the last 3 times may be obtained: t is11’、T12' and T13' and the write duration of the backup node 2 for the last 3 times can be obtained: t is21’、T22' and T23' calculating the average writing time length of the standby node 1 as T1’=(T11’+T12’+T13')/3, and the average write time period of the backup node 2 is T2’=(T21’+T22’+T23')/3; if T is1’-T2' < 50 μ s, then T is determined1' the corresponding standby node 1 is the first standby node whose performance has reached a set performance threshold.
It is understood that, when the performance of the first standby node reaches the set performance threshold, which indicates that the performance of the first standby node has been restored, the master node may mark the first standby node as an incomplete normal state in order to improve the reliability of the stored data of the distributed system.
It should be noted that the set performance threshold for determining whether the performance of the first standby node is recovered may be the same as or different from the set performance threshold for determining that the performance of the first standby node is poor in step 203, but generally, the set performance threshold here is less than or equal to the set performance threshold for determining that the performance of the first standby node is poor in step 203, and may be respectively set according to actual conditions and requirements, and is not limited herein.
On one hand, the incomplete normal state indicates that the master node needs to determine whether the first standby node successfully writes while executing a subsequent write data request. In specific implementation, when the master node receives a data writing request, it can be determined that the first standby node is a standby node whose performance reaches a set performance threshold as long as the first standby node in an incomplete normal state is found, and after the master node executes the data writing request to write data to the first standby node, the master node waits for the first standby node to write successfully, that is, the master node can send feedback that the data writing is successful to the terminal only after receiving a first response message sent by the first standby node and indicating that the first standby node writes successfully.
On the other hand, the incomplete normal state is also used to indicate that there may be inconsistency between the data stored by the first standby node and the data stored by the master node.
It can be understood that, after the performance of the first standby node is lower than the set performance threshold, the master node ignores the first response message fed back by the first standby node, that is, the master node may receive the first response message, or may not receive the first response message, that is, when the first standby node is in the temporary failure state, the master node does not receive the first response message, which indicates that there is a high possibility that the writing on the first standby node is unsuccessful. When a first standby node is in a temporary failure state, if the first standby node has a condition that data to be written is not successfully written, and the data in the main node is not stored in the first standby node under the condition, the data stored in the first standby node is inconsistent with the data stored in the main node after the performance of the first standby node is recovered; otherwise, if the first standby node has no condition that the data to be written is not successfully written, and only the data writing speed is relatively low, then after the performance of the first standby node is recovered, the data stored in the first standby node is consistent with the data stored in the main node.
It can be understood that, when the first standby node recovers from the temporary failure state to the incomplete normal state, because whether the writing of the first standby node is successfully ignored by the master node in the temporary failure state, during this period, there is a high possibility that the writing of the first standby node fails in the first standby node, which causes inconsistency of data stored in the first standby node and the master node, and then, in order to improve reliability of data stored in the distributed system, the embodiment of the present application further includes an operation of performing data synchronization on the first standby node.
Fig. 5 shows a processing manner of the first standby node for recovering performance after step 203 in this embodiment of the application, that is, this embodiment of the application may specifically further include:
step 501, when the state of the first standby node is in an incomplete normal state, the master node sends the data lost by the first standby node to the first standby node.
It can be understood that the first standby node is in an incomplete normal state, which indicates that the first standby node recovers from a temporary failure state with poor performance to a standby node with good performance, at this time, the master node needs to compare whether data stored in the first standby node and data stored in the master node are consistent, and if the data stored in the first standby node and the data stored in the master node are consistent, it indicates that the first standby node has a period of time with poor performance, but the storage speed is slow, and no data is lost, and the master node may directly modify the state of the first standby node to a normal state without performing steps 501 to 503.
If the master node finds that the data stored in the first standby node is inconsistent with the data stored in the master node through comparison, the master node indicates that the first standby node loses part of the data to be stored, and then the master node needs to compare the difference between the data stored in the first standby node and the data stored in the master node, and send the difference part as the lost data of the first standby node to the first standby node.
Step 502, after the first standby node receives the lost data, the master node determines whether the data stored in the first standby node is consistent with the data stored in the master node.
It can be understood that, when the first standby node receives the lost data, the first standby node correspondingly stores the lost data. At this time, in order to ensure that the data stored in the first standby node is completely consistent with the data stored in the master node after the first standby node synchronizes the lost data, and is a reliable data source, the master node needs to compare whether the data stored in the first standby node is consistent with the data stored in the master node, if so, step 503 is executed, otherwise, the data synchronization between the first standby node and the master node is realized through step 501 and step 502.
Step 503, when the data stored in the first standby node is consistent with the data stored in the master node, the master node modifies the state of the first standby node to a normal state, where the normal state is used to indicate that the data stored in the first standby node is consistent with the data stored in the master node, and the master node needs to determine whether the first standby node is successfully written when executing the data writing request.
It can be understood that the normal state of the first standby node not only means that the performance of the first standby node is restored above the set performance threshold, but also requires that the data stored in the first standby node is consistent with the data stored in the main node, that is, the first standby node and the main node achieve complete data synchronization.
In a specific implementation, when the state of the first standby node is a normal state, the master node not only needs to consider whether the first standby node is successfully written in the data writing request, but also can reliably call data in the first standby node even if the master node and/or the second standby node goes down.
In addition, when the distributed system encounters a master node replacement (for example, a master node is down), a new master node needs to be selected and determined from the slave nodes corresponding to the master node. In one case, if the new master node determined by the metadata server is the first standby node in the temporary failure state, the metadata server may be informed that the first standby node is the temporary failure node, and the master node is raised to fail, prompting the occurrence of selection of the new master node; in another case, if the new master node determined by the metadata server is the first slave node in an incomplete normal state, as an example, in consideration of data integrity, the metadata server may be notified that the first slave node is a node whose data may be incomplete, and the master node is raised to fail, and a selection of the new master node is prompted to occur, or, as another example, data differences between the first slave node and the master node may be compared, and if the data differences are within a certain threshold range, data synchronization between the first slave node and the master node may be performed in the manner shown in fig. 5, and then the metadata server is notified that the master node is raised to succeed; in another case, if the new master node determined by the metadata server is the first slave node or the second slave node in the normal state, the metadata server may be directly informed that the master node is successfully updated.
It can be seen that when requesting to write data in the distributed system, the master node may temporarily ignore the feedback of the standby nodes for lower performance (for example, too slow data writing speed) by detecting whether the performance of each standby node meets the requirements, and only wait for successful write feedback of the standby nodes with the performance meeting the requirements, that is, as long as the standby nodes with the performance meeting the requirements return response messages, the master node may feed back successful write to the terminal, compared with a distributed system data writing manner in which the master node waits for all the standby nodes to feed back successful write response messages to the terminal no matter whether the standby nodes with poor performance exist, the master node can effectively avoid the problems of long time and low efficiency of successful feedback write feedback of the whole distributed system due to slow feedback of the standby nodes with lower performance, therefore, the data writing efficiency of the distributed system is improved.
In order to make the method for writing data in a distributed system provided by the embodiment of the present application clearer and easier to understand, the embodiment of the present application shown in fig. 2 is described below with reference to fig. 6, taking an example of writing different logs in one data entry.
For example, after the master node a, the first standby node B, and the second standby node C of the data to be written are determined, in order to provide a basis for a write flow for a log to be written in which the data entry is subsequently written, in the embodiment of the present application, a FAST synchronization Table (english: FAST SYNC Table) may be further generated, where the master node and the node whose performance is not lower than the set performance threshold are present in the FAST synchronization Table. It can be understood that, nodes for which the master node needs to wait for the feedback response message are all recorded in the fast synchronization table, and the state of the nodes in the fast synchronization table can be considered as a normal state; nodes marked as temporarily invalid can be deleted from the fast synchronization table; nodes marked as not completely normal may be added back to the fast synchronization table in a distorted form, e.g., adding deleted node bs back to the fast synchronization table as B'; after data synchronization is performed on the node in the incomplete normal state, the flag of the node may be modified to the normal state, for example, B' is modified to B. It should be noted that the master node a can write data in the distributed system only according to the fast synchronization table.
Fig. 6 is a schematic flow chart of the present embodiment. The embodiment may specifically include:
at step 601, master node A issues a fast synchronization table comprising A, B and C and writes to Log 4.
As shown in fig. 7, a schematic diagram of the distributed system when no abnormality is detected is presented, and at this time, the master node a, the first standby node B, and the second standby node C store a log 1, a log 2, a log 3, and a log 4, respectively. Further, the fast synchronization table includes "A, B and" C ", so that the master node a receives a first response message indicating that the first standby node B has successfully written and a second response message indicating that the second standby node C has successfully written, and can regard them as write completion.
Step 602, when the master node a writes the log 5 into the first standby node B and the second standby node C, it detects that the performance of the first standby node B is lower than the set performance threshold, and deletes B from the fast synchronization table.
In a specific implementation, when the master node a receives a write data request including the log 5, the master node a detects that the performance of the first standby node B is lower than a set performance threshold, and then, as shown in fig. 8, a schematic diagram of the distributed system when an anomaly is detected is presented, when the first standby node B is marked as a temporary failure state, and "B" in the fast synchronization table is deleted, at this time, "a and C" are included in the fast synchronization table, so that the master node a only needs to receive a second response message indicating that the second standby node C is successfully written, and may be regarded as write completion.
Step 603, after the master node a writes logs 6 and 7 into the first standby node B and the second standby node C, and writes a log 8 into the first standby node B and the second standby node C, when detecting that the performance of the first standby node B reaches a set performance threshold, marking B as an incomplete normal state B ', and adding B' back to the fast synchronization table.
In a specific implementation, when the write data request including the log 8 is received by the master node a after the write in the logs 6 and 7 is completed, the master node a detects that the performance of the first standby node B reaches the set performance threshold, and then, as shown in fig. 9, presents a schematic diagram of the distributed system that recovers the performance after detecting the abnormality, marks the first standby node as an incomplete normal state, and rejoins "B '" into the fast synchronization table, where "A, C and B'" are included in the fast synchronization table, so that the master node a receives a first response message indicating that the write in the first standby node B is successful and a second response message indicating that the write in the second standby node C is successful, and can regard the first response message as the write completion.
In step 604, the master node a marks the first standby node B as a normal state after completing the data synchronization of the first standby node B.
It should be noted that "B '" is added to the fast synchronization table so that the first standby node B is in an incomplete normal state at this time, that is, there may be data difference with the master node a, at this time, the master node a may send the data (for example, log 6 and log 7) lost by the first standby node B to the first standby node B by comparing the difference between the data stored by the first standby node B and the data stored by itself, complete the data synchronization between the first standby node B and the master node a, at this time, the first standby node B may be marked as a normal state, and "B'" in the fast synchronization table may be replaced by "B", then, as shown in fig. 10, a schematic diagram of the distributed system is presented, the fast synchronization table includes "A, C and" B ", so that the first response message indicating that the first standby node B successfully writes and the second response message indicating that the second standby node C successfully writes, can it be considered a write completion.
Therefore, according to the method for writing data in the distributed system provided by the embodiment of the application, by detecting whether the performance of each standby node is lower than a set performance threshold, the master node can temporarily ignore the feedback of the standby nodes with lower performance (for example, the data writing speed is too slow) and only wait for the feedback of the standby nodes meeting the requirements; moreover, the performance of the temporarily ignored standby node is detected, and the feedback of the standby node with the recovery performance is considered to be re-considered, so that the reliability of data storage is improved to the maximum extent; and by establishing the quick synchronization table, a list of standby nodes needing to be considered for feedback is simply and conveniently improved for the main node, the condition that the main node feeds back writing success to the terminal is indicated, and the like. Therefore, the method and the device for writing the data in the distributed system can effectively solve the problems that due to the fact that the backup node with low performance feeds back slowly, the time for the feedback writing of the whole distributed system is long, the efficiency is low, especially the problem that the performance of the node is reduced due to short-time network jitter is solved, and the data writing efficiency of the distributed system is improved.
Correspondingly, the embodiment of the application also provides a device for writing data in the distributed system. Fig. 11 is a schematic structural diagram illustrating an apparatus for writing data in a distributed system in an embodiment of the present application, where the apparatus 1100 may specifically include:
a receiving unit 1101, configured to receive a write data request by a master node, where the write data request includes data to be written.
A writing unit 1102, configured to write the data to be written into the first standby node and the second standby node, respectively, by the master node.
A first determining unit 1103, configured to, when determining that the performance of the first standby node is lower than a set performance threshold, ignore that whether the first standby node sends a first response message that the data to be written is successfully written to the master node.
A feedback unit 1104, configured to, when receiving a second response message that the data to be written is successfully written and sent by the second standby node, the master node feeds back the success of writing to the terminal.
In a specific implementation, the receiving unit 1101 may be specifically configured to execute the method in step 201, specifically refer to the description of step 201 in the method embodiment shown in fig. 2; the writing unit 1102 may be specifically configured to execute the method in step 202, specifically refer to the description of step 202 in the embodiment of the method shown in fig. 2; the first determining unit 1103 may be specifically configured to execute the method in step 203, specifically refer to the description of step 203 in the method embodiment shown in fig. 2; the feedback unit 1104 may be specifically configured to execute the method in step 204, and please refer to the description of step 204 in the embodiment of the method shown in fig. 2, which is not described herein again.
Optionally, the write data request further includes a logical address of the data to be written; then, the apparatus 1100 further comprises:
a second determining unit, configured to determine, by the master node, a partition corresponding to the data to be written according to the logical address; a third determining unit, configured to determine, by the master node, the first standby node, and the second standby node according to the partition and through a correspondence between the partition and the master node, where the correspondence is stored in the metadata server.
In a specific implementation, the second determining unit and the third determining unit may be specifically configured to execute the methods in steps 301 to 302, which please refer to the related description of the method embodiment shown in fig. 3, and are not described herein again.
Optionally, the apparatus 1100 further comprises:
a first recording unit, configured to record, by the master node, a state of the first standby node as a temporary failure state, where the temporary failure state is used to indicate that the master node ignores whether writing of the first standby node is successful when executing the write data request.
In a specific implementation, the first recording unit may specifically refer to the description related to the state flag part in step 203 in the method embodiment shown in fig. 2, and is not described herein again.
Optionally, the apparatus 1100 further comprises:
a second recording unit, configured to record, after the master node determines that the performance of the first standby node is lower than a threshold, the state of the first standby node as an incomplete normal state when the master node detects that the performance of the first standby node reaches the set performance threshold, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
In a specific implementation, the second recording unit may specifically refer to the description of the performance recovery part of the first standby node after step 204 in the method embodiment shown in fig. 2, and is not described herein again.
Optionally, the apparatus 1100 further comprises:
a sending unit, configured to send, by the master node, data lost by the first standby node to the first standby node when the state of the first standby node is in an incomplete normal state;
a determining unit, configured to, after the first standby node receives the lost data, the master node determine whether data stored in the first standby node is consistent with data stored in the master node;
a state modifying unit, configured to, when the data stored in the first standby node is consistent with the data stored in the master node, modify, by the master node, the state of the first standby node into a normal state, where the normal state is used to indicate that the data stored in the first standby node is consistent with the data stored in the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
In a specific implementation, the sending unit, the determining unit, and the state modifying unit may specifically refer to the description related to the data synchronization part of the first standby node and the master node in the method embodiment shown in fig. 5, and are not described herein again.
Thus, with the apparatus for writing data in a distributed system provided in this embodiment of the present application, when requesting to write data in the distributed system, the master node may temporarily ignore the feedback of the standby nodes by detecting whether the performance of each standby node meets the requirement (for example, the speed of writing data is too slow), and only wait for the successful write feedback of the standby nodes whose performance meets the requirement, that is, as long as the standby nodes whose performance meets the requirement return a response message, the master node may feed back the successful write to the terminal The problem of low efficiency is solved, and therefore the data writing efficiency of the distributed system is improved.
In addition, the embodiment of the application also provides a device for writing data in the distributed system. Fig. 12 is a schematic structural diagram illustrating an apparatus for writing data in a distributed system according to an embodiment of the present application, where the apparatus 1200 includes: at least one processor 1201 and a memory 1202 are connected, where the memory 1202 is used for storing program codes, and the processor 1201 is used for calling the program codes in the memory to execute the method for writing data in the distributed system provided in fig. 2.
In a specific implementation, the processor 1201 is configured to implement the following operations:
the method comprises the steps that a main node receives a data writing request, wherein the data writing request comprises data to be written;
the master node writes the data to be written into a first standby node and a second standby node respectively;
when the master node determines that the performance of the first standby node is lower than a set performance threshold, ignoring whether the first standby node sends a first response message that the data to be written is successfully written to the master node;
and the main node feeds back successful writing to the terminal when receiving a second response message which is sent by the second standby node and in which the data to be written is successfully written.
Optionally, the write data request further includes a logical address of the data to be written;
the processor 1201 is further configured to implement the following operations:
the main node determines a partition corresponding to the data to be written according to the logic address;
and the main node determines the main node, the first standby node and the second standby node according to the partition and the corresponding relation between the partition and the main node stored in the metadata server.
Optionally, the processor 1201 is further configured to implement the following operations:
and the master node records the state of the first standby node as a temporary failure state, wherein the temporary failure state is used for indicating whether the master node ignores whether the first standby node is successfully written or not when executing the data writing request.
Optionally, after the master node determines that the performance of the first slave node is lower than a threshold, the processor 1201 is further configured to:
when detecting that the performance of the first standby node reaches the set performance threshold, the master node records the state of the first standby node as an incomplete normal state, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a data writing request.
Optionally, the processor 1201 is further configured to implement the following operations:
when the state of the first standby node is in an incomplete normal state, the main node sends the data lost by the first standby node to the first standby node;
after the first standby node receives the lost data, the main node judges whether the data stored in the first standby node is consistent with the data stored in the main node;
when the data stored in the first standby node is consistent with the data stored in the main node, the main node modifies the state of the first standby node into a normal state, the normal state is used for indicating that the data stored in the first standby node is consistent with the data stored in the main node, and the main node needs to determine whether the first standby node is successfully written in when executing a data writing request.
In addition, the embodiment of the application also provides a system for writing data in the distributed system. Fig. 13 is a schematic structural diagram illustrating a system for writing data in a distributed system according to an embodiment of the present application, where the system 1300 includes: the system comprises a main node 1301, a first standby node 1302 and a second standby node 1303; the master node 1301 is configured to execute the method for writing data in the distributed system provided in fig. 2, which is not described herein again.
Furthermore, an embodiment of the present application also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform one or more steps of any one of the methods described above. The respective constituent modules of the above-described apparatus for writing data in a distributed system may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.
Based on such understanding, the embodiments of the present application also provide a computer program product containing instructions, where a part of or all or part of the technical solution that substantially contributes to the prior art may be embodied in the form of a software product stored in a storage medium, and the computer program product contains instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor therein to execute all or part of the steps of the method described in the embodiments of the present application.
In the names of "first standby node", "first response message", and the like, the "first" mentioned in the embodiments of the present application is only used for name identification, and does not represent the first in sequence. The same applies to "second" etc.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and system embodiments, which are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some descriptions of the method embodiments for related points. The above-described embodiments of the apparatus, device and system are merely illustrative, wherein modules described as separate parts may or may not be physically separate, and parts shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only an exemplary embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (12)

1. A method of writing data in a distributed system, comprising:
the method comprises the steps that a main node receives a data writing request, wherein the data writing request comprises data to be written;
the master node writes the data to be written into a first standby node and a second standby node respectively;
when the master node determines that the performance of the first standby node is lower than a set performance threshold, ignoring whether the first standby node sends a first response message that the data to be written is successfully written to the master node;
and the main node feeds back successful writing to the terminal when receiving a second response message which is sent by the second standby node and in which the data to be written is successfully written.
2. The method of claim 1, wherein the write data request further comprises a logical address of the data to be written;
the method further comprises the following steps:
the main node determines a partition corresponding to the data to be written according to the logic address;
and the main node determines the main node, the first standby node and the second standby node according to the partition and the corresponding relation between the partition and the main node stored in the metadata server.
3. The method of claim 1, further comprising:
and the master node records the state of the first standby node as a temporary failure state, wherein the temporary failure state is used for indicating whether the master node ignores whether the first standby node is successfully written or not when executing the data writing request.
4. The method of claim 3, wherein after the primary node determines that the performance of the first standby node is below a threshold, the method further comprises:
when detecting that the performance of the first standby node reaches the set performance threshold, the master node records the state of the first standby node as an incomplete normal state, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a data writing request.
5. The method of claim 4, further comprising:
when the state of the first standby node is in an incomplete normal state, the main node sends the data lost by the first standby node to the first standby node;
after the first standby node receives the lost data, the main node judges whether the data stored in the first standby node is consistent with the data stored in the main node;
when the data stored in the first standby node is consistent with the data stored in the main node, the main node modifies the state of the first standby node into a normal state, the normal state is used for indicating that the data stored in the first standby node is consistent with the data stored in the main node, and the main node needs to determine whether the first standby node is successfully written in when executing a data writing request.
6. An apparatus for writing data in a distributed system, comprising:
the device comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a data writing request by a main node, and the data writing request comprises data to be written;
the writing unit is used for respectively writing the data to be written into a first standby node and a second standby node by the main node;
a first determining unit, configured to ignore that, when determining that the performance of the first standby node is lower than a set performance threshold, the master node sends, to the master node, a first response message indicating that the data to be written is successfully written;
and the feedback unit is used for feeding back the successful writing to the terminal when the master node receives a second response message which is sent by the second standby node and in which the data to be written is successfully written.
7. The apparatus of claim 6, wherein the write data request further comprises a logical address of the data to be written;
the device further comprises:
a second determining unit, configured to determine, by the master node, a partition corresponding to the data to be written according to the logical address;
a third determining unit, configured to determine, by the master node, the first standby node, and the second standby node according to the partition and through a correspondence between the partition and the master node, where the correspondence is stored in the metadata server.
8. The apparatus of claim 6, further comprising:
a first recording unit, configured to record, by the master node, a state of the first standby node as a temporary failure state, where the temporary failure state is used to indicate that the master node ignores whether writing of the first standby node is successful when executing the write data request.
9. The apparatus of claim 8, further comprising:
a second recording unit, configured to record, after the master node determines that the performance of the first standby node is lower than a threshold, the state of the first standby node as an incomplete normal state when the master node detects that the performance of the first standby node reaches the set performance threshold, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
10. The apparatus of claim 9, further comprising:
a sending unit, configured to send, by the master node, data lost by the first standby node to the first standby node when the state of the first standby node is in an incomplete normal state;
a determining unit, configured to, after the first standby node receives the lost data, the master node determine whether data stored in the first standby node is consistent with data stored in the master node;
a state modifying unit, configured to, when the data stored in the first standby node is consistent with the data stored in the master node, modify, by the master node, the state of the first standby node into a normal state, where the normal state is used to indicate that the data stored in the first standby node is consistent with the data stored in the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
11. An apparatus for writing data in a distributed system, comprising at least one processor and a memory coupled to each other, wherein the memory is configured to store program code, and the processor is configured to call the program code in the memory to perform the method for writing data in a distributed system according to any one of claims 1 to 5.
12. A system for writing data in a distributed system is characterized by comprising a main node, a first standby node and a second standby node;
the master node for performing the method of writing data in a distributed system of any one of claims 1 to 5.
CN201811124586.7A 2018-09-26 2018-09-26 Method and device for writing data in distributed system Pending CN110955382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811124586.7A CN110955382A (en) 2018-09-26 2018-09-26 Method and device for writing data in distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811124586.7A CN110955382A (en) 2018-09-26 2018-09-26 Method and device for writing data in distributed system

Publications (1)

Publication Number Publication Date
CN110955382A true CN110955382A (en) 2020-04-03

Family

ID=69964710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811124586.7A Pending CN110955382A (en) 2018-09-26 2018-09-26 Method and device for writing data in distributed system

Country Status (1)

Country Link
CN (1) CN110955382A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514173A (en) * 2012-06-20 2014-01-15 华为技术有限公司 Data processing method and node equipment
CN104603737A (en) * 2012-12-13 2015-05-06 株式会社日立制作所 Computer realizing high-speed access and data protection of storage device, computer system, and i/o request processing method
CN104935654A (en) * 2015-06-10 2015-09-23 华为技术有限公司 Caching method, write point client and read client in server cluster system
CN105426439A (en) * 2015-11-05 2016-03-23 腾讯科技(深圳)有限公司 Metadata processing method and device
CN106484311A (en) * 2015-08-31 2017-03-08 华为数字技术(成都)有限公司 A kind of data processing method and device
CN106878388A (en) * 2017-01-04 2017-06-20 北京百度网讯科技有限公司 Detection to slow node in distributed memory system
CN107295080A (en) * 2017-06-19 2017-10-24 北京百度网讯科技有限公司 Date storage method and server applied to distributed server cluster
CN107368485A (en) * 2016-05-12 2017-11-21 苏宁云商集团股份有限公司 The management method and Database Systems of a kind of database
CN107800802A (en) * 2017-11-10 2018-03-13 郑州云海信息技术有限公司 A kind of Rack whole machine cabinets write-in and the method for reading UUID
CN108173971A (en) * 2018-02-05 2018-06-15 江苏物联网研究发展中心 A kind of MooseFS high availability methods and system based on active-standby switch
CN108235751A (en) * 2017-12-18 2018-06-29 华为技术有限公司 Identify the method, apparatus and data-storage system of object storage device inferior health
CN108400881A (en) * 2017-02-08 2018-08-14 阿里巴巴集团控股有限公司 Message engine dynamic adjusting method, device and electronic equipment based on state machine

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514173A (en) * 2012-06-20 2014-01-15 华为技术有限公司 Data processing method and node equipment
CN104603737A (en) * 2012-12-13 2015-05-06 株式会社日立制作所 Computer realizing high-speed access and data protection of storage device, computer system, and i/o request processing method
CN104935654A (en) * 2015-06-10 2015-09-23 华为技术有限公司 Caching method, write point client and read client in server cluster system
CN106484311A (en) * 2015-08-31 2017-03-08 华为数字技术(成都)有限公司 A kind of data processing method and device
CN105426439A (en) * 2015-11-05 2016-03-23 腾讯科技(深圳)有限公司 Metadata processing method and device
CN107368485A (en) * 2016-05-12 2017-11-21 苏宁云商集团股份有限公司 The management method and Database Systems of a kind of database
CN106878388A (en) * 2017-01-04 2017-06-20 北京百度网讯科技有限公司 Detection to slow node in distributed memory system
CN108400881A (en) * 2017-02-08 2018-08-14 阿里巴巴集团控股有限公司 Message engine dynamic adjusting method, device and electronic equipment based on state machine
CN107295080A (en) * 2017-06-19 2017-10-24 北京百度网讯科技有限公司 Date storage method and server applied to distributed server cluster
CN107800802A (en) * 2017-11-10 2018-03-13 郑州云海信息技术有限公司 A kind of Rack whole machine cabinets write-in and the method for reading UUID
CN108235751A (en) * 2017-12-18 2018-06-29 华为技术有限公司 Identify the method, apparatus and data-storage system of object storage device inferior health
CN108173971A (en) * 2018-02-05 2018-06-15 江苏物联网研究发展中心 A kind of MooseFS high availability methods and system based on active-standby switch

Similar Documents

Publication Publication Date Title
US11086825B2 (en) Telemetry system for a cloud synchronization system
JP6749926B2 (en) Method, device and system for synchronizing data
US10895996B2 (en) Data synchronization method, system, and apparatus using a work log for synchronizing data greater than a threshold value
CN106202075B (en) Method and device for switching between main database and standby database
US20150213100A1 (en) Data synchronization method and system
WO2017177941A1 (en) Active/standby database switching method and apparatus
US10884623B2 (en) Method and apparatus for upgrading a distributed storage system
US11892922B2 (en) State management methods, methods for switching between master application server and backup application server, and electronic devices
CN110543386B (en) Data storage method, device, equipment and storage medium
CN105159795A (en) Data synchronization method, apparatus and system
US20210320977A1 (en) Method and apparatus for implementing data consistency, server, and terminal
CN108965383B (en) File synchronization method and device, computer equipment and storage medium
CN109245953B (en) Network configuration method and device
CN106815094B (en) Method and equipment for realizing transaction submission in master-slave synchronization mode
US8612799B2 (en) Method and apparatus of backing up subversion repository
CN112506702A (en) Data center disaster tolerance method, device, equipment and storage medium
CN107888434B (en) Network equipment configuration synchronization method and device
WO2016206568A1 (en) Data update method, device, and related system
CN107526652B (en) Data synchronization method and storage device
WO2017080362A1 (en) Data managing method and device
CN110955382A (en) Method and device for writing data in distributed system
CN109542338B (en) Method and device for realizing node information consistency in distributed storage system
CN112491633A (en) Fault recovery method, system and related components of multi-node cluster
CN104965769A (en) System, method and apparatus for on-line detecting consistency of memory data between primary and standby servers
CN112235393B (en) Service stabilization method and device based on logistics system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200403

RJ01 Rejection of invention patent application after publication