WO2019072085A1 - 日志条目复制方法、装置、计算机设备及存储介质 - Google Patents

日志条目复制方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2019072085A1
WO2019072085A1 PCT/CN2018/107512 CN2018107512W WO2019072085A1 WO 2019072085 A1 WO2019072085 A1 WO 2019072085A1 CN 2018107512 W CN2018107512 W CN 2018107512W WO 2019072085 A1 WO2019072085 A1 WO 2019072085A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
log
index
entry
log entry
Prior art date
Application number
PCT/CN2018/107512
Other languages
English (en)
French (fr)
Inventor
郭锐
李茂材
张建俊
王宗友
梁军
屠海涛
赵琦
刘斌华
朱大卫
秦青
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18866263.9A priority Critical patent/EP3623963B1/en
Publication of WO2019072085A1 publication Critical patent/WO2019072085A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/30Decision processes by autonomous network management units using voting and bidding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the present invention relates to the field of blockchain technology, and in particular, to a log entry copying method, device, computer device and storage medium.
  • the working state of the nodes in the node cluster can be divided into follower, Candidate and Leader, that is, the nodes in the node cluster can be divided into following nodes and candidate nodes. And the leader node.
  • the client sends a commit instruction to node A
  • node A adds the commit command to itself.
  • log entries are formed and log entries are broadcast to Node B, Node C, and Node D.
  • Node A determines, based on the consensus algorithm, that the nodes in the node cluster reach a consensus, it sends a successful submission response to the client.
  • the node clustering process is performed serially, that is, when multiple clients simultaneously send log entries to the node cluster, the nodes in the node cluster need to wait for the data recording process of the current client to complete. Handling log entries for the next client results in inefficient log entry replication.
  • the embodiment of the invention provides a method, a device, a computer device and a storage medium for copying log entries, which can solve the problem of low efficiency.
  • the technical solution is as follows:
  • a log entry replication method is provided, the method being applied to a first node running in a leader state in a node cluster, the node cluster further comprising a plurality of second nodes running in a following state, Methods include:
  • each acknowledgement response carrying a log index of the second node
  • each second log entry being used to instruct the second node to copy the second log entry.
  • a log entry replication method is provided, the method being applied to a second node running in a following state in a node cluster, the node cluster further comprising a first node running in a leadership state, the method comprising :
  • the acknowledgment response carrying a log index of the second node.
  • a log entry replication apparatus is provided, the apparatus being applied to a first node running in a leader state in a node cluster, the node cluster further comprising a plurality of second nodes running in a following state,
  • the device includes:
  • a sending module configured to send a plurality of first log entries to the plurality of second nodes in parallel, where each first log entry is used to instruct the second node to copy the first log entry;
  • a receiving module configured to receive at least one acknowledgement response of any one of the second nodes, where each acknowledgement response carries a log index of the second node;
  • An obtaining module configured to acquire, according to the log index of the second node, at least one second log entry to be re-issued by the second node;
  • the sending module is further configured to send at least one second log entry to the second node in parallel, where each second log entry is used to instruct the second node to copy the second log entry.
  • a log entry replication apparatus being applied to a second node operating in a following state in a cluster of nodes, the node cluster further comprising a first node operating in a leadership state, the apparatus comprising :
  • a receiving module configured to receive, in parallel, a plurality of log entries sent by the first node, where each log entry is used to instruct the second node to record the log entries;
  • a copying module configured to copy a log entry to a log of the second node according to an entry index of each log entry, and update a log index of the second node;
  • a sending module configured to send an acknowledgment response to the first node, where the acknowledgment response carries a log index of the second node.
  • a computer apparatus comprising a memory, a processor, and computer executable instructions stored on the memory and executable on the processor, the processor implementing the log entry copying described below when the computer executable instructions are executed
  • Method sending a plurality of first log entries to the plurality of second nodes in parallel, each first log entry is used to instruct the second node to copy the first log entry; and receive at least one of any second node Acknowledgement response, each acknowledgment response carrying a log index of the second node; acquiring at least one second log entry of the second node to be re-issued according to the log index of the second node; to the second The node sends at least one second log entry in parallel, and each second log entry is used to instruct the second node to copy the second log entry.
  • a computer readable storage medium having stored thereon instructions that are executed by a processor to perform a log entry copying method of: transmitting to the plurality of second nodes in parallel a plurality of first log entries, each of the first log entries being used to instruct the second node to replicate the first log entry; receiving at least one acknowledgement response of any one of the second nodes, each acknowledgement carrying the second Obtaining at least one second log entry of the second node to be re-issued according to the log index of the second node; and sending at least one second log entry to the second node in parallel, each The second log entry is used to instruct the second node to copy the second log entry.
  • a computer apparatus comprising a memory, a processor, and computer executable instructions stored on the memory and executable on the processor, wherein the processor implements the following log when the computer executable instructions are executed
  • An entry copy method receiving, in parallel, a plurality of log entries sent by the first node, each log entry being used to instruct the second node to record the log entries; copying the log entries to the first entry according to an entry index of each log entry And logging the log index of the second node in the log of the second node; sending an acknowledgment response to the first node, where the acknowledgment response carries a log index of the second node.
  • a computer readable storage medium having stored thereon instructions that are executed by a processor to perform a log entry copying method of: receiving a plurality of logs sent by a first node in parallel An entry, each log entry is used to instruct the second node to record the log entry; the log entry is copied to the log of the second node according to the entry index of each log entry, and the second node is updated a log index; sending an acknowledgment response to the first node, the acknowledgment response carrying a log index of the second node.
  • the first node sends a plurality of first log entries in parallel to the plurality of second nodes, and receives an acknowledgement response of at least one of the second nodes that carries the log index of the second node, so as to be according to the log index of the second node,
  • the second node sends at least one second log entry in parallel, because the plurality of first log entries are sent to the plurality of second nodes in parallel, and in order to avoid an error receiving log such as a missed or overcharged second node due to network problems
  • the first node also provides a re-issuing mechanism for the log entries.
  • FIG. 1 is a schematic diagram of a conventional log entry copying method
  • FIG. 2 is a schematic diagram of an implementation environment of a log entry replication method according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for replicating a log entry according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a method for replicating a log entry according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a method for replicating a log entry according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a method for replicating a log entry according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a method for replicating a log entry according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • FIG. 15 is a schematic structural diagram of a computer device 1500 according to an embodiment of the present invention.
  • the implementation environment is a system composed of a plurality of nodes, and the system is also equivalent to a node cluster, and each node is capable of performing consensus operations, storing data, forwarding data, and verifying data.
  • the basic unit of the behavior the node may be composed of one or more computer devices, and the nodes may be divided into a leader node, a following node, and a candidate node according to the working state of each node in the node cluster.
  • node A is the leader node in the node cluster
  • node B, node C, and node D are the follower nodes in the node cluster.
  • node A When node A is working normally, it can be timed to node B and node C. And the node D broadcasts the heartbeat information.
  • the node B, the node C, and the node D can determine that the node A is working normally, and reset the heartbeat timer, waiting to receive the next heartbeat information. Since there may be a rogue node in the node cluster that attempts to tamper with the data in the node in the intrusion, in order to prevent the fraud node from pretending that the leader node sends pseudo heartbeat information to the following node in the node cluster, the leader node sends heartbeat information to the following node.
  • the voting signatures of each following node when electing the leader node are usually used to prove that they are real leader nodes.
  • the client sends multiple commit instructions to the leader node in the node cluster, and the leader node adds the multiple commit commands to its own log to form multiple log entries. And sending the multiple log entries in parallel to each of the following nodes in the node cluster.
  • the leader node may encrypt the log entries based on the algorithm signature when sending the log entries to the follower node, for example, based on RSA (Ron Rivest AdiShamir) The Leonard Adleman (Asymmetric Encryption) algorithm encrypts log entries.
  • the node cluster may be a data sharing system based on a blockchain technology, and each node stores a blockchain, and the blockchain is formed by linking multiple blocks, and the log entry may be
  • the log stored by each node in the node cluster may correspond to a blockchain.
  • the process of forming a log entry when adding the record data to the log is actually the block data.
  • the form is stored as a procedure in the next block of the current block.
  • the record data may be transaction data or the like.
  • An entry index (block index or block hight) for indicating a log entry, that is, for indicating a block;
  • a log index used to indicate the current replication of a log entry by a node during the current running period (term);
  • the Last append index which is used to indicate the log entries a node has submitted.
  • FIG. 3 is a flowchart of a method for copying a log entry according to an embodiment of the present invention.
  • the first node refers to a node running in a leader state in a node cluster, that is, a leader node;
  • a node refers to a node running in a following state in a cluster of nodes, that is, a follower node.
  • the method includes the following steps.
  • the first node receives multiple commit commands sent by the client, and adds the multiple commit commands to the log of the first node to form multiple first log entries.
  • the commit instruction is used to submit record data to the node cluster, so that the node cluster can record the record data on each node.
  • a client served by the cluster may send multiple commit commands to the first node.
  • the multiple submission commands may be sent by the multiple clients to the first node, which is not limited by the embodiment of the present invention.
  • the first node acquires a sending window, and uses multiple threads to send multiple first log entries in the sending window in parallel to multiple second nodes, where each first log entry is used to instruct the second node to follow the first log entry.
  • the entry index is copied.
  • a sending window may be set, and multiple first log entries are sent based on the sending window.
  • the size of the send window is determined by the network bandwidth, and the size can be adjusted.
  • the window width of the send window is used to indicate the number of log entries sent in parallel. Generally, the window width of the send window can be obtained based on Equation 1 below.
  • Network Bandwidth Data Size of Multiple Log Entries * Window Width * Number of Nodes
  • the first node When the first node sends multiple first log entries based on the sending window, it may be based on multi-threading, and add a first log entry in each thread in the sending queue, and the first log entry is unordered by the multi-threading
  • the send is sent in parallel.
  • the sending queue of the sending window is a Queue
  • the Queue includes four threads, namely Thread1, Thread2, Thread3, and Thread4, and each of the four threads includes a first log entry.
  • the first log entry includes a Sign and a Send identifier.
  • This parallel transmission method can be called parallel pipeline.
  • the out-of-order transmission method is adopted, the transmission time required for serial transmission of log entries is avoided, and it takes 10 ms to send a log entry as an example. Sending, only 100 can be sent per second, and the above parallel transmission method can achieve the purpose of maximizing the bandwidth. If the sending window is 50, it takes only about 10ms to send 50 pieces in parallel, and 100 pieces are sent. The time can be reduced to about 20ms, and the overall transmission time can be reduced by several tens of times. This transmission method greatly improves the data throughput.
  • the multiple first log entries received by the second node may be out of order, in order to ensure that the second node can copy multiple data entries in sequence.
  • the first log entry also carries an entry index indicating the log entry copy order, so that the second node can still copy the log entries in order when receiving the out-of-order log entries.
  • the first node when the first node sends the plurality of first log entries to the plurality of second nodes in parallel, the first node may not use the sending window and the multi-thread, and only need to send the second node in parallel at the same time. Multiple first log entries.
  • the manner in which the first node sends a plurality of first log entries in parallel is not specifically limited in the embodiment of the present invention.
  • the sending window is obtained in real time after the first node receives the multiple commit commands sent by the client.
  • the transmission window may also be pre-configured based on the maximum bandwidth of the network for direct use in subsequent parallel transmissions. The embodiment of the present invention does not specifically limit the timing at which the first node acquires the sending window.
  • the second node receives, in parallel, a plurality of first log entries sent by the first node.
  • the second node may set a receiving window and receive a plurality of first log entries based on the receiving window.
  • the size of the receive window can be determined by the network bandwidth, the size can be adjusted, and the window width of the receive window is used to indicate the number of log entries received in parallel.
  • the method for obtaining the window width of the receiving window is the same as the method for obtaining the window width of the sending window in the above step 301, and details are not described herein.
  • the second node may form a receive queue of the receive window based on the multi-thread when receiving the plurality of first log entries based on the receive window.
  • the receiving queue may be in the form of a Concurrent Skip List Map.
  • the embodiment of the present invention does not specifically limit the form of the receiving queue.
  • the second node may acquire a signature in the first log entry, and perform the signature based on the first node public key stored on the second node. Check to prevent receipt of fraudulent data sent by the fraudulent node. For example, referring to the receiving window shown in FIG.
  • the receiving queue of the receiving window is a Concurrent Skip List Map
  • the Concurrent Skip List Map includes four threads, namely Thread5, Thread6, Thread7, and Thread8, and each of the four threads includes a first thread.
  • An embodiment of the present invention is described by taking a second node in a node cluster as an example.
  • multiple second nodes exist in a node cluster, and multiple embodiments are performed on multiple second nodes.
  • the operation shown in It should be noted that, due to the delay in the network and the possibility of data loss, multiple first log entries sent by the first node in parallel may not reach the second node at the same time, that is, the second node may not receive at the same time.
  • All first log entries sent by the first node further, if a data loss occurs, the second node may not receive all the first log entries sent by the first node, therefore, the second node in step 302 There may be a difference between the number of received first log entries and the number of first log entries sent by the first node in step 301.
  • the second node copies the log entry into the log according to the entry index of each log entry.
  • the duplicate log entry is completed and the received log entry does not include the target log entry that is consecutive to the entry of the log entry, , stores the unreplicated log entries in the received log entries into the cache queue.
  • the inventors have recognized that since the first node uses a parallel transmission mode when transmitting a plurality of first log entries to the second node, the plurality of first log entries received by the second node are likely to be out of order.
  • the second node replicates log entries in multiple first log entries, it usually needs to perform serial replication in sequence. Therefore, each log entry corresponds to an entry index (that is, block height: Block Height). So that the second node can still correctly copy the first log entry according to the entry index when receiving the first log entry that arrives out of order, so that the blockchain of the second node and the blockchain of the first node can be maintained. The data is consistent.
  • the second node sequentially copies the log entries, if the next log entry to be copied by the second node has not been received, that is, the target log entry has not been received, the target log entry is the next log entry to be copied. .
  • the second node cannot continue the copy operation of the log entry, and needs to wait for the target log entry to be received before continuing to work. At this time, the second node still receives other first log entries that arrive in parallel, in order to avoid receiving the first log. An effect occurs between the entry and the process of waiting to receive the target log entry, and the cache queue is added to the second node.
  • the cache queue is used to store log entries that have been received by the second node but are not replicated because the entry index is not consecutive with the entry index of the log entry that has been copied. In this way, when the second node subsequently receives the target log entry of the entry index and the entry index of the log entry that has been copied completely, the unreplicated log entry can be extracted in the cache queue to continue the replication, and the log entry is guaranteed. Copying in order does not affect the reception of the first log entry, thus improving the efficiency of copying log entries.
  • the second node copies one of the log entries
  • the log index is updated, and an acknowledgment response is sent to the first node, and the acknowledgment response carries the log index of the second node.
  • the log index of the second node is actually the latest log index of the second node, and is used to indicate the log of the latest storage of the second node.
  • This latest log index can characterize log integrity on a node. For any second node, since it receives the log entry, it will broadcast the index of the received log entry to other second nodes to reach a consensus with other second nodes in the node cluster, based on consensus When the algorithm determines that the nodes in the node cluster reach a consensus, the second node copies the log entries to its own log and updates its own log index.
  • the second node may add 1 to its current log index to obtain the latest log index, and then send an acknowledgment response carrying the latest log index to the first node.
  • the log index can be represented by Commit_index.
  • the second node may receive a delay in receiving the first log entry, so that the second node cannot return an acknowledgment response to the first node in time.
  • the first node may consider the first log entry. A packet loss occurred, causing the second node not to return an acknowledgment response in time. In this way, the first node will issue the first log entry to the second node again.
  • the second node it is possible for the second node to receive the same log entry multiple times.
  • the second node In order to timely inform the first node that the same log entry has been received multiple times, to avoid the bandwidth occupation caused by the first node multiple replenishment, the second node counts the entry index of the log entry when receiving the log entry, and When the number of times of receiving the log entry is greater than or equal to two times, the acknowledgment response carrying the log index of the second node is sent to the first node again, so that the first node learns that the current second node has not lost packets, and does not need to be Make a replacement.
  • the second node may implement the receipt of the entry index of the log entry based on the receipt counter. Referring to FIG.
  • the Receive Counter in the second node increases the number of receptions of the five first log entries by one. For example, if the second node receives an entry 1, the Receive Counter corresponding to the entry 1 is incremented by 1. If the second node receives an entry 1 again, the Receive Counter corresponding to the entry 1 is further incremented by 1. Similarly, the first When the two nodes receive an entry 2, the corresponding Receive Counter of the entry 2 is incremented by 1.
  • the first node receives at least one acknowledgment response of any one of the second nodes, and maintains a confirmation list for the second node.
  • an acknowledgment response is received, if the log index in the newly received acknowledgment response is greater than the second node, the acknowledgment is greater than the second node.
  • the log index in the list updates the log index in the confirmation list of the second node.
  • the first node in order to determine whether a log entry needs to be re-issued to the second node, the first node may maintain the confirmation list, and extract the log index of the second node in the confirmation response returned by the second node, based on the log index pair.
  • the confirmation list is updated in real time.
  • the first node may maintain a confirmation list corresponding to each second node, extract a log index in the confirmation response returned by the second node, and add the log index to the confirmation list of the second node itself; or
  • the two nodes can maintain only one confirmation list, store the log indexes in the confirmation responses returned by all the second nodes to the confirmation list, and update the confirmation list in real time.
  • the confirmation list can be represented by Commit_table.
  • the log index carried in the acknowledgment response returned by the second node to the first node is incremented, that is, over time, for the same
  • the log index carried in the acknowledgment response returned by the second node received by the first node is larger and larger, and each time the first node receives an acknowledgment response, the log index carried in the acknowledgment response is Comparing with the current log index of the second node in the confirmation list. If the newly received log index is greater than the current log index of the second node in the confirmation list, the newly received log index is updated to the second node in the confirmation list. Corresponding log index.
  • the log index corresponding to node A in the current confirmation list is index3
  • the first node receives the acknowledgment response carrying the index6 returned by the node A
  • the first node will confirm the node A corresponding to the list.
  • the log index is updated to index6.
  • the first node may not maintain the confirmation list, and only records the node identifier of the second node and the log index returned by the currently received second node, and receives the
  • the second node confirms the new response the log index corresponding to the node identifier of the second node in the record is deleted, and the node identifier of the second node is corresponding to the log index carried in the new confirmation response, that is,
  • the first node may update the node identifier of the second node of the corresponding record and the log index in the log index according to the received log index returned by the second node.
  • the manner of recording the log index returned by the second node is not specifically limited in the embodiment of the present invention.
  • the first node acquires the log between the target threshold and the maximum committed index of the first node as at least one second log of the second node to be re-issued. entry.
  • the target threshold is used to indicate the log index that the second node should reach at a minimum under normal circumstances.
  • the second node in order to predict the packet loss situation of the second node in time, and timely re-issue the log entry that the second node may lose, the second node may exist to receive the first log entry.
  • the target threshold may be set in the first node, and the log index in the acknowledgment response currently returned by each second node is detected based on the target threshold every target period, so as to determine whether it is necessary to re-send the log to the second node. Entries and which log entries are reissued.
  • the first node When the first node detects the log index in the acknowledgment response currently returned by each second node based on the target threshold, the first node compares the log index in the acknowledgment response currently returned by the second node with the target threshold. If the log index in the acknowledgment response currently returned by the second node is less than the target threshold, the first node obtains the maximum submitted index in the entry index of all log entries that need to be sent to the second node, and the target threshold and the maximum submitted index are The log entry corresponding to the entry index is obtained as at least one second log entry of the second node to be re-issued.
  • the first node will index50 to The log entry corresponding to the entry index between index 100 is used as the second log entry.
  • the target threshold may not be set in the first node, so that the first node may compare the log index corresponding to the second node in the confirmation list with the maximum submitted index of the first node every target period. If the log index of the second node is smaller than the maximum commit index of the first node, the first node acquires the log entry corresponding to the entry index between the log index of the second node and the largest committed index of the first node as the second node. The second log entry. For example, if the maximum submission index of the first node is index 100 and the log index corresponding to the second node in the confirmation list is index 89, the first node may use the log entry corresponding to the entry index between index 89 and index 100 as the second log entry.
  • the foregoing process is performed by detecting every target period, giving a certain copy time to each second node, avoiding data inaccuracy caused by over-detection, and avoiding excessive computing resources caused by real-time detection. Occupied.
  • the first node sends at least one second log entry in parallel to the second node, where each second log entry is used to instruct the second node to copy the second log entry.
  • the first node may send the second log entry to the second node in parallel.
  • the process of the first node sending the at least one second log entry in parallel to the second node is consistent with the process of sending the first log entries in parallel in the step 301, and is not described here.
  • the second node receives at least one second log entry sent by the first node, extracts an unreplicated log entry in the cache queue, and copies the log entry according to an entry index of each log entry.
  • the entry index of the second log entry reissued by the first node is continuous with the log entry that has been copied yet.
  • the entry index of the second log entry is also contiguous with the entry index of the unreplicated log entry in the cache queue, so that the second node can continue the copying work of the log entry. For example, if the entry index of the log entry of the current second node replication is 3, the cache queue stores the unreplicated log entries with the entry indexes 5, 6, and 7, when the second node receives the first node.
  • the log entry of the log entry extracted in the second log entry has an index of 4, and the second node can extract the unreplicated log entries whose entry indexes are 5, 6, and 7 in the cache queue, and Continue to copy the corresponding log entries in the order of entry indexes 4, 5, 6, and 7.
  • an embodiment of the present invention provides a schematic diagram of switching of an operating state of a node.
  • the leader node periodically sends heartbeat information to the following nodes.
  • the heartbeat timer of the following node fails to receive the heartbeat information sent by the leader node in the event of a timeout, Then the following node can determine that the leader node has collapsed, and the following node will switch the working state to the candidate state and become the candidate node. Further, the candidate node resets the heartbeat timer and broadcasts a voting request to the following node in the node cluster. When receiving more than half of the voting confirmation messages in the node cluster, the candidate node switches the working state to the leadership state and becomes the leader. node. It should be noted that, since the leader node bears the responsibility of sending a log entry to the following node, the following node that becomes the candidate node may be the node that has the largest number of duplicate log entries in the node cluster.
  • the first node when the switching of the working state occurs in the node cluster, on the one hand, the first node may switch its working state to the following state to become the following node; on the other hand, the second node may The working state of the user is switched to the leadership state and becomes the leader node.
  • the first node when the switching of the working state occurs in the node cluster, on the one hand, the first node may switch its working state to the following state to become the following node; on the other hand, the second node may The working state of the user is switched to the leadership state and becomes the leader node.
  • Case 1 The first node switches the working state to the following state.
  • the first node When the first node switches to the following state, that is, the first node becomes the following node, the first node cleans up the unsent log entry and stops the parallel transmission step.
  • the first node since the first node sends the plurality of first log entries in parallel by using the multi-thread based on the sending window, the first node clears the log entries that have not been sent by the sending window, and closes the multi-threading.
  • the first node is already a follower node, and the subsequent first node receives multiple log entries sent by the new leader node. After the first node becomes the follower node, the first node can initialize the receiving window and start multiple threads of the receiving window for subsequent parallelism. Receive multiple log entries sent by the new leader node.
  • the first node determines the maximum submission index of the log entry that has completed the replication, and sets the entry index of the next log entry to be received as the first node. Increase its own maximum commit index by 1. For example, if the maximum commit index of the log entry that the first node has completed copying is index 56, the first node sets the index of the entry of the next log entry to be received to index 57, so as to receive the index 57 corresponding to the new leader node. Log entry.
  • Case 2 The second node switches the working state to the leadership state.
  • the second node When the second node switches to the leader state, that is, the second node becomes the leader node, the second node cleans up the unreplicated log entries and stops the parallel receiving step.
  • the second node since the second node receives the plurality of first log entries in parallel by using the multi-thread based on the receiving window, and the second node adds the log entries that have been received but not copied to the cache queue, the second node The two nodes clear the unreplicated log entries in the cache queue and close the multithreading of the receive window.
  • the second node has become the leader node, and subsequently sends multiple log entries to the follower nodes in the node cluster. After the second node becomes the leader node, the second node can initialize the send window and start multiple threads of the send window for subsequent follow-up. The node sends multiple log entries in parallel.
  • the second node since the second node does not know the progress of the replication log entries of each following node after becoming the leader node, and thus does not know the log index of the log entry to be sent to each following node, the second node has already according to its current Complete the maximum submission index of the copied log entry, set a confirmation list for the nodes in the node cluster that are in the following state, set the log index of each node in the following state to the maximum submission index of the second node itself plus 1 to trigger the follow-up
  • the node returns the log index of the log entry that has been successfully copied to the second node, so that the second node learns the replication progress of each of the following nodes in the node cluster, and sends a log entry for the following node according to the replication progress of each following node. For example, if the maximum submission index of the log entry that the second node has completed copying is index99, the second node sets the log index of each node in the following state to index100, and sends the index 100 corresponding to the following no
  • the first node sends a plurality of first log entries in parallel to the plurality of second nodes, and receives an acknowledgement response of at least one of the second nodes that carries the log index of the second node, so that a log index of the two nodes, sending at least one second log entry to the second node in parallel, because the plurality of first log entries are sent to the plurality of second nodes in parallel, and in order to avoid the second node from occurring due to network problems, such as leakage Or the case of receiving the log entry by mistake, the first node also provides a replenishment mechanism for the log entry, which saves the time for sending the first log entry while ensuring the consistency of the data of each node in the node cluster, and improves the time.
  • the efficiency of log entry replication is the efficiency of log entry replication.
  • FIG. 8 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • the apparatus includes a transmitting module 401, a receiving module 402, and an obtaining module 403.
  • the sending module 401 is configured to send a plurality of first log entries to the plurality of second nodes in parallel, where each first log entry is used to instruct the second node to copy the first log entry.
  • the receiving module 402 is configured to receive at least one acknowledgement response of any one of the second nodes, where each acknowledgement response carries a log index of the second node;
  • the obtaining module 403 is configured to obtain, according to a log index of the second node, at least one second log entry to be re-issued of the second node;
  • the sending module 401 is further configured to send at least one second log entry to the second node in parallel, where each second log entry is used to instruct the second node to copy the second log entry.
  • the first node sends a plurality of first log entries in parallel to the plurality of second nodes, and receives an acknowledgement response of at least one of the second nodes that carries the log index of the second node, so that a log index of the two nodes, sending at least one second log entry to the second node in parallel, because the plurality of first log entries are sent to the plurality of second nodes in parallel, and in order to avoid the second node from occurring due to network problems, such as leakage Or the case of receiving the log entry by mistake, the first node also provides a replenishment mechanism for the log entry, which saves the time for sending the first log entry while ensuring the consistency of the data of each node in the node cluster, and improves the time.
  • the efficiency of log entry replication is the efficiency of log entry replication.
  • the obtaining module 403 is configured to obtain, when the log index of the second node is smaller than the target threshold, the log between the target threshold and the maximum committed index of the first node as the second node to be re-issued At least one second log entry.
  • the obtaining module 403 is configured to: if the log index of the second node is smaller than the maximum commit index of the first node, log the log index between the second node and the maximum submitted index of the first node. Obtaining at least one second log entry to be re-issued as the second node.
  • the apparatus further includes an update module 404.
  • the update module 404 is configured to maintain a confirmation list for the second node.
  • an acknowledgment response is received, if the log index in the newly received acknowledgment response is greater than the log index of the second node in the acknowledgment list, the second node is updated.
  • the log index in the confirmation list is configured to maintain a confirmation list for the second node.
  • the obtaining module 403 is further configured to perform, according to the log index of the second node, the at least one second log entry to be re-issued of the second node, according to the target period.
  • the sending module 401 includes an obtaining submodule 4011 and a transmitting submodule 4012.
  • the obtaining sub-module 4011 is configured to acquire a sending window, where a window width of the sending window is used to indicate the number of log entries sent in parallel;
  • the sending sub-module 4012 is configured to send, by using multiple threads, a plurality of first log entries in the sending window.
  • the apparatus further includes a cleaning module 405 and a setting module 406.
  • the cleaning module 405 is configured to: when the first node switches to the following state, clear the unsent log entry, and stop the parallel sending step;
  • the setting module 406 is configured to set the next log index to be received to be the maximum submission index of the first node itself plus one according to the maximum submission index of the first node itself.
  • FIG. 12 is a schematic structural diagram of a log entry copying apparatus according to an embodiment of the present invention.
  • the apparatus includes a receiving module 501, a copying module 502, and a transmitting module 503.
  • the receiving module 501 is configured to receive, in parallel, a plurality of log entries sent by the first node, where each log entry is used to instruct the second node to record the log entries;
  • the copying module 502 is configured to copy the log entry to the log of the second node according to the entry index of each log entry, and update the log index of the second node;
  • the sending module 503 is configured to send an acknowledgment response to the first node, and the acknowledgment response carries the log index of the second node.
  • the first node sends a plurality of first log entries in parallel to the plurality of second nodes, and receives an acknowledgement response of at least one of the second nodes that carries the log index of the second node, so that a log index of the two nodes, sending at least one second log entry to the second node in parallel, because the plurality of first log entries are sent to the plurality of second nodes in parallel, and in order to avoid the second node from occurring due to network problems, such as leakage Or the case of receiving the log entry by mistake, the first node also provides a replenishment mechanism for the log entry, which saves the time for sending the first log entry while ensuring the consistency of the data of each node in the node cluster, and improves the time.
  • the efficiency of log entry replication is the efficiency of log entry replication.
  • the replication module 502 is configured to: when the log entry is duplicated and the received log entry does not include a target log entry that is consecutive to the entry index of the log entry, the received log Log entries that are not replicated in the entry are stored in the cache queue until the destination log entry is received.
  • the apparatus further includes a counting module 504.
  • the counting module 504 is configured to count based on an entry index of the received log entry
  • the sending module 503 is further configured to: when the number of times of receiving the log entry is greater than or equal to two times, perform the step of sending an acknowledgement response to the first node.
  • the apparatus further includes a cleaning module 505 and a setting module 506.
  • the cleaning module 505 is configured to: when the second node switches to the leader state, clean the unreplicated log entries, and stop the parallel receiving step;
  • the setting module 506 is configured to set a confirmation list for the nodes in the node cluster that are in the following state according to the maximum submission index of the second node itself, and the log index of each node is equal to the maximum submission index of the second node itself plus one.
  • the device for copying the log entries provided by the foregoing embodiment only exemplifies the division of the foregoing functional modules when copying the log entries.
  • the functions may be allocated by different functional modules according to requirements. Completion, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the apparatus for copying the log entries provided in the foregoing embodiment is the same as the method embodiment of the method for copying the log entries. For the specific implementation process, refer to the method embodiment, and details are not described herein again.
  • FIG. 15 is a schematic structural diagram of a computer device 1500 according to an embodiment of the present invention.
  • the computer device 1500 may have a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 1501 and one. Or more than one memory 1502, wherein the memory 1502 stores at least one instruction, and in one possible implementation, the at least one instruction is loaded and executed by the processor 1501 to implement the application provided by the various method embodiments described above.
  • the method steps are as follows:
  • each acknowledgement response carrying a log index of the second node
  • each second log entry being used to instruct the second node to copy the second log entry.
  • the processor 1501 is configured to execute:
  • the log between the target threshold and the maximum committed index of the first node is obtained as at least one second log entry of the second node to be re-issued.
  • the processor 1501 is configured to execute:
  • the log index of the second node is smaller than the maximum commit index of the first node, the log between the log index of the second node and the maximum submitted index of the first node is obtained as the second node to be re-issued At least one second log entry.
  • the processor 1501 is further configured to:
  • the step of acquiring at least one second log entry to be re-sent of the second node according to the log index of the second node is performed every target period.
  • the processor 1501 is configured to execute:
  • Multiple first log entries in the send window are sent using multiple threads.
  • the processor 1501 is further configured to:
  • the next log index to be received is set to be the maximum submission index of the first node itself plus one.
  • the at least one instruction is loaded by the processor 1501 and executed to implement the method step of the log entry replication method provided by the foregoing method embodiments, and the method is applied to the running in the node cluster.
  • the second node further includes a first node running in a leadership state, and the method steps implemented by the processor 1501 when executing the instruction are as follows:
  • the acknowledgment response carrying a log index of the second node.
  • the processor 1501 is configured to execute:
  • the unreplicated log entry in the received log entry is stored in the cache queue Continue to copy until the target log entry is received.
  • the processor 1501 is further configured to:
  • a step of transmitting an acknowledgment response to the first node is performed.
  • the processor 1501 is further configured to:
  • a confirmation list is set for the node in the node cluster in the following state, and the log index of each node is equal to the maximum submission index of the second node itself plus 1.
  • the computer device 1500 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output.
  • the computer device may also include other components for implementing the functions of the device, and details are not described herein.
  • a computer readable storage medium such as a memory comprising instructions executable by a processor to perform the log entry copying method of the above embodiments.
  • the computer readable storage medium may be a read-only memory (ROM), a random access memory (RAM), a compact disc read-on memory (CD-ROM), Tapes, floppy disks, and optical data storage devices.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明实施例公开了一种日志条目复制方法、装置、计算机设备及存储介质,属于区块链技术领域。第一节点向多个第二节点并行发送多个第一日志条目,还提供日志条目的补发机制,避免第二节点发生漏收或者多收等错误接收日志条目的情况,保证各个节点数据一致,节省了多个第一日志条目的发送时间,提高了日志条目复制的效率。

Description

日志条目复制方法、装置、计算机设备及存储介质
本申请要求于2017年10月12日提交的申请号为2017109490264、发明名称为“日志条目复制方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及区块链技术领域,特别涉及一种日志条目复制方法、装置、计算机设备及存储介质。
背景技术
随着信息技术的不断发展,为了提高了数据的安全性,确保数据的公开透明,目前通常基于节点集群为客户端提供数据记录服务。在基于节点集群为客户端提供数据记录服务时,节点集群中的各个节点所存储的日志均对应一条相同的区块链。当客户端需要将待记录数据添加至节点集群中各个节点的日志中时,实质上是将该待记录数据复制到节点上区块链中当前区块的下一区块中,由于已经存储至节点的区块链中的数据不可更改,可以有效地防止待记录数据被篡改,提高数据的安全性。
在进行数据记录时,节点集群中节点的工作状态可分为Follower(跟随状态)、Candidate(候选状态)和Leader(领导状态),也即是节点集群中的节点可以分为跟随节点、候选节点以及领导节点。参见图1,在已经确定节点A为领导节点,节点B、节点C和节点D为跟随节点的情况下,当客户端向节点A下发提交指令时,节点A将该提交指令添加至自身的日志中,形成日志条目,并向节点B、节点C和节点D广播日志条目。当接收到节点A广播的日志条目后,节点B、节点C和节点D向节点集群中的其他节点广播该日志条目的条目索引。对于节点B、节点C和节点D中的任一节点,当基于共识算法确定节点集群中的节点达成共识,则将该日志条目复制至其日志中,当复制完成时,更新节点自身的日志索引,并向节点A发送确认响应。当节点A基于共识算 法确定节点集群中的节点达成共识后,向客户端发送提交成功响应。
该节点集群对数据记录过程为串行进行,也即是,当存在多个客户端同时向节点集群下发日志条目时,节点集群中的节点需要等待当前客户端的数据记录过程完毕后,才可以处理下一个客户端的日志条目,导致日志条目复制的效率低。
发明内容
本发明实施例提供了一种日志条目复制方法、装置、计算机设备及存储介质,可以解决效率低的问题。所述技术方案如下:
一方面,提供了一种日志条目复制方法,所述方法应用于节点集群中的运行于领导状态的第一节点,所述节点集群中还包括运行于跟随状态的多个第二节点,所述方法包括:
向所述多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对所述第一日志条目进行复制;
接收任一个第二节点的至少一个确认响应,每个确认响应携带所述第二节点的日志索引;
根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目;
向所述第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对所述第二日志条目进行复制。
一方面,提供了一种日志条目复制方法,所述方法应用于节点集群中的运行于跟随状态的第二节点,所述节点集群中还包括运行于领导状态的第一节点,所述方法包括:
并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对所述日志条目进行记录;
按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中,并更新所述第二节点的日志索引;
向所述第一节点发送确认响应,所述确认响应携带所述第二节点的日志索引。
一方面,提供了一种日志条目复制装置,所述装置应用于节点集群中的运行于领导状态的第一节点,所述节点集群中还包括运行于跟随状态的多个第二节点,所述装置包括:
发送模块,用于向所述多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对所述第一日志条目进行复制;
接收模块,用于接收任一个第二节点的至少一个确认响应,每个确认响应携带所述第二节点的日志索引;
获取模块,用于根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目;
所述发送模块,还用于向所述第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对所述第二日志条目进行复制。
一方面,提供了一种日志条目复制装置,所述装置应用于节点集群中的运行于跟随状态的第二节点,所述节点集群中还包括运行于领导状态的第一节点,所述装置包括:
接收模块,用于并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对所述日志条目进行记录;
复制模块,用于按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中,并更新所述第二节点的日志索引;
发送模块,用于向所述第一节点发送确认响应,所述确认响应携带所述第二节点的日志索引。
一方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可执行指令,所述处理器执行所述计算机可执行指令时实现下述日志条目复制方法:向所述多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对所述第一日志条目进行复制;接收任一个第二节点的至少一个确认响应,每个确认响应携带所述第二节点的日志索引;根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目;向所述第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对所述第二日志条目进行复制。
一方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,所述指令被处理器执行以完成下述日志条目复制方法:向所述多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对所述第一日志条目进行复制;接收任一个第二节点的至少一个确认响应,每个确认响应携带所述第二节点的日志索引;根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目;向所述第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对所述第二日志条目进行复制。
一方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可执行指令,其中,所述处理器执行所述计算机可执行指令时实现下述日志条目复制方法:并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对所述日志条目进行记录;按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中,并更新所述第二节点的日志索引;向所述第一节点发送确认响应,所述确认响应携带所述第二节点的日志索引。
一方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,所述指令被处理器执行以完成下述日志条目复制方法:并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对所述日志条目进行记录;按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中,并更新所述第二节点的日志索引;向所述第一节点发送确认响应,所述确认响应携带所述第二节点的日志索引。
第一节点通过向多个第二节点并行发送多个第一日志条目,并接收任一个第二节点的至少一个携带第二节点的日志索引的确认响应,以便根据第二节点的日志索引,向第二节点并行发送至少一条第二日志条目,由于多个第一日志条目是并行发送至多个第二节点的,且为了避免由于网络问题导致第二节点发生诸如漏收或者多收等错误接收日志条目的情况,第一节点还提供了日志条目的补发机制,在保证节点集群内各个节点数据一致的同时,节省了发送多个第一日志条目的时间,提高了日志条目复制的效率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是现有的日志条目复制方法示意图;
图2是本发明实施例提供的日志条目复制方法的实施环境示意图;
图3是本发明实施例提供的日志条目复制方法流程图;
图4是本发明实施例提供的日志条目复制方法示意图;
图5是本发明实施例提供的日志条目复制方法示意图;
图6是本发明实施例提供的日志条目复制方法示意图;
图7是本发明实施例提供的日志条目复制方法示意图;
图8是本发明实施例提供的日志条目复制装置结构示意图;
图9是本发明实施例提供的日志条目复制装置结构示意图;
图10是本发明实施例提供的日志条目复制装置结构示意图;
图11是本发明实施例提供的日志条目复制装置结构示意图;
图12是本发明实施例提供的日志条目复制装置结构示意图;
图13是本发明实施例提供的日志条目复制装置结构示意图;
图14是本发明实施例提供的日志条目复制装置结构示意图;
图15是本发明实施例提供的计算机设备1500结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
在对本发明实施例进行详细的解释说明之前,先对本发明实施例涉及的节点运行的实施环境进行简单的介绍。
参见图2所示的节点运行的实施环境,该实施环境是由多个节点构成的系统,该系统也相当于一个节点集群,各个节点是指能够进行共识运算、存储数据、转发数据以及验证数据等行为的基本单元,节点可以为一台或多台计算机设备组成,根据该节点集群中各个节点的工作状态,可以将节点分为领导节点、 跟随节点和候选节点。如图2所示,节点A为该节点集群中的领导节点,节点B、节点C和节点D为该节点集群中的跟随节点,在节点A工作正常时,可以定时的向节点B、节点C和节点D广播心跳信息,节点B、节点C和节点D在接收到心跳信息时,可以确定节点A工作正常,并重置心跳定时器,等待接收下一个心跳信息。由于节点集群中可能存在入侵到节点集群中试图篡改跟随节点中数据的欺诈节点,为了防止欺诈节点伪装成领导节点向节点集群中的跟随节点发送伪心跳信息,领导节点在向跟随节点发送心跳信息时,通常在心跳信息中携带各个跟随节点在选举领导节点时的投票签名,以此证明自己为真实的领导节点。
在该节点集群为客户端提供数据记录服务时,客户端向节点集群中的领导节点下发多个提交指令,由领导节点将该多个提交指令添加至自身的日志中,形成多个日志条目,并将该多个日志条目并行发送至节点集群中的各个跟随节点。需要说明的是,为了避免欺诈节点对领导节点发送的多个日志条目篡改,领导节点在向跟随节点发送日志条目时,可以基于算法签名对日志条目进行加密,例如,可以基于RSA(Ron Rivest AdiShamir Leonard Adleman,非对称加密)算法对日志条目进行加密。当跟随节点并行接收到该多个日志条目后,会按照每个日志条目的条目索引对日志条目进行复制,并当日志条目复制完成后,向领导节点返回携带日志索引的确认响应,以使领导节点获知跟随节点当前已经完成日志条目的复制。在一种可能实现方式中,该节点集群可为底层基于区块链技术的数据共享系统,每个节点上存储有区块链,区块链由多个区块链接起来形成,该日志条目可为客户端提交的记录数据,节点集群中的每个节点所存储的日志可以对应一条区块链,当添加记录数据至日志中时形成日志条目的过程,实际上是将该记录数据以区块的形式存储为当前区块的下一区块中的过程。需要说明的是,该记录数据可以为交易数据等。
为了便于理解后续实施例,在此先对一些名词进行解释:
条目索引(block index或block hight),用于指示日志条目,也即是用于指示一个区块;
日志索引(commit index),用于指示一个节点在当前运行周期(term)内,当前对日志条目的复制情况;
最大提交索引(Last append index),用于指示一个节点已提交的日志条目。
图3是本发明实施例提供的日志条目复制方法流程图,为了便于描述,在本发明实施例中,第一节点是指节点集群中的运行于领导状态的节点,也即领导节点;第二节点是指节点集群中的运行于跟随状态的节点,也即跟随节点,如图3所示,该方法包括以下步骤。
300、第一节点接收客户端下发的多个提交指令,将该多个提交指令添加至第一节点的日志中,形成多个第一日志条目。
该提交指令用于向节点集群提交记录数据,使得节点集群能够将该记录数据记录在各个节点上。而对于节点集群所服务的任一个客户端来说,其可以发送多个提交指令,则该步骤300中可以是由该节点集群所服务的一个客户端向第一节点发送多个提交指令,也可以是由多个客户端向第一节点发送多个提交指令,本发明实施例对此不做限定。
301、第一节点获取发送窗口,采用多线程向多个第二节点并行发送该发送窗口内的多个第一日志条目,每个第一日志条目用于指示第二节点对第一日志条目按照条目索引进行复制。
在本发明实施例中,为了实现将多个日志条目并行发送至多个第二节点,可以设置发送窗口,并基于发送窗口发送多个第一日志条目。发送窗口的大小由网络带宽决定,大小可以调节,发送窗口的窗口宽度用于指示并行发送的日志条目的数量,通常情况下可以基于下述公式1获取发送窗口的窗口宽度。
公式1:网络带宽=多个日志条目的数据大小*窗口宽度*节点个数
第一节点在基于发送窗口发送多个第一日志条目时,可以基于多线程,并在发送队列中的每一个线程中添加一条第一日志条目,通过该多线程对第一日志条目进行无序的发送,以实现并行发送。例如,参见图4所示的发送窗口,发送窗口的发送队列为Queue(队列),Queue中包括4个线程,分别为Thread1、Thread2、Thread3和Thread4,该4个线程均包括一条第一日志条目,第一日志条目包括Sign(签名)和Send(发送)标识。这种并行发送的方式可以称作并行流水线,由于采用了无序发送的方式,避免了串行发送日志条目时所需的发送耗时,以发送一条日志条目需要10ms为例,如果采用串行发送,则每秒仅能够发送100条,而采取上述并行发送的方式,则可以达到对带宽最大利用的目的,假如发送窗口为50条,则并行发送50条仅需要10ms左右,发送100条的时间能够降低到20ms左右,能够将整体发送耗时可以降低几十倍,这种发送方式大大的提高了数据的吞吐量。
需要说明的是,由于第一节点并行发送多个第一日志条目,第二节点接收到的多个第一日志条目可能为乱序的状态,为了保证第二节点可以按顺序复制多个数据条目,第一日志条目中还会携带用于指示日志条目复制顺序的条目索引,以便第二节点在接收到乱序的日志条目时,仍可以按顺序复制日志条目。
在一种可能实现方式中,第一节点在向多个第二节点并行发送该多个第一日志条目时,也可以不采用发送窗口以及多线程,仅需同时向多个第二节点并行发送多个第一日志条目。本发明实施例对第一节点并行发送多个第一日志条目的方式不进行具体限定。
另外,在本发明实施例中,发送窗口为第一节点在接收到客户端下发的多个提交指令后实时获取的。在获取发送窗口时,可以先对网络当前的可用带宽进行检测,再根据可用带宽获取发送窗口。在一种可能实现方式中,也可以预基于网络最大带宽配置发送窗口,以便在后续的并行发送过程中直接使用。本发明实施例对第一节点获取发送窗口的时机不进行具体限定。
302、第二节点并行接收第一节点发送的多个第一日志条目。
在本发明实施例中,第二节点可以设置接收窗口,并基于接收窗口接收多个第一日志条目。接收窗口的大小可由网络带宽决定,大小可以调节,接收窗口的窗口宽度用于指示并行接收的日志条目的数量。其中,接收窗口的窗口宽度的获取方法与上述步骤301中发送窗口的窗口宽度的获取方法一致,此处不再进行赘述。
第二节点在基于接收窗口接收多个第一日志条目时,可以基于多线程组成接收窗口的接收队列。其中,接收队列可为Concurrent Skip List Map(并行跳表)的形式,本发明实施例对接收队列的形式不进行具体限定。对于接收到的多个第一日志条目中的每一个第一日志条目,第二节点可获取第一日志条目中的签名,基于第二节点上所存储的第一节点公钥,对该签名进行校验,以防止接收到欺诈节点发送的欺诈数据。例如,参见图5所示的接收窗口,接收窗口的接收队列为Concurrent Skip List Map,Concurrent Skip List Map中包括4个线程,分别为Thread5、Thread6、Thread7和Thread8,该4个线程均包括一条第一日志条目,第一日志条目包括Sign和Send标识。
本发明实施例以节点集群中的一个第二节点为例进行说明,而在一种可能实现方式中,节点集群中存在多个第二节点,且多个第二节点上均执行本发明实施例中所示的操作。需要说明的是,由于网络中存在延迟且可能发生数据丢 失的状况,第一节点并行发送的多个第一日志条目可能不会同时到达第二节点,也即第二节点可能不会同时接收到第一节点发送的全部第一日志条目;进一步地,如果发生数据丢失的情况,则第二节点可能不会接收到第一节点发送的全部第一日志条目,因此,该步骤302中第二节点接收到的第一日志条目的数量与步骤301中第一节点发送的第一日志条目的数量之间可能存在差异。
303、第二节点按照每个日志条目的条目索引将日志条目复制至日志中,当复制完任一个日志条目且已接收到的日志条目中不包括与日志条目的条目索引连续的目标日志条目时,将已接收到的日志条目中未复制的日志条目存储至缓存队列中。
发明人认识到,由于第一节点在向第二节点发送多个第一日志条目时采用的是并行发送方式,第二节点接收到的多个第一日志条目很可能为乱序的状态。第二节点在对多个第一日志条目中的日志条目进行复制时,通常需要按顺序进行串行复制,因此,每个日志条目均对应一个条目索引(也即是区块高度:Block Height),使得第二节点在接收到乱序到达的第一日志条目时,仍可以按照条目索引正确的复制第一日志条目,以使得第二节点的区块链与第一节点的区块链能够保持数据一致。
另外,由于第二节点按顺序复制日志条目的机制,若第二节点即将复制的下一个日志条目尚未接收到,也即目标日志条目尚未接收到,该目标日志条目为即将复制的下一个日志条目。第二节点便无法继续日志条目的复制操作,需要等待接收到该目标日志条目才可以继续工作,而此时第二节点仍会接收到并行到达的其他第一日志条目,为了避免接收第一日志条目与等待接收目标日志条目的过程之间发生影响,在第二节点中增加缓存队列。缓存队列用于存储第二节点已经接收到、但是暂时由于条目索引与当前已经复制完成的日志条目的条目索引不连续而未复制的日志条目。这样,当第二节点在后续接收到条目索引与当前已经复制完成的日志条目的条目索引连续的目标日志条目时,可以在缓存队列中提取未复制的日志条目继续复制,在保证对日志条目的按顺序复制的同时,也不影响对第一日志条目的接收,因而,提高了日志条目的复制效率。
304、当第二节点复制完任一个日志条目时,更新日志索引,向第一节点发送确认响应,确认响应携带第二节点的日志索引。
在本发明实施例中,该第二节点的日志索引事实上是第二节点的最新日志索引,用于指示该第二节点最新存储的日志。该最新日志索引可以表征一个节 点上的日志完整性。对于任一个第二节点来说,由于其在接收到了日志条目后,会向其他第二节点广播接收到的日志条目的条目索引,以便与节点集群中的其他第二节点达成共识,当基于共识算法确定节点集群中的节点达成共识时,第二节点将该日志条目复制到自己的日志,并将自身的日志索引进行更新。例如,第二节点在完成将日志条目复制到自己的日志后,可以在自身当前的日志索引上加1,以得到最新日志索引,并在后续将携带最新日志索引的确认响应发送至第一节点。其中,日志索引可以用Commit_index表示。
需要说明的是,由于网络中存在波动,第二节点接收第一日志条目可能发生延迟,导致第二节点不能及时向第一节点返回确认响应,此时,第一节点有可能认为第一日志条目发生丢包,导致第二节点没有及时返回确认响应。这样,第一节点便会向第二节点再次下发该第一日志条目。对于第二节点来说,第二节点有可能多次接收到同一条日志条目。为了及时告知第一节点当前已经多次接收到相同日志条目,避免第一节点多次补发造成带宽的占用,第二节点在接收到日志条目时,会对日志条目的条目索引进行计数,并当任一个日志条目的接收次数大于或等于两次时,再次向第一节点发送携带第二节点的日志索引的确认响应,以使第一节点获知当前第二节点未发生丢包现象,无需再次进行补发。在一种可能实现方式中,第二节点在对日志条目的条目索引进行计数时,可以基于收包计数器实现。参见图6,第二节点中存在用于对日志条目的条目索引进行计数的Receive Counter(收包计数器),第一节点发送的第一日志条目分别为条目1、条目2、条目3、条目4和条目5,当第二节点接收到上述5个第一日志条目时,第二节点中的Receive Counter便会对应5个第一日志条目的接收次数增加1。例如,第二节点接收到一个条目1,则该条目1对应的Receive Counter增加1,如果第二节点再次接收到一个条目1,则该条目1对应的Receive Counter再增加1,同理地,第二节点接收到一个条目2,则该条目2对应的Receive Counter增加1。
305、第一节点接收任一个第二节点的至少一个确认响应,为第二节点维护确认列表,每接收到一个确认响应时,如果新接收到的确认响应中的日志索引大于第二节点在确认列表中的日志索引,更新第二节点的确认列表中的日志索引。
在本发明实施例中,为了判断是否需要向第二节点补发日志条目,第一节点可以维护确认列表,并在第二节点返回的确认响应中提取第二节点的日志索 引,基于日志索引对确认列表进行实时更新。其中,第一节点可为每一个第二节点维护一个与其对应的确认列表,在第二节点返回的确认响应中提取日志索引,并将日志索引添加至第二节点自身的确认列表中;或第二节点可以仅维护一个确认列表,将全部第二节点返回的确认响应中的日志索引均存储至该确认列表,并实时对该确认列表进行更新。其中,确认列表可以用Commit_table表示。
由于第二节点在进行日志条目的复制时会按条目索引进行复制,第二节点向第一节点返回的确认响应中携带的日志索引是递增的,也即随着时间的推移,对于同一个第二节点,第一节点接收到的该第二节点返回的确认响应中携带的日志索引会越来越大,第一节点每接收到一个确认响应时,便会将该确认响应中携带的日志索引与确认列表中第二节点当前的日志索引进行比对,如果新接收到的日志索引大于确认列表中第二节点当前的日志索引,则将新接收到的日志索引更新至确认列表中第二节点对应的日志索引。例如,对于节点A,若当前确认列表中节点A对应的日志索引为index3,若第一节点此时接收到节点A返回的携带index6的确认响应,则第一节点将确认列表中节点A对应的日志索引更新为index6。
需要说明的是,在一种可能实现方式中,第一节点也可以不维护确认列表,仅将第二节点的节点标识与当前接收到的第二节点返回的日志索引对应记录,并在接收到第二节点新的确认响应时,将记录中第二节点的节点标识对应的日志索引删除,并将第二节点的节点标识与新的确认响应中携带的日志索引对应记录即可,也即是,第一节点可以根据接收到的第二节点返回的日志索引,更新上述对应记录的第二节点的节点标识与日志索引中的日志索引。本发明实施例对记录第二节点返回的日志索引的方式不进行具体限定。
306、如果确认响应中第二节点的日志索引小于目标阈值,则第一节点将目标阈值与第一节点的最大提交索引之间的日志获取为第二节点的待补发的至少一条第二日志条目。
其中,目标阈值用于指示在正常情况下第二节点最低限度应该达到的日志索引。在本发明实施例中,为了及时对第二节点的丢包情况进行预测,并及时为第二节点补发第二节点可能丢失的日志条目,考虑到第二节点可能存在接收第一日志条目发生延迟的状况,第一节点中可以设置目标阈值,并每隔目标周期基于目标阈值对各个第二节点当前返回的确认响应中的日志索引进行检测, 以便判断当前是否需要向第二节点补发日志条目以及补发哪些日志条目。
第一节点在每隔目标周期基于目标阈值对各个第二节点当前返回的确认响应中的日志索引进行检测时,通常将第二节点当前返回的确认响应中的日志索引与目标阈值进行比对,如果第二节点当前返回的确认响应中的日志索引小于目标阈值,则第一节点在当前需要向第二节点发送的全部日志条目的条目索引中获取最大提交索引,将目标阈值与最大提交索引之间条目索引对应的日志条目获取为第二节点的待补发的至少一条第二日志条目。例如,设第一节点最大提交索引为index100,目标阈值为50,若第一节点当前获取的第二节点返回的确认响应中的日志索引为index39,由于39小于50,则第一节点将index50至index100之间的条目索引对应的日志条目作为第二日志条目。
在一种可能实现方式中,第一节点中也可以不设置目标阈值,这样第一节点可以每隔目标周期将确认列表中第二节点对应的日志索引与第一节点的最大提交索引进行比对,如果第二节点的日志索引小于第一节点的最大提交索引,则第一节点将第二节点的日志索引与第一节点的最大提交索引之间的条目索引对应的日志条目获取为第二节点的第二日志条目。例如,设第一节点的最大提交索引为index100,确认列表中第二节点对应的日志索引为index89,则第一节点可将index89至index100之间的条目索引对应的日志条目作为第二日志条目。这种获取第二日志条目的方式,可以提高补发的准确率。进一步地,上述过程通过每隔目标周期进行检测,给予各个第二节点一定的复制时间,避免由于过度检测而造成的数据不准确的情况,也可以避免由于实时检测而造成的对计算资源的过度占用。
307、第一节点向第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对第二日志条目进行复制。
在本发明实施例中,在确定需要向第二节点补发的第二日志条目后,第一节点便可以向第二节点并行发送第二日志条目。其中,第一节点向第二节点并行发送至少一个第二日志条目的过程与上述步骤301中所示的第一节点并行发送多个第一日志条目的过程一致,此处不再进行赘述。
308、第二节点接收第一节点发送的至少一条第二日志条目,在缓存队列中提取未复制的日志条目,按照每个日志条目的条目索引对日志条目进行复制。
在本发明实施例中,由于第二节点需要按照每个日志条目的条目索引对日 志条目进行复制,第一节点补发的第二日志条目的条目索引与当前已经复制完成的日志条目是连续的,且第二日志条目的条目索引与缓存队列中未复制的日志条目的条目索引也是连续的,这样,第二节点可以继续进行日志条目的复制工作。例如,若当前第二节点复制完成的日志条目的条目索引为3,缓存队列中存储有条目索引为5、6和7的未复制的日志条目,当第二节点接收到第一节点发送的第二日志条目后,在该第二日志条目中提取的日志条目的条目索引为4,则第二节点可以在缓存队列中将条目索引为5、6和7的未复制的日志条目提取出来,并继续按条目索引4、5、6、7的顺序复制对应的日志条目。
需要说明的是,第二节点接收第一节点发送的至少一条第二日志条目的过程以上述步骤302中所述的接收多个第一日志条目的过程一致,此处不再进行赘述。
在一种可能实现方式中,由于节点集群中的领导节点可能存在崩溃的情况,或节点集群发生分裂,当前的领导节点自动降级为跟随节点,节点集群中各个节点的工作状态是可以动态切换的,参见图7,本发明实施例提供了一种节点工作状态的切换示意图。在节点集群中,领导节点会定时向跟随节点发送心跳信息,对于节点集群中的跟随节点来说,当跟随节点的心跳定时器在发生超时的情况下仍没有接收到领导节点发送的心跳信息,则跟随节点可以确定领导节点发生崩溃,跟随节点便会将工作状态切换为候选状态,成为候选节点。进一步地,候选节点重置心跳定时器,并向节点集群中的跟随节点广播投票请求,当接收到节点集群中半数以上的投票确认消息时,该候选节点将工作状态切换为领导状态,成为领导节点。需要说明的是,由于领导节点承载着向跟随节点发送日志条目的责任,成为候选节点的跟随节点可以是节点集群中完成复制日志条目数量最多的节点。
在本发明实施例中,当节点集群中发生工作状态的切换时,一方面,第一节点可能会将自身的工作状态切换为跟随状态,成为跟随节点;另一方面,第二节点可能会将自身的工作状态切换为领导状态,成为领导节点。下面针对上述两种情况进行说明。
情况一、第一节点将工作状态切换为跟随状态。
当第一节点切换至跟随状态时,也即第一节点成为跟随节点,则第一节点清理未发送的日志条目,停止并行发送步骤。在一个具体的可能实施例中,由于第一节点基于发送窗口采用多线程并行发送多个第一日志条目,第一节点将 发送窗口尚未发送的日志条目清空,并将多线程关闭。第一节点已经为跟随节点,后续第一节点会接收新领导节点发送的多个日志条目,第一节点在成为跟随节点后,可初始化接收窗口,并启动接收窗口的多个线程,以便后续并行接收新领导节点发送的多个日志条目。
需要说明的是,由于第一节点中存在已经完成复制的日志条目,第一节点确定自身已经完成复制的日志条目的最大提交索引,将下一个需接收的日志条目的条目索引设置为第一节点自身的最大提交索引加1。例如,若第一节点当前已经完成复制的日志条目的最大提交索引为index56,则第一节点将下一个需接收的日志条目的条目索引设置为index57,以便接收新领导节点发送的携带index57对应的日志条目。
在这种场景下,一旦运行于领导状态的节点发生崩溃,上述清空尚未发送的日志条目以及关闭并行发送的多线程,可以避免该节点在节点集群里对其他节点的正常运行造成影响。
情况二、第二节点将工作状态切换为领导状态。
当第二节点切换至领导状态时,也即第二节点成为领导节点,则第二节点清理未复制的日志条目,停止并行接收步骤。在一个具体的可能实施例中,由于第二节点基于接收窗口采用多线程并行接收多个第一日志条目,且第二节点将当前已经接收到但未复制的日志条目添加至缓存队列中,第二节点将缓存队列中未复制的日志条目清空,并将接收窗口的多线程关闭。第二节点已经成为领导节点,后续会向节点集群中的跟随节点发送多个日志条目,第二节点在成为领导节点后,可初始化发送窗口,并启动发送窗口的多个线程,以便后续向跟随节点并行发送多个日志条目。
需要说明的是,由于第二节点在成为领导节点后并不知道各个跟随节点复制日志条目的进度,进而也不知道即将向各个跟随节点发送的日志条目的日志索引,第二节点根据自身当前已经完成复制的日志条目的最大提交索引,为节点集群中处于跟随状态的节点设置确认列表,将每个处于跟随状态的节点的日志索引设置为第二节点自身的最大提交索引加1,以便触发跟随节点将当前已经完成复制的日志条目的日志索引返回给第二节点,使得第二节点获知节点集群中各个跟随节点的复制进度,并根据各个跟随节点的复制进度为跟随节点发送日志条目。例如,若第二节点当前已经完成复制的日志条目的最大提交索引为index99,则第二节点将每个处于跟随状态的节点的日志索引设置为 index100,并向多个跟随节点发送携带index100对应的日志条目。
在本发明实施例中,第一节点通过向多个第二节点并行发送多个第一日志条目,并接收任一个第二节点的至少一个携带第二节点的日志索引的确认响应,以便根据第二节点的日志索引,向第二节点并行发送至少一条第二日志条目,由于多个第一日志条目是并行发送至多个第二节点的,且为了避免由于网络问题导致第二节点发生诸如漏收或者多收等错误接收日志条目的情况,第一节点还提供了日志条目的补发机制,在保证节点集群内各个节点数据一致的同时,节省了发送多个第一日志条目的时间,提高了日志条目复制的效率。
图8是本发明实施例提供的日志条目复制装置结构示意图。参照图8,该装置包括发送模块401,接收模块402和获取模块403。
该发送模块401,用于向多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对第一日志条目进行复制;
该接收模块402,用于接收任一个第二节点的至少一个确认响应,每个确认响应携带第二节点的日志索引;
该获取模块403,用于根据第二节点的日志索引,获取第二节点的待补发的至少一条第二日志条目;
该发送模块401,还用于向第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对第二日志条目进行复制。
在本发明实施例中,第一节点通过向多个第二节点并行发送多个第一日志条目,并接收任一个第二节点的至少一个携带第二节点的日志索引的确认响应,以便根据第二节点的日志索引,向第二节点并行发送至少一条第二日志条目,由于多个第一日志条目是并行发送至多个第二节点的,且为了避免由于网络问题导致第二节点发生诸如漏收或者多收等错误接收日志条目的情况,第一节点还提供了日志条目的补发机制,在保证节点集群内各个节点数据一致的同时,节省了发送多个第一日志条目的时间,提高了日志条目复制的效率。
在另一个实施例中,该获取模块403,用于如果第二节点的日志索引小于目标阈值,则将目标阈值与第一节点的最大提交索引之间的日志获取为第二节点的待补发的至少一条第二日志条目。
在另一个实施例中,该获取模块403,用于如果第二节点的日志索引小于第一节点的最大提交索引,则将第二节点的日志索引与第一节点的最大提交索 引之间的日志获取为第二节点的待补发的至少一条第二日志条目。
在另一个实施例中,参见图9,该装置还包括更新模块404。
该更新模块404,用于为第二节点维护确认列表,每接收到一个确认响应时,如果新接收到的确认响应中的日志索引大于第二节点在确认列表中的日志索引,更新第二节点的确认列表中的日志索引;
该获取模块403,还用于每隔目标周期,执行根据第二节点的日志索引,获取第二节点的待补发的至少一条第二日志条目的步骤。
在另一个实施例中,参见图10,该发送模块401,包括获取子模块4011和发送子模块4012。
该获取子模块4011,用于获取发送窗口,发送窗口的窗口宽度用于指示并行发送的日志条目的数量;
该发送子模块4012,用于采用多线程发送该发送窗口内的多个第一日志条目。
在另一个实施例中,参见图11,该装置还包括清理模块405和设置模块406。
该清理模块405,用于当第一节点切换至跟随状态时,清理未发送的日志条目,停止并行发送步骤;
该设置模块406,用于根据第一节点自身的最大提交索引,将下一个需接收的日志索引设置为第一节点自身的最大提交索引加1。
图12是本发明实施例提供的日志条目复制装置结构示意图。参照图12,该装置包括接收模块501,复制模块502和发送模块503。
该接收模块501,用于并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对日志条目进行记录;
该复制模块502,用于按照每个日志条目的条目索引将日志条目复制至第二节点的日志中,并更新第二节点的日志索引;
该发送模块503,用于向第一节点发送确认响应,确认响应携带第二节点的日志索引。
在本发明实施例中,第一节点通过向多个第二节点并行发送多个第一日志条目,并接收任一个第二节点的至少一个携带第二节点的日志索引的确认响应,以便根据第二节点的日志索引,向第二节点并行发送至少一条第二日志条 目,由于多个第一日志条目是并行发送至多个第二节点的,且为了避免由于网络问题导致第二节点发生诸如漏收或者多收等错误接收日志条目的情况,第一节点还提供了日志条目的补发机制,在保证节点集群内各个节点数据一致的同时,节省了发送多个第一日志条目的时间,提高了日志条目复制的效率。
在另一个实施例中,该复制模块502,用于当复制完任一个日志条目且已接收到的日志条目中不包括与日志条目的条目索引连续的目标日志条目时,将已接收到的日志条目中未复制的日志条目存储至缓存队列中,直到接收到目标日志条目时再继续进行复制。
在另一个实施例中,参见图13,该装置还包括计数模块504。
该计数模块504,用于基于接收到的日志条目的条目索引进行计数;
该发送模块503,还用于当任一个日志条目的接收次数大于或等于两次,则执行向第一节点发送确认响应的步骤。
在另一个实施例中,参见图14,该装置还包括清理模块505和设置模块506。
该清理模块505,用于当第二节点切换至领导状态时,清理未复制的日志条目,停止并行接收步骤;
该设置模块506,用于根据第二节点自身的最大提交索引,为节点集群中处于跟随状态的节点设置确认列表,每个节点的日志索引等于第二节点自身的最大提交索引加1。
需要说明的是:上述实施例提供的日志条目复制的装置在复制日志条目时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的日志条目复制的装置与日志条目复制的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图15是本发明实施例提供的计算机设备1500结构示意图,该计算机设备1500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)1501和一个或一个以上的存储器1502,其中,该存储器1502中存储有至少一条指令,在一种可能实现方式中,该至少一条指令由该处理器1501加载并执行以实现上述各个方法实施例提供的应用 于节点集群中的运行于领导状态的第一节点的日志条目复制方法中的方法步骤,该节点集群中还包括运行于跟随状态的多个第二节点,具体地,该处理器1501执行指令时实现的方法步骤如下:
向该多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对该第一日志条目进行复制;
接收任一个第二节点的至少一个确认响应,每个确认响应携带该第二节点的日志索引;
根据该第二节点的日志索引,获取该第二节点的待补发的至少一条第二日志条目;
向该第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对该第二日志条目进行复制。
可选地,该处理器1501用于执行:
如果该第二节点的日志索引小于目标阈值,则将该目标阈值与该第一节点的最大提交索引之间的日志获取为该第二节点的待补发的至少一条第二日志条目。
可选地,该处理器1501用于执行:
如果该第二节点的日志索引小于该第一节点的最大提交索引,则将该第二节点的日志索引与该第一节点的最大提交索引之间的日志获取为该第二节点的待补发的至少一条第二日志条目。
可选地,该处理器1501还用于执行:
为该第二节点维护确认列表,每接收到一个确认响应时,如果新接收到的确认响应中的日志索引大于该第二节点在该确认列表中的日志索引,更新该第二节点的确认列表中的日志索引;
每隔目标周期,执行根据该第二节点的日志索引,获取该第二节点的待补发的至少一条第二日志条目的步骤。
可选地,该处理器1501用于执行:
获取发送窗口,该发送窗口的窗口宽度用于指示并行发送的日志条目的数量;
采用多线程发送该发送窗口内的多个第一日志条目。
可选地,该处理器1501还用于执行:
当该第一节点切换至跟随状态时,清理未发送的日志条目,停止并行发送 步骤;
根据该第一节点自身的最大提交索引,将下一个需接收的日志索引设置为该第一节点自身的最大提交索引加1。
在另一种可能实现方式中,该至少一条指令由该处理器1501加载并执行以实现上述各个方法实施例提供的日志条目复制方法的方法步骤,该方法应用于节点集群中的运行于跟随状态的第二节点,该节点集群中还包括运行于领导状态的第一节点,该处理器1501执行指令时实现的方法步骤如下:
并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对该日志条目进行记录;
按照每个日志条目的条目索引将日志条目复制至该第二节点的日志中,并更新该第二节点的日志索引;
向该第一节点发送确认响应,该确认响应携带该第二节点的日志索引。
可选地,该处理器1501用于执行:
当复制完任一个日志条目且已接收到的日志条目中不包括与该日志条目的条目索引连续的目标日志条目时,将该已接收到的日志条目中未复制的日志条目存储至缓存队列中,直到接收到该目标日志条目时再继续进行复制。
可选地,该处理器1501还用于执行:
基于接收到的日志条目的条目索引进行计数;
当任一个日志条目的接收次数大于或等于两次,则执行向该第一节点发送确认响应的步骤。
可选地,该处理器1501还用于执行:
当该第二节点切换至领导状态时,清理未复制的日志条目,停止并行接收步骤;
根据该第二节点自身的最大提交索引,为该节点集群中处于跟随状态的节点设置确认列表,每个节点的日志索引等于该第二节点自身的最大提交索引加1。
当然,该计算机设备1500还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的 存储器,上述指令可由处理器执行以完成上述实施例中的日志条目复制方法。例如,该计算机可读存储介质可以是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (22)

  1. 一种日志条目复制方法,其中,所述方法应用于节点集群中的运行于领导状态的第一节点,所述节点集群中还包括运行于跟随状态的多个第二节点,所述方法包括:
    向所述多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对所述第一日志条目进行复制;
    接收任一个第二节点的至少一个确认响应,每个确认响应携带所述第二节点的日志索引;
    根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目;
    向所述第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对所述第二日志条目进行复制。
  2. 根据权利要求1所述的方法,其中,所述根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目包括:
    如果所述第二节点的日志索引小于目标阈值,则将所述目标阈值与所述第一节点的最大提交索引之间的日志获取为所述第二节点的待补发的至少一条第二日志条目。
  3. 根据权利要求1所述的方法,其中,所述根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目包括:
    如果所述第二节点的日志索引小于所述第一节点的最大提交索引,则将所述第二节点的日志索引与所述第一节点的最大提交索引之间的日志获取为所述第二节点的待补发的至少一条第二日志条目。
  4. 根据权利要求1至3任一项所述的方法,其中,所述接收任一个第二节点的至少一个确认响应之后,所述方法还包括:
    为所述第二节点维护确认列表,每接收到一个确认响应时,如果新接收到的确认响应中的日志索引大于所述第二节点在所述确认列表中的日志索引,更 新所述第二节点的确认列表中的日志索引;
    每隔目标周期,执行根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目的步骤。
  5. 根据权利要求1至3任一项所述的方法,其中,所述向所述多个第二节点并行发送多个第一日志条目包括:
    获取发送窗口,所述发送窗口的窗口宽度用于指示并行发送的日志条目的数量;
    采用多线程发送所述发送窗口内的多个第一日志条目。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    当所述第一节点切换至跟随状态时,清理未发送的日志条目,停止并行发送步骤;
    根据所述第一节点自身的最大提交索引,将下一个需接收的日志索引设置为所述第一节点自身的最大提交索引加1。
  7. 一种日志条目复制方法,其中,所述方法应用于节点集群中的运行于跟随状态的第二节点,所述节点集群中还包括运行于领导状态的第一节点,所述方法包括:
    并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对所述日志条目进行记录;
    按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中,并更新所述第二节点的日志索引;
    向所述第一节点发送确认响应,所述确认响应携带所述第二节点的日志索引。
  8. 根据权利要求7所述的方法,其中,所述按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中包括:
    当复制完任一个日志条目且已接收到的日志条目中不包括与所述日志条目的条目索引连续的目标日志条目时,将所述已接收到的日志条目中未复制的日志条目存储至缓存队列中,直到接收到所述目标日志条目时再继续进行复制。
  9. 根据权利要求7所述的方法,其中,所述方法还包括:
    基于接收到的日志条目的条目索引进行计数;
    当任一个日志条目的接收次数大于或等于两次,则执行向所述第一节点发送确认响应的步骤。
  10. 根据权利要求7所述的方法,其中,所述方法还包括:
    当所述第二节点切换至领导状态时,清理未复制的日志条目,停止并行接收步骤;
    根据所述第二节点自身的最大提交索引,为所述节点集群中处于跟随状态的节点设置确认列表,每个节点的日志索引等于所述第二节点自身的最大提交索引加1。
  11. 一种计算机设备,其中,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可执行指令,所述处理器执行所述计算机可执行指令时实现日志条目复制方法中的方法步骤,所述方法应用于节点集群中的运行于领导状态的第一节点,所述节点集群中还包括运行于跟随状态的多个第二节点,所述处理器执行指令时实现的方法步骤包括:
    向所述多个第二节点并行发送多个第一日志条目,每个第一日志条目用于指示第二节点对所述第一日志条目进行复制;
    接收任一个第二节点的至少一个确认响应,每个确认响应携带所述第二节点的日志索引;
    根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目;
    向所述第二节点并行发送至少一条第二日志条目,每个第二日志条目用于指示第二节点对所述第二日志条目进行复制。
  12. 根据权利要求11所述的计算机设备,其中,所述处理器用于执行:
    如果所述第二节点的日志索引小于目标阈值,则将所述目标阈值与所述第一节点的最大提交索引之间的日志获取为所述第二节点的待补发的至少一条第二日志条目。
  13. 根据权利要求11所述的计算机设备,其中,所述处理器用于执行:
    如果所述第二节点的日志索引小于所述第一节点的最大提交索引,则将所述第二节点的日志索引与所述第一节点的最大提交索引之间的日志获取为所述第二节点的待补发的至少一条第二日志条目。
  14. 根据权利要求11至13任一项所述的计算机设备,其中,所述处理器还用于执行:
    为所述第二节点维护确认列表,每接收到一个确认响应时,如果新接收到的确认响应中的日志索引大于所述第二节点在所述确认列表中的日志索引,更新所述第二节点的确认列表中的日志索引;
    每隔目标周期,执行根据所述第二节点的日志索引,获取所述第二节点的待补发的至少一条第二日志条目的步骤。
  15. 根据权利要求11至13任一项所述的计算机设备,其中,所述处理器用于执行:
    获取发送窗口,所述发送窗口的窗口宽度用于指示并行发送的日志条目的数量;
    采用多线程发送所述发送窗口内的多个第一日志条目。
  16. 根据权利要求11所述的计算机设备,其中,所述处理器还用于执行:
    当所述第一节点切换至跟随状态时,清理未发送的日志条目,停止并行发送步骤;
    根据所述第一节点自身的最大提交索引,将下一个需接收的日志索引设置为所述第一节点自身的最大提交索引加1。
  17. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有指令,所述指令被处理器执行以完成权利要求1-6任一项所述的日志条目复制方法。
  18. 一种计算机设备,其中,包括存储器、处理器及存储在存储器上并可 在处理器上运行的计算机可执行指令,所述处理器执行所述计算机可执行指令时实现日志条目复制方法中的方法步骤,所述方法应用于节点集群中的运行于跟随状态的第二节点,所述节点集群中还包括运行于领导状态的第一节点,所述处理器执行指令时实现的方法步骤包括:
    并行接收第一节点发送的多个日志条目,每个日志条目用于指示第二节点对所述日志条目进行记录;
    按照每个日志条目的条目索引将日志条目复制至所述第二节点的日志中,并更新所述第二节点的日志索引;
    向所述第一节点发送确认响应,所述确认响应携带所述第二节点的日志索引。
  19. 根据权利要求18所述的计算机设备,其中,所述处理器用于执行:
    当复制完任一个日志条目且已接收到的日志条目中不包括与所述日志条目的条目索引连续的目标日志条目时,将所述已接收到的日志条目中未复制的日志条目存储至缓存队列中,直到接收到所述目标日志条目时再继续进行复制。
  20. 根据权利要求18所述的计算机设备,其中,所述处理器还用于执行:
    基于接收到的日志条目的条目索引进行计数;
    当任一个日志条目的接收次数大于或等于两次,则执行向所述第一节点发送确认响应的步骤。
  21. 根据权利要求18所述的计算机设备,其中,所述处理器还用于执行:
    当所述第二节点切换至领导状态时,清理未复制的日志条目,停止并行接收步骤;
    根据所述第二节点自身的最大提交索引,为所述节点集群中处于跟随状态的节点设置确认列表,每个节点的日志索引等于所述第二节点自身的最大提交索引加1。
  22. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有指令,所述指令被处理器执行以完成权利要求7-10任一项所述的日志条目复制方法。
PCT/CN2018/107512 2017-10-12 2018-09-26 日志条目复制方法、装置、计算机设备及存储介质 WO2019072085A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP18866263.9A EP3623963B1 (en) 2017-10-12 2018-09-26 Log entry duplication method and device, computer equipment, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710949026.4A CN107967291B (zh) 2017-10-12 2017-10-12 日志条目复制方法、装置、计算机设备及存储介质
CN201710949026.4 2017-10-12

Publications (1)

Publication Number Publication Date
WO2019072085A1 true WO2019072085A1 (zh) 2019-04-18

Family

ID=61997602

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107512 WO2019072085A1 (zh) 2017-10-12 2018-09-26 日志条目复制方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
EP (1) EP3623963B1 (zh)
CN (2) CN110377570B (zh)
WO (1) WO2019072085A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083192A (zh) * 2019-11-05 2020-04-28 北京字节跳动网络技术有限公司 数据共识方法、装置及电子设备
CN111555860A (zh) * 2020-04-09 2020-08-18 中信梧桐港供应链管理有限公司 一种区块链节点共识方法、装置、电子设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377570B (zh) * 2017-10-12 2021-06-11 腾讯科技(深圳)有限公司 节点切换方法、装置、计算机设备及存储介质
CN108769150B (zh) * 2018-05-14 2021-11-12 百度在线网络技术(北京)有限公司 区块链网络的数据处理方法、装置、集群节点和存储介质
CN109492049B (zh) * 2018-09-21 2021-05-04 上海点融信息科技有限责任公司 用于区块链网络的数据处理、区块生成及同步方法
CN110706101B (zh) * 2019-08-30 2021-06-29 创新先进技术有限公司 在区块链中并发执行交易的方法和装置
EP3682340A4 (en) 2019-09-12 2020-12-02 Advanced New Technologies Co., Ltd. LOG-STRUCTURED STORAGE SYSTEMS
WO2019228570A2 (en) * 2019-09-12 2019-12-05 Alibaba Group Holding Limited Log-structured storage systems
CN111431802B (zh) * 2020-03-25 2022-09-16 中国工商银行股份有限公司 区块链节点通信优化系统及方法
CN113609229B (zh) * 2021-08-18 2023-10-20 东北大学 Fabric区块链中的快速日志复制方法及装置
CN115437843B (zh) * 2022-08-25 2023-03-28 北京万里开源软件有限公司 一种基于多级分布式共识的数据库存储分区恢复方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367410B2 (en) * 2014-09-12 2016-06-14 Facebook, Inc. Failover mechanism in a distributed computing system
CN106484709A (zh) * 2015-08-26 2017-03-08 北京神州泰岳软件股份有限公司 一种日志数据的审计方法和审计装置
CN106777270A (zh) * 2016-12-28 2017-05-31 中国民航信息网络股份有限公司 一种基于提交点时间线同步的异构数据库复制并行执行系统及方法
CN107124305A (zh) * 2017-04-20 2017-09-01 腾讯科技(深圳)有限公司 节点设备运行方法及节点设备
CN107967291A (zh) * 2017-10-12 2018-04-27 腾讯科技(深圳)有限公司 日志条目复制方法、装置、计算机设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202075B (zh) * 2015-04-29 2021-02-19 中兴通讯股份有限公司 一种数据库主备切换的方法及装置
US10103801B2 (en) * 2015-06-03 2018-10-16 At&T Intellectual Property I, L.P. Host node device and methods for use therewith
CN106301853B (zh) * 2015-06-05 2019-06-18 华为技术有限公司 集群系统中节点的故障检测方法和装置
CN106060036B (zh) * 2016-05-26 2019-07-16 布比(北京)网络技术有限公司 去中心化共识方法及装置
CN106254100B (zh) * 2016-07-27 2019-04-16 腾讯科技(深圳)有限公司 一种数据容灾方法、装置和系统
CN107105032B (zh) * 2017-04-20 2019-08-06 腾讯科技(深圳)有限公司 节点设备运行方法及节点设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367410B2 (en) * 2014-09-12 2016-06-14 Facebook, Inc. Failover mechanism in a distributed computing system
CN106484709A (zh) * 2015-08-26 2017-03-08 北京神州泰岳软件股份有限公司 一种日志数据的审计方法和审计装置
CN106777270A (zh) * 2016-12-28 2017-05-31 中国民航信息网络股份有限公司 一种基于提交点时间线同步的异构数据库复制并行执行系统及方法
CN107124305A (zh) * 2017-04-20 2017-09-01 腾讯科技(深圳)有限公司 节点设备运行方法及节点设备
CN107967291A (zh) * 2017-10-12 2018-04-27 腾讯科技(深圳)有限公司 日志条目复制方法、装置、计算机设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083192A (zh) * 2019-11-05 2020-04-28 北京字节跳动网络技术有限公司 数据共识方法、装置及电子设备
CN111555860A (zh) * 2020-04-09 2020-08-18 中信梧桐港供应链管理有限公司 一种区块链节点共识方法、装置、电子设备及存储介质
CN111555860B (zh) * 2020-04-09 2023-04-21 中信梧桐港供应链管理有限公司 一种区块链节点共识方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN107967291B (zh) 2019-08-13
CN110377570B (zh) 2021-06-11
EP3623963A4 (en) 2020-08-05
EP3623963A1 (en) 2020-03-18
CN110377570A (zh) 2019-10-25
CN107967291A (zh) 2018-04-27
EP3623963B1 (en) 2023-11-01

Similar Documents

Publication Publication Date Title
WO2019072085A1 (zh) 日志条目复制方法、装置、计算机设备及存储介质
JP6677759B2 (ja) 拡張縮小可能なログベーストランザクション管理
US11995066B2 (en) Multi-database log with multi-item transaction support
AU2019200967B2 (en) Multi-database log with multi-item transaction support
US9619544B2 (en) Distributed state management using dynamic replication graphs
US9619278B2 (en) Log-based concurrency control using signatures
US9529882B2 (en) Coordinated suspension of replication groups
US10282228B2 (en) Log-based transaction constraint management
EP2820531B1 (en) Interval-controlled replication
Sadi et al. Communication-aware approaches for transparent checkpointing in cloud computing
KR20140140973A (ko) 가상 머신 기반의 무중단 시스템 및 상기 시스템에서의 패킷 중재 방법

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018866263

Country of ref document: EP

Effective date: 20191213

NENP Non-entry into the national phase

Ref country code: DE