WO2022033290A1 - 强一致存储系统、数据强一致存储方法、服务器及介质 - Google Patents

强一致存储系统、数据强一致存储方法、服务器及介质 Download PDF

Info

Publication number
WO2022033290A1
WO2022033290A1 PCT/CN2021/108190 CN2021108190W WO2022033290A1 WO 2022033290 A1 WO2022033290 A1 WO 2022033290A1 CN 2021108190 W CN2021108190 W CN 2021108190W WO 2022033290 A1 WO2022033290 A1 WO 2022033290A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
storage
data object
target
target data
Prior art date
Application number
PCT/CN2021/108190
Other languages
English (en)
French (fr)
Inventor
简怀兵
Original Assignee
百果园技术(新加坡)有限公司
简怀兵
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 简怀兵 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2022033290A1 publication Critical patent/WO2022033290A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present application relates to the technical field of data storage, for example, to a strongly consistent storage system, a strongly consistent data storage method, a server, and a medium.
  • a global strongly consistent storage system represented by the classic strongly consistent storage algorithm Paxos or its variant strongly consistent storage algorithm Raft.
  • Paxos or its variant strongly consistent storage algorithm Raft.
  • This type of global strongly consistent storage system needs to deploy the required leading nodes across multiple sub-clusters first, and then all The write request generated by the client needs to be routed to the deployed master node first, and then the subsequent storage submission is performed.
  • the network delay of the global sub-cluster is very large, usually 200-500ms, if the request initiated by the client must be routed to the dominant node first, it will directly lead to a very large submission delay, thus affecting the global mandatory one system.
  • the performance of the storage system at the same time, the classical algorithms such as paxos and raft used in the system are difficult to implement and verify the correctness, which also leads to the poor stability of the global strongly consistent storage system.
  • Google Spanner which can statically partition data, treat each partition as a Paxos group, and each group contains multiple data objects. Since multiple Paxos groups can At the same time, the concurrent write operation improves the write capability of the entire system to a certain extent, and better solves the problem of poor storage performance in the above-mentioned first type of technology.
  • the write corresponding to the same data object o no matter where the client with the write request is located, must route the write request to The request is initiated at the leading node corresponding to the Paxos group to which the data object o belongs, and the adaptive capability of the system is poor.
  • a key of the improved system is to statically partition the data, and to allocate a plurality of data objects responsible for strongly consistent storage processing to each partition. For multiple data objects in the same partition, they are logically bound together, but these data objects may conflict in terms of data characteristics, system load, and user access behavior.
  • the optimized partition may affect the storage performance of other data objects in the same partition, and this kind of problem cannot be solved or avoided only based on the partition strategy.
  • the present application provides a strongly consistent storage system, a strongly consistent data storage method, a server and a medium, which improve the system storage performance and system adaptability when implementing the strongly consistent data storage.
  • the present application provides a strongly consistent storage system, including:
  • each node cluster includes at least one storage node
  • Each storage node is configured to store a leading node mapping table, wherein the leading node mapping table is used to record data objects and the leading nodes mapped by the data objects, and the leading node is a storage node under a node cluster And analyze the write request received by this node, determine the target data object contained in the write request, and according to the leading node mapping table stored by this node, through the interaction with other storage nodes, complete the target data object Strongly consistent storage commits for data objects.
  • the present application provides a method for strongly consistent data storage, including:
  • Each storage node that receives the write request analyzes the received write request, and determines the target data object contained in the write request;
  • the strongly consistent storage submission of the target data object is completed.
  • This application provides a server, including:
  • processors one or more processors
  • storage means arranged to store one or more programs
  • the one or more programs are executed by the one or more processors, so that the one or more processors implement the above-mentioned data strongly consistent storage method.
  • the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the above-mentioned data strongly consistent storage method.
  • FIG. 1 is an architecture diagram of a strongly consistent storage system provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic flowchart of a method for strongly consistent data storage provided in Embodiment 2 of the present application;
  • FIG. 3 is a schematic diagram of a hardware structure of a server according to Embodiment 3 of the present application.
  • the storage node as the dominant node is directly selected from the node cluster, and the write request containing the data object can be directly routed to the determined dominant node, and directly Subsequent strongly consistent storage commits are made by the master node to the data objects contained in the write request. Because the dominant node and the client sending the write request may exist in different regions, the system has the problem of poor system storage performance due to network latency.
  • FIG. 1 is an architecture diagram of a strongly consistent storage system provided by the first embodiment of the present application.
  • the storage system includes: node clusters 10 divided by regions, each node cluster 10 includes at least one storage node 101; each storage node 101 stores a leading node mapping table 1011, and the leading node mapping table 1011 is used to record data objects and The leading node mapped by the data object, wherein the leading node is any storage node 101 under any node cluster 10; the storage node 101 is configured to analyze the received write request, and determine that the write request contains the target data object, and according to the stored master node mapping table 1011, through interaction with other storage nodes 101, the strongly consistent storage submission of the target data object is completed.
  • T 2 mainly depends on the time spent in two rounds of interaction between the dominant node and the majority node in the cluster, and the time spent in one round of interaction depends on the dominant node and the majority in the cluster
  • the maximum round-trip time for interactions between nodes Assuming that the leading node corresponding to the data object can be stable and unchanged, the time it takes for the leading node of the data object to initiate a strongly consistent storage request to achieve a strongly consistent storage submission is equivalent to one round of interaction between the leading node and the majority node in the cluster time.
  • the strongly consistent storage system provided in this embodiment needs to ensure that T in the above modeling is strongly consistent and stored as an optimal value.
  • the condition for making T strongly consistent and stored as an optimal value can be seen as: the time T1 from the write request containing the data object to the leading node corresponding to the data object is the smallest; at the same time, considering the leading node corresponding to the data object All are in a reasonable stable state, so that T2 can also reach the time optimum.
  • the data object is considered as the smallest unit, and a separate Paxos mechanism is used for each data object to achieve strong consistent storage of the data object, so as to ensure that different data objects can be independently based on their data.
  • the characteristics, system load and user access behavior are adaptively updated to the corresponding dominant node.
  • the provided strongly consistent storage system is deployed based on a global network architecture.
  • the strongly consistent storage system may include multiple node clusters 10 divided according to regions, and different node clusters 10 It may be constituted by at least one storage node 101 .
  • each storage node 101 is equivalent to a physical node, and undertakes the implementation of the strongly consistent storage of data objects in the strongly consistent storage system provided by this embodiment.
  • the operation performed here may refer to an operation started after receiving the write request, and the write request received by the storage node 101 may be directly generated and sent by the user terminal , or it may be forwarded by other storage nodes 101 .
  • the receiving of the write request sent by the user terminal is used as the starting point for the storage node 101 to perform the strongly consistent storage process.
  • each storage node 101 stores a leading node mapping table 1011 correspondingly, and the leading node mapping table 1011 stored on each storage node 101 may include that the storage node 101 has processed
  • the master node mapped by the data object and the master node mapped by the data object can be any storage node 101 under any node cluster 10 in the strongly consistent storage system, and it is not necessary to store the master node mapping.
  • Storage node 101 of table 1011 can be any storage node 101 under any node cluster 10 in the strongly consistent storage system, and it is not necessary to store the master node mapping.
  • each storage node 101 when it receives a write request, in order to achieve strong consistent storage of the data objects included in the write request, it is first necessary to analyze the data objects included in the write request.
  • the data object included in the write request is recorded as the target data object, and after that, the target data object can be completed according to the dominant node mapping table 1011 stored on this node and through interaction with other storage nodes 101 Strongly consistent storage.
  • the target data object analyzed on the storage node 101 may be a data object that has not been processed by the storage node 101.
  • the record information relative to the target data object in the leading node mapping table 1011 is empty. Therefore, the storage node 101 needs to Through the interaction with other storage nodes 101, first determine the leading node of the target data object, and then complete the strongly consistent storage of the target data object according to the leading node of the target data object; when the target data object is processed by the storage node 101 When the data object has passed, there is a leading node corresponding to the target data object in the leading node mapping table 1011, and the storage node 101 needs to determine whether the leading node satisfies the condition of continuing to serve as the leading node.
  • the dominant node can directly complete the strongly consistent storage submission of the target data object.
  • the dominant node does not meet the condition of continuing to be the dominant node, it needs to interact with other storage nodes 101 to determine the new dominant node. node, and the new dominant node completes the strongly consistent storage submission of the target data object.
  • the leading node of the data object can be dynamically adjusted according to the data characteristics of the data object, system load and user access behavior, etc., so as to ensure that the write request of the data object is sent to the leading node.
  • the time spent is the most reasonable.
  • the strongly consistent storage system provided by the first embodiment of the present application can regard each data object as a minimum unit, and the storage node can independently use a set of Paxos mechanism to complete the strongly consistent storage of each data object.
  • Objects are partitioned independently to ensure that there is no dependency between multiple data objects; and when processing data objects as the smallest unit, according to the leading node mapping table stored on the storage node, through interaction with other storage nodes, to dynamically determine the target leading node of the target data object and complete the strongly consistent storage submission for the target data object on this basis, so that the strongly consistent storage system provided by this embodiment can and user access behavior to achieve the performance optimization of strongly consistent storage of data objects, while reducing the cost of strongly consistent storage, it also effectively improves the applicability of the strongly consistent storage system; in addition, it also improves the business side in business scenarios. user experience.
  • the storage node 101 may include:
  • the object determination module is configured to analyze the received write request and determine the target data object contained in the write request; the first determination module is configured to determine whether the target data exists in the stored leading node mapping table 1011 The target leading node of the object; the first execution module is set to, when there is no target leading node in the leading node mapping table 1011, as an initiating node to initiate a leading node election request corresponding to the target data object to other storage nodes 101 to pass Interaction with other storage nodes 101 to determine the target dominant node, and based on the determined target dominant node to complete the strongly consistent storage submission of the target data object; the second execution module is set to be when the dominant node mapping table 1011 There is a target dominant node When the node is a node, determine whether the target leading node is the current node, and complete the strongly consistent storage submission of the target data object through interaction with other storage nodes 101 according to the determination result.
  • the storage node 101 can be subdivided into several different functional modules.
  • the object determination module in the storage node 101 can analyze the received write request to determine the The target data object to be stored with strong consistency; another example is the first determination module in the storage node 101, which can directly determine whether the analyzed target data object exists in the leading node mapping table 1011 on the node, thereby determining the node. Whether the target master node of the target data object is recorded on it. And different determination results of the first determination module may trigger the first execution module or the second execution module in the storage node 101 to perform subsequent operations.
  • the first execution module can first use this node as an initiating node, to send the leading node election request corresponding to the target data object to other storage nodes 101, so as to realize interaction with other storage nodes 101 through the initiated leading node election request, so that the target leading node of the target data object can be determined, and then The strongly consistent storage commit for the target data object may be done based on the determined target master node.
  • the determination of the target dominant node achieved by the first execution module in the storage node 101 interacting with other storage nodes 101 by initiating a dominant node election request can actually be regarded as the storage node 101 in the Paxos algorithm and the majority of other storage nodes in the system.
  • An initiation request for a round of interaction between nodes 101, and this round of interaction can be used to determine the target dominant node corresponding to the target data object.
  • this node determines a new target leader node for the target data object by initiating a leader node election request to other storage nodes 101
  • the target data object and its corresponding target leader node will be associated and stored in the database created on this node.
  • the execution of the second execution module can be triggered.
  • the second execution module can first determine whether the existing target dominant node is the storage node 101 , and then can complete the strongly consistent storage submission of the target data object according to the determination result combined with interaction with other storage nodes 101 .
  • the second execution module in the storage node 101 can compare the target dominant node corresponding to the target data object in the dominant node mapping table 1011 with the node itself. If the target dominant node is the node, it is equivalent to storing Node 101 is the target leading node of the target data object, then the rationality of this node as the target data object can be considered, and when it is suitable as the target leading node of the target data object, the data object can be directly stored and submitted by this node; if If the target dominant node is not the current node, it is equivalent to the target dominant node being another storage node 101 in the strongly consistent storage system.
  • the second execution module needs to forward the write request including the target data object to the target dominant node, and send the request by the second execution module to the target dominant node.
  • the target master node implements the strongly consistent storage submission of the target data object, and the local storage node 101 can end the processing of the target data object.
  • the second execution module is set to:
  • the target dominant node existing in the dominant node mapping table 1011 is the node, it is determined whether the target data object satisfies the set dominant node transition condition in combination with the historical access data corresponding to the target data object; if the target data object satisfies the set dominant node transition condition determine the dominant node transfer condition, then determine a candidate dominant node and transfer the target data object to the candidate dominant node, so as to determine the target dominant node of the target data object based on the candidate dominant node, and based on the determined target
  • the dominant node completes the strongly consistent storage submission for the target data object; if the target data object does not meet the set dominant node transfer conditions, it initiates a strongly consistent storage submission request to other storage nodes 101 to complete the target data object.
  • a strongly consistent store commit is the target data object satisfies the set dominant node transition condition in combination with the historical access data corresponding to the target data object.
  • the second execution module learns that the target dominant node is the node. It can be seen that, in the second execution module, the target dominant node existing in the dominant node mapping table 1011 is determined to be the main node. After the node is established, the historical access data of the corresponding target data object pre-existed on the node can be obtained. Wherein, the historical access data includes at least the region from which the write request associated with the target data object is initiated, which is equivalent to including the initiation region information of the write request associated with the target data object.
  • the second execution module can determine whether the target data object satisfies the master node transition condition of the pre-equipment according to the content contained in the historical access data, and then the second execution module can respectively provide different execution modes according to the determination result.
  • the write requests including the target data object received consecutively set times are all generated from the same originating region, and the region to which the node belongs may be different from the originating region.
  • the process of determining whether the target data object satisfies the set dominant node transfer conditions can be described as: searching for the historical access data of the target data object, and extracting the write requests received by the node within the historical time and containing the target data object; Obtain the origination region information corresponding to multiple write requests; it is assumed that consecutively received write requests are initiated from the same region, and the origination region for sending these write requests is different from the region to which the node belongs region, it can be determined that the current master node transfer conditions are met.
  • the candidate dominant node is a storage node 101 selected from any node cluster 10 corresponding to the specified region, and the specified region can be the origination region corresponding to the write request associated with the target data object.
  • the process of determining the candidate leader node by the second execution module may be described as: selecting a storage node 101 as the candidate leader node from the node cluster 10 corresponding to the origination region of the write request.
  • a storage node 101 that is not a leading node of other data objects may be selected from the above node cluster 10 as a candidate leading node for the target data object.
  • the candidate leader node can send a message feedback to the node to inform the node of which storage node 101 the new target leader node of the target data object is, and send the The target data object and the corresponding new target leader node are updated to the leader node mapping table 1011 stored in the current node.
  • the second execution module determines that the target data object does not meet the transition conditions of the leading node, it can be considered that the current node can still serve as the target leading node of the target data object.
  • the node can directly initiate strong consistent storage to other storage nodes 101 Submit a request to complete a strongly consistent storage submission for the target data object.
  • the storage node 101 as the candidate dominant node is set to:
  • the strongly consistent storage submission request is performed to complete the strongly consistent storage submission to the target data object.
  • This embodiment also provides operations that should be performed when the storage node 101 acts as a candidate lead node.
  • the storage node 101 serving as the candidate lead node has the storage node 101 that receives the write request.
  • the storage node 101 serving as the candidate leading node may perform the following operations: first, after receiving the target data object, initiate a pairing to other storage nodes 101.
  • the node can then receive the election response result fed back by other storage nodes 101 according to the leader node election request sent by it, and can determine that it is allowed to be the target leader based on the election response result. Then, this node can directly initiate a strongly consistent storage submission request to other storage nodes 101 to complete the strongly consistent storage submission of the target data object.
  • the current node that performs these operations can also be recorded as an initiating node after initiating the dominant node election request.
  • the storage node 101 that initiates the dominant node election request is uniformly recorded as the initiating node, so as to facilitate the subsequent evaluation of the storage node 101 as the initiating node. Describe in more detail.
  • the operation performed by the storage node 101 as the candidate leading node in this embodiment is actually equivalent to the storage node 101 executing a complete Paxos algorithm on the target data object, that is, it can be regarded as the storage node 101 and the system in the system. Two rounds of interaction between majority nodes.
  • the second execution module is also set to:
  • the write request is forwarded to the storage node 101 serving as the target dominant node, so as to pass the storage node serving as the target dominant node to the storage node 101.
  • 101 Complete the strongly consistent storage submission for the target data object according to the received write request.
  • This embodiment also provides a description when the second execution module learns that the target dominant node is not the local node.
  • the second execution module after determining that the target dominant node existing in the dominant node mapping table 1011 is not the local node, it can The write request containing the target data object is forwarded to the storage node 101 as its target master node. Therefore, the storage node 101 serving as the target master node can subsequently complete the strongly consistent storage submission of the target data object according to the received write request.
  • the storage node 101 serving as the target master node may be equivalent to a storage node 101 that has received a write request, and may perform analysis of the received write request again to determine the target data object contained in the write request , and according to the stored leading node mapping table 1011, through the interaction with other storage nodes 101, the operation of strongly consistent storage and submission of the target data object is completed.
  • the module and the first judgment can also be determined according to the objects included in it.
  • the module, the first execution module and the second execution module implement strongly consistent storage submission for the target data object contained in the received write request.
  • the storage node 101 that receives the leader node election request includes:
  • the information search module is set to receive the leading node election request sent by the storage node 101 as the initiating node, and searches for the original leading node corresponding to the target data object from the leading node mapping table 1011 stored locally; the election response module is set to The original dominant node is compared with the initiating node, and an election response result corresponding to the dominant node election request is sent to the initiating node according to the comparison result.
  • the storage node 101 in the provided strongly consistent storage system may further include: an information search module and an election response module, thus, the storage node 101 After the node election request, the information search module and the election response module are used to feed back the election response result to the initiating node.
  • the information search module in the storage node 101 can receive the lead node election request sent by the storage node 101 as the initiating node, and then can search the target data object from the lead node mapping table 1011 stored on the node The current corresponding dominant node, and record the dominant node as the original dominant node.
  • the leading node election request can be regarded as a request including the information that the initiating node wants to be the leading node of the target data object, and the information search module can analyze the received leading node election request to be included in the request to be The initiator node elected as the dominant node.
  • the initiating node that wants to be elected as the leading node of the target data object can be obtained, and the information search module can also obtain the leading node mapping table 1011 searched by the information search module.
  • the original dominant node; after that, the initiating node may be compared with the original dominant node, and then an election response result corresponding to the dominant node election request may be sent to the initiating node according to the comparison result.
  • the election response result fed back by the storage node 101 that has received the lead node election request to the initiating node can include two cases, one is allowing the initiating node to be the target dominant node of the target data object, and the other is rejecting the initiating node as the target data object.
  • the target dominant node If the response result of the election is to allow the initiating node to be the target leading node of the target data object, one of the following two conditions must be satisfied: 1) The original leading node corresponding to the target data object stored locally by the storage node 101 is the initiating node; 2. ) The data number of the target data object included in the leader node election request is greater than the data number recorded by the local corresponding target data object of the storage node 101 .
  • the data number of the target data object can be understood as an identification number that enables other storage nodes 101 to reach a consensus on the election of the target dominant node, and each storage node 101 can record the current maximum data number corresponding to different data objects.
  • the initiating node when it is determined that the original dominant node of the local record and the initiating node are the same storage node 101, the initiating node can be considered to be the target dominant node corresponding to the target data object.
  • the node sends an election response result of accepting the initiator node as the target leader node of the target data object.
  • this embodiment can also optimize the execution process of the election response module to determine the election response result so that the election response module is set to:
  • the original dominant node and the initiating node are different storage nodes 101, obtain the historical data number stored locally in advance corresponding to the target data object, and obtain the data corresponding to the target data object from the dominant node election request current data number; if the current data number is greater than the historical data number, then after the response message sending conditions are met, the current data number is taken as the new historical data number and the initiating node is taken as the new original dominant node , and send the election response result of accepting the initiating node as the target leading node of the target data object to the initiating node; if the current data number is not greater than the historical data number, send the rejection to the initiating node.
  • the initiating node is the election response result of the target leading node of the target data object.
  • the election response module when it is determined that the original leading node and the initiating node of the local record are different storage nodes 101, first obtain the historical data number recorded in advance corresponding to the target data object from this node; Compare with the current data number of the target data object contained in the leading node election request. When the current data number is greater than the historical data number, the current data number can be used as the new data number after the response message sending conditions are met.
  • the historical data number of the target data object and the initiating node as the new original leading node, and sending the election response result of accepting the initiating node as the target leading node of the target data object to the initiating node.
  • the conditions for sending the response message here can be summarized as: the node performs self-check and determines that it is not currently in the process of electing a leader node for the same target data object. When the conditions for sending the response message are not currently met, it needs to wait for a random period of time and then re-check the self-check until it is checked that the node is not currently in the process of electing the leader node for the same target data object.
  • the election response module needs to update the historical data number recorded in this node, using the current data number.
  • the number replaces the historical data number as the new historical data number of the target data object in this node.
  • the election response module can also take the initiating node as the target data object and the new original leader corresponding to the leader node mapping table 1011 of this node. node to update the master node mapping table 1011 .
  • the election response module considers that the initiating node does not meet the requirements of the target dominant node corresponding to the target data object. Therefore, the election response result of rejecting the initiating node as the target master node of the target data object can be directly sent to the initiating node.
  • the initiating node that receives the election response result may be the storage node 101 that has received the write request containing the target data object in this embodiment, or may be the storage node 101 that has received the write request containing the target data object and is determined as a candidate The storage node 101 of the master node.
  • the storage node 101 as the initiating node is set to:
  • the statistical election response result is the number of nodes that allow this node to be the target dominant node of the target data object; when the number of nodes is greater than or equal to the set
  • the threshold determines that this node is the target leading node of the target data object, update the stored leading node mapping table 1011, and initiate a strongly consistent storage submission request to other storage nodes 101 to complete the target data object. Strongly consistent store commits.
  • the execution operation of the storage node 101 as the initiating node after receiving the election response result fed back by other storage nodes 101 is given.
  • This type of storage node 101 as the initiating node can receive the election response results fed back by other storage nodes 101 in the system corresponding to the dominant node election request sent by it;
  • the number of nodes of the target dominant node of the target data object can be counted, and finally it can be determined according to the counted number of the nodes whether its own node has reached the condition of being the target dominant node corresponding to the target data object, and when the condition is reached (for example, the After the number of nodes is greater than or equal to the set threshold, this condition is equivalent to the condition of executing the dominant node election in the Paxos algorithm), determine its own node as the target dominant node of the target data object, and execute a strong call to other storage nodes 101.
  • the operation of the consistent storage submission request is performed to complete the strongly consistent storage submission of the target data object.
  • the storage node 101 serving as the initiating node may record the association of the target data object and its corresponding target leading node in the leading node mapping table 1011 of the node after being determined as the target leading node of the target data object.
  • the storage node 101 serving as the initiating node is also set to:
  • the current dominant node carried in the target election response result is taken as the target dominant node corresponding to the target data object in the locally stored dominant node mapping table 1011, wherein the target election The response result is selected from the election response result of the target leader node whose content is to reject the initiating node as the target data object; forward the write request to the storage node 101 as the target leader node, to The strongly consistent storage commit for the target data object is completed by the storage node 101 serving as the target master node according to the received write request.
  • the storage node 101 serving as the initiating node may also, when the counted number of nodes is less than the set threshold, firstly select the target leader node election from the sent content that rejects the initiating node as the target data object In the response result, select an election response result with the largest current data number carried as the target election response result, and use the current dominant node carried in the target election response result as the target data described in the locally stored dominant node mapping table 1011
  • the target leader node corresponding to the object; after that, the write request containing the target data object can be forwarded to the storage node 101 as the target leader node again, so that the storage node 101 as the target leader
  • the incoming request completes the strongly consistent storage submission for the target data object.
  • the initiating node Before forwarding the write request containing the target data object to the storage node 101 as the target leading node, the initiating node can also update the data number recorded locally corresponding to the target data object, and store the current data carried in the target election response result.
  • the data number is used as the new historical data number corresponding to the target data object locally.
  • the strongly consistent storage system provided in the first embodiment of the present application aims to achieve the optimal value of the T strongly consistent storage .
  • the T strongly consistent storage is actually written by the data object containing the data.
  • the time T1 spent requesting to the leading node of the data object and the time T2 spent by the leading node of the data object from initiating a strongly consistent storage request to implementing a strongly consistent storage submission through the Paxos algorithm are composed of two parts.
  • the strongly consistent storage system provided by this embodiment adopts the same strongly consistent storage system as the background art. way, and not much improvement. Therefore, this embodiment mainly considers the optimization of the time T 1 taken by the write request containing the data object to the leading node of the data object,
  • this embodiment considers the data object as the smallest unit, and the storage node can process each data object independently through the Paxos algorithm, which ensures that there is no dependency relationship between multiple data objects, and the realization of the entire processing It is mainly done dynamically through interaction with other storage nodes according to the master node mapping table that has been stored on the storage node. After a period of iterative update of the leading node mapping table on each storage node in the strongly consistent storage system of this embodiment, the stability of the leading node corresponding to the data object can basically be guaranteed, and the write containing the data object can be guaranteed to be received.
  • the requested storage node is the leading node of the data object, so the time for sending the write request to the storage node can be regarded as T 1 , so as to realize the optimization of T 1 .
  • Using the strongly consistent storage system of this embodiment can better reflect the storage performance of data in strongly consistent storage, and while reducing the cost of strongly consistent storage, it also reflects the applicability of the strongly consistent storage system; Improves the experience of business parties in using the strongly consistent storage system in business scenarios.
  • FIG. 2 is a schematic flowchart of a method for strongly consistent data storage provided in Embodiment 2 of the present application. The method is applicable to the case of strongly consistent storage of business data. implement.
  • the strongly consistent storage system includes: node clusters divided into different regions, and each node cluster further includes at least one storage node.
  • each storage node stores a leading node mapping table, and the leading node mapping table includes the data objects for which the storage node has determined the leading node and the corresponding leading node.
  • a method for strongly consistent data storage includes the following operations:
  • the reception of the write request may be regarded as the initial step of processing the data object by the storage node in the strongly consistent storage system provided in the above embodiment.
  • the target data object included in the write request can be determined through the operation of this step, and thus the strongly consistent storage processing for the target data object is started.
  • a leading node mapping table is locally stored on the storage node. Based on the leading node mapping table, the strongly consistent storage submission of the target data object can be completed through interaction with other storage nodes.
  • the dominant node mapping table may contain its corresponding target dominant node, or it may not contain the target dominant node, and based on the above two conditions, the storage node can perform the target data object. different operations. For example, when the leader node mapping table does not contain the target leader node, the storage node needs to initiate a leader node election request to other storage nodes to determine a new target leader node for the target data object through the participation of other storage nodes.
  • the target dominant node completes the strongly consistent storage submission by sending a strongly consistent storage request to other storage nodes; for another example, when the dominant node mapping table contains the dominant node, if the dominant node conversion is not currently required, the storage node is directly equivalent to The target leader node can directly send a strongly consistent storage request to other storage nodes to complete the strongly consistent storage submission of the target data object; if the current leader node conversion is required, the storage node can determine the candidate leader node and transfer the target data object to The candidate leader node, and then through the interaction between the candidate leader node and other storage nodes, the strongly consistent storage submission of the target data object is completed.
  • a method for strongly consistent data storage can regard each data object as a minimum unit, so that the storage node independently uses the Paxos algorithm to perform strong consistent storage for each data object, thereby ensuring that There are no dependencies between multiple data objects.
  • This embodiment dynamically determines the target dominant node of the target data object by interacting with other storage nodes according to the dominant node mapping table stored on the storage node, and completes the strongly consistent storage submission for the target data object on this basis.
  • This enables the strongly consistent storage system provided by this embodiment to achieve performance optimization of strongly consistent storage of data objects according to the data characteristics, system load, and user access behavior of each data object.
  • the method effectively improves the applicability of the strongly consistent storage system while reducing the cost of strongly consistent storage; in addition, it also improves the business party's experience of using the system in business scenarios.
  • the second embodiment uses the following examples to describe the implementation process of strongly consistent storage of data objects.
  • the strongly consistent storage system includes three storage nodes.
  • the three storage nodes can be in node clusters corresponding to different regions.
  • Each storage node contains a mapping table updated to the current dominant node.
  • the three storage nodes are respectively recorded as node 1, node 2 and node 3, and node 1 firstly receives a write request generated by a user terminal for a service, and the write request includes the corresponding to-be-strengthened service of the service. Consistently stored target data object.
  • target leader node of the target data object does not exist in the leader node mapping table, act as the initiating node to initiate a leader node election request to other storage nodes, so as to determine the target leader node through interaction with other storage nodes, and execute S7.
  • node 2 For node 2, assuming that node 2 is a candidate dominant node and receives the target data object sent by node 1, node 2 can perform the following operations:
  • node 3 For node 3, assuming that node 3 receives the write request forwarded by node 1, node 3 is theoretically equivalent to a new node 1, and the above operations corresponding to node 1 can be re-executed.
  • node 3 For node 3, assuming that node 3 receives the leader node election request initiated by the initiating node (may be node 1 or node 2), node 3 can perform the following operations:
  • S9 Analyze the received request for the election of the dominant node, determine that the initiating node (node 1 or node 2) applies for the target dominant node of the target data object, and obtain the current data number corresponding to the target data object.
  • the initiating node can perform the following operations:
  • S15 Receive an election response result fed back by other storage nodes corresponding to the sent leader node election request.
  • the other storage nodes refer to node 3, but in practical applications, the other storage nodes may be all storage nodes that have received the leader node election request.
  • the target election response result is selected from the election response results of the target leader node whose content is to reject the initiating node as the target data object.
  • the target election response result is the election response result returned by node 3.
  • the storage node By receiving the storage node (assuming node 2) of the write request forwarded by S20, the storage node is also equivalent to the new node 1 in theory, and the above operations corresponding to node 1 can be re-executed.
  • sequence numbers of the steps marked in the above examples do not represent the execution sequence of the steps, but only represent the sequence numbers of a step. From the above example, it can be seen that when a strongly consistent storage system implements strongly consistent storage of a data object contained in a write request, it mainly depends on the determination of the leading node associated with the data object.
  • the node completes the strongly consistent storage submission of the data object by sending a strongly consistent storage submission request to other storage nodes.
  • the determination of the dominant node associated with the data object it can be determined according to the dominant node mapping table stored on the storage node.
  • the final situation may be: 1) The storage node that receives the write request is the dominant node of the data object.
  • the storage node that receives the write request directly forwards the write request to the leading node corresponding to the data object record in the leading node mapping table; 3) The storage node that receives the write request initiates the request to other nodes.
  • the leader node election request that becomes the leader node of the data object, and acts as the leader node of the data object after the request is allowed, and uses the storage node recommended by other storage nodes as the leader node when it is not allowed; 4) After receiving the write request When the storage node satisfies the master node transfer conditions, it will transfer the data object to the candidate master node, and the candidate master node will request other storage nodes to be the master node, and will act as the master node of the data object after the request is allowed, and transfer other storage nodes when it is not allowed.
  • the storage node recommended by the storage node serves as the master node.
  • the leading node mapping table stored on it is also updated accordingly, so that the strongly consistent system continuously iterates the leading node associated with the data object when the strongly consistent storage process is performed by the storage node.
  • a relatively stable mapping relationship can be formed after a certain period of time, so as to achieve the optimal time for strongly consistent storage processing.
  • the strongly consistent storage system can also deploy the storage nodes based on the master node mapping table of the storage nodes in advance, so that the write request generated by the client can be quickly received by the storage node serving as the master node of the data object.
  • FIG. 3 is a schematic diagram of a hardware structure of a server according to Embodiment 3 of the present application.
  • the server is set as a storage node in the strongly consistent storage system provided in Embodiment 1 above. May include: a processor and a storage device. At least one instruction is stored in the storage device, and the instruction is executed by the processor, so that the server can execute the operation steps corresponding to the storage node in the data strongly consistent storage method provided in the second embodiment.
  • the server may include: a processor 30 , a storage device 31 , a display screen 32 , an input device 33 , an output device 34 and a communication device 35 .
  • the number of processors 30 in the server may be one or more, and one processor 30 is taken as an example in FIG. 3 .
  • the number of storage devices 31 in the server may be one or more, and one storage device 31 is taken as an example in FIG. 3 .
  • the processor 30, the storage device 31, the display screen 32, the input device 33, the output device 34 and the communication device 35 of the server can be connected by a bus or other means, and the connection by a bus is taken as an example in FIG. 3 .
  • the processor 30 executes one or more programs stored in the storage device 31, the following is performed, such as: for each storage node that receives a write request, analyze the received write request, and determine the Write the target data object contained in the request; according to the leading node mapping table stored locally, through interaction with other storage nodes, the strongly consistent storage submission of the target data object is completed.
  • Embodiments of the present application further provide a storage medium including computer-executable instructions, where the computer-executable instructions are used to execute the strongly consistent data storage method provided by the embodiments of the present application when executed by a computer processor.
  • the methods described in the foregoing embodiments include: for each storage node that receives a write request, analyzing the received write request, and determining a target data object included in the write request; The leading node mapping table, through interaction with other storage nodes, completes the strongly consistent storage submission of the target data object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本文公开了强一致存储系统、数据强一致存储方法、服务器及存储介质。该强一致存储系统包括:按地域划分的多个节点集群,每个节点集群中包括至少一个存储节点;每个存储节点,设置为存储主导节点映射表,其中,主导节点映射表用于记录数据对象及数据对象所映射的主导节点,主导节点为一节点集群下的一存储节点;以及分析本节点所接收的写入请求,确定写入请求中包含的目标数据对象,并根据本节点所存储的主导节点映射表,通过与其他存储节点的交互,完成对目标数据对象的强一致存储提交。

Description

强一致存储系统、数据强一致存储方法、服务器及介质
本申请要求在2020年08月12日提交中国专利局、申请号为202010809245.4的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据存储技术领域,例如涉及强一致存储系统、数据强一致存储方法、服务器及介质。
背景技术
对于一些互联网领域的企业而言,其所涉及的很多业务场景,比如支付流水和月货币等业务流程中,需要依赖全球强一致存储系统来存储核心数据。
对于上述依赖全球强一致存储系统进行核心数据存储的实现,可以采用的技术分为两类:
1、以经典的强一致存储算法Paxos或其变种的强一致存储算法Raft为代表的全球强一致存储系统,该类全球强一致存储系统需要先跨多个子集群部署所需的主导节点,之后所有客户端产生的写请求都需要先路由到所部署的主导节点上,然后在进行后续的存储提交。这种模式下,由于全球子集群的网络延时很大,通常为200-500ms,若客户端发起的请求必须先路由到主导节点,将直接导致提交延时非常大,从而影响全球强一制存储系统的性能;同时,系统中所采用的paxos和raft等经典算法,在工程实现以及正确性验证上难度较大,由此也导致全球强一致存储系统的稳定性较差。
2、一个经过改进的全球强一致存储存储系统Google Spanner,该系统可以对数据进行静态分区,将每一个分区看作一个Paxos组,每一个分组中包含多个数据对象,由于多个Paxos组可以同时并发进行写操作,在一定程度上提升了整个系统的写入能力,较好地解决了上述第1类技术中存储性能差的问题。
然而,针对上述第2类技术方案,其在实际应用中存在以下几方面问题:
a、由于可处理的数据对象预先分配给不同的分区进行处理,所以对应同一个数据对象o的写入,无论具备该写入请求的客户端位于何处,都必须将该写入请求路由到该数据对象o所属Paxos组对应的主导节点处进行请求的发起,系统的自适应能力差。
b、该改进系统的一个关键在于对数据进行静态分区,以及为每个分区分配多个负责进行强一致存储处理的数据对象。对于同一分区中的多个数据对象而 言,其在逻辑上是被绑定在一起的,但这些数据对象在数据特性、系统负载以及用户访问行为等特征上可能是冲突的,由此,当考虑为一个分区中一些数据对象进行强一致存储优化时,优化后的分区可能会影响同一分区内其他数据对象的存储性能,且该类问题仅基于划分策略并没有办法解决或避免。
发明内容
本申请提供了强一致存储系统、数据强一致存储方法、服务器及介质,提高了数据强一致存储实现时的系统存储性能以及系统自适应性。
本申请提供了一种强一致存储系统,包括:
按地域划分的多个节点集群,每个节点集群中包括至少一个存储节点;
每个存储节点,设置为存储主导节点映射表,其中,所述主导节点映射表用于记录数据对象及所述数据对象所映射的主导节点,所述主导节点为一节点集群下的一存储节点;以及分析本节点所接收的写入请求,确定所述写入请求中包含的目标数据对象,并根据本节点所存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
本申请提供一种数据强一致存储方法,包括:
接收到写入请求的每个存储节点分析所接收的写入请求,确定所述写入请求中包含的目标数据对象;
根据本地存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
本申请提供了一种服务器,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的数据强一致存储方法。
本申请提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的数据强一致存储方法。
附图说明
图1是本申请实施例一提供的一种强一致存储系统的架构图;
图2是本申请实施例二提供的一种数据强一致存储方法的流程示意图;
图3是本申请实施例三提供的一种服务器的硬件结构示意图。
具体实施方式
下面将结合附图对本申请实施例方式进行描述。
实施例一
通过上述背景技术部分的描述,可以看出,无论以经典的Paxos算法为代表的全球强一致存储系统,还是改进后对数据进行静态分区使每个静态分区对应一个Paxos组的全球强一致存储系统。其均以Paxos算法为基础来实现业务场景中数据对象的强一致存储,而Paxos算法的关键在于主导节点的确定,其中,所述数据对象可理解为写入请求中待进行强一致存储的数据。
对于主导节点的确定,经典的全球强一致存储系统中,直接从节点集群中选中作为主导节点的存储节点,包含数据对象的写入请求则可直接被路由到所确定的主导节点上,并直接由主导节点对写入请求中包含的数据对象进行后续的强一致存储提交。因为主导节点与发送写入请求的客户端可能存在于不同的区域,该系统存在因网络延时而产生的系统存储性能差的问题。
对于背景技术所提及改进后的全球强一致存储系统来说,该系统尽管考虑了按照静态分区来划分数据对象所对应的主导节点,但存在需要将包含数据对象的写入请求路由到该数据对象所对应主导节点上的问题,同时还存在处于同一分区的数据对象因数据特性、系统负载以及用户访问行为等特征上的冲突不能针对一个数据对象选择有效主导节点的问题。
基于此,本申请实施例一提出了一个能够解决上述问题的强一致存储系统,图1是本申请实施例一提供的一种强一致存储系统的架构图,如图1所示,该强一致存储系统包括:按地域划分的节点集群10,每个节点集群10中包括至少一个存储节点101;每个存储节点101上存储有主导节点映射表1011,主导节点映射表1011用于记录数据对象及数据对象所映射的主导节点,其中,所述主导节点为任一节点集群10下的任一存储节点101;存储节点101,设置为分析所接收的写入请求,确定所述写入请求中包含的目标数据对象,并根据所存储的主导节点映射表1011,通过与其他存储节点101的交互,完成对所述目标数据对象的强一致存储提交。
首先,本实施例所提供的强一致存储系统的核心思路在于:通过分析强一致存储系统存在的问题,可以将一个数据对象实现强一致存储的过程建模为:T 强一致存储=T 1+T 2,其中,T 强一致存储表示数据对象实现强一致存储需要花费的时间,T 1表示包含数据对象的写入请求到该数据对象的主导节点所花费的时间,T 2表示数据对象的主导节点通过Paxos算法从发起强一致存储请求到实现强一致存 储提交所花费的时间。
其次,通过对Paxos算法的分析,可知T 2主要取决于主导节点和集群中多数派节点之间进行两轮交互所耗费的时间,其中,一轮交互的耗时取决主导节点与集群中多数派节点之间交互的最大来回时间。假设数据对象所对应的主导节点能够稳定不变,则数据对象的主导节点发起强一致存储请求实现强一致存储提交所花费的时间相当于该主导节点与集群中多数派节点之间的一轮交互时间。
由此,为了使本实施例所提供的强一致存储系统具备良好的存储性能以及使业务方在使用强一致存储系统时具备更好的业务体验。本实施例所提供的强一致存储系统需要保证上述建模中T 强一致存储为一个最优值。而使T 强一致存储为一个最优值的条件可以看作:包含数据对象的写入请求到该数据对象所对应的主导节点的时间T 1最小;同时,在考虑数据对象所对应的主导节点均处于一个合理的稳定状态下,使T 2也能达到时间最优。
为使本实施例所提供的强一致存储系统能够更好的满足T 强一致存储最优的条件。本实施例考虑以数据对象为最小单元,并对每个数据对象分别采用一套单独的Paxos机制来实现数据对象的强一致存储,以此来保证不同的数据对象之间可以独立的根据其数据特性、系统负载和用户访问行为进行所对应主导节点的自适应更新。
在本实施例中,所提供的强一致存储系统为基于全球网络架构的部署,在全球网络架构下,该强一致存储系统中可以包括按照地域划分的多个节点集群10,而不同节点集群10可以由至少一个存储节点101构成。在本实施例所提供的强一致存储系统中,每一个存储节点101相当于一个物理节点,承担了本实施例所提供强一致存储系统实现数据对象强一致存储的执行。
在本实施例中,对于存储节点101而言,此处所执行的操作可指接收到写入请求后所启动的操作,存储节点101接收到的写入请求可以是直接由用户终端产生并发送的,也可以是由其他存储节点101转发的。本实施例中以接收到由用户终端发送的写入请求作为存储节点101进行强一致存储处理的起点。
为实现对数据对象的强一致存储,首先可以由接收到包含该数据对象的写入请求的存储节点101确定该数据对象所对应的主导节点,之后由作为该数据对象主导节点的存储节点101完成数据对象的强一致存储。为有效获得数据对象的主导节点,本实施例中每个存储节点101上相应存储了主导节点映射表1011,每个存储节点101上存储的主导节点映射表1011中可以包括该存储节点101已处理过的数据对象及其所映射的主导节点,而数据对象所映射的主导节点可以是该强一致存储系统中任一节点集群10下的任一存储节点101,并不一定是存 储该主导节点映射表1011的存储节点101。
对于每个存储节点101而言,当其接收到一个写入请求时,为实现对写入请求中所包括数据对象的强一致存储,首先需要先分析该写入请求中包括的数据对象,本实施例中将该写入请求中包括的数据对象记为目标数据对象,之后,可以根据本节点上存储的主导节点映射表1011,以及通过与其他存储节点101的交互,来完成对目标数据对象的强一致存储。
存储节点101上所分析出的目标数据对象,可能是存储节点101未处理过的数据对象,此时主导节点映射表1011中相对该目标数据对象的记录信息为空,由此,存储节点101需要通过与其他存储节点101的交互,先确定该目标数据对象的主导节点,然后根据该目标数据对象的主导节点来完成对该目标数据对象的强一致存储;当目标数据对象为存储节点101已处理过的数据对象时,主导节点映射表1011中存在该目标数据对象对应的主导节点,存储节点101则需要确定该主导节点是否满足继续作为主导节点的条件,当该主导节点满足继续作为主导节点的条件时,可以直接由该主导节点来完成对目标数据对象的强一致存储提交,当该主导节点不满足继续作为主导节点的条件时,则需要通过与其他存储节点101的交互来确定新的主导节点,并由新的主导节点来完成对目标数据对象的强一致存储提交。
在该强一致存储系统中,能够尽量地根据数据对象的数据特征、系统负载以及用户访问行为等来动态的调整该数据对象的主导节点,从而保证该数据对象的写入请求到该主导节点所花费的时间最合理。
本申请实施例一提供的强一致存储系统,能够将每个数据对象看作一个最小单元,存储节点可以单独采用一套Paxos机制来完成对每个数据对象的强一致存储,考虑对每个数据对象进行独立分区,保证了多个数据对象之间不存在依赖关系;且在以数据对象为最小单元进行处理时,根据存储节点上已存储的主导节点映射表,通过与其他存储节点的交互,来动态地确定目标数据对象的目标主导节点并在此基础上完成对目标数据对象的强一致存储提交,使得本实施例所提供的强一致存储系统能够根据每个数据对象的数据特征、系统负载以及用户访问行为等来实现数据对象强一致存储的性能优化,在降低强一致存储的成本投入的同时,有效提高了强一致存储系统适用性;此外,还提升了业务场景中业务方对该系统的使用体验。
本实施例对存储节点101所具备的功能进行了细化,存储节点101可以包括:
对象确定模块,设置为分析所接收的写入请求,确定所述写入请求中包含的目标数据对象;第一判定模块,设置为确定所存储的主导节点映射表1011中 是否存在所述目标数据对象的目标主导节点;第一执行模块,设置为当主导节点映射表1011中不存在目标主导节点时,作为发起节点向其他存储节点101发起对应所述目标数据对象的主导节点选举请求,以通过与其他存储节点101的交互来确定目标主导节点,并基于确定的目标主导节点完成对所述目标数据对象的强一致存储提交;第二执行模块,设置为当主导节点映射表1011中存在目标主导节点时,确定所述目标主导节点是否为本节点,并根据确定结果通过与其他存储节点101的交互完成对所述目标数据对象的强一致存储提交。
在本实施例中,可以将存储节点101细化为几个不同的功能模块,如,存储节点101中的对象确定模块,可以对所接收的写入请求进行分析,从而确定出写入请求中待进行强一致存储的目标数据对象;又如存储节点101中的第一判定模块,可以直接判定分析出的目标数据对象是否存在于本节点上的主导节点映射表1011中,由此判定本节点上是否记录有该目标数据对象的目标主导节点。且第一判定模块的不同判定结果可以触发存储节点101中的第一执行模块或者第二执行模块执行后续的操作。
当第一判定模块的判定结果满足第一执行模块的执行条件,即主导节点映射表1011中不存在目标数据对象的目标主导节点时,可以通过第一执行模块首先将本节点作为一个发起节点,来向其他存储节点101发送对应目标数据对象的主导节点选举请求,从而通过所发起的主导节点选举请求实现与其他存储节点101的交互,由此可以确定出目标数据对象的目标主导节点,且之后可基于确定的目标主导节点来完成对所述目标数据对象的强一致存储提交。
存储节点101中第一执行模块通过发起的主导节点选举请求,来与其他存储节点101进行交互而实现的目标主导节点的确定,实际可以看作Paxos算法中存储节点101与系统中多数派其他存储节点101之间进行一轮交互的发起请求,该轮交互可用来实现目标数据对象所对应目标主导节点的确定。同时本节点通过向其他存储节点101发起主导节点选举请求,为目标数据对象确定出新的目标主导节点后,该目标数据对象及其所对应的目标主导节点将关联存储至创建在本节点上的主导节点映射表1011中。
此外,当第一判定模块的判定结果满足第二执行模块的执行条件,即主导节点映射表1011中存在目标数据对象的目标主导节点时,可以触发第二执行模块的执行。第二执行模块可以首先确定所存在的目标主导节点是否为本存储节点101,之后则可根据该确定结果再结合与其他存储节点101的交互,来完成对目标数据对象的强一致存储提交。
对于存储节点101中的第二执行模块,其可以将目标数据对象在主导节点映射表1011中对应的目标主导节点与本节点自身进行比对,如果该目标主导节 点为本节点,则相当于存储节点101就是目标数据对象的目标主导节点,之后可考虑本节点作为目标数据对象的合理性,并在适合作为目标数据对象的目标主导节点时,直接由本节点对数据对象进行强一致存储提交;如果该目标主导节点不是本节点,则相当于目标主导节点为强一致存储系统中其他存储节点101,由此第二执行模块需要将包含目标数据对象的该写入请求转发给目标主导节点,并由目标主导节点来实现对该目标数据对象的强一致存储提交,而本存储节点101则可以结束对该目标数据对象的处理。
所述第二执行模块设置为:
当主导节点映射表1011中所存在的目标主导节点为本节点时,结合所述目标数据对象对应的历史访问数据,确定目标数据对象是否满足设定的主导节点转移条件;若目标数据对象满足设定的主导节点转移条件,则确定候选主导节点并将所述目标数据对象转移给所述候选主导节点,以基于所述候选主导节点确定所述目标数据对象的目标主导节点,并基于确定的目标主导节点完成对所述目标数据对象的强一致存储提交;若目标数据对象不满足设定的主导节点转移条件,则向其他存储节点101发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
本实施例上述给出了第二执行模块得知到目标主导节点为本节点时的描述,可以看出,第二执行模块中,在确定主导节点映射表1011中所存在的目标主导节点为本节点后,可以获取预先存在本节点上的对应目标数据对象的历史访问数据。其中,所述历史访问数据中至少包括了目标数据对象关联的写入请求从哪个地域中发起,相当于包括了目标数据对象所关联写入请求的发起地域信息。
第二执行模块可以根据该历史访问数据中包含的内容来确定该目标数据对象是否满足预先设备的主导节点转移条件,之后,第二执行模块可以根据判定结果分别给出不同的执行方式。其中,连续设定次接收到的包含所述目标数据对象的写入请求均从同一个发起地域内产生,且本节点归属的地域与所述发起地域可以不同。
确定目标数据对象是否满足设定的主导节点转移条件的过程可描述为:查找目标数据对象的历史访问数据,提取出本节点在历史时间内所接收到的包含该目标数据对象的写入请求;获得多个写入请求所对应的发起地域信息;假设设定连续次接收到的写入请求均从同一个地域中发起,且发送这些写入请求的发起地域与本节点所归属的地域为不同地域,则可确定当前满足主导节点转移条件。
第二执行模块确定目标数据对象满足主导节点转移条件后,可认为本节点已经不再适合继续作为目标数据对象的目标主导节点,由此可以将该目标数据 对象直接转移给确定出的候选主导节点来处理,从而由候选主导节点确定该目标数据对象的目标主导节点,以及由确定的目标主导节点来完成对目标输数据对象的强一致存储提交。其中,所述候选主导节点为从指定地域所对应节点集群10中任一选择的存储节点101,而所述指定地域则可为所述目标数据对象所关联写入请求对应的发起地域。
示例性的,第二执行模块对候选主导节点的确定过程可描述为:从写入请求的发起地域所对应的节点集群10中选择一个存储节点101作为候选主导节点。为保证数据强一致存储的效率,可以从上述节点集群10中选择未作为其他数据对象主导节点的存储节点101来作为该目标数据对象的候选主导节点。
通过候选主导节点确定出目标数据对象的目标主导节点后,该候选主导节点可以向本节点进行一个消息反馈,以告知本节点该目标数据对象的新目标主导节点为哪个存储节点101,并将该目标数据对象及对应的新目标主导节点更新至存在本节点中的主导节点映射表1011中。
此外,第二执行模块确定目标数据对象不满足主导节点转移条件时,可认为本节点仍可作为该目标数据对象的目标主导节点,此时,可直接由本节点向其他存储节点101发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
作为所述候选主导节点的存储节点101,设置为:
向其他存储节点101发起对应所述目标数据对象的主导节点选举请求,并记作发起节点;根据其他存储节点101反馈的选举响应结果确定自身被允许作为目标主导节点后,向其他存储节点101发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
本实施例还给出了存储节点101作为候选主导节点时应该执行的操作,作为候选主导节点的存储节点101相对于上述接收到写入请求的存储节点101而言,接收到写入请求的存储节点101在执行将目标数据对象转发给所确定的候选主导节点后,作为候选主导节点的存储节点101可以执行下述操作:首先可以在接收到该目标数据对象后,向其他存储节点101发起对应该目标数据对象的主导节点选举请求,之后,本节点可以接收到其他存储节点101根据其所发送的主导节点选举请求反馈的选举响应结果,且可以基于该选举响应结果确定自身被允许作为目标主导节点,然后,本节点就可以直接向其他存储节点101发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
执行这些操作的本节点也可以在发起主导节点选举请求后记作一个发起节点,本实施例将发起主导节点选举请求的存储节点101统一记为发起节点,以 便于后续对作为发起节点的存储节点101进行更细节化的描述。
本实施例上述作为候选主导节点的存储节点101所执行操作实际上相当于该存储节点101对该目标数据对象进行了一次完整的Paxos算法的执行,即,可以看作该存储节点101与系统中多数派节点之间的两轮交互。
所述第二执行模块还设置为:
当主导节点映射表1011中所存在的目标主导节点为非本节点时,则将所述写入请求转发给作为所述目标主导节点的存储节点101,以通过作为所述目标主导节点的存储节点101根据接收的写入请求完成对所述目标数据对象的强一致存储提交。
本实施例还给出了第二执行模块得知目标主导节点为非本节点时的描述,第二执行模块中,在确定主导节点映射表1011中所存在的目标主导节点不是本节点后,可以将包含该目标数据对象的写入请求转发给作为其目标主导节点的存储节点101。由此后续可通过作为目标主导节点的存储节点101根据所接收的写入请求来完成对该目标数据对象的强一致存储提交。
本实施例中作为目标主导节点的存储节点101又可相当于一个接收到写入请求的存储节点101,可以再次执行分析所接收的写入请求,确定所述写入请求中包含的目标数据对象,并根据所存储的主导节点映射表1011,通过与其他存储节点101的交互,完成对所述目标数据对象的强一致存储提交的操作,同样也可以根据其包括的对象确定模块、第一判定模块、第一执行模块以及第二执行模块来实现对所接收写入请求中所包含目标数据对象的强一致存储提交。
在上述实施例的基础上,接收到所述主导节点选举请求的存储节点101,包括:
信息查找模块,设置为接收作为发起节点的存储节点101发送的主导节点选举请求,并从本地存储的主导节点映射表1011中查找所述目标数据对象对应的原始主导节点;选举响应模块,设置为将所述原始主导节点与所述发起节点进行比较,并根据比较结果向所述发起节点发送对应所述主导节点选举请求的选举响应结果。
在本实施例中,所提供强一致存储系统中的存储节点101还可以包括:信息查找模块以及选举响应模块,由此,存储节点101可以在接收到作为发起节点的其他存储节点101发起的主导节点选举请求后,通过信息查找模块以及选举响应模块来向发起节点进行选举响应结果的反馈。
对于存储节点101中的信息查找模块而言,可以接收到作为发起节点的存储节点101所发送的主导节点选举请求,之后可以从存储在本节点上的主导节 点映射表1011中查找该目标数据对象当前对应的主导节点,并将该主导节点记为原始主导节点。其中,所述主导节点选举请求可以看作一个包括了发起节点想要作为该目标数据对象的主导节点的信息的请求,该信息查找模块可以分析出所接收的主导节点选举请求中包括的想要被选举为主导节点的发起节点。
对于存储节点101中的选举响应模块而言,可以获得到想要被选举为目标数据对象的主导节点的发起节点,还可以获得到信息查找模块从本节点的主导节点映射表1011中查找到的原始主导节点;之后,可以将发起节点与该原始主导节点进行比对,然后根据比较结果向所述发起节点发送对应所述主导节点选举请求的选举响应结果。
接收到主导节点选举请求的存储节点101向发起节点反馈的选举响应结果可以包含两种情况,一种是允许发起节点作为目标数据对象的目标主导节点,一种是拒绝发起节点作为目标数据对象的目标主导节点。而想要选举响应结果为允许发起节点作为目标数据对象的目标主导节点,需要满足下面两个条件之一,1)该存储节点101本地存储的目标数据对象对应的原始主导节点就是发起节点;2)主导节点选举请求中包括的目标数据对象的数据编号大于该存储节点101本地对应的目标数据对象记录的数据编号。其中,目标数据对象的数据编号可以理解为一个让其他存储节点101对于目标主导节点的选举能够达成共识的标识编号,每个存储节点101中可对应不同数据对象分别进行当前最大数据编号的记录。
本实施例可以将选举响应模块进行选举响应结果判定的执行过程优化为所述选举响应模块设置为:
当所述原始主导节点与所述发起节点为同一个存储节点101时,向所述发起节点发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
对于该选举响应模块,其可以在确定本地记录的原始主导节点与发起节点为同一个存储节点101时,认为该发起节点符合作为目标数据对象所对应目标主导节点的条件,由此可直接向发起节点发送接受该发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
同时,本实施例还可以将选举响应模块进行选举响应结果判定的执行过程优化为所述选举响应模块设置为:
当所述原始主导节点与所述发起节点为不同存储节点101时,获取本地预先对应所述目标数据对象存储的历史数据编号,并从所述主导节点选举请求中获取所述目标数据对象对应的当前数据编号;如果所述当前数据编号大于所述 历史数据编号,则在满足响应消息发送条件后,将所述当前数据编号作为新的历史数据编号以及将所述发起节点作为新的原始主导节点,并向所述发起节点发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果;如果所述当前数据编号不大于所述历史数据编号,向所述发起节点发送拒绝所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
对于该选举响应模块,其还可以在确定本地记录的原始主导节点与发起节点为不同存储节点101时,先从本节点中获取预先对应目标数据对象记录的历史数据编号;并将该历史数据编号与主导节点选举请求中包含的目标数据对象的当前数据编号进行比对,当所述当前数据编号大于所述历史数据编号时,就可在满足响应消息发送条件后将所述当前数据编号作为新的历史数据编号以及将所述发起节点作为新的原始主导节点,并向所述发起节点发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
这里的响应消息发送条件可以概括为:本节点进行自检查并确定当前并未处于对同一个目标数据对象的主导节点选举流程中。在当前不满足响应消息发送条件时,需要随机的等待一段时间再重新进行自检查,直至检查到本节点当前并未处于对同一个目标数据对象的主导节点选举流程中。
在本实施例中,选举响应模块在原始主导节点与发起节点为不同存储节点101且当前数据编号大于所述历史数据编号时,需要对记录在本节点中的历史数据编号进行更新,采用当前数据编号替换历史数据编号作为目标数据对象在本节点中新的历史数据编号,此时,选举响应模块还可以将发起节点作为目标数据对象在本节点的主导节点映射表1011中对应的新的原始主导节点,以进行主导节点映射表1011的更新。
在本实施例中,选举响应模块在原始主导节点与发起节点为不同存储节点101但当前数据编号不大于所述历史数据编号时,认为该发起节点不符合作为目标数据对象所对应目标主导节点的条件,由此可直接向发起节点发送拒绝该发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
接收到选举响应结果的发起节点可以是本实施例中接收到包含目标数据对象的写入请求的存储节点101,也可以是由接收到包含目标数据对象的写入请求的存储节点101确定为候选主导节点的存储节点101。
作为发起节点的存储节点101,设置为:
接收其他存储节点101对应所发送主导节点选举请求反馈的选举响应结果;统计选举响应结果为允许本节点作为所述目标数据对象的目标主导节点的节点数量;当所述节点数量大于或等于设定阈值时,确定本节点为所述目标数据对 象的目标主导节点以及进行所存储主导节点映射表1011的更新,并向其他存储节点101发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
在本实施例中,给出了作为发起节点的存储节点101在接收到其他存储节点101所反馈选举响应结果后的执行操作。作为发起节点的该类存储节点101,可以接收系统中其他存储节点101相应其所发送主导节点选举请求反馈的选举响应结果;之后可以分析选举响应结果并对选举响应结果为允许本节点作为所述目标数据对象的目标主导节点的节点数量进行统计,最终可以根据所统计的该节点数量来确定自身节点是否达到作为目标数据对象所对应目标主导节点的条件,并在达到该条件(如,所述节点数量大于或等于设定阈值,该条件相当于Paxos算法中执行主导节点选举时所具备的条件)后,将自身节点确定为目标数据对象的目标主导节点,并执行向其他存储节点101发起强一致存储提交请求的操作,来完成对所述目标数据对象的强一致存储提交。
作为发起节点的存储节点101可以在确定为目标数据对象的目标主导节点后,将目标数据对象及其对应的目标主导节点关联记录在本节点的主导节点映射表1011中。
作为发起节点的存储节点101还设置为:
当所述节点数量小于设定阈值时,将目标选举响应结果中携带的当前主导节点,作为本地所存储主导节点映射表1011中所述目标数据对象对应的目标主导节点,其中,所述目标选举响应结果从所发送内容为拒绝所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果中选定;将所述写入请求转发给作为所述目标主导节点的存储节点101,以通过作为所述目标主导节点的存储节点101根据接收的写入请求完成对所述目标数据对象的强一致存储提交。
在本实施例中,作为发起节点的存储节点101还可以在统计的节点数量小于设定阈值时,首先可以从所发送内容为拒绝所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果中选定一个所携带的当前数据编号最大的选举响应结果作为目标选举响应结果,并将该目标选举响应结果中携带的当前主导节点作为本地所存储主导节点映射表1011中所述目标数据对象对应的目标主导节点;之后,可以将包含该目标数据对象的写入请求再次转发给作为所述目标主导节点的存储节点101,以通过作为所述目标主导节点的存储节点101根据接收的写入请求完成对所述目标数据对象的强一致存储提交。
发起节点在将包含该目标数据对象的写入请求再次转发给作为所述目标主导节点的存储节点101之前,还可以更新本地对应目标数据对象记录的数据编号,将目标选举响应结果中携带的当前数据编号作为目标数据对象在本地对应 的新的历史数据编号。
本申请实施例一上述所提供的强一致存储系统,以使T 强一致存储达到最优值为目的,根据上述对T 强一致存储的描述可知,T 强一致存储实际由包含数据对象的写入请求到该数据对象的主导节点所花费的时间T 1和数据对象的主导节点通过Paxos算法从发起强一致存储请求到实现强一致存储提交所花费的时间T 2两部分构成。对于数据对象的主导节点通过Paxos算法从发起强一致存储请求到实现强一致存储提交所花费的时间T 2,本实施例所提供的强一致存储系统采用了与背景技术中的强一致存储系统相同的方式,并没有过多的改进。因此本实施例主要考虑对包含数据对象的写入请求到该数据对象的主导节点所花费的时间T 1的优化,
对于T 1的优化,本实施例考虑将数据对象作为最小单元,存储节点可以通过Paxos算法对每个数据对象进行独立处理,保证了多个数据对象之间不存在依赖关系,且整个处理的实现主要根据存储节点上已存储的主导节点映射表,通过与其他存储节点的交互来动态地完成。本实施例的强一致存储系统中每个存储节点上的主导节点映射表经过一段时间的迭代更新后,基本可以保证数据对象所对应主导节点的稳定性,并保证接收包含该数据对象的写入请求的存储节点即为该数据对象的主导节点,由此可以将发送写入请求到该存储节点的时间看作T 1,以此来实现对T 1的优化。采用本实施例的强一致存储系统,能够更好地体现数据强一致存储时的存储性能,并且在降低强一致存储的成本投入的同时,还体现了强一致存储系统的适用性;此外,还提升了业务场景中业务方对该强一致存储系统的使用体验。
实施例二
图2是本申请实施例二提供的一种数据强一致存储方法的流程示意图,该方法适用于对业务数据进行强一致存储的情况,该方法可以由本申请上述实施例一提供的强一致存储系统执行。
根据上述实施例一对强一致存储系统的描述,可知强一致存储系统包括了:划分在不同地域中的节点集群,每个节点集群中还包含了至少一个存储节点。同时,每个存储节点上存储有主导节点映射表,该主导节点映射表中包含了存储节点已确定出主导节点的数据对象及相应的主导节点。
如图2所示,本实施例二提供的一种数据强一致存储方法,包括如下操作:
S201、针对接收到写入请求的每个存储节点,分析所接收的写入请求,确定所述写入请求中包含的目标数据对象。
本实施例可以将接收到写入请求看作上述实施例所提供强一致存储系统中 存储节点进行数据对象处理的起始步骤。针对每个接收到写入请求的存储节点,都可以通过本步骤的操作来确定写入请求中包含的目标数据对象,并由此启动对目标数据对象的强一致存储处理。
S202、根据本地存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
存储节点上本地存储有主导节点映射表,基于该主导节点映射表,通过与其他存储节点的交互,就可以完成对该目标数据对象的强一致存储提交。
对于目标数据对象,主导节点映射表中可能存在包含其所对应的目标主导节点,也可以不包含该目标主导节点,而基于上述包含或不包含两种情况,存储节点可以对该目标数据对象进行不同的操作。如,主导节点映射表中不包含目标主导节点时,存储节点需要向其他存储节点发起主导节点选举请求,以通过其他存储节点的参与为目标数据对象确定一个新的目标主导节点,之后就可通过该目标主导节点以向其他存储节点发送强一致存储请求的方式完成强一致存储提交;又如,主导节点映射表中包含主导节点时,如果当前不需要进行主导节点转换,则存储节点直接相当于目标主导节点,可以直接向其他存储节点发送强一致存储请求来完成对目标数据对象的强一致存储提交;如果当前需要进行主导节点转换,则存储节点可以确定候选主导节点并将目标数据对象转移给候选主导节点,之后通过候选主导节点与其他存储节点的交互,来完成对目标数据对象的强一致存储提交。
本申请实施例二提供的一种数据强一致存储方法,能够将每个数据对象看作一个最小单元,使存储节点对每个数据对象独立的采用Paxos算法来进行强一致存储,由此保证了多个数据对象之间不存在依赖关系。本实施例根据存储节点上已存储的主导节点映射表,通过与其他存储节点的交互,来动态地确定目标数据对象的目标主导节点并在此基础上完成对目标数据对象的强一致存储提交,使得本实施例所提供的强一致存储系统能够根据每个数据对象的数据特征、系统负载以及用户访问行为等来实现数据对象强一致存储的性能优化。该方法在降低强一致存储的成本投入的同时,有效提高了强一致存储系统适用性;此外,还提升了业务场景中业务方对该系统的使用体验。
为便于理解上述实施例一所提供强一致存储系统中存储节点对数据对象的强一致存储处理,本实施例二通过下述示例来表述数据对象强一致存储的实现过程。
首先,本示例假设强一致存储系统中包括3个存储节点,3个存储节点可以处于不同地域对应的节点集群中,每个存储节点上包含了更新至当前的主导节点映射表。本实施例将3个存储节点分别记为节点1、节点2以及节点3,且由 节点1首先接收到用户终端相对一个业务生成的写入请求,该写入请求中包含该业务对应的待强一致存储的目标数据对象。
对于节点1,在接收到该写入请求后可以执行下述操作:
S1、分析所接收的写入请求,确定该写入请求中包含的目标数据对象。
S2、从所存储的主导节点映射表中查找是否包含目标数据对象的目标主导节点。
S3、若主导节点映射表中不存在目标数据对象的目标主导节点,则作为发起节点向其他存储节点发起主导节点选举请求,以通过与其他存储节点的交互来确定目标主导节点,执行S7。
S4、若主导节点映射表中存在目标数据对象的目标主导节点,则当目标主导节点为本节点1时,结合对应该目标数据对象记录的历史访问数据,确定该目标数据对象是否满足设定的主导节点转移条件,若该目标数据对象满足设定的主导节点转移条件,并执行S5;若该目标数据对象不满足设定的主导节点转移条件,执行S6。
S5、确定候选主导节点,并将该目标数据对象转移给该候选主导节点。
S6、向其他存储节点发起强一致存储提交请求,来完成对所述目标数据对象的强一致存储提交。
S7、将该写入请求转发给作为目标主导节点的存储节点。
对于节点2,假设节点2作为候选主导节点,接收到节点1发送的该目标数据对象,则节点2可以执行下述操作:
S8、向其他存储节点发起对应所述目标数据对象的主导节点选举请求,并同样记本节点为发起节点。
对于节点3,假设节点3接收到节点1转发的该写入请求,则节点3在理论上相当于新的节点1,可以重新执行节点1对应的上述操作。
对于节点3,假设节点3接收到发起节点(可能是节点1或者节点2)发起的主导节点选举请求,则节点3可以执行下述操作:
S9、分析所接收的主导节点选举请求,确定发起节点(节点1或者节点2)申请作为该目标数据对象的目标主导节点,并获取该目标数据对象对应的当前数据编号。
S10、从本地存储的主导节点映射表中查找该目标数据对象对应的主导节点,并记为原始主导节点。
S11、比对发起节点和原始主导节点,如果发起节点和原始主导节点为同一存储节点,则向发起节点(节点1或者节点2)发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
S12、如果发起节点和原始主导节点不为同一存储节点,则获取本地预先对应所述目标数据对象存储的历史数据编号,当历史数据编号小于当前数据编号时,执行S13;当历史数据编号不小于当前数据编号时,执行S14。
S13、在满足响应消息发送条件后,将所述当前数据编号作为新的历史数据编号以及将所述发起节点作为新的原始主导节点,并向所述发起节点发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
S14、向所述发起节点发送拒绝所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
对于作为发起节点的节点1或者节点2,在接收到选举响应结果后,发起节点可以执行下述操作:
S15、接收其他存储节点对应所发送主导节点选举请求反馈的选举响应结果。
在本示例中,其他存储节点指节点3,但在实际应用中,其他存储节点可以是所有接收到主导节点选举请求的存储节点。
S16、统计选举响应结果为允许本节点作为所述目标数据对象的目标主导节点的节点数量。
S17、当所述节点数量大于或等于设定阈值时,确定本发起节点为该目标数据对象的目标主导节点,并更新所存储的主导节点映射表,以记录该目标数据对象及当前新关联的主导节点。
S18、向其他存储节点发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
S19、当所述节点数量小于设定阈值时,将目标选举响应结果中携带的当前主导节点,作为本地所存储主导节点映射表中所述目标数据对象对应的目标主导节点。
所述目标选举响应结果从所发送内容为拒绝所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果中选定。本示例中目标选举响应结果就是节点3反馈的选举响应结果。
S20、更新对应目标数据对象的历史数据编号,并将包含该目标数据对象的写入请求转发给作为所述目标主导节点的存储节点。
通过接收到S20所转发写入请求的存储节点(假设为节点2),该存储节点 在理论上同样相当于新的节点1,可以重新执行节点1对应的上述操作。
上述示例中所标注的步骤序号并不代表步骤的执行顺序,仅为一个步骤的序号表示。通过上述示例,可以看出强一致存储系统实现一个写入请求中所包含数据对象的强一致存储时,主要在于该数据对象所关联主导节点的确定,在确定主导节点后,后续可以直接由主导节点通过向其他存储节点发送强一致存储提交请求来完成对该数据对象的强一致存储提交。而对于数据对象所关联主导节点的确定,可根据存储在存储节点上的主导节点映射表来确定,最终存在的情况可能是:1)接收到该写入请求的存储节点就是数据对象的主导节点;2)接收到该写入请求的存储节点直接将写入请求转发给主导节点映射表中对应该数据对象记录的主导节点;3)接收到该写入请求的存储节点向其他节点发起想要成为该数据对象的主导节点的主导节点选举请求,并在请求允许后作为数据对象的主导节点,以及不允许时将其他存储节点推举的存储节点作为主导节点;4)接收到该写入请求的存储节点在满足主导节点转移条件时,将数据对象转移给候选主导节点,由候选主导节点向其他存储节点请求作为主导节点,并在请求允许后作为数据对象的主导节点,以及不允许时将其他存储节点推举的存储节点作为主导节点。
存储节点在进行数据对象所关联主导节点确认时,其上存储的主导节点映射表也相应进行更新,由此强一致系统通过存储节点进行强一致存储处理时对数据对象所关联主导节点的不断迭代,可以在一定时间后形成一个相对稳定的映射关系,从而实现强一致存储处理的时间最优。此外,强一致存储系统也可以预先基于存储节点的主导节点映射表来对存储节点进行部署,以使客户端生成的写入请求能够快速地被作为数据对象主导节点的存储节点所接收。
实施例三
图3是本申请实施例三提供的一种服务器的硬件结构示意图,该服务器设置为作为上述实施例一所提供强一致存储系统中的存储节点。可以包括:处理器和存储装置。存储装置中存储有至少一条指令,且指令由所述处理器执行,使得所述服务器可以执行上述实施例二所提供的数据强一致存储方法中存储节点对应的操作步骤。
参照图3,该服务器可以包括:处理器30、存储装置31、显示屏32、输入装置33、输出装置34以及通信装置35。该服务器中处理器30的数量可以是一个或者多个,图3中以一个处理器30为例。该服务器中存储装置31的数量可以是一个或者多个,图3中以一个存储装置31为例。该服务器的处理器30、存储装置31、显示屏32、输入装置33、输出装置34以及通信装置35可以通过总 线或者其他方式连接,图3中以通过总线连接为例。
实施例中,处理器30执行存储装置31中存储的一个或多个程序时,执行下述,如:针对接收到写入请求的每个存储节点,分析所接收的写入请求,确定所述写入请求中包含的目标数据对象;根据本地存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行本申请实施例提供的数据强一致存储方法。示例性的,上述实施例所述的方法包括:针对接收到写入请求的每个存储节点,分析所接收的写入请求,确定所述写入请求中包含的目标数据对象;根据本地存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
对于服务器及存储介质实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。

Claims (15)

  1. 一种强一致存储系统,包括:按地域划分的多个节点集群,每个节点集群中包括至少一个存储节点;
    每个存储节点,设置为存储主导节点映射表,其中,所述主导节点映射表用于记录数据对象及所述数据对象所映射的主导节点,所述主导节点为一节点集群下的一存储节点;以及分析本节点所接收的写入请求,确定所述写入请求中包含的目标数据对象,并根据本节点所存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
  2. 根据权利要求1所述的系统,其中,每个存储节点包括:
    对象确定模块,设置为分析本节点所接收的写入请求,确定所述写入请求中包含的目标数据对象;
    第一判定模块,设置为确定本节点所存储的主导节点映射表中是否存在所述目标数据对象的目标主导节点;
    第一执行模块,设置为响应于本节点所存储的主导节点映射表中不存在所述目标数据对象的目标主导节点,将本节点作为发起节点向其他存储节点发起对应所述目标数据对象的主导节点选举请求,以通过与其他存储节点的交互来确定所述目标主导节点,并基于确定的所述目标主导节点完成对所述目标数据对象的强一致存储提交;
    第二执行模块,设置为响应于本节点所存储的主导节点映射表中存在所述目标数据对象的目标主导节点,确定所述目标主导节点是否为本节点,并根据确定结果通过与其他存储节点的交互完成对所述目标数据对象的强一致存储提交。
  3. 根据权利要求2所述的系统,其中,所述第二执行模块设置为通过如下方式根据确定结果通过与所述其他存储节点的交互完成对所述目标数据对象的强一致存储提交:
    响应于本节点所存储的主导节点映射表中所存在的目标主导节点为本节点,结合所述目标数据对象对应的历史访问数据,确定所述目标数据对象是否满足设定的主导节点转移条件;
    响应于所述目标数据对象满足所述设定的主导节点转移条件,确定候选主导节点并将所述目标数据对象转移给所述候选主导节点,以基于所述候选主导节点确定所述目标数据对象的目标主导节点,并基于确定的所述目标主导节点完成对所述目标数据对象的强一致存储提交;
    响应于所述目标数据对象不满足所述设定的主导节点转移条件,向其他存 储节点发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
  4. 根据权利要求3所述的系统,其中,所述目标数据对象对应的历史访问数据中包括所述目标数据对象所关联写入请求的发起地域信息;
    所述主导节点转移条件为:连续设定次接收到的包含所述目标数据对象的写入请求均从同一个发起地域内产生,且本节点归属的地域与所述发起地域不同。
  5. 根据权利要求4所述的系统,其中,所述候选主导节点为从指定地域所对应节点集群中选择的一存储节点,其中,所述指定地域为所述目标数据对象所关联写入请求对应的发起地域。
  6. 根据权利要求3所述的系统,其中,作为所述候选主导节点的存储节点,设置为:
    向其他存储节点发起对应所述目标数据对象的主导节点选举请求,并将本节点记作发起节点;
    根据其他存储节点反馈的选举响应结果确定本节点被允许作为所述目标主导节点后,向其他存储节点发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
  7. 根据权利要求2所述的系统,其中,所述第二执行模块设置为通过如下方式根据确定结果通过与所述其他存储节点的交互完成对所述目标数据对象的强一致存储提交:
    响应于本节点所存储的主导节点映射表中所存在的目标主导节点不为本节点,将所述写入请求转发给作为所述目标主导节点的存储节点,以通过作为所述目标主导节点的存储节点根据接收的所述写入请求完成对所述目标数据对象的强一致存储提交。
  8. 根据权利要求2-7中任一项所述的系统,其中,接收到所述主导节点选举请求的存储节点,包括:
    信息查找模块,设置为接收作为发起节点的存储节点发送的所述主导节点选举请求,并从本地存储的主导节点映射表中查找所述目标数据对象对应的原始主导节点;
    选举响应模块,设置为将所述原始主导节点与所述发起节点进行比较,并根据比较结果向所述发起节点发送对应所述主导节点选举请求的选举响应结果。
  9. 根据权利要求8所述的系统,其中,所述选举响应模块设置为通过如下 方式根据比较结果向所述发起节点发送对应所述主导节点选举请求的选举响应结果:
    响应于所述原始主导节点与所述发起节点为同一个存储节点,向所述发起节点发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
  10. 根据权利要求8所述的系统,其中,所述选举响应模块设置为通过如下方式根据比较结果向所述发起节点发送对应所述主导节点选举请求的选举响应结果:
    响应于所述原始主导节点与所述发起节点为不同存储节点,获取本地预先对应所述目标数据对象存储的历史数据编号,并从所述主导节点选举请求中获取所述目标数据对象对应的当前数据编号;
    在所述当前数据编号大于所述历史数据编号的情况下,在满足响应消息发送条件后,将所述当前数据编号作为新的历史数据编号以及将所述发起节点作为新的原始主导节点,并向所述发起节点发送接受所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果;
    在所述当前数据编号不大于所述历史数据编号的情况下,向所述发起节点发送拒绝所述发起节点作为所述目标数据对象的目标主导节点的选举响应结果。
  11. 根据权利要求8所述的系统,其中,作为发起节点的存储节点,设置为:
    接收其他存储节点对应本节点所发送主导节点选举请求反馈的选举响应结果;
    统计选举响应结果为允许本节点作为所述目标数据对象的目标主导节点的节点数量;
    在所述节点数量大于或等于设定阈值的情况下,确定本节点为所述目标数据对象的目标主导节点以及进行本节点所存储的主导节点映射表的更新,并向其他存储节点发起强一致存储提交请求,以完成对所述目标数据对象的强一致存储提交。
  12. 根据权利要求11所述的系统,其中,所述作为发起节点的存储节点,还设置为:
    在所述节点数量小于设定阈值的情况下,将目标选举响应结果中携带的当前主导节点,作为本地所存储主导节点映射表中所述目标数据对象对应的目标主导节点,其中,所述目标选举响应结果从所发送内容为拒绝所述发起节点作 为所述目标数据对象的目标主导节点的选举响应结果中选定;
    将所述写入请求转发给作为所述目标主导节点的存储节点,以通过作为所述目标主导节点的存储节点根据接收的所述写入请求完成对所述目标数据对象的强一致存储提交。
  13. 一种数据强一致存储方法,由权利要求1-12任一项所述的强一致存储系统执行,包括:
    接收到写入请求的每个存储节点分析所接收的写入请求,确定所述写入请求中包含的目标数据对象;
    根据本地存储的主导节点映射表,通过与其他存储节点的交互,完成对所述目标数据对象的强一致存储提交。
  14. 一种服务器,设置为作为权利要求1-12中任一项所述强一致存储系统中的存储节点,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求13所述的数据强一致存储方法。
  15. 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求13所述的数据强一致存储方法。
PCT/CN2021/108190 2020-08-12 2021-07-23 强一致存储系统、数据强一致存储方法、服务器及介质 WO2022033290A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010809245.4A CN112000285A (zh) 2020-08-12 2020-08-12 强一致存储系统、数据强一致存储方法、服务器及介质
CN202010809245.4 2020-08-12

Publications (1)

Publication Number Publication Date
WO2022033290A1 true WO2022033290A1 (zh) 2022-02-17

Family

ID=73463362

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108190 WO2022033290A1 (zh) 2020-08-12 2021-07-23 强一致存储系统、数据强一致存储方法、服务器及介质

Country Status (2)

Country Link
CN (1) CN112000285A (zh)
WO (1) WO2022033290A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114567646A (zh) * 2022-03-08 2022-05-31 京东科技信息技术有限公司 数据处理方法、数据处理系统、电子设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000285A (zh) * 2020-08-12 2020-11-27 广州市百果园信息技术有限公司 强一致存储系统、数据强一致存储方法、服务器及介质
CN114185485B (zh) * 2021-11-04 2024-06-14 浙江华忆芯科技有限公司 静态电压表的节点处理方法、装置、计算机设备和存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268318A (zh) * 2013-04-16 2013-08-28 华中科技大学 一种强一致性的分布式键值数据库系统及其读写方法
CN104704773A (zh) * 2012-10-05 2015-06-10 微软公司 云存储环境中基于一致性的服务级协定
CN106161495A (zh) * 2015-03-25 2016-11-23 中兴通讯股份有限公司 一种主节点选举方法、装置及存储系统
CN106911728A (zh) * 2015-12-22 2017-06-30 华为技术服务有限公司 分布式系统中主节点的选取方法和装置
US20180368123A1 (en) * 2017-06-20 2018-12-20 Citrix Systems, Inc. Optimized Caching of Data in a Network of Nodes
CN109933284A (zh) * 2019-02-26 2019-06-25 启迪云计算有限公司 一种分布式块存储系统的数据分布算法
CN109995835A (zh) * 2017-12-29 2019-07-09 浙江宇视科技有限公司 主节点选举方法、装置和分布式存储系统
CN110169040A (zh) * 2018-07-10 2019-08-23 深圳花儿数据技术有限公司 基于多层一致性哈希的分布式数据存储方法与系统
CN112000285A (zh) * 2020-08-12 2020-11-27 广州市百果园信息技术有限公司 强一致存储系统、数据强一致存储方法、服务器及介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9203900B2 (en) * 2011-09-23 2015-12-01 Netapp, Inc. Storage area network attached clustered storage system
CN108600321A (zh) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 一种基于分布式内存云的图数据存储方法和系统
CN109062512B (zh) * 2018-07-26 2022-02-18 郑州云海信息技术有限公司 一种分布式存储集群、数据读写方法、系统及相关装置
CN110519354A (zh) * 2019-08-16 2019-11-29 济南浪潮数据技术有限公司 一种分布式对象存储系统及其业务处理方法和存储介质
CN111124301B (zh) * 2019-12-18 2024-02-23 深圳供电局有限公司 一种对象存储设备的数据一致性存储方法及系统
CN111338902B (zh) * 2020-02-28 2024-04-12 上海商汤智能科技有限公司 数据处理方法、装置及系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104704773A (zh) * 2012-10-05 2015-06-10 微软公司 云存储环境中基于一致性的服务级协定
CN103268318A (zh) * 2013-04-16 2013-08-28 华中科技大学 一种强一致性的分布式键值数据库系统及其读写方法
CN106161495A (zh) * 2015-03-25 2016-11-23 中兴通讯股份有限公司 一种主节点选举方法、装置及存储系统
CN106911728A (zh) * 2015-12-22 2017-06-30 华为技术服务有限公司 分布式系统中主节点的选取方法和装置
US20180368123A1 (en) * 2017-06-20 2018-12-20 Citrix Systems, Inc. Optimized Caching of Data in a Network of Nodes
CN109995835A (zh) * 2017-12-29 2019-07-09 浙江宇视科技有限公司 主节点选举方法、装置和分布式存储系统
CN110169040A (zh) * 2018-07-10 2019-08-23 深圳花儿数据技术有限公司 基于多层一致性哈希的分布式数据存储方法与系统
CN109933284A (zh) * 2019-02-26 2019-06-25 启迪云计算有限公司 一种分布式块存储系统的数据分布算法
CN112000285A (zh) * 2020-08-12 2020-11-27 广州市百果园信息技术有限公司 强一致存储系统、数据强一致存储方法、服务器及介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114567646A (zh) * 2022-03-08 2022-05-31 京东科技信息技术有限公司 数据处理方法、数据处理系统、电子设备及存储介质

Also Published As

Publication number Publication date
CN112000285A (zh) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2022033290A1 (zh) 强一致存储系统、数据强一致存储方法、服务器及介质
US8117156B2 (en) Replication for common availability substrate
CN111478820B (zh) 网络靶场大规模网络环境的网络设备配置系统与方法
WO2017133623A1 (zh) 一种数据流处理方法、装置和系统
US7801997B2 (en) Asynchronous interconnect protocol for a clustered DBMS
EP3380937A1 (en) Techniques for analytics-driven hybrid concurrency control in clouds
CN108363621B (zh) numa架构下的报文转发方法、装置、存储介质及电子设备
CN106569896B (zh) 一种数据分发及并行处理方法和系统
CN113835899B (zh) 针对分布式图学习的数据融合方法及装置
CN111147546B (zh) 一种边缘集群资源的处理方法及系统
CN111404931B (zh) 一种基于持久性内存的远程数据传输方法
US20220318071A1 (en) Load balancing method and related device
WO2022134797A1 (zh) 一种数据分片存储方法、装置、计算机设备和存储介质
WO2023082992A1 (zh) 数据处理方法以及系统
CN111338806A (zh) 一种业务控制方法及装置
US11947534B2 (en) Connection pools for parallel processing applications accessing distributed databases
WO2024037629A1 (zh) 区块链的数据整合方法、装置、计算机设备及存储介质
US10713187B2 (en) Memory controller having data access hint message for specifying the given range of one or more memory addresses
US11061719B2 (en) High availability cluster management of computing nodes
US12003588B2 (en) Coalescing packets with multiple writers in a stateless network function
FR3010201A1 (fr) Calculateur comprenant un processeur multicoeur et procede de controle d'un tel calculateur
CN105874435B (zh) 分布式事务中的非阻塞注册
CN105955819B (zh) 基于Hadoop的数据传输方法及系统
CN109408537A (zh) 基于Spark SQL的数据处理方法及装置、存储介质及计算设备
CN114205354A (zh) 事件管理系统、事件管理方法、服务器及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855343

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855343

Country of ref document: EP

Kind code of ref document: A1