WO2020143410A1 - 数据存储方法及装置、电子设备、存储介质 - Google Patents

数据存储方法及装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2020143410A1
WO2020143410A1 PCT/CN2019/125871 CN2019125871W WO2020143410A1 WO 2020143410 A1 WO2020143410 A1 WO 2020143410A1 CN 2019125871 W CN2019125871 W CN 2019125871W WO 2020143410 A1 WO2020143410 A1 WO 2020143410A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
standby
physical
client
Prior art date
Application number
PCT/CN2019/125871
Other languages
English (en)
French (fr)
Inventor
石建伟
王辉
吴克柱
时晖
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020143410A1 publication Critical patent/WO2020143410A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • One or more embodiments of this specification relate to the field of distributed storage technology, and in particular, to a data storage method and apparatus, electronic equipment, and storage medium.
  • Distributed key-value pair storage systems are mostly designed based on consistent hash algorithm.
  • the consistent hash algorithm can ensure that the data is distributed as evenly as possible on the physical nodes of the system, and when the physical node joins or exits the system, the data that needs to be migrated can be controlled to the minimum necessary range as much as possible.
  • one or more embodiments of this specification provide a data storage method and apparatus, electronic equipment, and storage medium.
  • a data storage method is proposed, which is applied to a physical node in a distributed key-value pair storage system; the method includes:
  • a data storage method is proposed, which is applied to a physical node in a distributed key-value pair storage system; the method includes:
  • Receiving data sent by the master node the data is sent by the client to the master node, and the master node is determined by the client by calculating the data based on a consistent hash algorithm;
  • a data storage method which is applied to a client in a distributed key-value pair storage system; the method includes:
  • a data storage device which is applied to a physical node in a distributed key-value pair storage system; the device includes:
  • a storage unit that stores data sent by a client, and the physical node is determined by the client to calculate the data based on a consistent hash algorithm to be the master node of the data;
  • a standby node determining unit determining a standby node for storing the data
  • the first sending unit sends the data to the standby node, so that the standby node stores the data.
  • a data storage device which is applied to a physical node in a distributed key-value pair storage system; the device includes:
  • the first receiving unit receives the data sent by the master node; the data is sent by the client to the master node, and the master node is determined by the client by calculating the data based on a consistent hash algorithm;
  • the storage unit stores the data.
  • a data storage device which is applied to a client in a distributed key-value pair storage system; the device includes:
  • the first determining unit calculates the data based on the consistent hash algorithm to determine the distributed key-value pair storage system as the master node of the data;
  • the first sending unit sends the data to the master node, so that the master node stores the data, and determines a standby node associated with itself and sends the data to the standby node;
  • a second determining unit that determines a standby node associated with the master node
  • the second sending unit sends the data to the master node and the standby node respectively, so that the master node and the standby node store the data.
  • an electronic device including:
  • Memory for storing processor executable instructions
  • the processor executes the executable instruction to implement the data storage method as described in the first aspect above.
  • a computer-readable storage medium is stored on which computer instructions are stored, which when executed by a processor implements the steps of the method as described in the first aspect.
  • an electronic device including:
  • Memory for storing processor executable instructions
  • the processor executes the executable instruction to implement the data storage method as described in the second aspect above.
  • a computer-readable storage medium is provided on which computer instructions are stored, which when executed by a processor implements the steps of the method as described in the second aspect.
  • an electronic device including:
  • Memory for storing processor executable instructions
  • a computer-readable storage medium is stored on which computer instructions are stored, which when executed by a processor implements the steps of the method as described in the third aspect.
  • FIG. 1 is a schematic structural diagram of a data storage system provided by an exemplary embodiment.
  • FIG. 2 is a flowchart of a data storage method provided by an exemplary embodiment.
  • FIG. 3 is a flowchart of another data storage method provided by an exemplary embodiment.
  • FIG. 4 is a flowchart of another data storage method provided by an exemplary embodiment.
  • FIG. 5 is an interaction diagram of a data storage method provided by an exemplary embodiment.
  • 6A-6C are schematic diagrams of a ring-shaped hash space provided by an exemplary embodiment.
  • FIG. 7 is an interaction diagram of a data reading method provided by an exemplary embodiment when the master node does not fail.
  • FIG. 8 is an interaction diagram of a method for reading data when a master node fails according to an exemplary embodiment.
  • FIG. 9 is a schematic diagram of a ring-shaped hash space when a master node fails according to an exemplary embodiment.
  • FIG. 10 is a schematic structural diagram of an apparatus provided by an exemplary embodiment.
  • FIG. 11 is a block diagram of a data storage device provided by an exemplary embodiment.
  • FIG. 12 is a schematic structural diagram of another device provided by an exemplary embodiment.
  • FIG. 13 is a block diagram of another data storage device provided by an exemplary embodiment.
  • FIG. 14 is a schematic structural diagram of another device provided by an exemplary embodiment.
  • 15 is a block diagram of another data storage device provided by an exemplary embodiment.
  • the steps of the corresponding method are not necessarily performed in the order shown and described in this specification.
  • the method may include more or fewer steps than described in this specification.
  • the single step described in this specification may be decomposed into multiple steps for description in other embodiments; and the multiple steps described in this specification may also be combined into a single step in other embodiments. description.
  • FIG. 1 is a schematic structural diagram of a data storage system provided by an exemplary embodiment.
  • the system may include a node statistics device 11, a network 12, a distributed key-value pair storage system 13, and several electronic devices.
  • the distributed key-value pair storage system 13 may include a physical node 131, a physical node 132, a physical node 133, etc.; electronic devices may include a PC 14, PC 15 and the like.
  • the node statistics device 11 may be a physical server containing an independent host, or may be a virtual server hosted by a host cluster. In the process of implementing the data storage scheme of this specification, the node statistics device is used to count the various physical nodes included in the distributed key-value pair storage system 13, so that each physical node and the electronic device interacting with the physical node can obtain physical data based on the statistical results The situation of nodes joining and exiting the distributed key-value pair storage system.
  • PC14-15 is just one type of electronic equipment that users can use. In fact, users obviously can also use electronic devices such as the following types: mobile phones, tablet devices, PDAs (Personal Digital Assistants), wearable devices (such as smart glasses, smart watches, etc.). One or more of this manual This embodiment does not limit this.
  • the electronic device can act as a client and interact with the distributed key-value physical nodes in the storage system 13 to store data in and read data from the physical nodes data.
  • the distributed key-value pair storage system 13 provides two services of "writing value by key” and “reading value by key” through multiple physical nodes. Among them, different key-value pairs are independent of each other, and there is no association relationship. In the process of implementing the data storage scheme of this specification, each physical node may store data sent by the client, and return corresponding data to the client in response to the client's read request.
  • the network 12 for interaction between the node statistical device 11, the distributed key-value pair storage system 13 and the electronic device may include various types of wired or wireless networks.
  • the network 12 may include the Internet.
  • each physical node in the distributed key-value pair storage system 13 can also communicate and interact through the network 12.
  • FIG. 2 is a flowchart of a data storage method according to an exemplary embodiment. As shown in FIG. 2, the method is applied to a physical node (as a master node of data to be stored) in a distributed key-value pair storage system, and may include the following steps:
  • step 202 the data sent by the client is stored, and the physical node is determined by the client to calculate the data based on a consistent hash algorithm to be the master node of the data.
  • the distributed key-value pair storage system may be designed based on a consistent hash algorithm, so that the data to be stored is distributed as evenly as possible between physical nodes, and when there are physical nodes joining or exiting the system Move as little data as possible.
  • the client can use the consistent hash algorithm to calculate the hash value corresponding to the data, and then search in the hash value space according to the preset direction (for example, fixed according to the order The clockwise direction is searched in the circular hash space) the virtual node with the closest distance to the hash value, so that the physical node corresponding to the virtual node is used as the master node to store the data.
  • the hash value space can be maintained by the node statistical equipment.
  • the node statistics device can count all physical nodes currently added to the distributed key-value pair storage system, and based on the consistent hash algorithm, map the virtual nodes of each physical node to the ring-shaped hash space as evenly as possible (also known as hash) Ring or hash bucket). Then, the client and each physical node can obtain the ring-shaped hash space from the node statistical device, or the node statistical device actively sends it to the client and each physical node; in other words, the client and the physical node both record the same ring Hash space. After the master node is determined, the client can send the data to be stored to the master node, so that the master node stores the data.
  • Step 204 Determine a standby node for storing the data.
  • a preset number of spare nodes can be further selected from the hash value space to store the data to prevent the master node from exiting the system This data is lost. Therefore, after receiving the data sent by the client, the master node can also determine the hash value corresponding to the data based on the consistent hash algorithm, and find it in the hash value space according to the preset direction (undertake the above For example, the first virtual node closest to the hash value can be fixedly searched in a clockwise direction in a circular hash space (taking the above example, the physical node corresponding to the first virtual node is actually the master node), And find a preset number of first spare virtual nodes closest to the first virtual node in the hash value space, so that the physical node corresponding to the first spare virtual node can be used as the spare node.
  • the first standby virtual node belongs to different physical nodes different from itself (ie, the master node); in other words
  • Step 206 Send the data to the backup node, so that the backup node stores the data.
  • the master node may store the data and return a receipt to the client after the storage is completed to inform the client that the storage is successful, and the operation for sending the data to the standby node .
  • the data can be asynchronously copied to back up the data to the standby node, thereby reducing the impact on the speed of responding to the client's request to store data, that is, to meet the requirements of weak consistency.
  • the data requested by the client is stored in the master node and the backup node respectively.
  • the master node responds to the read Request for data (standby node does not need to respond).
  • the client sends a read request for the data
  • the master can still locate the master according to the above method of calculating the hash value
  • the node that is, the master node can still receive the read request for the data sent by the client.
  • the master node can also check whether it is the master node of the data, and return it to the client when it determines that it is the master node of the data based on the consistent hash algorithm .
  • the physical node may directly return the corresponding data to the client without performing the above-mentioned verification operation.
  • the master node When the master node exits the distributed key-value pair storage system (for example, the master node fails), it can smoothly switch to the standby node, and the standby node (which also stores the data requested by the client to read) responds to the read data request In order to avoid the situation where the client cannot read the required data (that is, the data stored on the master node) when the master node is in abnormal operation.
  • the main node and the backup node corresponding to different data are different, so the load of each physical node can still be balanced, and there is no obvious difference between the main node and the backup node.
  • the client can determine whether the master node of the requested data has failed, and further determine the master node when the master node fails The standby node associated with the node (that is, the manner of determining the standby node in step 204 above), so as to send a read request to the standby node to obtain corresponding data.
  • the corresponding master node and standby node it will also change accordingly.
  • the newly added other physical nodes will serve as the master node for part of the data and also as the backup node for part of the other data. Therefore, the data needs to be backed up to other newly added physical nodes, so that the other physical nodes can normally provide services for reading data to clients after joining the system.
  • the data stored by the above physical node as the master node as an example, when other physical nodes join the distributed key-value pair storage system, and the other physical node acts as the master or backup node of the data, the data is sent to the other physical node Data to enable the other physical node to store the data.
  • the newly added physical node is the master node or the standby node of the data in the following manner: based on the consistent hash algorithm (when implementing the data storage scheme of this specification, both the client and the physical node adopt the same consistency Hash algorithm) to determine the hash value corresponding to the data, when the second virtual node closest to the hash value in the preset direction in the hash value space belongs to the other physical node, the other physical node can be determined The node is the master node of the data; when the virtual node of the other physical node belongs to the second standby virtual node, it can be determined that the other physical node is the standby node of the data.
  • the consistent hash algorithm when implementing the data storage scheme of this specification, both the client and the physical node adopt the same consistency Hash algorithm
  • the second standby virtual node is a preset number of virtual nodes (that is, the number of standby nodes) closest to the second virtual node in the hash value space, and the second standby virtual node belongs to the master that is different from the data Other different physical nodes of the node.
  • the corresponding standby node may no longer serve as a standby node for any data after newly adding other physical nodes in the distributed key-value pair storage system; in other words, in this case A piece of data is redundant data that does not need to be stored for the standby node. Therefore, the backup node can delete any data after the other physical node joins and backs up the data to save its storage space.
  • each physical node can delete the redundant data according to a preset frequency, or perform the operation of deleting the redundant data after learning that there are new physical nodes joining the system through the node statistics device.
  • each physical node can determine whether it is still a backup node of the stored data by using the above-mentioned method of calculating the hash value, and delete the data of the backup node that is not corresponding to itself from the local.
  • the number of standby nodes may be set to be positively related to the security level, storage space, and processing resources of the distributed key-value pair storage system. In other words, when the distributed key value requires a higher security level of the storage system, a relatively large number of spare nodes can be configured, thereby preventing data loss.
  • FIG. 3 is a flowchart of another data storage method provided by an exemplary embodiment.
  • the method is applied to a physical node in a distributed key-value storage system (as a backup node for data to be stored), and may include the following steps:
  • Step 302 Receive data sent by the master node; the data is sent by the client to the master node, and the master node is determined by the client by calculating the data based on a consistent hash algorithm.
  • a preset number of spare nodes can be further selected to store the data to prevent the data from being lost due to the master node exiting the system. Therefore, after receiving the data sent by the client, the master node can select a part of the physical nodes from the distributed key-value pair storage system as the backup node, and send the data stored by the backup node to the backup node so that the backup node stores the data, thereby Realize the backup of the data.
  • the master node can select a part of the physical nodes from the distributed key-value pair storage system as the backup node, and send the data stored by the backup node to the backup node so that the backup node stores the data, thereby Realize the backup of the data.
  • Step 304 Store the data.
  • the data requested by the client is stored in the master node and the backup node respectively.
  • the master node responds to the read The request to fetch data (the standby node does not need to respond); otherwise, the standby node (which also stores the data requested by the client to read) can respond to the request to read the data, thereby avoiding the abnormal operation of the master node
  • the client cannot read the required data (that is, the data stored on the master node).
  • the main node and the backup node corresponding to different data are different, so the load of each physical node can still be balanced, and there is no obvious difference between the main node and the backup node.
  • the client may further determine the standby node associated with the master node (that is, the method of determining the standby node in step 204 above), so as to report to the standby node Send a read request to get the corresponding data. Then, the standby node can receive the read request for the data sent by the client (that is, the read request is sent by the client when the master node exits the distributed key-value pair storage system), and returns the data to the client.
  • the standby node can switch to the master node of the data in response to the read request sent by the client. After the standby node is switched to the master node, it may still exit the distributed key-value pair storage system as the master node. Therefore, in order to prevent data loss, the standby node associated with itself may be further determined, and the data may be sent to the determined standby node, so that the standby node stores the data.
  • the process of determining the standby node is similar to the process of determining the standby node associated with itself by the master node shown in FIG. 2 described above, and details are not described herein again.
  • FIG. 4 is a flowchart of another data storage method provided by an exemplary embodiment. As shown in FIG. 4, this method is applied to clients in a distributed key-value pair storage system, and may include the following steps:
  • Step 402 Calculate the data based on the consistent hash algorithm to determine the distributed key-value pair storage system as the master node of the data.
  • Step 404 Send the data to the master node, so that the master node stores the data, and determine a backup node associated with itself and send the data to the backup node.
  • Step 406 Determine the backup node associated with the master node.
  • Step 408 Send the data to the master node and the backup node respectively, so that the master node and the backup node store the data.
  • the client can only determine the master node of the data to be stored, and the standby node associated with the master node is determined by the master node itself, and the data is also sent by the master node to the standby node. This can reduce the processing pressure of the client.
  • the client may further determine the standby node associated with the master node and send the data to the master node and the standby node respectively, thereby reducing the master node’s Pressure (the master node does not need to perform operations to determine the standby node and send data to the standby node).
  • FIG. 5 is an interaction diagram of a data storage method provided by an exemplary embodiment. As shown in FIG. 5, the interaction process may include the following steps:
  • step 502 the client calculates a hash value corresponding to the data to be stored.
  • Step 504 Determine the master node that stores the data.
  • the value space of the hash algorithm may be connected end to end to form a circular hash space. Further, for each physical node, the hash values of the corresponding multiple virtual nodes are calculated and mapped to the ring-shaped hash space. Then when you need to read and write data, you can calculate the hash value according to the key value of the data and map it to the ring-shaped hash space, and then find the physical node corresponding to the virtual node with the closest hash value according to a fixed direction (such as clockwise) , And the physical node is the master node that stores the data.
  • the value space of the ring-shaped hash space P is 0 to 2 32
  • the virtual nodes include B4, C3, A1, C1, A2, and D2.
  • virtual nodes A1 and A2 belong to physical node A
  • virtual node B4 belongs to physical node B
  • virtual nodes C1 and C3 belong to physical node C
  • virtual node D2 belongs to physical node D.
  • the hash value corresponding to the data to be stored is K
  • the position of the hash value K in the ring-shaped hash space P is shown by the arrow in the figure.
  • the first virtual node closest to the hash value K is B4 (that is, the virtual node B4 is the first virtual node), then the physical node corresponding to the virtual node B4 B can be used as the master node of the data to be stored.
  • step 506 the client sends a storage request to the determined master node.
  • step 508 the master node stores the data to be stored.
  • step 510 the master node returns to the customer a receipt of successful storage.
  • the master node after receiving the data (contained in the storage request) sent by the client, can store the data and return a receipt to the client after the storage is completed to inform the client that the storage is successful, and the The operation of the standby node to send the data can be achieved by asynchronously copying the data to back up the data to the standby node, thereby reducing the impact on the speed of responding to the client's request to store data, that is, to meet the requirements of weak consistency .
  • step 512 the master node determines a backup node associated with itself to store the data.
  • the number of standby nodes can be preset.
  • the number of spare nodes is positively related to the security level, storage space, and processing resources of the distributed key-value pair storage system. For example, the higher the security level requirements of the distributed key value on the storage system, the more spare nodes can be configured when storing data; the larger the storage space of each physical node in the distributed key value pair, the more data is stored The more spare nodes can be configured at the time; the more processing resources of each physical node in the distributed key-value pair storage system, the more spare nodes can be configured when storing data.
  • the two closest to the first virtual node B4 in the ring hash space P and not belonging to the physical node B are mutually exclusive
  • the same first standby virtual nodes are C3 and A1, respectively, so that physical node C and physical node A can be used as standby nodes.
  • step 514 the master node sends data to the determined standby node.
  • the client may only determine the master node of the data to be stored, and the backup node associated with the master node may be determined by the master node itself, and the data is also determined by the master The node sends to the standby node (that is, the above steps 512-514), which can reduce the processing pressure of the client.
  • the client may further determine the standby node associated with the master node and send the data to the master node and the standby node respectively, thereby reducing the master node’s Pressure (the master node does not need to perform operations to determine the standby node and send data to the standby node).
  • the process for the client to determine the master node and the standby node is the same as the process for the master node to determine the master node and the standby node, which will not be repeated here.
  • the corresponding master node and standby node it will also change accordingly.
  • the newly added other physical nodes will serve as the master node for part of the data and also as the backup node for part of the other data. Therefore, the data needs to be backed up to other newly added physical nodes, so that the other physical nodes can normally provide services for reading data to clients after joining the system.
  • the target data As an example, as shown in FIG. 6B, assume that when the physical node E joins the distributed key-value pair storage system, and The position of the virtual node E1 corresponding to the physical node E in the ring-shaped hash space P is between the hash value K and the virtual node B4 (that is, the virtual node E1 is the clockwise distance from the hash value K in the ring-shaped hash space P The nearest second virtual node), then the main node for the target data at this time is the newly added physical node E.
  • the physical node B which is the master node of the target data, needs to back up the target data to the physical node E.
  • the subsequent client needs to read the target data, it can locate the physical node E that is the master node of the target data according to the hash value K, thereby sending a read request to the physical node E to obtain the target data.
  • the node statistics device can be notified, so that the client can obtain the information that the physical node E has been backed up through the node statistics device, then when the subsequent client needs to read the target data, it can join the physical node E based on The circular hash space P after the virtual node determines the master node.
  • the corresponding standby node may no longer serve as a standby node for any data after newly adding other physical nodes in the distributed key-value pair storage system; in other words, in this case A piece of data is redundant data that does not need to be stored for the standby node. Therefore, the backup node can delete any data after the other physical node joins and backs up the data to save its storage space. For example, if the virtual node A1 shown in FIG. 6C is no longer a backup node for target data after the physical node E is added, then the physical node A corresponding to the virtual node A1 may delete the target data. As an exemplary embodiment, each physical node may delete redundant data according to a preset frequency, or perform an operation of deleting redundant data after learning that a new physical node has joined the system through the node statistics device.
  • the master node based on storing the data requested by the client in the master node and the standby node, when subsequent clients need to request to read the data, if the master node is in a normal working state, the master node will respond The request to read data (the standby node does not need to respond) is sufficient; otherwise, it can be smoothly switched to the standby node to respond to the request to read data by the standby node (which also stores the data requested by the client to read), which can be avoided When the master node is in abnormal operation, the client cannot read the required data (that is, the data stored on the master node).
  • the main node and the backup node corresponding to different data are different, so the load of each physical node can still be balanced, and there is no obvious difference between the main node and the backup node.
  • the process of the client requesting to read data will be described in detail below with reference to FIGS. 7-9.
  • FIG. 7 is an interaction diagram of a data reading method provided by an exemplary embodiment when the master node does not fail. As shown in FIG. 7, the interaction process may include the following steps:
  • Step 702 the client calculates the hash value corresponding to the target data.
  • Step 704 Determine the master node that stores the target data.
  • the client may directly send a read request for target data to the master node. For example, in FIG. 6A, if the physical node B has not failed, the client can still determine that the master node of the target data is the physical node B according to the above step 504.
  • Step 706 Send a read request for target data to the master node.
  • Step 708 the master node checks whether it is the master node or the standby node of the target data.
  • the physical node after receiving the read request, can verify whether it is the master node or the standby node of the corresponding data, and only if it is the master node of the corresponding data (the verification process is the same as the above-mentioned determination of the master node Similar to the process of the standby node), the corresponding data is returned to the customer.
  • the physical node may not need to perform the above verification operation, and the corresponding data may be returned by default. However, in this embodiment, the physical node B as the master node has not failed, so it can directly return the target data.
  • step 710 the master node reads the target data.
  • step 712 the master node returns the target data to the client.
  • FIG. 8 is an interaction diagram of a data reading method when a master node fails according to an exemplary embodiment. As shown in FIG. 8, the interaction process may include the following steps:
  • step 802 the client calculates the hash value corresponding to the target data.
  • Step 804 Determine the standby node that stores the target data.
  • Step 806 Send a read request for target data to the backup node 1.
  • the client may switch to send a read request for target data to the standby node.
  • the virtual nodes closest to the hash value K are C3 and A1, so that it can be determined that the backup node of the target data includes the physical node C (spare node 1 ) And physical node A (standby node 2).
  • the client may select the virtual node C3 closest to the hash value K after the physical node B exits the system, so as to send a read request for target data to the physical node C corresponding to the virtual node C3.
  • step 808 the backup node 1 checks whether it is the master node or the backup node of the target data.
  • step 810 the standby node 1 is switched to the master node of the target data.
  • step 812 the standby node 1 reads the target data.
  • step 814 the standby node 1 returns the target data to the client.
  • the physical node after receiving the read request, can verify whether it is the master node or the standby node of the corresponding data, and only if it is the master node of the corresponding data (the verification process is the same as the above-mentioned determination of the master node Similar to the process of the standby node), the corresponding data is returned to the customer.
  • the physical node may not need to perform the above-mentioned verification operation, and the corresponding data may be returned by default.
  • the physical node C (standby node 1) may determine that the virtual node B4 is closest to the hash value K, and It can be known that the physical node B corresponding to the virtual node B4 has failed.
  • the next closest to the hash value K is the virtual node C3, thus determining itself as the backup node of the target data, then switching itself to the target node of the target data in response to the read request sent by the client.
  • step 816 the backup node 1 sends the target data to the backup node 3.
  • Step 818 the backup node 3 stores the target data.
  • the standby node 1 after the standby node 1 is switched to the master node, it may still exit the distributed key-value pair storage system as the master node. Therefore, in order to prevent data loss, it is possible to further determine the standby node associated with itself (in the same way as the above-mentioned master node determines the standby node when it receives the storage request), and send the target data to the determined standby node to make the standby The node stores the target data. In other words, as the master node of any data, it should always ensure that there is a preset number of spare nodes associated with itself.
  • the three virtual nodes closest to the hash value K that belong to different physical nodes are B4, C3, and A1; as shown in FIG. 9, After Node B fails, the three virtual nodes closest to the hash value K that belong to different physical nodes are C3, A1, and D2. It can be seen that the physical node D is not a backup node of the target data before the physical node B fails, but is used as a backup node of the target data after the physical node B fails. Therefore, the target data needs to be backed up to the physical node D (that is, the standby node 3).
  • the physical node C can also respond to the read request first to reduce the impact on the speed of responding to the read request, and then asynchronously back up the target data to the physical In node D.
  • the device includes a processor 1002, an internal bus 1004, a network interface 1006, a memory 1008, and a non-volatile memory 1010. Of course, it may include hardware required for other services.
  • the processor 1002 reads the corresponding computer program from the non-volatile memory 1010 into the memory 1008 and then runs it to form a data storage device at a logical level.
  • one or more embodiments of this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, etc., that is to say, the execution body of the following processing flow is not limited to each
  • the logic unit may also be a hardware or logic device.
  • the data storage device is applied to a physical node in a distributed key-value pair storage system, and may include:
  • the storage unit 1101 stores data sent by a client, and the physical node is determined by the client to calculate the data based on a consistent hash algorithm to be the master node of the data;
  • the standby node determining unit 1102 determines a standby node for storing the data
  • the first sending unit 1103 sends the data to the standby node, so that the standby node stores the data.
  • the standby node determining unit 1102 is specifically used to:
  • a physical node corresponding to the first standby virtual node is used as the standby node.
  • Optional also includes:
  • the receiving unit 1104 receives the read request for the data sent by the client;
  • the data return unit 1105 returns the data to the client when it is determined that it is the master node of the data based on the consistent hash algorithm.
  • Optional also includes:
  • Optional also includes:
  • the hash value determining unit 1107 determines a hash value corresponding to the data based on the consistent hash algorithm
  • the second determining unit 1109 when the virtual node of the other physical node belongs to the second backup virtual node, determines that the other physical node is a backup node of the data, and the second backup virtual node is a value in the hash A preset number of virtual nodes closest to the second virtual node in the space, and the second standby virtual node all belong to other different physical nodes that are different from the master node of the data.
  • the number of the standby nodes is positively related to the security level, storage space, and processing resources of the distributed key-value pair storage system.
  • the device includes a processor 1202, an internal bus 1204, a network interface 1206, a memory 1208, and a non-volatile memory 1212. Of course, it may include hardware required for other services.
  • the processor 1202 reads the corresponding computer program from the non-volatile memory 1212 into the memory 1208 and then runs it to form a data storage device at a logical level.
  • one or more embodiments of this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, etc., that is to say, the execution body of the following processing flow is not limited to each
  • the logic unit may also be a hardware or logic device.
  • the data storage device is applied to a physical node in a distributed key-value pair storage system, and may include:
  • the first receiving unit 1301 receives the data sent by the master node; the data is sent by the client to the master node, and the master node is determined by the client by calculating the data based on a consistent hash algorithm;
  • the storage unit 1302 stores the data.
  • Optional also includes:
  • the second receiving unit 1303 receives the read request for the data sent by the client, and the read request is sent by the client when the master node exits the distributed key-value pair storage system;
  • Optional also includes:
  • the switching unit 1305 switches to the master node of the data
  • the determining unit 1306 determines the standby node associated with itself
  • the sending unit 1307 sends the data to the standby node, so that the standby node stores the data.
  • the device includes a processor 1402, an internal bus 1404, a network interface 1406, a memory 1408, and a non-volatile memory 1414. Of course, it may include hardware required for other services.
  • the processor 1402 reads the corresponding computer program from the non-volatile memory 1414 into the memory 1408 and then runs it to form a data storage device at a logical level.
  • one or more embodiments of this specification do not exclude other implementations, such as a logic device or a combination of software and hardware, etc., that is to say, the execution body of the following processing flow is not limited to each
  • the logic unit may also be a hardware or logic device.
  • the data storage device is applied to a client in a distributed key-value pair storage system, and may include:
  • the first determining unit 1501 calculates the data based on the consistent hash algorithm to determine the distributed key-value pair storage system as the master node of the data;
  • the first sending unit 1502 sends the data to the master node, so that the master node stores the data, and determines a standby node associated with itself and sends the data to the standby node;
  • the second sending unit 1504 sends the data to the master node and the standby node respectively, so that the master node and the standby node store the data.
  • the system, device, module or unit explained in the above embodiments may be specifically implemented by a computer chip or entity, or implemented by a product with a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or any combination of these devices.
  • the computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in a computer-readable medium, such as read only memory (ROM) or flash memory (flashRAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • flashRAM flash memory
  • Computer-readable media including permanent and non-permanent, removable and non-removable media, can store information by any method or technology.
  • the information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.

Abstract

本说明书一个或多个实施例提供一种数据存储方法及装置、电子设备、存储介质,该方法应用于分布式键值对存储系统中的物理节点,可以包括:存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点;确定用于存储所述数据的备用节点;向所述备用节点发送所述数据,以使所述备用节点存储所述数据。

Description

数据存储方法及装置、电子设备、存储介质 技术领域
本说明书一个或多个实施例涉及分布式存储技术领域,尤其涉及一种数据存储方法及装置、电子设备、存储介质。
背景技术
分布式键值对存储系统大多基于一致性哈希算法设计。通过一致性哈希算法可以保证数据尽可能均匀地分布在系统的物理节点上,并且在物理节点加入或退出系统时,可使得需要迁移的数据尽可能控制在最小必要范围内。
发明内容
有鉴于此,本说明书一个或多个实施例提供一种数据存储方法及装置、电子设备、存储介质。
为实现上述目的,本说明书一个或多个实施例提供技术方案如下:
根据本说明书一个或多个实施例的第一方面,提出了一种数据存储方法,应用于分布式键值对存储系统中的物理节点;所述方法包括:
存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点;
确定用于存储所述数据的备用节点;
向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
根据本说明书一个或多个实施例的第二方面,提出了一种数据存储方法,应用于分布式键值对存储系统中的物理节点;所述方法包括:
接收主节点发送的数据;所述数据由客户端发送至主节点,所述主节点由客户端基于一致性哈希算法对所述数据进行计算而确定出;
存储所述数据。
根据本说明书一个或多个实施例的第三方面,提出了一种数据存储方法,应用于分布式键值对存储系统中的客户端;所述方法包括:
基于一致性哈希算法对数据进行计算,以确定出所述分布式键值对存储系统中作为所述数据的主节点;
向所述主节点发送所述数据,以使所述主节点存储所述数据,以及确定出与自身相关联的备用节点并将所述数据发送至所述备用节点;
或者,确定出与所述主节点相关联的备用节点;
分别向所述主节点和所述备用节点发送所述数据,以使得所述主节点和所述备用节点存储所述数据。
根据本说明书一个或多个实施例的第四方面,提出了一种数据存储装置,应用于分布式键值对存储系统中的物理节点;所述装置包括:
存储单元,存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点;
备用节点确定单元,确定用于存储所述数据的备用节点;
第一发送单元,向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
根据本说明书一个或多个实施例的第五方面,提出了一种数据存储装置,应用于分布式键值对存储系统中的物理节点;所述装置包括:
第一接收单元,接收主节点发送的数据;所述数据由客户端发送至主节点,所述主节点由客户端基于一致性哈希算法对所述数据进行计算而确定出;
存储单元,存储所述数据。
根据本说明书一个或多个实施例的第六方面,提出了一种数据存储装置,应用于分布式键值对存储系统中的客户端;所述装置包括:
第一确定单元,基于一致性哈希算法对数据进行计算,以确定出所述分布式键值对存储系统中作为所述数据的主节点;
第一发送单元,向所述主节点发送所述数据,以使所述主节点存储所述数据,以及确定出与自身相关联的备用节点并将所述数据发送至所述备用节点;
或者,包括:第二确定单元,确定出与所述主节点相关联的备用节点;
第二发送单元,分别向所述主节点和所述备用节点发送所述数据,以使得所述主节点和所述备用节点存储所述数据。
根据本说明书一个或多个实施例的第七方面,提出了一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器通过运行所述可执行指令以实现如上述第一方面所述的数据存储方法。
根据本说明书一个或多个实施例的第八方面,提出了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如第一方面所述方法的步骤。
根据本说明书一个或多个实施例的第九方面,提出了一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器通过运行所述可执行指令以实现如上述第二方面所述的数据存储方法。
根据本说明书一个或多个实施例的第十方面,提出了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如第二方面所述方法的步骤。
根据本说明书一个或多个实施例的第十一方面,提出了一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器通过运行所述可执行指令以实现如上述第三方面所述的数据存储方法。
根据本说明书一个或多个实施例的第十二方面,提出了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如第三方面所述方法的步骤。
附图说明
图1是一示例性实施例提供的一种数据存储系统的架构示意图。
图2是一示例性实施例提供的一种数据存储方法的流程图。
图3是一示例性实施例提供的另一种数据存储方法的流程图。
图4是一示例性实施例提供的另一种数据存储方法的流程图。
图5是一示例性实施例提供的一种数据存储方法的交互图。
图6A-6C是一示例性实施例提供的环形哈希空间的示意图。
图7是一示例性实施例提供的在主节点未发生故障时的数据读取方法的交互图。
图8是一示例性实施例提供的在主节点发生故障时的数据读取方法的交互图。
图9是一示例性实施例提供的在主节点发生故障时环形哈希空间的示意图。
图10是一示例性实施例提供的一种设备的结构示意图。
图11是一示例性实施例提供的一种数据存储装置的框图。
图12是一示例性实施例提供的另一种设备的结构示意图。
图13是一示例性实施例提供的另一种数据存储装置的框图。
图14是一示例性实施例提供的另一种设备的结构示意图。
图15是一示例性实施例提供的另一种数据存储装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书一个或多个实施例相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书一个或多个实施例的一些方面相一致的装置和方法的例子。
需要说明的是:在其他实施例中并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤。在一些其他实施例中,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其他实施例中可能被分解为多个步骤进行描述;而本说明书中所描述的多个步骤,在其他实施例中也可能被合并为单个步骤进行描述。
图1是一示例性实施例提供的一种数据存储系统的架构示意图。如图1所示,该系统可以包括节点统计设备11、网络12、分布式键值对存储系统13和若干电子设备。其中,分布式键值对存储系统13可包括物理节点131、物理节点132和物理节点133 等;电子设备可包括PC14、PC15等。
节点统计设备11可以为包含一独立主机的物理服务器,或者可以为主机集群承载的虚拟服务器。在实施本说明书的数据存储方案的过程中,节点统计设备用于统计分布式键值对存储系统13包含的各个物理节点,从而各个物理节点以及与物理节点交互的电子设备可基于统计结果获取物理节点加入和退出分布式键值对存储系统的情况。
PC14-15只是用户可以使用的一种类型的电子设备。实际上,用户显然还可以使用诸如下述类型的电子设备:手机、平板设备、掌上电脑(PDAs,Personal Digital Assistants)、可穿戴设备(如智能眼镜、智能手表等)等,本说明书一个或多个实施例并不对此进行限制。在实施本说明书的数据存储方案的过程中,该电子设备可作为客户端与分布式键值对存储系统13中的物理节点进行交互,以将数据存储至物理节点中,以及向物理节点读取数据。
分布式键值对存储系统13通过多个物理节点提供“按照键写入值”和“按照键读取值”两种服务。其中,不同的键值对互相独立,不存在关联关系。在实施本说明书的数据存储方案的过程中,各个物理节点可存储客户端发送的数据,以及响应于客户端的读取请求,向客户端返回相应的数据。
而对于节点统计设备11、分布式键值对存储系统13和电子设备之间进行交互的网络12,可以包括多种类型的有线或无线网络。例如,该网络12可以包括因特网。当然,本说明书一个或多个实施例并不对此进行限制。同时,分布式键值对存储系统13中的各个物理节点之间也可以通过该网络12进行通讯交互。
请参见图2,图2是一示例性实施例提供的一种数据存储方法的流程图。如图2所示,该方法应用于分布式键值对存储系统中的物理节点(作为待存储数据的主节点),可以包括以下步骤:
步骤202,存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点。
在一实施例中,分布式键值对存储系统可基于一致性哈希算法设计,以使得所需存储的数据在物理节点间的分配尽可能均匀,以及在存在物理节点加入或退出系统的情况下尽可能少地迁移数据。进一步的,客户端在确定当前待存储的数据时,可利用一致性哈希算法计算对应于该数据的散列值,再按照预设方向在哈希取值空间中查找(例如,固定按照顺时针方向在环形哈希空间中查找)与该散列值距离最近的虚拟节点,从而将 该虚拟节点对应的物理节点作为存储该数据的主节点。其中,哈希取值空间可由节点统计设备来维护。例如,节点统计设备可统计当前加入分布式键值对存储系统的所有物理节点,并基于一致性哈希算法将各个物理节点的虚拟节点尽可能均匀地映射到环形哈希空间(又称哈希环或哈希桶)中。那么,客户端和各个物理节点均可向节点统计设备获取该环形哈希空间,或者由节点统计设备主动发送至客户端和各个物理节点;换言之,客户端和物理节点侧均记录有相同的环形哈希空间。在确定出主节点后,客户端可向主节点发送待存储的数据,以使主节点存储该数据。
步骤204,确定用于存储所述数据的备用节点。
在一实施例中,除主节点存储客户端发送的数据之外,还可进一步从哈希取值空间中选取出预设数量的备用节点来存储该数据,以防止因主节点退出系统而导致该数据丢失。因此,主节点在接收到客户端发送的数据后,可同样基于该一致性哈希算法确定对应于该数据的散列值,并按照预设方向在哈希取值空间中查找(承接于上述举例,可固定按照顺时针方向在环形哈希空间中查找)距离该散列值最近的第一虚拟节点(承接于上述举例,实际上对应于第一虚拟节点的物理节点便是主节点),并在该哈希取值空间中查找出距离第一虚拟节点最近的预设数量的第一备用虚拟节点,从而可将对应于第一备用虚拟节点的物理节点作为备用节点。其中,第一备用虚拟节点均属于区别于自身(即主节点)的各个不同物理节点;换言之,各个备用节点为相互不同的物理节点。
步骤206,向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
在一实施例中,主节点在接收到客户端发送的数据后,可先存储该数据并在完成存储后向客户端返回回执以告知客户端存储成功,而针对向备用节点发送该数据的操作,可采用异步复制该数据的方式来实现将该数据备份至备用节点中,从而降低对响应客户端请求存储数据的速度的影响,即符合弱一致性的要求即可。
在一实施例中,基于将客户端请求存储的数据分别存储在主节点和备用节点中,当后续客户端请求读取该数据时,若主节点处于正常工作状态,则由主节点来响应读取数据的请求(备用节点无需响应)即可。其中,客户端在发送针对该数据的读取请求时,若主节点处于正常工作状态(即未退出分布式键值对存储系统),则按照上述计算散列值的方式仍然可以定位到该主节点,即该主节点仍然可以接收到客户端发送的针对该数据的读取请求。同时,主节点在接收到读取请求后,可同样校验自身是否为该数据的主节点,并在基于一致性哈希算法确定出自身作为该数据的主节点时,向客户端返回该数据。当然,物理节点(无论是否为该数据的主节点)在接收到读取请求后,可无需执行 上述校验的操作,直接向客户端返回相应的数据。
而当主节点退出分布式键值对存储系统时(例如主节点发生故障),可平滑切换至备节点,由备用节点(同样存储有客户端请求读取的数据)来响应该读取数据的请求,从而可以避免在主节点处于非正常工作时客户端无法读取所需数据(即主节点上存储的数据)的情况。同时,在宏观上,不同的数据所对应的主节点和备用节点是不同的,因此各个物理节点的负载仍然可以达到均衡,不存在明显的主备差异。例如,客户端基于本地维护的系统中各个物理节点的状态(来自于节点统计设备),可确定所请求读取的数据的主节点是否发生故障,以及在主节点发生故障时进一步确定出与主节点相关联的备用节点(即采用上述步骤204中确定备用节点的方式),从而向备用节点发送读取请求以获取相应的数据。
在一实施例中,基于上述利用主节点和备用节点分别存储数据的机制,当分布式键值对存储系统中新加入其他物理节点时,针对任一数据,与之对应的主节点和备用节点也将随之发生变化。换言之,该新加入的其他物理节点,将作为部分数据的主节点,也将作为部分其他数据的备用节点。因此,需要将这些数据备份至新加入的其他物理节点中,以使得该其他物理节点在加入系统后可正常向客户端提供读取数据的服务。
以上述物理节点作为主节点所存储的数据为例,当其他物理节点加入分布式键值对存储系统,且该其他物理节点作为该数据的主节点或备用节点时,向该其他物理节点发送该数据以使该其他物理节点存储该数据。其中,可通过以下方式确定新加入的其他物理节点是否为该数据的主节点或者备用节点:基于一致性哈希算法(在实施本说明书的数据存储方案时,客户端和物理节点均采用同一一致性哈希算法)确定对应于该数据的散列值,当哈希取值空间中在预设方向上距离该散列值最近的第二虚拟节点属于该其他物理节点时,可判定该其他物理节点为该数据的主节点;当该其他物理节点的虚拟节点属于第二备用虚拟节点时,可判定该其他物理节点为该数据的备用节点。其中,第二备用虚拟节点为在哈希取值空间中距离第二虚拟节点最近的预设数量(即备用节点的数量)的虚拟节点,且第二备用虚拟节点均属于区别于该数据的主节点的其他不同物理节点。
进一步的,针对任一数据,与之对应的备用节点可能在分布式键值对存储系统中新加入其他物理节点后,不再作为该任一数据的备用节点;换言之,在此情况下该任一数据对于该备用节点来说为无需存储的冗余数据。因此,该备用节点可在该其他物理节点加入并备份完数据之后,将该任一数据删除以节省自身的存储空间。例如,各个物理 节点可按照预设频率来删除冗余数据,或者在通过节点统计设备得知存在新的物理节点加入系统后来执行删除冗余数据的操作。其中,各个物理节点可利用上述计算散列值的方式来确定出自身是否仍然为所存储数据的备用节点,并将自身并非与之对应的备用节点的数据从本地删除。
在一实施例中,可将备用节点的数量设定为与分布式键值对存储系统的安全等级、存储空间、处理资源呈正相关。换言之,当分布式键值对存储系统的安全等级要求越高时,可配置数量相对较多的备用节点,从而防止数据丢失。
相应的,请参见图3,图3是一示例性实施例提供的另一种数据存储方法的流程图。如图3所示,该方法应用于分布式键值对存储系统中的物理节点(作为待存储数据的备用节点),可以包括以下步骤:
步骤302,接收主节点发送的数据;所述数据由客户端发送至主节点,所述主节点由客户端基于一致性哈希算法对所述数据进行计算而确定出。
在一实施例中,除主节点存储客户端发送的数据之外,还可进一步选取出预设数量的备用节点来存储该数据,以防止因主节点退出系统而导致该数据丢失。因此,主节点在接收到客户端发送的数据后,可从分布式键值对存储系统中选取部分物理节点作为备用节点,并向备用节点发送自身存储的数据以使得备用节点存储该数据,从而实现对该数据的备份。其中,客户端确定主节点以及主节点确定出与自身相关联的备用节点的具体过程,可参考上述图2所示实施例的相应部分,在此不再赘述。
步骤304,存储所述数据。
在一实施例中,基于将客户端请求存储的数据分别存储在主节点和备用节点中,当后续客户端请求读取该数据时,若主节点处于正常工作状态,则由主节点来响应读取数据的请求(备用节点无需响应)即可;否则,可由备用节点(同样存储有客户端请求读取的数据)来响应该读取数据的请求,从而可以避免在主节点处于非正常工作时客户端无法读取所需数据(即主节点上存储的数据)的情况。同时,在宏观上,不同的数据所对应的主节点和备用节点是不同的,因此各个物理节点的负载仍然可以达到均衡,不存在明显的主备差异。
例如,客户端在确定出所请求读取的数据的主节点发生故障后,可进一步确定出与该主节点相关联的备用节点(即采用上述步骤204中确定备用节点的方式),从而向备用节点发送读取请求以获取相应的数据。那么,备用节点可接收客户端发送的针对该 数据的读取请求(即该读取请求由客户端在主节点退出分布式键值对存储系统时发送),并向客户端返回该数据。
由于数据的主节点退出分布式键值对存储系统,备用节点可切换为该数据的主节点来响应客户端发送的读取请求。而在该备用节点切换为主节点后,自身作为主节点仍然存在退出分布式键值对存储系统的可能。因此,为了防止数据丢失,可进一步确定与自身相关联的备用节点,并向确定出的备用节点发送该数据,以使备用节点存储该数据。其中,该确定备用节点的过程,与上述图2所示主节点确定与自身相关联的备用节点的过程类似,在此不再赘述。
相应的,请参见图4,图4是一示例性实施例提供的另一种数据存储方法的流程图。如图4所示,该方法应用于分布式键值对存储系统中的客户端,可以包括以下步骤:
步骤402,基于一致性哈希算法对数据进行计算,以确定出所述分布式键值对存储系统中作为所述数据的主节点。
步骤404,向所述主节点发送所述数据,以使所述主节点存储所述数据,以及确定出与自身相关联的备用节点并将所述数据发送至所述备用节点。
步骤406,确定出与所述主节点相关联的备用节点。
步骤408,分别向所述主节点和所述备用节点发送所述数据,以使得所述主节点和所述备用节点存储所述数据。
在一种情况下,客户端可仅确定出待存储的数据的主节点,而与主节点相关联的备用节点由主节点自身来确定即可,同时该数据也由主节点发送至备用节点,从而可降低客户端的处理压力。在另一种情况下,客户端在确定出待存储的数据的主节点后,可进一步确定与主节点相关联的备用节点,并分别向主节点和备用节点发送该数据,从而降低主节点的压力(主节点无需执行确定备用节点和向备用节点发送数据的操作)。
为了便于理解,下面结合场景和附图对本说明书的数据存储方案进行详细说明。
请参见图5,图5是一示例性实施例提供的一种数据存储方法的交互图。如图5所示,该交互过程可以包括以下步骤:
步骤502,客户端计算对应于待存储的数据的散列值。
步骤504,确定存储该数据的主节点。
在一实施例中,可将哈希算法的取值空间首尾相接来形成环形哈希空间。进一步 的,针对每个物理节点,计算相应的多个虚拟节点的散列值并映射到环形哈希空间。那么当需要读写数据时,可根据数据的键值计算散列值映射到环形哈希空间,再按照固定方向(比如顺时针方向)查找与该散列值距离最近的虚拟节点对应的物理节点,而该物理节点便是存储该数据的主节点。
举例而言,如图6A所示,环形哈希空间P的取值空间为0~2 32,虚拟节点包括B4、C3、A1、C1、A2、D2等。其中,虚拟节点A1、A2属于物理节点A,虚拟节点B4属于物理节点B,虚拟节点C1、C3属于物理节点C,虚拟节点D2属于物理节点D。假定对应于待存储的数据的散列值为K,散列值K在环形哈希空间P中的位置如图中箭头所示。以按照顺时针查找主节点为例,由图6A可知,距离散列值K最近的第一个虚拟节点为B4(即虚拟节点B4为第一虚拟节点),那么与虚拟节点B4对应的物理节点B则可作为待存储的数据的主节点。
步骤506,客户端向确定出的主节点发送存储请求。
步骤508,主节点存储该待存储的数据。
步骤510,主节点向客户单返回存储成功的回执。
在一实施例中,主节点在接收到客户端发送的数据(包含于存储请求中)后,可先存储该数据并在完成存储后向客户端返回回执以告知客户端存储成功,而针对向备用节点发送该数据的操作,可采用异步复制该数据的方式来实现将该数据备份至备用节点中,从而降低对响应客户端请求存储数据的速度的影响,即符合弱一致性的要求即可。
步骤512,主节点确定与自身相关联的用于存储该数据的备用节点。
在一实施例中,可预先设定备用节点的数量。其中,备用节点的数量与分布式键值对存储系统的安全等级、存储空间、处理资源呈正相关。例如,分布式键值对存储系统的安全等级要求越高,则在存储数据时可配置越多的备用节点;分布式键值对存储系统中各物理节点的存储空间越大,则在存储数据时可配置越多的备用节点;分布式键值对存储系统中各物理节点的处理资源越多,则在存储数据时可配置越多的备用节点。
承接于上述举例,如图6A所示,假定在存储数据时设定2个备用节点,那么在环形哈希空间P中距离第一虚拟节点B4最近,且不属于物理节点B的2个互不相同的第一备用虚拟节点分别为C3和A1,从而可将物理节点C和物理节点A作为备用节点。
步骤514,主节点向确定出的备用节点发送数据。
在一实施例中,在一种情况下,客户端可仅确定出待存储的数据的主节点,而与主节点相关联的备用节点由主节点自身来确定即可,同时该数据也由主节点发送至备用节点(即上述步骤512-514),从而可降低客户端的处理压力。在另一种情况下,客户端在确定出待存储的数据的主节点后,可进一步确定与主节点相关联的备用节点,并分别向主节点和备用节点发送该数据,从而降低主节点的压力(主节点无需执行确定备用节点和向备用节点发送数据的操作)。其中,客户端确定主节点和备用节点的过程,与上述由主节点来确定主节点和备用节点的过程相同,在此不再赘述。
步骤516,备用节点存储数据。
在一实施例中,基于上述利用主节点和备用节点分别存储数据的机制,当分布式键值对存储系统中新加入其他物理节点时,针对任一数据,与之对应的主节点和备用节点也将随之发生变化。换言之,该新加入的其他物理节点,将作为部分数据的主节点,也将作为部分其他数据的备用节点。因此,需要将这些数据备份至新加入的其他物理节点中,以使得该其他物理节点在加入系统后可正常向客户端提供读取数据的服务。
以图5所示实施例中物理节点B作为主节点所存储的数据(以下简称为目标数据)为例,如图6B所示,假定当物理节点E加入分布式键值对存储系统时,与物理节点E对应的虚拟节点E1在环形哈希空间P中的位置在散列值K与虚拟节点B4之间(即虚拟节点E1为环形哈希空间P中在顺时针方向上距离散列值K最近的第二虚拟节点),那么此时针对目标数据的主节点则为新加入的物理节点E。进一步的,在物理节点E加入分布式键值对存储系统之前作为目标数据的主节点的物理节点B,需要将目标数据备份至物理节点E中。在备份完成后,后续客户端在需要读取目标数据时,便可根据散列值K定位出作为目标数据的主节点的物理节点E,从而向物理节点E发送读取请求以获取目标数据。例如,在备份完成后可告知节点统计设备,使得客户端可通过节点统计设备获取到物理节点E已备份完成的信息,那么后续客户端在需要读取目标数据时,便可基于加入物理节点E的虚拟节点后的环形哈希空间P来确定主节点。
如图6C所示,假定当物理节点E加入分布式键值对存储系统时,与物理节点E对应的虚拟节点E1在环形哈希空间P中的位置在虚拟节点C3与A1之间(即虚拟节点E1为在环形哈希空间中距离虚拟节点B4最近的2个第二备用虚拟节点中的其中一个,另外一个为虚拟节点C3),那么此时针对目标数据的备用节点则为新加入的物理节点E和之前的物理节点C。进一步的,作为目标数据的主节点物理节点B需要将目标数据备份至物理节点E中,从而保证主节点始终有2个相关联的备用节点。
进一步的,针对任一数据,与之对应的备用节点可能在分布式键值对存储系统中新加入其他物理节点后,不再作为该任一数据的备用节点;换言之,在此情况下该任一数据对于该备用节点来说为无需存储的冗余数据。因此,该备用节点可在该其他物理节点加入并备份完数据之后,将该任一数据删除以节省自身的存储空间。例如,图6C所示的虚拟节点A1,在物理节点E加入后不再作为目标数据的备用节点,那么对应于虚拟节点A1的物理节点A可将目标数据删除。作为一示例性实施例,各个物理节点可按照预设频率来删除冗余数据,或者在通过节点统计设备得知存在新的物理节点加入系统后来执行删除冗余数据的操作。
由上述实施例可见,基于将客户端请求存储的数据分别存储在主节点和备用节点中,当后续客户端需要请求读取该数据时,若主节点处于正常工作状态,则由主节点来响应读取数据的请求(备用节点无需响应)即可;否则,可平滑切换至备节点以由备用节点(同样存储有客户端请求读取的数据)来响应该读取数据的请求,从而可以避免在主节点处于非正常工作时客户端无法读取所需数据(即主节点上存储的数据)的情况。同时,在宏观上,不同的数据所对应的主节点和备用节点是不同的,因此各个物理节点的负载仍然可以达到均衡,不存在明显的主备差异。下面结合图7-9对客户端请求读取数据的过程进行详细说明。
请参见图7,图7是一示例性实施例提供的在主节点未发生故障时的数据读取方法的交互图。如图7所示,该交互过程可以包括以下步骤:
步骤702,客户端计算对应于目标数据的散列值。
在一实施例中,以客户端向主节点请求读取上述图5中所存储的目标数据为例,对本说明书读取数据的过程进行详细说明。
步骤704,确定存储目标数据的主节点。
在一实施例中,当主节点未发生故障(即主节点未退出分布式键值对存储系统)时,客户端直接向主节点发送针对目标数据的读取请求即可。例如,在图6A中,若物理节点B未发生故障,则客户端按照上述步骤504的方式仍然可以确定出目标数据的主节点为物理节点B。
步骤706,向主节点发送针对目标数据的读取请求。
步骤708,主节点校验自身是否为目标数据的主节点或备用节点。
在一实施例中,物理节点在接收到读取请求后,可校验自身是否为相应数据的主 节点或备用节点,并且只在自身为相应数据的主节点(校验过程与上述确定主节点和备用节点的过程类似)时,向客户单返回该相应数据。当然,物理节点在接收到读取请求后,也可以无需执行上述校验的操作,默认返回相应数据即可。而在本实施例中,物理节点B作为主节点并未发生故障,因此可直接返回目标数据。
步骤710,主节点读取目标数据。
步骤712,主节点向客户端返回目标数据。
请参见图8,图8是一示例性实施例提供的在主节点发生故障时的数据读取方法的交互图。如图8所示,该交互过程可以包括以下步骤:
步骤802,客户端计算对应于目标数据的散列值。
在一实施例中,以客户端向主节点请求读取上述图5中所存储的目标数据为例,对本说明书读取数据的过程进行详细说明。
步骤804,确定存储目标数据的备用节点。
步骤806,向备用节点1发送针对目标数据的读取请求。
在一实施例中,当主节点发生故障(即主节点退出分布式键值对存储系统)时,客户端可切换为向备用节点发送针对目标数据的读取请求。
举例而言,如图9所示,当物理节点B发生故障时,与散列值K距离最近的虚拟节点为C3和A1,从而可确定出目标数据的备用节点包括物理节点C(备用节点1)和物理节点A(备用节点2)。例如,客户端可选取在物理节点B退出系统后距离散列值K最近的虚拟节点C3,从而向与虚拟节点为C3对应的物理节点C发送针对目标数据的读取请求。
步骤808,备用节点1校验自身是否为目标数据的主节点或备用节点。
步骤810,备用节点1切换为目标数据的主节点。
步骤812,备用节点1读取目标数据。
步骤814,备用节点1向客户端返回目标数据。
在一实施例中,物理节点在接收到读取请求后,可校验自身是否为相应数据的主节点或备用节点,并且只在自身为相应数据的主节点(校验过程与上述确定主节点和备用节点的过程类似)时,向客户单返回该相应数据。当然,物理节点在接收到读取请求 后,也可以无需执行上述校验的操作,默认返回相应数据即可。
举例而言,如图9所示,物理节点C(备用节点1)在接收到针对目标数据的读取请求后,可确定出距离散列值K最近的是虚拟节点B4,而通过节点统计神可知与虚拟节点B4对应的物理节点B已经发生故障。同时,下一个距离散列值K最近的是虚拟节点C3,从而确定出自身为目标数据的备用节点,那么将自身切换为目标数据的主节点以响应客户端发送的读取请求。
步骤816,备用节点1向备用节点3发送目标数据。
步骤818,备用节点3存储目标数据。
在一实施例中,在备用节点1切换为主节点后,自身作为主节点仍然存在退出分布式键值对存储系统的可能。因此,为了防止数据丢失,可进一步确定与自身相关联的备用节点(与上述主节点接收到存储请求时确定备用节点的方式相同),并向确定出的备用节点发送目标数据,以使该备用节点存储目标数据。换言之,作为任一数据的主节点,应始终保证存在预设数量的与自身相关联的备用节点。
举例而言,如图6A所示,在物理节点B发生故障之前,距离散列值K最近的3个分别属于不同物理节点的虚拟节点为B4、C3和A1;如图9所示,在物理节点B发生故障之后,距离散列值K最近的3个分别属于不同物理节点的虚拟节点为C3、A1和D2。可见,物理节点D在物理节点B发生故障之前,并非目标数据的备用节点,而在物理节点B发生故障之后作为目标数据的备用节点。因此,需将目标数据备份至物理节点D(即备用节点3)中。其中,由于读取请求仅由作为主节点的物理节点C来响应,物理节点C同样可以先响应读取请求,以降低对响应读取请求的速度的影响,然后再异步将目标数据备份至物理节点D中。
图10是一示例性实施例提供的一种设备的示意结构图。请参考图10,在硬件层面,该设备包括处理器1002、内部总线1004、网络接口1006、内存1008以及非易失性存储器1010,当然还可能包括其他业务所需要的硬件。处理器1002从非易失性存储器1010中读取对应的计算机程序到内存1008中然后运行,在逻辑层面上形成数据存储装置。当然,除了软件实现方式之外,本说明书一个或多个实施例并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
请参考图11,在软件实施方式中,该数据存储装置应用于分布式键值对存储系统 中的物理节点,可以包括:
存储单元1101,存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点;
备用节点确定单元1102,确定用于存储所述数据的备用节点;
第一发送单元1103,向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
可选的,所述备用节点确定单元1102具体用于:
基于所述一致性哈希算法确定对应于所述数据的散列值;
按照预设方向在哈希取值空间中查找距离所述散列值最近的第一虚拟节点;
在所述哈希取值空间中查找出距离所述第一虚拟节点最近的预设数量的第一备用虚拟节点,所述第一备用虚拟节点均属于区别于自身的各个不同物理节点;
将对应于所述第一备用虚拟节点的物理节点作为所述备用节点。
可选的,还包括:
接收单元1104,接收所述客户端发送的针对所述数据的读取请求;
数据返回单元1105,当基于所述一致性哈希算法确定出自身作为所述数据的主节点时,向所述客户端返回所述数据。
可选的,还包括:
第二发送单元1106,当其他物理节点加入所述分布式键值对存储系统,且所述其他物理节点作为所述数据的主节点或备用节点时,向所述其他物理节点发送所述数据,以使所述其他物理节点存储所述数据。
可选的,还包括:
散列值确定单元1107,基于所述一致性哈希算法确定对应于所述数据的散列值;
第一判定单元1108,当哈希取值空间中在预设方向上距离所述散列值最近的第二虚拟节点属于所述其他物理节点时,判定所述其他物理节点为所述数据的主节点;
第二判定单元1109,当所述其他物理节点的虚拟节点属于第二备用虚拟节点时,判定所述其他物理节点为所述数据的备用节点,所述第二备用虚拟节点为在哈希取值空间中距离所述第二虚拟节点最近的预设数量的虚拟节点,且所述第二备用虚拟节点均属 于区别于所述数据的主节点的其他不同物理节点。
可选的,所述备用节点的数量与所述分布式键值对存储系统的安全等级、存储空间、处理资源呈正相关。
图12是一示例性实施例提供的另一种设备的示意结构图。请参考图12,在硬件层面,该设备包括处理器1202、内部总线1204、网络接口1206、内存1208以及非易失性存储器1212,当然还可能包括其他业务所需要的硬件。处理器1202从非易失性存储器1212中读取对应的计算机程序到内存1208中然后运行,在逻辑层面上形成数据存储装置。当然,除了软件实现方式之外,本说明书一个或多个实施例并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
请参考图13,在软件实施方式中,该数据存储装置应用于分布式键值对存储系统中的物理节点,可以包括:
第一接收单元1301,接收主节点发送的数据;所述数据由客户端发送至主节点,所述主节点由客户端基于一致性哈希算法对所述数据进行计算而确定出;
存储单元1302,存储所述数据。
可选的,还包括:
第二接收单元1303,接收所述客户端发送的针对所述数据的读取请求,所述读取请求由所述客户端在所述主节点退出所述分布式键值对存储系统时发送;
返回单元1304,向所述客户端返回所述数据。
可选的,还包括:
切换单元1305,切换为所述数据的主节点;
确定单元1306,确定与自身相关联的备用节点;
发送单元1307,向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
图14是一示例性实施例提供的另一种设备的示意结构图。请参考图14,在硬件层面,该设备包括处理器1402、内部总线1404、网络接口1406、内存1408以及非易失性存储器1414,当然还可能包括其他业务所需要的硬件。处理器1402从非易失性存储器1414中读取对应的计算机程序到内存1408中然后运行,在逻辑层面上形成数据存储装置。当然,除了软件实现方式之外,本说明书一个或多个实施例并不排除其他实现方式, 比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
请参考图15,在软件实施方式中,该数据存储装置应用于分布式键值对存储系统中的客户端,可以包括:
第一确定单元1501,基于一致性哈希算法对数据进行计算,以确定出所述分布式键值对存储系统中作为所述数据的主节点;
第一发送单元1502,向所述主节点发送所述数据,以使所述主节点存储所述数据,以及确定出与自身相关联的备用节点并将所述数据发送至所述备用节点;
或者,包括:第二确定单元1503,确定出与所述主节点相关联的备用节点;
第二发送单元1504,分别向所述主节点和所述备用节点发送所述数据,以使得所述主节点和所述备用节点存储所述数据。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
在一个典型的配置中,计算机包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带、磁盘存储、量子存储器、基于石墨烯的存储介质或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定, 计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
在本说明书一个或多个实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书一个或多个实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
以上所述仅为本说明书一个或多个实施例的较佳实施例而已,并不用以限制本说明书一个或多个实施例,凡在本说明书一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例保护的范围之内。

Claims (26)

  1. 一种数据存储方法,应用于分布式键值对存储系统中的物理节点;所述方法包括:
    存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点;
    确定用于存储所述数据的备用节点;
    向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
  2. 根据权利要求1所述的方法,所述确定用于存储所述数据的备用节点,包括:
    基于所述一致性哈希算法确定对应于所述数据的散列值;
    按照预设方向在哈希取值空间中查找距离所述散列值最近的第一虚拟节点;
    在所述哈希取值空间中查找出距离所述第一虚拟节点最近的预设数量的第一备用虚拟节点,所述第一备用虚拟节点均属于区别于自身的各个不同物理节点;
    将对应于所述第一备用虚拟节点的物理节点作为所述备用节点。
  3. 根据权利要求1所述的方法,还包括:
    接收所述客户端发送的针对所述数据的读取请求;
    当基于所述一致性哈希算法确定出自身作为所述数据的主节点时,向所述客户端返回所述数据。
  4. 根据权利要求1所述的方法,还包括:
    当其他物理节点加入所述分布式键值对存储系统,且所述其他物理节点作为所述数据的主节点或备用节点时,向所述其他物理节点发送所述数据,以使所述其他物理节点存储所述数据。
  5. 根据权利要求4所述的方法,还包括:
    基于所述一致性哈希算法确定对应于所述数据的散列值;
    当哈希取值空间中在预设方向上距离所述散列值最近的第二虚拟节点属于所述其他物理节点时,判定所述其他物理节点为所述数据的主节点;
    当所述其他物理节点的虚拟节点属于第二备用虚拟节点时,判定所述其他物理节点为所述数据的备用节点,所述第二备用虚拟节点为在哈希取值空间中距离所述第二虚拟节点最近的预设数量的虚拟节点,且所述第二备用虚拟节点均属于区别于所述数据的主节点的其他不同物理节点。
  6. 根据权利要求1所述的方法,所述备用节点的数量与所述分布式键值对存储系统的安全等级、存储空间、处理资源呈正相关。
  7. 一种数据存储方法,应用于分布式键值对存储系统中的物理节点;所述方法包括:
    接收主节点发送的数据;所述数据由客户端发送至主节点,所述主节点由客户端基于一致性哈希算法对所述数据进行计算而确定出;
    存储所述数据。
  8. 根据权利要求7所述的方法,还包括:
    接收所述客户端发送的针对所述数据的读取请求,所述读取请求由所述客户端在所述主节点退出所述分布式键值对存储系统时发送;
    向所述客户端返回所述数据。
  9. 根据权利要求8所述的方法,还包括:
    切换为所述数据的主节点;
    确定与自身相关联的备用节点;
    向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
  10. 一种数据存储方法,应用于分布式键值对存储系统中的客户端;所述方法包括:
    基于一致性哈希算法对数据进行计算,以确定出所述分布式键值对存储系统中作为所述数据的主节点;
    向所述主节点发送所述数据,以使所述主节点存储所述数据,以及确定出与自身相关联的备用节点并将所述数据发送至所述备用节点;
    或者,确定出与所述主节点相关联的备用节点;
    分别向所述主节点和所述备用节点发送所述数据,以使得所述主节点和所述备用节点存储所述数据。
  11. 一种数据存储装置,应用于分布式键值对存储系统中的物理节点;所述装置包括:
    存储单元,存储客户端发送的数据,所述物理节点由所述客户端基于一致性哈希算法对所述数据进行计算而确定出作为所述数据的主节点;
    备用节点确定单元,确定用于存储所述数据的备用节点;
    第一发送单元,向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
  12. 根据权利要求11所述的装置,所述备用节点确定单元具体用于:
    基于所述一致性哈希算法确定对应于所述数据的散列值;
    按照预设方向在哈希取值空间中查找距离所述散列值最近的第一虚拟节点;
    在所述哈希取值空间中查找出距离所述第一虚拟节点最近的预设数量的第一备用 虚拟节点,所述第一备用虚拟节点均属于区别于自身的各个不同物理节点;
    将对应于所述第一备用虚拟节点的物理节点作为所述备用节点。
  13. 根据权利要求11所述的装置,还包括:
    接收单元,接收所述客户端发送的针对所述数据的读取请求;
    数据返回单元,当基于所述一致性哈希算法确定出自身作为所述数据的主节点时,向所述客户端返回所述数据。
  14. 根据权利要求11所述的装置,还包括:
    第二发送单元,当其他物理节点加入所述分布式键值对存储系统,且所述其他物理节点作为所述数据的主节点或备用节点时,向所述其他物理节点发送所述数据,以使所述其他物理节点存储所述数据。
  15. 根据权利要求14所述的装置,还包括:
    散列值确定单元,基于所述一致性哈希算法确定对应于所述数据的散列值;
    第一判定单元,当哈希取值空间中在预设方向上距离所述散列值最近的第二虚拟节点属于所述其他物理节点时,判定所述其他物理节点为所述数据的主节点;
    第二判定单元,当所述其他物理节点的虚拟节点属于第二备用虚拟节点时,判定所述其他物理节点为所述数据的备用节点,所述第二备用虚拟节点为在哈希取值空间中距离所述第二虚拟节点最近的预设数量的虚拟节点,且所述第二备用虚拟节点均属于区别于所述数据的主节点的其他不同物理节点。
  16. 根据权利要求11所述的装置,所述备用节点的数量与所述分布式键值对存储系统的安全等级、存储空间、处理资源呈正相关。
  17. 一种数据存储装置,应用于分布式键值对存储系统中的物理节点;所述装置包括:
    第一接收单元,接收主节点发送的数据;所述数据由客户端发送至主节点,所述主节点由客户端基于一致性哈希算法对所述数据进行计算而确定出;
    存储单元,存储所述数据。
  18. 根据权利要求17所述的装置,还包括:
    第二接收单元,接收所述客户端发送的针对所述数据的读取请求,所述读取请求由所述客户端在所述主节点退出所述分布式键值对存储系统时发送;
    返回单元,向所述客户端返回所述数据。
  19. 根据权利要求18所述的装置,还包括:
    切换单元,切换为所述数据的主节点;
    确定单元,确定与自身相关联的备用节点;
    发送单元,向所述备用节点发送所述数据,以使所述备用节点存储所述数据。
  20. 一种数据存储装置,应用于分布式键值对存储系统中的客户端;所述装置包括:
    第一确定单元,基于一致性哈希算法对数据进行计算,以确定出所述分布式键值对存储系统中作为所述数据的主节点;
    第一发送单元,向所述主节点发送所述数据,以使所述主节点存储所述数据,以及确定出与自身相关联的备用节点并将所述数据发送至所述备用节点;
    或者,包括:第二确定单元,确定出与所述主节点相关联的备用节点;
    第二发送单元,分别向所述主节点和所述备用节点发送所述数据,以使得所述主节点和所述备用节点存储所述数据。
  21. 一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1-6中任一项所述的方法。
  22. 一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如权利要求1-6中任一项所述方法的步骤。
  23. 一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求7-9中任一项所述的方法。
  24. 一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如权利要求7-9中任一项所述方法的步骤。
  25. 一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求10所述的方法。
  26. 一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如权利要求10所述方法的步骤。
PCT/CN2019/125871 2019-01-10 2019-12-17 数据存储方法及装置、电子设备、存储介质 WO2020143410A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910023907.2A CN110049091A (zh) 2019-01-10 2019-01-10 数据存储方法及装置、电子设备、存储介质
CN201910023907.2 2019-01-10

Publications (1)

Publication Number Publication Date
WO2020143410A1 true WO2020143410A1 (zh) 2020-07-16

Family

ID=67274098

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/125871 WO2020143410A1 (zh) 2019-01-10 2019-12-17 数据存储方法及装置、电子设备、存储介质

Country Status (2)

Country Link
CN (1) CN110049091A (zh)
WO (1) WO2020143410A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110049091A (zh) * 2019-01-10 2019-07-23 阿里巴巴集团控股有限公司 数据存储方法及装置、电子设备、存储介质
CN110336891A (zh) * 2019-07-24 2019-10-15 中南民族大学 缓存数据分布方法、设备、存储介质及装置
CN110633053B (zh) * 2019-09-16 2020-07-10 北京马赫谷科技有限公司 存储容量均衡方法、对象存储方法及装置
CN110764705B (zh) * 2019-10-22 2023-08-04 北京锐安科技有限公司 一种数据的读写方法、装置、设备和存储介质
CN112511634A (zh) * 2020-12-02 2021-03-16 北京邮电大学 一种数据获取方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011157144A2 (zh) * 2011-05-31 2011-12-22 华为技术有限公司 数据读写方法、装置和存储系统
CN106254240A (zh) * 2016-09-18 2016-12-21 腾讯科技(深圳)有限公司 一种数据处理方法和路由层设备以及系统
CN106572153A (zh) * 2016-10-21 2017-04-19 乐视控股(北京)有限公司 集群的数据存储方法及装置
CN110049091A (zh) * 2019-01-10 2019-07-23 阿里巴巴集团控股有限公司 数据存储方法及装置、电子设备、存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105939389A (zh) * 2016-06-29 2016-09-14 乐视控股(北京)有限公司 负载均衡方法及装置
CN106503139A (zh) * 2016-10-20 2017-03-15 上海携程商务有限公司 动态数据存取方法及系统
CN108810041B (zh) * 2017-04-27 2021-03-05 华为技术有限公司 一种分布式缓存系统的数据写入及扩容方法、装置
CN108235751B (zh) * 2017-12-18 2020-04-14 华为技术有限公司 识别对象存储设备亚健康的方法、装置和数据存储系统
CN108460292A (zh) * 2018-02-07 2018-08-28 冼钇冰 基于物联网的冷链物流数据存储方法、装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011157144A2 (zh) * 2011-05-31 2011-12-22 华为技术有限公司 数据读写方法、装置和存储系统
CN106254240A (zh) * 2016-09-18 2016-12-21 腾讯科技(深圳)有限公司 一种数据处理方法和路由层设备以及系统
CN106572153A (zh) * 2016-10-21 2017-04-19 乐视控股(北京)有限公司 集群的数据存储方法及装置
CN110049091A (zh) * 2019-01-10 2019-07-23 阿里巴巴集团控股有限公司 数据存储方法及装置、电子设备、存储介质

Also Published As

Publication number Publication date
CN110049091A (zh) 2019-07-23

Similar Documents

Publication Publication Date Title
WO2020143410A1 (zh) 数据存储方法及装置、电子设备、存储介质
US9031910B2 (en) System and method for maintaining a cluster setup
US10613978B2 (en) Application cache replication to secondary application(s)
CA2940328C (en) Reducing data volume durability state for block-based storage
US9720620B1 (en) Efficient data volume replication for block-based storage
US7716280B2 (en) State reflection
US8595366B2 (en) Method and system for dynamically creating and servicing master-slave pairs within and across switch fabrics of a portable computing device
US7680908B2 (en) State replication
US8151062B2 (en) Consistency models in a distributed store
CA3048737A1 (en) Service processing and consensus method and device
CN105765554A (zh) 在分布式存储系统上分发数据
WO2018054079A1 (zh) 一种存储文件的方法、第一虚拟机及名称节点
WO2021057108A1 (zh) 一种读数据方法、写数据方法及服务器
WO2018120810A1 (zh) 一种解决数据冲突的方法和系统
CN107861691B (zh) 一种多控存储系统的负载均衡方法和装置
WO2020134678A1 (zh) 容灾方法、装置及系统
JP2019527883A (ja) データベースのデータ変更要求処理方法及び装置
WO2023142543A1 (zh) 分布式数据库的主备切换方法、装置及可读存储介质
US8732346B2 (en) Coordination of direct I/O with a filter
CN111488247B (zh) 一种管控节点多次容错的高可用方法及设备
WO2023066198A1 (zh) 分布式数据处理
US10785295B2 (en) Fabric encapsulated resilient storage
CN113946542A (zh) 数据处理方法以及装置
CN116578334B (zh) 基于配置化的用户在线动态对接方法及系统
EP3757864B1 (en) Method and system for performing computations in a distributed system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19909052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19909052

Country of ref document: EP

Kind code of ref document: A1