WO2016197666A1 - 一种服务器集群系统中的缓存方法、写入点客户端和读客户端 - Google Patents

一种服务器集群系统中的缓存方法、写入点客户端和读客户端 Download PDF

Info

Publication number
WO2016197666A1
WO2016197666A1 PCT/CN2016/077385 CN2016077385W WO2016197666A1 WO 2016197666 A1 WO2016197666 A1 WO 2016197666A1 CN 2016077385 W CN2016077385 W CN 2016077385W WO 2016197666 A1 WO2016197666 A1 WO 2016197666A1
Authority
WO
WIPO (PCT)
Prior art keywords
client
data
write
read
written
Prior art date
Application number
PCT/CN2016/077385
Other languages
English (en)
French (fr)
Inventor
王道辉
丁萌
周文明
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016197666A1 publication Critical patent/WO2016197666A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the present invention relates to computer technology, and in particular, to a cache method, a client, and a storage system of a distributed server cluster system.
  • the head is widely used (the so-called head is a client that can distribute the request sent by the application to the storage medium and manage the internal volume of the system) plus the storage logic process on the physical medium side.
  • the client when the system processes the request sent by the application, the client usually receives the request, and then forwards the request to the storage side for processing.
  • a storage logical process close to a physical medium is usually deployed with a Cache, and a request to enter a storage logical process can be returned to the Cache without being written to the physical medium to be returned externally.
  • an effective method is to deploy a layer of distributed Cache on the client side.
  • the machine head After receiving the request sent by the application, the machine head writes the data to the distributed Cache for the write request and then returns. In this way, for the write delay, since the above layer network can be reduced Delaying the performance of the write is improved; for the read request, the read cache statistics hotspot data is cached in the distributed cache, and the hit rate of the read request at the local head is improved, thereby improving the read performance.
  • the distributed storage system provides a block interface externally, that is, the user/application sees one disk block device, and each disk block device corresponds to a volume inside the system, and is implemented by the client at the same time. If a disk block device is mounted on a server node, after the head is deployed on multiple servers, the disk block device can be mounted on multiple servers for one volume inside the system. Thus, when an application on multiple servers accesses a disk block device, it is actually accessing the same volume inside the system, that is, the same data source.
  • a data source will have multiple clients concurrently reading and writing.
  • After deploying a layer of distributed Cache on the client side how to ensure high performance requirements in this scenario while maintaining the client side.
  • the consistency of read Cache and Write Cache is a core issue.
  • an existing solution is to improve by maintaining a data relationship directory.
  • the performance of reading and writing Although the method of data relational directory is adopted, the location of data storage can be clearly recorded, and the problem of data conflict is solved, but the scheme has two defects.
  • the location of the data storage is not fixed, which results in a relationship structure of the data relational directory, which directly causes a large storage space to be consumed, and Considering the failure scenario, this data directory may need to be persisted, which brings the performance cost of updating the data relational directory.
  • multiple Caches hold a data directory the data relationship must be guaranteed.
  • Consistency of the directory in multiple Caches which means that when processing read and write requests, it is necessary to detect whether there has been an update change in the data, and whether the update operation occurs on the IO path, so this will also Impede the improvement of read and write performance.
  • an embodiment of the present invention provides a method for write cache consistency in a server cluster, where the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client, each The client is configured with a write cache, and the write cache is used for 6 caches to write data of each client, and the method includes:
  • the write point client receives a data write message, the data write message requests to write data to be written, and determines a primary write client for saving the data to be written according to the feature value of the data to be written. And at least one standby write client, the primary write client and each standby write client respectively belong to different servers;
  • the first notification message is used to notify the primary write client and each of the standby write clients to change the synchronization state of the data to be written recorded from the unsynchronized to the already synchronized Synchronize.
  • the determining, according to the feature value of the data to be written, determining a primary write client and at least one write-ahead Into the client specifically:
  • the determining, by the data distribution view, the primary write corresponding to the partition partition to which the data to be written belongs include:
  • an embodiment of the present invention provides a method for reading cache coherency in a server cluster, where the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client, each The client is configured with a read cache, and the read cache is used to cache hotspot data frequently accessed by the application in each client, and the method includes:
  • the write point client receives a data update request, and the data update request is used to request to update the to-be-updated data, and generate a data update notification according to the feature value of the to-be-updated data, where the data update notification carries the data indicating the to-be-updated data.
  • the characteristic value ;
  • Receiving a response message that the data to be updated is successfully processed by the read client Sending a response message that the update of the data to be updated is successful, and the response message of the update of the data to be updated is used to indicate that the read client has updated the read cache of the read client for the data to be updated. deal with.
  • the method further includes: the write point client searching for hotspot information
  • the directory table determines the read client according to the feature value of the data to be updated, and the hotspot information directory table is used to indicate all clients that have the data to be updated cached in the read cache.
  • the method further includes:
  • the write point client receives hotspot information broadcasted by the read client, and records the read client into the hotspot information directory table, where the hotspot information is used to indicate that the read client has been cached. The data to be updated.
  • an embodiment of the present invention provides a method for reading cache coherency in a server cluster, where the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client, each The client is configured with a read cache, and the read cache is used to cache hotspot data frequently accessed by the application in each client, and the method includes:
  • the read client receives a data update notification sent by the write point client, where the data update notification carries a feature value indicating data to be updated, and the read client includes the server cluster except the write point client All other clients, or clients that have saved the data to be updated;
  • the performing, by the data update notification, the update processing of the read cache of the read client specifically: the read client according to the feature value And confirming whether the data to be updated is cached in the read cache of the read client, and if it is confirmed that the data to be updated is not cached, adding the data to be updated to the invalid record.
  • the method further includes:
  • the read client broadcasts hotspot information to all clients except the read client in the server cluster when the data to be updated is cached into the read cache of the read client.
  • the hot spot information is used to indicate that the read client has already cached the data to be updated.
  • an embodiment of the present invention provides a write point client in a server cluster, where
  • the server cluster includes n servers, n is a natural number ⁇ 2, each server is configured with at least one client, each client is configured with a write cache, and the write cache is used for cache writes to each Client data, the write point client includes:
  • a receiving module configured to receive a data write message, where the data write message requests to write data to be written
  • a determining module configured to determine, according to the feature value of the data to be written, a primary write client and at least one standby write client for saving the data to be written, the primary write client and each The write-write client belongs to different servers respectively;
  • a sending module configured to separately send the data to be written to a write cache of each of the primary write client and each of the backup write clients;
  • a notification module configured to: when determining that the to-be-written data is successfully saved by the primary write client and each of the backup write clients, to the primary write client and each of the backup devices The write client sends a first notification message, where the first notification message is used to notify the primary write client and each of the backup write clients to synchronize the synchronization status of the data to be written Not synced to synced.
  • the notification module is further configured to send a second notification message, where the second notification message is used to notify the main write client and the at least one write-once client that the client successfully writes records the location
  • the synchronization status of the write data is recorded as being out of sync with the client that failed to write.
  • the write point client further includes a data distribution view, where the data distribution view is used to indicate each
  • the determining module is configured to: calculate, according to the feature value of the to-be-written data, a consistent hash algorithm to calculate the to-be-written data, corresponding to the primary write client and the standby write client Determining a hash value corresponding to the feature value, determining, according to the hash value, a partition partition to which the data to be written belongs; determining, according to the data distribution view, the primary write corresponding to the partition partition to which the data to be written belongs The client and the at least one standby write client.
  • the determining module is specifically configured to:
  • an embodiment of the present invention provides a write point client in a server cluster, where the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client.
  • Each client is configured with a read cache, and the read cache is used to cache hotspot data frequently accessed by the application in each client, where the write point client includes:
  • a receiving module configured to receive a data update request, where the data update request is used to request to update data to be updated;
  • a processing module configured to generate a data update notification according to the feature value of the data to be updated, where the data update notification carries the feature value indicating the data to be updated;
  • a notification module configured to send the data update notification to a read client corresponding to the data update request in the server cluster, where the read client includes the server cluster except the write point client All other clients, or clients that have saved the data to be updated;
  • the notification module is further configured to send a response message that the update of the data to be updated is successful, and the response to the update of the data to be updated is successful, when receiving the response message that the data to be updated is successfully processed by the read client.
  • the message is used to indicate that the read client has performed update processing on the read cache of the read client for the data to be updated.
  • the write point client further includes a hotspot information directory table, where the hotspot information directory table is used to indicate that all the data to be updated are cached in the read cache.
  • the processing module is further configured to search the hotspot information directory table, and determine the read client according to the feature value of the to-be-updated data.
  • the receiving module is further configured to receive hotspot information broadcasted by the read client, where the hotspot information is used to indicate The processing client has already cached the data to be updated, and the processing module is further configured to record the read client into the hotspot information directory table.
  • an embodiment of the present invention provides a read client in a server cluster, where the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client.
  • Each client is configured with a read cache, which is used to cache each guest Hotspot data frequently accessed by the application in the client, the read client includes:
  • a receiving module configured to receive a data update notification sent by the write point client, where the data update notification carries a feature value indicating the data to be updated;
  • a processing module configured to perform update processing on the read cache of the read client according to the data update notification
  • a sending module configured to send, to the write point client, a response message that the data to be updated is processed successfully.
  • the processing module is configured to confirm, according to the feature value, whether the data to be updated is cached in a read cache of the read client, and if the cache is not confirmed When the update data is mentioned, the data to be updated is added as a record that is invalid.
  • the read client is configured to cache the data to be updated into a read cache of the read client
  • the sending module is further configured to broadcast hotspot information to all clients except the read client in the server cluster, where the hotspot information is used to indicate that the read client has cached the to-be-updated data.
  • an embodiment of the present invention provides a server cluster system, where the server cluster system includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one of the fourth aspect or the fourth aspect.
  • the write point client described in any of the possible implementations of the fifth aspect or the fifth aspect, and the read client as described in any of the sixth aspect or the sixth aspect.
  • an embodiment of the present invention provides a computer, including: a processor, a memory, a bus, and a communication interface;
  • the memory is configured to store computer execution instructions
  • the processor is coupled to the memory via the bus, and when the computer is running, the processor executes the computer-executed instructions stored by the memory to cause
  • the method for performing the write cache coherency provided by the first aspect, the first implementation of the first aspect, or the read cache coherency provided by the second aspect, any of the possible implementations of the second aspect The method of reading cache coherency provided by any of the possible implementations of the third aspect, the third aspect.
  • a ninth aspect the embodiment of the present invention provides a computer readable medium, comprising: computer execution instructions, when the processor of the computer executes the computer execution instruction, the computer performs any of the above first aspect, the first aspect A method of write cache coherency provided by a possible implementation, Or the method for reading cache coherency provided by any of the possible implementations of the second aspect, the second aspect, or the method for reading cache coherency provided by any of the third aspect and the third aspect.
  • the write point client when the write point client receives the data to be written, the write point client determines the main write for saving the to-be-written data according to the feature value of the to-be-written data. And the at least one standby write client, and the write point client separately sends the to-be-written data to the write cache of the primary write client and each of the backup write clients respectively And confirming, according to the write success response message returned by the primary write client and each of the backup write clients, that the write data is successfully written, so that the primary write client and the data to be written are The at least one standby write client can achieve consistency, thereby ensuring write cache consistency in the distributed cache under the server cluster.
  • FIG. 1 is a schematic structural diagram of a system of a distributed server cluster system 100 according to an embodiment of the invention
  • FIG. 2 is an exemplary flowchart of a method 200 of write cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention
  • FIG. 3 is an exemplary flowchart of a method 300 of write cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention
  • FIG. 4 is an exemplary flow diagram of a method 400 of read cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention
  • FIG. 5 is an exemplary flow diagram of a method 500 of read cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention
  • FIG. 6 is a schematic diagram showing the logical structure of a write point client 600 according to an embodiment of the invention.
  • FIG. 7 is a schematic diagram showing the logical structure of a write point client 700 according to an embodiment of the invention.
  • FIG. 8 is a schematic diagram showing the logical structure of a read client 800 according to an embodiment of the invention.
  • FIG. 9 is a schematic diagram showing the logical structure of a computer 900 according to an embodiment of the invention.
  • FIG. 1 is a schematic diagram of a system structure of the distributed server system 100.
  • the distributed server cluster system 100 only three servers 101, 102, and 103 are provided as examples and not limitations (the system can follow its own Actually, the number of servers needs to be flexibly adjusted.
  • Each server has a client deployed, and each client is provided with a write cache cache and a read cache cache (not shown), and each client's write cache and read cache can be independent. Set or two logical spaces in the same cache.
  • the system 100 also includes storage nodes 140, 141, 142 and Cache 130, Cache 131, and Cache 132.
  • the above three servers 101, 102, 103 implement cluster interconnection with storage nodes 140, 141, 142 by, for example, but not limited to, a computing network, a conversion cable InfiniBand, or an Ethernet Fibre Channel FCoE.
  • the Cache 130, the Cache 131, and the Cache 132 are caches on the storage node 140, 141, and 142 side, and are configured to receive data requests sent from the client.
  • the distributed storage system 100 provides a block interface externally, that is, the user/application sees one disk block device, and each disk block device corresponds to a volume inside the system (such as Volume1, Volume2, and Volume3), at the same time, is to mount a disk block device on the server node through the client. Then, as shown in FIG.
  • the location of the primary copy of the data to be written is uniquely determined by view management for each data to be written (ie, the primary write of the data to be written is stored) Client).
  • view management refers to the algorithm of consistent hashing for the distributed Cache, that is, the system adopts a distributed hash table DHT (Distributed Hash Table), and the DHT ring space is used as an example.
  • the DHT ring space is 2 32.
  • the ring space formed by the super large virtual node is divided into N equal parts, each aliquot is a partition Partition, the number of partition Partitions (N) is fixed, and at the same time, which disk/storage medium each Partition belongs to (of course, the system can According to the actual needs, the number of partitions and the corresponding relationship between each Partition and the disk/storage medium are dynamically adjusted. And the system evenly distributes the partition Partitions to the Caches on all clients in the system and saves the distribution information. As an example and not limitation, the system saves the distribution information of the partition Partitions in the form of data distribution views.
  • the write point client calculates a hash value according to the feature key value of the data to be written received by the client, and determines the Partition to which the data to be written belongs according to the hash value, since each Partition corresponds to one The client, thus determining the location where the primary copy is stored for each data to be written received by any of the write point clients.
  • Partition granularity refers to which Part of the control Partition actually falls
  • the location of the client-side Cache to which each copy of the Partition belongs is determined, thereby indirectly determining which client-side Caches each of the data to be written needs to be forwarded for storage.
  • the client that receives the primary and backup copies of the data to be written ie, the primary write client and the standby write client
  • the primary write client and the standby write client needs to receive and cache each other.
  • a write success response message is sent to the write point client to inform the write point that the client has successfully processed.
  • the write point client can return the write success to the application that sends the data to be written after receiving the write success response message sent by the client to which the data copy belongs, and informs all the main writes of the data to be written.
  • the client and the standby write client have synchronized data to be written.
  • client 110 is a write point client.
  • the client 110 determines, by using a consistency hash algorithm (DHT), that the primary copy of the data to be written belongs to the client 112, and the backup copy belongs to the client 113, and the client 110 respectively writes data to be written.
  • the master copy is sent to the client 112 and the backup copy is sent to 113. If the client 110 receives the write success response message sent by the clients 112 and 113 respectively, the application successfully returns a write success response to the APP that sends the data to be written, and simultaneously informs the primary write client 112 and the backup write client 113.
  • the data to be written has been synchronized.
  • the write point client treats the data to be uniquely determined by the view management, and determines the location of the primary and backup copies of the data to be written (in the case of multiple copies), and receives the client to which the primary and backup copies belong.
  • the write success response message returned by the terminal determines the synchronization state of the data to be written, and finally passes Knowing that the data to be written is to be written to the client and the write-once client, the data to be written has been synchronized, so that the primary and backup copies of the data to be written can be consistent, ensuring that the data to be written is in the server cluster.
  • Write cache consistency in distributed Cache
  • the write point client needs to inform all other clients who may cache the data to be updated while receiving the data to be updated.
  • the data is invalidated.
  • the client 110 is a write point client, and the client 110 needs to notify all other clients that may cache the data to be updated in the write process flow (ie, The client 110 is in all other clients in the same network cluster, such as the clients 112, 113 in FIG. 1, so that other clients that cache the data to be updated invalidate the data.
  • the client 110 may record which clients are cached with the data to be updated. Then, when updating the data to be updated, it is only necessary to notify the clients that cache the data to be updated according to the previous record.
  • the receiving point client sends a notification to all other clients in the server cluster or the client that caches the data to be updated while receiving the data to be updated, so as to cache the data to be updated.
  • the client invalidates the data to be updated, ensuring the consistency of the read cache of each client side under the server cluster.
  • FIG. 2 is an exemplary flow diagram of a method 200 of write cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention.
  • the method 200 can be, but is not limited to, specifically applied to the distributed server cluster 100 shown in FIG. 1 above or other distributed server clusters obtained based on the flexible deformation of the system 100. It should be noted that although the flow of the method 200 described below includes multiple operations occurring in a particular order, it should be clearly understood that these operations may also include more or fewer operations that may be performed sequentially or in parallel ( For example using a parallel processor or a multi-threaded environment). As shown in FIG. 2, method 200 includes the following steps:
  • the write point client receives the data to be written sent by the application, which is only an example and not a limitation.
  • the data to be written carries a feature value key
  • the feature value key is a logical address of the data to be written (
  • the key value is constructed by the logical unit number LUN of the data to be written and the logical block address LBA.
  • Step S202 optionally, the write point client calculates the hash according to the key value of the data to be written. And determining, according to the hash value, the partition Partition to which the data to be written belongs (here, it is calculated that the partition to which the data to be written belongs is P1).
  • Step S203 optionally, after determining the partition Partition (P1) to which the data to be written belongs, the write point client determines the partition Partition to which the data to be written belongs by querying the internally stored data distribution view. All primary write clients and standby write clients of P1). It should be noted that, for convenience of description, only two copies of the data of the primary and secondary data to be written are given by way of example, and those skilled in the art can understand that the system can adjust the copy of the data to be written at any time according to actual needs of the system. The number of copies, therefore, is not intended to limit the scope of the embodiments and the scope of the invention.
  • Step S204 the write point client sends the data to be written to the main write client of the data to be written (the client where the data master copy is to be written).
  • Step S205 the primary write client of the data to be written receives the data to be written, and allocates a cache space for the data to be written.
  • Step S206 the main write client sets the synchronization state of the data to be written to be unsynchronized with other clients (UNOK).
  • the primary write client sets the value of the segment of cache space allocated for the data to be written in step S205 to UNOK in its metadata structure.
  • Step S207 the main write client returns a write success response message to the write point client, where the write success response message is used to notify the write point client that the main write client has successfully written.
  • Step S208 the write point client sends the data to be written to the standby write client of the data to be written (the client where the data backup copy is to be written).
  • Step S209 the standby write client of the data to be written receives the data to be written, and allocates a cache space for the data to be written.
  • Step S210 the standby write client sets the synchronization state of the data to be written to be unsynchronized with other clients (UNOK).
  • the standby write client sets the value of the segment of cache space allocated for the data to be written in step S209 to UNOK in its metadata structure.
  • Step S211 the standby write client returns a write success response message to the write point client, where the write success response message is used to notify the write point client that the write client has successfully written.
  • Step S212 if the write point client receives the write success response message returned from the main write client and the backup write client respectively, then the process proceeds to step S213.
  • Step S213 the write point client returns a write success to the application that sends the data to be written.
  • Step S214 the write point client sends a first notification message for the data to be written (only for By way of example and not limitation, the first notification message is an OK message), and all clients (primary write client and standby write client) that have a copy of the data to be written are notified to be written The sync status of the data is changed to be synchronized with other clients (OK).
  • the first notification message is an OK message
  • all clients primary write client and standby write client
  • step S215 after receiving the OK message, the main write client changes the synchronization state of the data to be written recorded from the unsynchronized (UNOK) to the synchronized (OK).
  • the primary write client sets the value of the buffer space allocated for the data to be written in step S205 to OK in its metadata structure.
  • Step S216 After receiving the OK message, the standby write client changes the synchronization state of the data to be written recorded from the unsynchronized (UNOK) to the synchronized (OK). Optionally, the standby write client sets the value of the buffer space allocated for the data to be written in step S209 to OK in its metadata structure.
  • the write point client queries the data distribution view to determine the client where the primary and backup copies of the Partition to which the data to be written belongs, and increases the judgment function, that is, the write point client.
  • the data distribution view it is determined whether there is a UNOK client in the client where the primary and backup copies of the Partition to which the data belongs are written (the client of UNOK means that the client has failed or the client with the write point)
  • the terminal loses the communication connection
  • the write point client sends a flag to notify the OK copy/client to write the data to be written in the process of copying the data to be written to each copy.
  • the sync status is set to not sync with the UNOK copy/client.
  • the OK client when receiving the data to be written carrying the identifier, the OK client records in the respective metadata structure that the to-be-written data is not synchronized with the UNOK copy/client, and then the write point is The client returns a write success response message.
  • step S204 and step S208 can be performed simultaneously, that is, the write point client concurrently sends the data to be written to the primary write client and the standby write client, thereby reducing the time. Delay.
  • the hash value can be calculated according to the feature value key of the data to be written, and the partition Partition to which the data to be written belongs belongs is determined according to the hash value.
  • the write point client determines the main write client and the write-write of the partition Partition to which the data to be written belongs by querying the internally stored data distribution view.
  • the data to be written is synchronized with the standby write client. Enabling the write point client to quickly and accurately find the primary write client and the standby write client storing the data to be written, and ensure write consistency of the data to be written in the distributed cache under the server cluster .
  • FIG. 2 details the flow of writing data in the distributed server cluster system according to an embodiment of the present invention.
  • the flow of the write data abnormality in the distributed server cluster system according to an embodiment of the present invention will now be described in detail.
  • 3 is an exemplary flow diagram of a method 300 of write cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention.
  • the method 300 can be, but is not limited to, specifically applied to the distributed server cluster system 100 shown in FIG. 1 above or other distributed server cluster systems obtained based on the flexible deformation of the system 100.
  • method 300 includes multiple operations occurring in a particular order, it should be clearly understood that these operations may also include more or fewer operations that may be performed sequentially or in parallel ( For example using a parallel processor or a multi-threaded environment). As shown in FIG. 3, method 300 includes the following steps:
  • Step S301 the write point client receives the data to be written sent by the application, and the data to be written carries the feature value key, which is only an example and not a limitation, and the feature value key is the logical address of the data to be written ( For example, the key value is constructed by the logical unit number LUN of the data to be written and the logical block address LBA.
  • Step S302 optionally, the write point client calculates a hash value according to the key value of the data to be written, and determines, according to the hash value, a partition Partition to which the data to be written belongs (here, it is assumed that the waiting is calculated The partition to which the data is written is P1).
  • Step S303 optionally, after determining the partition Partition (P1) to which the data to be written belongs, the write point client determines the partition Partition to which the data to be written belongs by querying the internally stored data distribution view.
  • P1 The primary write client and the standby write client.
  • the number of copies is not intended to limit the scope and scope of the invention.
  • Step S304 the write point client copies the data to be written to the client (the primary write client) where the master copy of the data to be written is located.
  • Step S305 the primary write client of the data to be written receives the data to be written, and allocates a cache space for the data to be written.
  • Step S306 the main write client sets the synchronization state of the data to be written to be unsynchronized with other clients (UNOK).
  • the primary write client is in its metadata node
  • the value of the buffer space allocated for the data to be written in step S305 is set to UNOK.
  • Step S307 the main write client returns a write success response message to the write point client, where the write success response message is used to notify the write point client that the main write client has successfully written.
  • Step S308 the write point client copies the data to be written to the client where the backup copy of the data to be written is located (the backup write client).
  • the client on which the standby copy of the data to be written is located is a write-once client that fails or does not respond.
  • Step S309 if the write point client only receives the write success response message returned by the main write client, the write point client is in a waiting state.
  • Step S310 if the write point client receives the backup write client failure notification sent by the data distribution view management node, the write point client no longer waits for the response of the standby write client, and sends the to write The application that entered the data returned a successful write.
  • the fault notification sent by the data distribution view management node is used to notify the write point client that the backup copy of the data to be written is failed to be written.
  • the data distribution view management node monitors whether each client in the system is running normally by means of, for example, but not limited to, heartbeat detection or periodically sending an inquiry message. If the client fails, the system will notify other clients of the system by broadcast message.
  • Step S311 the write point client sends a second notification message, where the second notification message indicates the client that failed to write, and is used to notify other successfully written copies/clients (ie, the primary write client to be written data)
  • the synchronization state of the data to be written is set to be out of synchronization with the client that failed to write (ie, the standby write client of the data to be written).
  • Step S312 after receiving the second notification message in step S311, the main write client changes the synchronization state of the data to be written recorded to the standby write client of the data to be written.
  • the primary write client records, in its metadata structure, a segment of the cache space allocated for the data to be written in step S305 as being out of synchronization with the standby write client of the data to be written.
  • step S303 the write point client queries the data distribution view to determine the client where the primary and backup copies of the Partition to which the data to be written belongs, and increases the judgment.
  • Step S304 and step S308 can be performed simultaneously, that is, the write point client concurrently sends the data to be written to the master and backup copies, thereby reducing the delay.
  • the write point client when the standby write client receiving the data to be written fails or does not respond, the write point client can notify the receiving of the failure to be written according to the fault notification sent by the data distribution view management node.
  • the main write client sets the synchronization state of the data to be written to be prepared The incoming client is not synchronized, and the external return is successful, thus ensuring that when the main write client or the standby write client receiving the data to be written in the system has an abnormality, the data writing process can still be performed normally, and the system is improved. stability.
  • FIG. 4 an exemplary flowchart of a method 400 of reading cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention.
  • the method 400 can be, but is not limited to, specifically applied to the distributed server cluster system 100 shown in FIG. 1 above or other distributed server cluster systems obtained based on the flexible deformation of the system 100.
  • method 400 includes multiple operations occurring in a particular order, it should be clearly understood that these operations may also include more or fewer operations that may be performed sequentially or in parallel (for example, a parallel processor or a multi-threaded environment is used; at the same time, for convenience of description, three clients are listed as an example in the solution, and those skilled in the art should know that the number of clients is not limited to the scope of protection of the present invention, and the system can be based on The needs of oneself are flexible. As shown in FIG. 4, method 400 includes the following steps:
  • step S401 the write point client receives a data update request sent by the application, where the data update request is used to request to update the data to be updated, which is only an example and not limited.
  • the data to be updated carries the feature value key (for convenience, the assumption is made here)
  • the feature value is key1)
  • the feature value key is a logical address of the data to be updated (eg, the key value is constructed by the logical unit number LUN of the data to be updated and the logical block address LBA).
  • Step S402 optionally, the write point client checks whether the data to be updated is cached in the read cache local to the write point according to the feature value key1 of the data to be updated. If it is detected that the data to be updated is included in the read buffer, the data to be updated is set to a failure state or the data to be updated is directly updated.
  • Step S403 the write point client generates a data update notification according to the feature value of the data to be updated, and broadcasts the data update notification to all other clients in the same cluster, and the data update notification carries the to-be-updated
  • the feature value key1 of the data is used to notify all other clients to set the data to be updated to a invalid state so as not to provide a read service to the outside.
  • Step S404 the first client receives the data update notification broadcasted by the write point client, and checks whether the data to be updated is cached in the read cache of the local node according to the feature value key1 carried in the data update notification. After the data to be updated is set to the invalid state (but the hotspot information of the data to be updated is saved), step S405 is performed; if not, the data to be updated is added as invalid. The record is further executed in step S405.
  • Step S405 the first client returns a response to the successful processing of the write point client, and the response of the processing success is used to indicate that the first client has updated the read cache for the data to be updated.
  • Step S406 the second client receives the data update notification broadcasted by the write point client, and checks whether the data to be updated is cached in the read cache of the local node according to the feature value key1 carried in the data update notification. Then, the data to be updated is set to the invalid state (but the hotspot information of the data to be updated is saved), and then step S407 is performed; if not, the record to be updated is added as the invalid record, and then step S407 is performed.
  • Step S407 the second client returns a response of processing success to the write point client, and the successful response is used to indicate that the second client has updated the read cache for the data to be updated.
  • Step S408 The third client receives the data update notification broadcasted by the write point client, and checks whether the data to be updated is cached in the read cache of the local node according to the feature value key1 carried in the data update notification. Then, the data to be updated is set to the invalid state (but the hotspot information of the data to be updated is saved), and then step S409 is performed; if not, the data to be updated is added as the invalid record, and then step S409 is performed.
  • Step S409 the third client returns a processing success response to the write point client, and the successful response is used to indicate that the third client has performed update processing on its read cache for the data to be updated.
  • Step S410 If the write-point client receives the processing success response sent by all other clients, the application returning the processing to the data to be updated is successfully processed. It should be noted that, in the specific implementation process, if the write client does not receive a successful response from a client, the write client is in a wait state until the write client waits for the client. The end-to-end response to a successful response or the receipt of a notification from the data distribution view management node that the client has failed can be processed successfully.
  • the write point client broadcasts the data update notification to all other clients in the same cluster in the process of receiving the data to be updated, so that the client that caches the data can know and Recording the data has expired, thus avoiding the situation that once a client has data to be updated and other clients are unknowingly causing the use of old data, the consistency of the read cache of each head side under the server cluster is ensured.
  • FIG. 5 an exemplary flowchart of a method 500 of reading cache coherency in a distributed server cluster system in accordance with an embodiment of the present invention.
  • the method 500 can be, but is not limited to, specifically applied to the distributed server cluster system 100 shown in FIG. 1 above or other distributed server cluster systems obtained based on the flexible deformation of the system 100.
  • method 500 includes multiple operations occurring in a particular order, it should be clearly understood that these operations may also include more or fewer operations that may be performed sequentially or in parallel (for example, a parallel processor or a multi-threaded environment is used; at the same time, for convenience of description, three clients are listed as an example in the solution, and those skilled in the art should know that the number of clients is not limited to the scope of protection of the present invention, and the system can be based on The needs of oneself are flexible. As shown in FIG. 5, method 500 includes the following steps:
  • step S501 the write point client receives a data update request sent by the application, where the data update request is used to request to update the data to be updated, which is only an example, and the data to be updated carries the feature value key (for convenience, the assumption is made here)
  • the feature value is key1)
  • the feature value key is a logical address of the data to be updated (eg, the key value is constructed by the logical unit number LUN of the data to be updated and the logical block address LBA).
  • Step S502 optionally, the write point client checks whether the data to be updated is cached in the read cache local to the write point according to the feature value key1 of the data to be updated. If it is detected that the data to be updated is included in the read buffer, the data to be updated is set to a failure state or the data to be updated is directly updated.
  • Step S503 the write point client searches the hotspot information directory table according to the feature value key1 of the to-be-updated data, and determines which of the client's read caches are cached with the data to be updated (as shown in FIG. 5, assuming the first client and The data to be updated (key1) is cached in the read cache of the second client, and the data to be updated is not cached by the third client. Then, the write client confirms that the data update notification only needs to be sent after querying the hotspot information directory table. Give the first and second clients).
  • the hotspot information directory table is a method adopted in order to reduce the number of broadcast data update notifications in the case where the cluster size is large.
  • the hotspot information directory table is specifically generated as follows: any client in the system prepares the hotspot data to be migrated before it considers that the data is hot data and is ready to migrate to the client's read cache space.
  • the migration action information in the read buffer space of the client is broadcasted to all other client nodes in the same cluster as the client, and other client nodes receive the local hotspot after receiving the broadcast message of the migration action. Add a record containing a certain data in the read cache of a client.
  • Step S504 the write point client broadcasts an entry message to the hotspot client (the first and second clients) that stores the data to be updated in the read cache confirmed in step S503, the write message Carrying the feature value key1 of the data to be updated, the write message is used to notify all other clients to put the data to be updated into a failure state so as not to provide a read service.
  • Step S505 the first client receives the write message broadcasted by the write point client, and sets the to-be-updated data cached in the read cache to a failure state according to the feature value key1 carried in the write message. But save the hotspot information of the data to be updated).
  • Step S506 the first client returns a response to the successful processing of the write point client, and the response of the processing success is used to indicate that the first client has performed update processing on the read cache for the data to be updated.
  • Step S507 the second client receives the write message broadcasted by the write point client, and sets the to-be-updated data cached in the read cache to a failure state according to the feature value key1 carried in the write message. But save the hotspot information of the data to be updated).
  • Step S508 the second client returns a response to the successful processing of the write point client, and the successful response is used to indicate that the second client has updated the read cache for the data to be updated.
  • Step S509 if the write point client receives the processing success response sent by all the hotspot clients confirmed in step S503, the application return processing is successful.
  • the write point client determines, by searching the hotspot information directory table, which of the client's read caches are cached with the data to be updated, and then caches the data in the read cache.
  • the hotspot client of the data to be updated sends a data update notification, so that the client that caches the data to be updated can know and record that the data has expired, thereby reducing the consistency of the read cache on each head side of the server cluster.
  • the number of data update notifications prevents network congestion and improves system IO performance.
  • FIG. 6 is a schematic diagram showing the logical structure of a write point client 600 in a server cluster according to an embodiment of the present invention.
  • the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client.
  • Each client is configured with a write cache, which is used to cache data written to each of the clients.
  • the write point client 600 can be, but is not limited to, specifically applied to the distributed server cluster system 100 shown in FIG. 1 above or flexibly deformed based on the system 100. Get in other distributed server cluster systems. It should be noted that a plurality of modules or units are mentioned in the embodiments of the present invention.
  • the write point client 600 includes a receiving module 610, a determining module 620, a sending module 630, and a notification module 640.
  • the receiving module 610 is configured to receive a data write message, where the data write message requests to write data to be written;
  • a determining module 620 configured to determine, according to the feature value of the data to be written, a primary write client and at least one standby write client for saving the data to be written, the primary write client and each Each of the write-write clients belongs to a different server;
  • a sending module 630 configured to separately send the data to be written to a write cache of each of the primary write client and each of the standby write clients;
  • a notification module 640 configured to: when determining that the to-be-written data is successfully saved by the primary write client and each of the backup write clients, to the primary write client and each of the The write-write client sends a first notification message, where the first notification message is used to notify the master write client and each of the backup write clients to record the synchronization status of the data to be written None sync to synced.
  • the notification module 640 is further configured to send the second a notification message, the second notification message is used to notify the client that is successfully written in the master write client and the at least one backup client to record the synchronization status of the data to be written recorded by itself as Not synchronized with the client that failed to write.
  • the write point client 600 further includes a data distribution view (not shown), where the data distribution view is used to indicate a primary write client and a backup write client corresponding to each partition partition.
  • the determining module 620 is specifically configured to: apply a consistent hash algorithm to calculate a hash value corresponding to the feature value of the data to be written according to the feature value of the data to be written, and determine, according to the hash value, the hash value Determining, by the data distribution view, the primary write client and the at least one standby write client corresponding to the partition partition to which the data to be written belongs.
  • the determining module 620 is specifically configured to: determine, according to the data distribution view All primary write clients and all standby write clients corresponding to the partition partition to which the data to be written belongs;
  • the receiving module 610 of the write point client 600 receives the data to be written, and the determining module 620 determines, according to the feature value of the data to be written, the main write client that stores the data to be written. And the at least one standby write client, wherein the sending module 630 sends the to-be-written data to a write cache of the primary write client and the at least one standby write client;
  • the notification module 640 notifies the primary write client and each of the standby write clients that the write The write data has been synchronized. Ensures the consistency of the write cache in the distributed Cache of the data to be written under the server cluster.
  • FIG. 7 is a schematic diagram showing the logical structure of a write point client 700 in a server cluster according to an embodiment of the present invention.
  • the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client.
  • Each client is configured with a read cache, which is used to cache hotspot data that is frequently accessed by the application in each client.
  • the write point client 700 can be, but is not limited to, specifically applied to the distributed server cluster system 100 shown in FIG. 1 above or other distributed server cluster systems obtained based on the flexible deformation of the system 100. It should be noted that a plurality of modules or units are mentioned in the embodiments of the present invention.
  • the write point client 700 includes a receiving module 710, a processing module 720, and a notification module 730.
  • the receiving module 710 is configured to receive a data update request, where the data update request is used to request to update data to be updated;
  • the processing module 720 is configured to generate a data update notification according to the feature value of the data to be updated, where the data update notification carries the feature value indicating the data to be updated;
  • the notification module 730 is configured to send the data update notification to a read client corresponding to the data update request in the server cluster, where the read client includes a client in the server cluster except the write point client All other clients except the client that saves the data to be updated;
  • the notification module 730 is further configured to send a response message that the update of the data to be updated is successful, and the update of the data to be updated is successful after receiving the response message that the data to be updated is successfully processed by the read client.
  • the response message is used to indicate that the read client has performed update processing on the read cache of the read client for the data to be updated.
  • the write point client 700 further includes a hotspot information directory table (not shown), where the hotspot information directory table is used to indicate all clients that have the data to be updated cached in the read cache.
  • the processing module 720 is further configured to search the hotspot information directory table, and determine the read client according to the feature value of the to-be-updated data.
  • the receiving module 710 is further configured to receive hotspot information that is broadcasted by the read client, where the hotspot information is used to indicate that the read client has cached the data to be updated, and the processing module further And used to record the read client into the hotspot information directory table.
  • the write point client 700 receives the data update request through the receiving module 710, and then the processing module 720 generates a data update notification according to the feature value of the data to be updated, and is notified by the notification module 730 to the client.
  • the read client includes all the clients in the server cluster except the write point client 700, or the client that saves the data to be updated, and sends the data update notification, thereby ensuring the server cluster. The consistency of the read cache in the distributed cache of each client.
  • FIG. 8 is a schematic diagram showing the logical structure of a read client 800 in a server cluster according to an embodiment of the present invention.
  • the server cluster includes n servers, n is a natural number ⁇ 2, and each server is configured with at least one client, and each server is configured with at least one client.
  • the clients are configured with a read cache, which is used to cache hotspot data that is frequently accessed by the application in each client.
  • the read client 800 can be, but is not limited to, specifically applied to the distributed server cluster system 100 shown in FIG. 1 above or other distributed server cluster systems obtained based on the flexible deformation of the system 100. It should be noted that a plurality of modules or units are mentioned in the embodiments of the present invention.
  • the read client 800 includes a receiving module 810, a processing module 820, and a transmitting module. Block 830.
  • the receiving module 810 is configured to receive a data update notification sent by the write point client, where the data update notification carries a feature value indicating the data to be updated;
  • the processing module 820 is configured to perform update processing on the read cache of the read client according to the data update notification.
  • the sending module 830 is configured to send, to the write point client, a response message that the data to be updated is successfully processed.
  • the processing module 820 is configured to confirm, according to the feature value, whether the data to be updated is cached in a read cache of the read client, and if the data to be updated is not cached, add the to-be-updated data.
  • the updated data is a stale record.
  • the sending module is further configured to: in addition to the read client, the server cluster All other clients broadcast hotspot information, which is used to indicate that the read client has cached the data to be updated.
  • the read client 800 receives the data update notification sent by the write point client through the receiving module 810, and then the processing module 820 updates the read cache of the read client according to the data update notification, and The sending module 830 sends a response message to the write point client that the data to be updated is successfully processed, thereby ensuring the consistency of the read cache in the distributed cache of each client under the server cluster.
  • FIG. 9 is a schematic diagram showing the logical structure of a computer 900 according to an embodiment of the present invention.
  • the computer of the embodiment of the present invention may include:
  • the processor 901, the memory 902, and the communication interface 905 are connected by the system bus 904 and complete communication with each other.
  • Processor 901 may be a single core or multi-core central processing unit, or a particular integrated circuit, or one or more integrated circuits configured to implement embodiments of the present invention.
  • the memory 902 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • Memory 902 is used by computer to execute instructions 903.
  • the computer execution instructions 903 may include program code.
  • the processor 901 runs the computer execution instruction 903, and can execute FIG. 2. The method flow described in any one of 3, 4 or 5.
  • aspects of the present invention, or possible implementations of various aspects may be embodied as a system, method, or computer program product.
  • aspects of the invention, or possible implementations of various aspects may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," “modules,” or “systems.”
  • aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM).
  • the processor in the computer reads the computer readable program code stored in the computer readable medium such that the processor is capable of performing the various functional steps specified in each step of the flowchart, or a combination of steps; A device that functions as specified in each block, or combination of blocks.
  • the computer readable program code can execute entirely on the user's computer, partly on the user's computer, as a separate software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or server.
  • the functions noted in the various steps in the flowcharts or in the blocks in the block diagrams may not occur in the order noted. For example, two steps, or two blocks, shown in succession may be executed substantially concurrently or the blocks may be executed in the reverse order.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Multimedia (AREA)

Abstract

本发明涉及一种分布式服务器集群系统的缓存方法、客户端和系统。写入点客户端根据待写入数据的特征值确定用于保存所述待写入数据的主写入客户端和至少一个备写入客户端,分别将所述待写入的数据发送到所述主写入客户端和所述每个备写入客户端各自的写缓存中;当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,向所述主写入客户端和所述每个备写入客户端发送第一通知消息,所述第一通知消息用于告知所述主写入客户端及所述每个备写入客户端将各自记录的所述待写入数据的同步状态从未同步改为已同步。根据本发明提供的技术方案,确保了数据在服务器集群下的分布式Cache中的写一致性。

Description

一种服务器集群系统中的缓存方法、写入点客户端和读客户端 技术领域
本发明涉及计算机技术,尤其涉及一种分布式服务器集群系统的缓存方法、客户端和存储系统。
背景技术
目前的分布式存储架构中,广泛采用机头(所谓机头即是能够实现将应用发来的请求分发到存储介质以及进行系统内部卷管理的客户端)加上物理介质侧的存储逻辑进程的架构,系统在处理应用发来的请求的时候,通常由客户端进行请求接收,然后再将请求转发到存储侧的进程进行处理。在此架构中,靠近物理介质的存储逻辑进程通常部署有Cache,进入存储逻辑进程的请求只需写入Cache中而不需要写入物理介质中就可以对外返回成功。为了能够进一步减少网络时延,一种有效的方法就是在客户端侧也部署一层分布式Cache。机头在接收到应用发来的请求后,对于写请求则将数据写入这一层分布式Cache中然后返回,通过这种方式对写时延来说,由于可以减少上述的一层网络时延而使得写的性能得到提高;而对于读请求,在分布式Cache中实现读Cache统计热点数据并进行缓存,提高读请求在本地机头的命中率,从而也能够做到提高读的性能。
在一种基于通用服务器集群的分布式存储系统结构下会出现多个服务器节点上的应用访问同一个卷上的数据的情形。具体来说,该分布式存储系统对外提供块接口,也就是说用户/应用看到的是一个个磁盘块设备,每个磁盘块设备对应了系统内部的一个卷,同时是通过客户端实现在服务器节点上挂载出一个个磁盘块设备的,那么在多个服务器上部署机头后,就可以针对系统内部的一个卷在多个服务器上都挂载出磁盘块设备。因而,当多个服务器上的应用访问某个磁盘块设备时,实际是在访问系统内部的同一个卷,也就是同一份数据源。
不可避免的,上述场景下一份数据源会有多个客户端并发进行读写,那么在客户端侧部署一层分布式Cache后,如何保证此场景下高性能要求的同时又维护客户端侧读Cache和写Cache的一致性是核心问题。
针对上述问题,现有的一种解决方案是通过维护一份数据关系目录提升 读写的性能。虽然采取数据关系目录的方式,能够清晰记录出数据存放的位置,解决数据冲突的问题,但是该方案存在两个缺陷。首先,由于每一份数据可能在多个点进行更新,数据存放的位置是不固定的,这样导致的是数据关系目录这样一个关系结构会非常大,直接导致需要消耗较大的存储空间,且考虑故障场景,那么这份数据目录可能就需要进行持久化,带来的是更新数据关系目录的性能代价;其次,由于多个Cache中都持有一份数据目录,那么就必须保证该数据关系目录在多个Cache中的一致性,这就意味在处理读写请求的时候,需要检测是否该数据有出现过更新变动,检测是否有更新的操作是发生在IO路径上的,因此这样也会阻碍读写性能的提升。
发明内容
有鉴于此,实有必要提供一种分布式服务器集群系统的缓存方法,以确保数据在服务器集群下的分布式Cache中的一致性。
第一方面,本发明实施例提出了一种服务器集群中写缓存一致性的方法,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有写缓存,所述写缓存用于6缓存写入所述每个客户端的数据,所述方法包括:
写入点客户端接收数据写入消息,所述数据写入消息请求写入待写入数据,根据所述待写入数据的特征值确定用于保存所述待写入数据的主写入客户端和至少一个备写入客户端,所述主写入客户端与每个备写入客户端分别归属于不同的服务器;
分别将所述待写入的数据发送到所述主写入客户端和所述每个备写入客户端各自的写缓存中;
当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,向所述主写入客户端和所述每个备写入客户端发送第一通知消息,所述第一通知消息用于告知所述主写入客户端及所述每个备写入客户端将各自记录的所述待写入数据的同步状态从未同步改为已同步。
结合第一方面,在第一种可能实现的方式中,当确定所述待写入的数据在所述主写入客户端或所述至少一个备写入客户端中发生写入失败时,向所述主写入客户端和所述至少一个备写入客户端中写成功的客户端发送第二通知消息,所述第二通知消息用于告知所述主写入客户端及所述至少一个备写 入客户端中的写成功的客户端将自身记录的所述待写入数据的同步状态记为与写失败的客户端未同步。
结合第一方面或第一方面的第一种可能实现的方式,在第二种可能实现的方式中,所述根据所述待写入数据的特征值确定主写入客户端和至少一个备写入客户端,具体包括:
根据所述待写入数据的特征值,应用一致性哈希算法计算所述特征值对应的哈希值,根据所述哈希值确定所述待写入数据所属的分区partition;
根据数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端,所述数据分布视图用于指示每个分区partition各自对应的主写入客户端备写入客户端。
结合第一方面的第二种可能实现的方式,在第三种可能实现的方式中,其特征在于,所述根据数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端包括:
根据所述数据分布视图确定所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端;
判断所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端中是否存在故障;
将所述待写入数据所属的分区partition对应的不存在故障的主写入客户端和不存在故障的备写入客户端确定为所述主写入客户端和所述至少一个备写入客户端。
第二方面,本发明实施例提出了一种服务器集群中读缓存一致性的方法,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述方法包括:
写入点客户端接收数据更新请求,所述数据更新请求用于请求更新待更新数据,根据所述待更新数据的特征值生成数据更新通知,所述数据更新通知携带指示所述待更新数据的所述特征值;
向所述服务器集群中的所述数据更新请求对应的读客户端发送所述数据更新通知,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
当接收到所述读客户端发送的所述待更新数据处理成功的响应消息,则 发送所述待更新数据更新成功的响应消息,所述待更新数据更新成功的响应消息用于指示所述读客户端已针对所述待更新数据对所述读客户端各自的读缓存作了更新处理。
结合第二方面,在第一种可能实现的方式中,当所述读客户端为保存了所述待更新数据的客户端时,所述方法还包括:所述写入点客户端查找热点信息目录表,根据所述待更新数据的所述特征值确定所述读客户端,所述热点信息目录表用于指示所有在读缓存中缓存有所述待更新数据的客户端。
结合第二方面的第一种可能实现的方式,在第二种可能实现的方式中,在所述写入点客户端查找热点信息目录表之前,所述方法还包括:
所述写入点客户端接收来自所述读客户端广播的热点信息,将所述读客户端记录进所述热点信息目录表中,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
第三方面,本发明实施例提出了一种服务器集群中读缓存一致性的方法,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述方法包括:
读客户端接收写入点客户端发送的数据更新通知,所述数据更新通知携带指示待更新数据的特征值,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
根据所述数据更新通知对所述读客户端各自的读缓存进行更新处理,并对所述写入点客户端发送待更新数据处理成功的响应消息。
结合第三方面,在第一种可能实现的方式中,所述根据所述数据更新通知对所述读客户端各自的读缓存进行更新处理,具体包括:所述读客户端根据所述特征值确认所述读客户端各自的读缓存中是否缓存有所述待更新数据,若确认没有缓存所述待更新数据,则添加所述待更新数据为失效的记录。
结合第三方面或第三方面的第一种可能实现的方式,在第二种可能实现的方式中,所述方法还包括:
所述读客户端在将所述待更新数据缓存进所述读客户端各自的读缓存中时,向所述服务器集群中除所述读客户端之外的其他所有客户端广播热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
第四方面,本发明实施例提出了一种服务器集群中的写入点客户端,其 特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有写缓存,所述写缓存用于缓存写入所述每个客户端的数据,所述写入点客户端包括:
接收模块,用于接收数据写入消息,所述数据写入消息请求写入待写入数据;
确定模块,用于根据所述待写入数据的特征值确定用于保存所述待写入数据的主写入客户端和至少一个备写入客户端,所述主写入客户端与每个备写入客户端分别归属于不同的服务器;
发送模块,用于分别将所述待写入的数据发送到所述主写入客户端和所述每个备写入客户端各自的写缓存中;
通知模块,用于当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,向所述主写入客户端和所述每个备写入客户端发送第一通知消息,所述第一通知消息用于告知所述主写入客户端及所述每个备写入客户端将各自记录的所述待写入数据的同步状态从未同步改为已同步。
结合第四方面,在第一种可能实现的方式中,当确定所述待写入的数据在所述主写入客户端或所述至少一个备写入客户端中发生写入失败时,则所述通知模块还用于发送第二通知消息,所述第二通知消息用于告知所述主写入客户端及所述至少一个备写入客户端中写成功的客户端将自身记录的所述待写入数据的同步状态记为与写失败的客户端未同步。
结合第四方面或第四方面的第一种可能实现的方式,在第二种可能实现的方式中,所述写入点客户端还包括数据分布视图,所述数据分布视图用于指示每个分区partition对应的主写入客户端和备写入客户端,则所述确定模块具体用于:根据所述待写入数据的特征值,应用一致性哈希算法计算所述待写入数据的特征值对应的哈希值,根据所述哈希值确定所述待写入数据所属的分区partition;根据所述数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端。
结合第四方面的第二种可能实现的方式,在第三种可能实现的方式中,所述确定模块具体用于:
根据所述数据分布视图确定所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端;
判断所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端中是否存在故障;
将所述待写入数据所属的分区partition对应的不存在故障的主写入客户端和不存在故障的备写入客户端确定为所述主写入客户端和所述至少一个备写入客户端。
第五方面,本发明实施例提出了一种服务器集群中的写入点客户端,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述写入点客户端包括:
接收模块,用于接收数据更新请求,所述数据更新请求用于请求更新待更新数据;
处理模块,用于根据所述待更新数据的特征值生成数据更新通知,所述数据更新通知携带指示所述待更新数据的所述特征值;
通知模块,用于向所述服务器集群中的所述数据更新请求对应的读客户端发送所述数据更新通知,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
当接收到所述读客户端发送的所述待更新数据处理成功的响应消息,则所述通知模块还用于发送所述待更新数据更新成功的响应消息,所述待更新数据更新成功的响应消息用于指示所述读客户端已针对所述待更新数据对所述读客户端各自的读缓存作了更新处理。
结合第五方面,在第一种可能实现的方式中,所述写入点客户端还包括热点信息目录表,所述热点信息目录表用于指示所有在读缓存中缓存有所述待更新数据的客户端,所述处理模块还用于查找所述热点信息目录表,根据所述待更新数据的所述特征值确定所述读客户端。
结合第五方面的第一种可能实现的方式,在第二种可能实现的方式中,所述接收模块还用于接收来自所述读客户端广播的热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据,所述处理模块还用于将所述读客户端记录进所述热点信息目录表中。
第六方面,本发明实施例提出了一种服务器集群中的读客户端,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客 户端中被应用频繁访问的热点数据,所述读客户端包括:
接收模块,用于接收写入点客户端发送的数据更新通知,所述数据更新通知携带指示待更新数据的特征值;
处理模块,用于根据所述数据更新通知对所述读客户端的读缓存进行更新处理;
发送模块,用于对所述写入点客户端发送待更新数据处理成功的响应消息。
结合第六方面,在第一种可能实现的方式中,所述处理模块具体用于根据所述特征值确认所述读客户端的读缓存中是否缓存有所述待更新数据,若确认没有缓存所述待更新数据,则添加所述待更新数据为失效的记录。
结合第六方面或第六方面的第一种可能实现的方式,在第二种可能实现的方式中,所述读客户端在将所述待更新数据缓存进所述读客户端的读缓存中时,所述发送模块还用于向所述服务器集群中除所述读客户端之外的其他所有客户端广播热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
第七方面,本发明实施例提出了一种服务器集群系统,所述服务器集群系统包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个如第四方面或第四方面任一可能的实现方式,第五方面或第五方面任一可能的实现方式中所述的写入点客户端,以及如第六方面或第六方面任一可能的实现方式所述的读客户端。
第八方面,本发明实施例提供一种计算机,包括:处理器、存储器、总线和通信接口;
所述存储器用于存储计算机执行指令,所述处理器与所述存储器通过所述总线连接,当所述计算机运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述计算机执行以上第一方面、第一方面中任一可能的实现方式所提供的写缓存一致性的方法,或第二方面、第二方面中任一可能的实现方式所提供的读缓存一致性的方法,或第三方面、第三方面中任一可能的实现方式所提供的读缓存一致性的方法。
第九方面,本发明实施例提供一种计算机可读介质,包括计算机执行指令,以供计算机的处理器执行所述计算机执行指令时,所述计算机执行以上第一方面、第一方面中任一可能的实现方式所提供的写缓存一致性的方法, 或第二方面、第二方面中任一可能的实现方式所提供的读缓存一致性的方法,或第三方面、第三方面中任一可能的实现方式所提供的读缓存一致性的方法。
本发明实施例中,当写入点客户端接收到待写入数据时,由写入点客户端根据所述待写入数据的特征值分别确定用于保存所述待写入数据的主写入客户端以及至少一个备写入客户端,并由写入点客户端分别发送所述待写入数据至所述主写入客户端及所述每个备写入客户端各自的写缓存中,再根据所述主写入客户端及所述每个备写入客户端返回的写成功响应消息确认待写入数据写入成功,使得待写入数据的所述主写入客户端和所述至少一个备写入客户端之间能够实现一致,从而确保数据在服务器集群下的分布式Cache中的写缓存一致性。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为依据本发明一实施例的一种分布式服务器集群系统100的系统结构示意图;
图2为依据本发明一实施例的分布式服务器集群系统中写缓存一致性的方法200的示范性流程图;
图3是依据本发明一实施例的分布式服务器集群系统中写缓存一致性的方法300的示范性流程图;
图4是依据本发明一实施例的分布式服务器集群系统中读缓存一致性的方法400的示范性流程图;
图5是依据本发明一实施例的分布式服务器集群系统中读缓存一致性的方法500的示范性流程图;
图6为依据本发明一实施例的写入点客户端600的逻辑结构示意图;
图7为依据本发明一实施例的写入点客户端700的逻辑结构示意图;
图8为依据本发明一实施例的读客户端800的逻辑结构示意图;
图9为依据本发明一实施例的计算机900的逻辑结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案做详细描述,为了全面理解本发明,在以下详细描述中提到了众多具体细节,但是本领域技术人员应该理解,本发明可以无需这些具体细节而实现。在某些实施例中,不详细描述公知的方法、过程、组件和电路,以免不必要地使实施例模糊。
为方便理解实施,首先提供一种基于本发明实施例的分布式服务器集群系统100。如图1所示为该分布式服务器系统100的系统结构示意图,在该分布式服务器集群系统100中仅作为示例而非限制,给出了三个服务器101、102、103(系统能够按照自身的实际需要灵活调整服务器的数量),每个服务器上部署有客户端,同时每个客户端都布置有写缓存Cache和读缓存Cache(未示出),每个客户端的写缓存和读缓存可以独立设置或者是同一个缓存中的两个逻辑空间。该系统100还包括存储节点140、141、142以及Cache130、Cache131和Cache132。上述三个服务器101、102、103通过例如但不限于计算网络、转换线缆InfiniBand或以太网光纤通道FCoE与存储节点140、141、142实现集群互联。其中,Cache130、Cache131和Cache132为上述存储节点140、141、142侧的高速缓存,用于接收来自客户端发送的数据请求。该分布式存储系统100对外提供块接口,也就是说用户/应用看到的是一个个磁盘块设备,每个磁盘块设备对应了系统内部的一个卷(如图中所示的Volume1、Volume2和Volume3),同时是通过客户端实现在服务器节点上挂载出一个个磁盘块设备的。那么,如图1所示,在服务器101、102、103均部署有客户端后,上述三个服务器101~103上的应用APP能够访问系统内部的同一个卷,也就是同一份数据源。需要注意的是,上述说明性论述并不打算穷举或将本发明局限于图1所示的系统架构形式,在具体实现过程中,分布式服务器集群系统100的众多修改和变形都是可行的,仅作为示例而非限制,存储节点140~142可以分别集成到服务器101~103中。
在该分布式服务器集群系统100中,可选的,对每份待写入数据都通过视图管理唯一确定该待写入数据的主副本的位置(即存储所述待写入数据的主写入客户端)。所谓视图管理是指对于分布式Cache采用一致性哈希的算法,即系统采用了分布式哈希表DHT(Distributed Hash Table),将DHT环空间(仅作为示例,所述DHT环空间为232超大虚拟节点构成的环形空间)划分为N等份,每一等份是一个分区Partition,分区Partition的个数(N个) 固定,同时指定每个Partition属于哪个磁盘/存储介质(当然,系统可以根据实际的需要对分区Partition的个数以及每个Partition与磁盘/存储介质的对应关系进行动态调整)。并且系统将这些分区Partition均匀地分散到系统内所有客户端上的Cache中并将此分布信息保存起来,仅作为示例非限制,系统将分区Partition的分布信息用数据分布视图的形式保存下来。在实际应用中,写入点客户端根据其接收的待写入数据的特征key值计算哈希值,根据该哈希值确定该待写入数据所属的Partition,由于每个Partition都对应了一个客户端,从而对于任何一个写入点客户端接收的每份待写入数据就确定了其主副本存放的位置。对于存在多个数据副本的场景(即为了数据安全,待写入数据除了有主副本外还有至少一个备副本),则通过以Partition的粒度(所谓Partition的粒度是指控制Partition实际落在哪个物理磁盘/存储介质上)确定每个Partition的副本所属的客户端侧Cache的位置,从而间接确定了每一份待写入数据需要转发到哪些客户端侧Cache中进行存储。在具体实现过程中,为了安全需要可以灵活配置分区分配算法,需要避免将每个Partition的主副本和备副本位于同一个客户端上。为了保证待写入数据在多个副本之间的一致性,接收上述待写入数据主、备副本的客户端(即主写入客户端和备写入客户端)需要在接收并缓存好各自的待写入数据的副本后,向写入点客户端发送写成功响应消息,告知写入点客户端已经处理成功。写入点客户端在接到全部数据副本所属的客户端发送的写成功响应消息后才能对发送所述待写入数据的应用返回写成功,同时告知所述待写入数据的所有主写入客户端和备写入客户端该待写入数据已经同步。
仅作为示例,假设Server101中的APP有待写入数据,则客户端110为写入点客户端。可选的,客户端110通过一致性哈希算法(DHT)确定出上述待写入数据的主副本归属于客户端112,备副本归属于客户端113,则客户端110分别将待写入数据的主副本发送给客户端112,备副本发送给113。若客户端110分别接到客户端112和113发送的写成功响应消息,则对发送上述待写入数据的APP返回写成功响应,同时告知主写入客户端112、备写入客户端113该待写入数据已经同步。
本发明实施例中,写入点客户端对待写入数据通过视图管理唯一确定该待写入数据的主、备副本的位置(多副本的场景下),并通过接收主、备副本所属的客户端返回的写成功响应消息判断待写入数据的同步状态,最后通 知接收待写入数据主写入客户端和备写入客户端该待写入数据已经同步,从而使得待写入数据的主、备副本之间能够实现一致,确保待写入数据在服务器集群下的分布式Cache中的写缓存一致性。
同时,为了确保一旦副本数据更新后,客户端侧读cache中的数据不会失效,则需要由写入点客户端在接收待更新数据的同时,告知其它所有可能缓存有该待更新数据的客户端将该数据置为失效。仅作为示例,假设Server101中的APP有待更新数据,则客户端110为写入点客户端,那么客户端110在写处理流程中需要告知其它所有可能缓存有该待更新数据的客户端(即与客户端110处于同一网络集群中的其他所有客户端,如图1中的客户端112、113),使得其它缓存有该待更新数据的客户端将该数据置为失效。作为优选的,在服务器集群规模很大的情况下,考虑到客户端110通知其它所有客户端可能造成的网络堵塞和通信成本,可以在客户端110侧记录哪些客户端中缓存有该待更新数据,那么在对此待更新数据更新的时候,仅需要按照之前的记录通知那些缓存有该待更新数据的客户端即可。
本发明实施例中,通过写入点客户端在接收待更新数据的同时,对服务器集群中的其他所有客户端或缓存有该待更新数据的客户端发送通知,以便于缓存有该待更新数据的客户端将该待更新数据置为失效,确保了服务器集群下各个客户端侧读cache的一致性。
图2为依据本发明一实施例的分布式服务器集群系统中写缓存一致性的方法200的示范性流程图。该方法200可以但不限于具体应用在上述图1所示的分布式服务器集群100或基于该系统100进行灵活变形而得到的其他分布式服务器集群中。需要注意的是,虽然下文描述的方法200的流程包括以特定顺序出现的多个操作,但是应该清楚了解,这些操作也可以包括更多或更少的操作,这些操作可以顺序执行或并行执行(例如使用并行处理器或多线程环境)。如图2所示,方法200包括以下步骤:
步骤S201,写入点客户端接收应用发送的待写入数据,仅作为示例而非限制,所述待写入数据携带特征值key,该特征值key为所述待写入数据的逻辑地址(如key值通过待写入数据的逻辑单元号LUN和逻辑区块地址LBA构造而来)。
步骤S202,可选的,写入点客户端根据所述待写入数据的key值计算hash 值,根据该hash值确定所述待写入数据所属的分区Partition(这里假设计算得出所述待写入数据的所属分区为P1)。
步骤S203,可选的,确定出所述待写入数据所属的分区Partition(P1)后,写入点客户端通过查询内部存储的数据分布视图,确定所述待写入数据所属的分区Partition(P1)的全部主写入客户端和备写入客户端。需要注意的是,这里为了方便说明仅示例性地给出了待写入数据的主、备两个数据副本,本领域技术人员可以理解系统能够按照自身的实际需要随时调整待写入数据的副本的数量,因此副本的数量不作为对本发明的实施例及保护范围的限制。
步骤S204,写入点客户端发送所述待写入数据到所述待写入数据的主写入客户端(待写入数据主副本所在的客户端)。
步骤S205,所述待写入数据的主写入客户端接收所述待写入数据,为所述待写入数据分配缓存空间。
步骤S206,主写入客户端将所述待写入数据的同步状态设置为与其他客户端未同步(UNOK)。仅作为示例而非限制,主写入客户端在其元数据结构中将步骤S205中为所述待写入数据分配的一段缓存空间的值置为UNOK。
步骤S207,主写入客户端向写入点客户端返回写成功响应消息,该写成功响应消息用于告知写入点客户端主写入客户端已经写成功。
步骤S208,写入点客户端发送所述待写入数据到所述待写入数据的备写入客户端(待写入数据备副本所在的客户端)。
步骤S209,所述待写入数据的备写入客户端接收所述待写入数据,为所述待写入数据分配缓存空间。
步骤S210,备写入客户端将所述待写入数据的同步状态设置为与其他客户端未同步(UNOK)。仅作为示例而非限制,备写入客户端在其元数据结构中将步骤S209中为所述待写入数据分配的一段缓存空间的值置为UNOK。
步骤S211,备写入客户端向写入点客户端返回写成功响应消息,该写成功响应消息用于告知写入点客户端备写入客户端已经写成功。
步骤S212,若写入点客户端分别接收来自主写入客户端、备写入客户端返回的写成功响应消息则继续步骤S213。
步骤S213,写入点客户端对发送上述待写入数据的应用返回写成功。
步骤S214,写入点客户端针对所述待写入数据发送第一通知消息(仅作 为示例而非限制,该第一通知消息为OK消息),告知所有存有所述待写入数据的副本的客户端(主写入客户端与备写入客户端)将所述待写入数据的同步状态改为与其他客户端已同步(OK)。
步骤S215,主写入客户端收到OK消息后,将其记录的待写入数据的同步状态从与未同步(UNOK)改为已同步(OK)。可选的,主写入客户端在其元数据结构中将步骤S205中为所述待写入数据分配的一段缓存空间的值置为OK。
步骤S216,备写入客户端收到OK消息后,将其记录的待写入数据的同步状态从与未同步(UNOK)改为已同步(OK)。可选的,备写入客户端在其元数据结构中将步骤S209中为所述待写入数据分配的一段缓存空间的值置为OK。
可选的,具体实现时,步骤S203中在写入点客户端查询数据分布视图,确定待写入数据所属的Partition的主、备副本所在的客户端的同时增加判断功能,即写入点客户端通过查询数据分布视图判断待写入数据所属的Partition的主、备副本所在的客户端中是否有UNOK的客户端(UNOK的客户端是指该客户端发生了故障或与所述写入点客户端失去了通信连接),若有UNOK的客户端则写入点客户端在后续复制待写入数据到各副本的过程中带上标识告知OK的副本/客户端将所述待写入数据的同步状态置为与UNOK的副本/客户端未同步。相应的,OK的客户端在接收到携带上述标识的待写入数据时,在各自的元数据结构中记录上所述待写入数据与UNOK的副本/客户端未同步,然后对写入点客户端返回写成功响应消息。
需要注意的是,具体实现过程中,步骤S204与步骤S208可以同时进行,即由写入点客户端并发地将待写入数据发送给主写入客户端和备写入客户端,从而减少时延。
本发明实施例中,写入点客户端接收应用发送的待写入数据后,能够根据待写入数据的特征值key计算hash值,根据该hash值确定所述待写入数据所属的分区Partition,确定出所述待写入数据所属的分区Partition后,由写入点客户端通过查询内部存储的数据分布视图,确定所述待写入数据所属的分区Partition的主写入客户端和备写入客户端,再将所述待写入数据发送给所述主写入客户端和所述备写入客户端,最后由写入点客户端根据所述主写入客户端和所述备写入客户端返回的写成功响应消息通知所述主写入客户 端和所述备写入客户端该待写入数据已同步。使得写入点客户端能够快速准确地找到存储所述待写入数据的主写入客户端和备写入客户端,并确保待写入数据在服务器集群下的分布式Cache中的写一致性。
上述图2详细描述了依据本发明一实施例的分布式服务器集群系统中写数据正常的流程,现详细描述依据本发明一实施例的分布式服务器集群系统中出现写数据异常时的流程。如图3所示,为依据本发明一实施例的分布式服务器集群系统中写缓存一致性的方法300的示范性流程图。该方法300可以但不限于具体应用在上述图1所示的分布式服务器集群系统100或基于该系统100进行灵活变形而得到的其他分布式服务器集群系统中。需要注意的是,虽然下文描述的方法300的流程包括以特定顺序出现的多个操作,但是应该清楚了解,这些操作也可以包括更多或更少的操作,这些操作可以顺序执行或并行执行(例如使用并行处理器或多线程环境)。如图3所示,方法300包括以下步骤:
步骤S301,写入点客户端接收应用发送的待写入数据,所述待写入数据携带特征值key,仅作为示例而非限制,该特征值key为所述待写入数据的逻辑地址(如key值通过待写入数据的逻辑单元号LUN和逻辑区块地址LBA构造而来)。
步骤S302,可选的,写入点客户端根据所述待写入数据的key值计算hash值,根据该hash值确定所述待写入数据所属的分区Partition(这里假设计算得出所述待写入数据的所属分区为P1)。
步骤S303,可选的,确定出所述待写入数据所属的分区Partition(P1)后,写入点客户端通过查询内部存储的数据分布视图,确定所述待写入数据所属的分区Partition(P1)的主写入客户端和备写入客户端。如上述方法200中所述,副本的数量不作为对本发明的实施例及保护范围的限制。
步骤S304,写入点客户端复制所述待写入数据到所述待写入数据的主副本所在的客户端(主写入客户端)。
步骤S305,所述待写入数据的主写入客户端接收所述待写入数据,为所述待写入数据分配缓存空间。
步骤S306,主写入客户端将所述待写入数据的同步状态设置为与其他客户端未同步(UNOK)。仅作为示例而非限制,主写入客户端在其元数据结 构中将步骤S305中为所述待写入数据分配的一段缓存空间的值置为UNOK。
步骤S307,主写入客户端向写入点客户端返回写成功响应消息,该写成功响应消息用于告知写入点客户端主写入客户端已经写成功。
步骤S308,写入点客户端复制所述待写入数据到所述待写入数据的备副本所在的客户端(备写入客户端)。
以下假设所述待写入数据的备副本所在的客户端即备写入客户端出现故障或者无响应。
步骤S309,若写入点客户端仅接收到主写入客户端返回的写成功响应消息,则写入点客户端处于等待状态。
步骤S310,若写入点客户端接收到来自数据分布视图管理节点发送的备写入客户端故障通知,则写入点客户端不再等待备写入客户端的响应,并对发送所述待写入数据的应用返回写成功。所述数据分布视图管理节点发送的故障通知用于告知写入点客户端上述待写入数据的备副本写失败。在具体实现过程中,仅作为示例而非限制,所述数据分布视图管理节点通过例如但不限于心跳检测或定时发送询问消息的方式监控系统中的每个客户端是否运行正常,一旦发现某个客户端出现故障则通过广播消息告知系统其它客户端。
步骤S311,写入点客户端发送第二通知消息,该第二通知消息中标明了写失败的客户端,用于告知其它写成功的副本/客户端(即待写入数据的主写入客户端)将所述待写入数据的同步状态设置为与写失败的客户端(即所述待写入数据的备写入客户端)未同步。
步骤S312,主写入客户端收到步骤S311中的第二通知消息后,将其记录的所述待写入数据的同步状态改为与所述待写入数据的备写入客户端未同步。可选的,主写入客户端在其元数据结构中将步骤S305中为所述待写入数据分配的一段缓存空间记录为与待写入数据的备写入客户端未同步。
如方法200中所述,具体实现过程中,可选的,步骤S303中在写入点客户端查询数据分布视图,确定待写入数据所属的Partition的主、备副本所在的客户端的同时增加判断功能;步骤S304与步骤S308可以同时进行,即由写入点客户端并发地将待写入数据发送给主、备副本,从而减少时延。
本发明实施例中,在接收待写入数据的备写入客户端出现故障或者无响应时,写入点客户端能够根据数据分布视图管理节点发送的故障通知,告知接收待写入数据成功的主写入客户端将该待写入数据的同步状态置为与备写 入客户端未同步,并对外返回成功,从而保证了系统中接收待写入数据的主写入客户端或备写入客户端发生异常时,数据写入流程仍能正常进行,提升了系统的稳定性。
如图4所示,为依据本发明一实施例的分布式服务器集群系统中读缓存一致性的方法400的示范性流程图。该方法400可以但不限于具体应用在上述图1所示的分布式服务器集群系统100或基于该系统100进行灵活变形而得到的其他分布式服务器集群系统中。需要注意的是,虽然下文描述的方法400的流程包括以特定顺序出现的多个操作,但是应该清楚了解,这些操作也可以包括更多或更少的操作,这些操作可以顺序执行或并行执行(例如使用并行处理器或多线程环境);同时为了方便说明,本方案中仅作为示例列举了三个客户端,本领域技术人员应该知道客户端的数量不作为对本发明保护范围的限制,系统可以根据自身的需要灵活安排。如图4所示,方法400包括以下步骤:
步骤S401,写入点客户端接收应用发送的数据更新请求,所述数据更新请求用于请求更新待更新数据,仅作为示例非限制,所述待更新数据携带特征值key(为方便说明这里假设该特征值为key1),该特征值key为所述待更新数据的逻辑地址(如key值通过待更新数据的逻辑单元号LUN和逻辑区块地址LBA构造而来)。
步骤S402,可选的,写入点客户端根据所述待更新数据的特征值key1检查写入点本地的读缓存中是否缓存有该待更新数据。若检测到读缓存中有所述待更新数据则将该待更新数据设置为失效状态或直接更新所述待更新数据。
步骤S403,写入点客户端根据所述待更新数据的特征值生成数据更新通知,向与其在同一个集群内的其他所有客户端广播所述数据更新通知,所述数据更新通知携带上述待更新数据的特征值key1,所述数据更新通知用于告知其它所有客户端将所述待更新数据置为失效状态从而不对外提供读服务。
步骤S404,第一客户端接收写入点客户端广播的所述数据更新通知,根据所述数据更新通知中携带的特征值key1查看本节点的读缓存中是否缓存有该待更新数据,若有则将所述待更新数据设置为失效状态(但保存住该待更新数据的热点信息)后执行步骤S405;若无则添加所述待更新数据为失效 的记录再执行步骤S405。
步骤S405,第一客户端向写入点客户端返回处理成功的响应,该处理成功的响应用于指示第一客户端已针对所述待更新数据对其读缓存作了更新处理。
步骤S406,第二客户端接收写入点客户端广播的所述数据更新通知,根据所述数据更新通知中携带的特征值key1查看本节点的读缓存中是否缓存有该待更新数据,若有则将所述待更新数据设置为失效状态(但保存住该待更新数据的热点信息)后执行步骤S407;若无则添加所述待更新数据为失效的记录再执行步骤S407。
步骤S407,第二客户端向写入点客户端返回处理成功的响应,该处理成功的响应用于指示第二客户端已针对所述待更新数据对其读缓存作了更新处理。
步骤S408,第三客户端接收写入点客户端广播的所述数据更新通知,根据所述数据更新通知中携带的特征值key1查看本节点的读缓存中是否缓存有该待更新数据,若有则将所述待更新数据设置为失效状态(但保存住该待更新数据的热点信息)后执行步骤S409;若无则添加所述待更新数据为失效的记录再执行步骤S409。
步骤S409,第三客户端向写入点客户端返回处理成功的响应,该处理成功的响应用于指示第三客户端已针对所述待更新数据对其读缓存作了更新处理。
步骤S410,若写入点客户端接收到其他所有客户端发来的处理成功响应,则对发送待更新数据的应用返回处理成功。需要注意的是,在具体实现过程中,若写入点客户端没有接收到某个客户端发来的处理成功的响应则写入点客户端处于等待状态,直到写入点客户端等到该客户端发来处理成功的响应或者接收到数据分布视图管理节点发来的告知该客户端出现故障的通知后才能对应用返回处理成功。
本发明实施例中,写入点客户端在接收待更新数据的写处理流程中,通过向在同一个集群内的其他所有客户端广播数据更新通知,使得缓存有该数据的客户端能够知道并记录该数据已经失效,从而避免了一旦某客户端有数据被更新而其他客户端不知情导致使用旧数据的情形,确保了服务器集群下各个机头侧读cache的一致性。
如图5所示,为依据本发明一实施例的分布式服务器集群系统中读缓存一致性的方法500的示范性流程图。该方法500可以但不限于具体应用在上述图1所示的分布式服务器集群系统100或基于该系统100进行灵活变形而得到的其他分布式服务器集群系统中。需要注意的是,虽然下文描述的方法500的流程包括以特定顺序出现的多个操作,但是应该清楚了解,这些操作也可以包括更多或更少的操作,这些操作可以顺序执行或并行执行(例如使用并行处理器或多线程环境);同时为了方便说明,本方案中仅作为示例列举了三个客户端,本领域技术人员应该知道客户端的数量不作为对本发明保护范围的限制,系统可以根据自身的需要灵活安排。如图5所示,方法500包括以下步骤:
步骤S501,写入点客户端接收应用发送的数据更新请求,所述数据更新请求用于请求更新待更新数据,仅作为示例非限制,所述待更新数据携带特征值key(为方便说明这里假设该特征值为key1),该特征值key为所述待更新数据的逻辑地址(如key值通过待更新数据的逻辑单元号LUN和逻辑区块地址LBA构造而来)。
步骤S502,可选的,写入点客户端根据所述待更新数据的特征值key1检查写入点本地的读缓存中是否缓存有该待更新数据。若检测到读缓存中有所述待更新数据则将该待更新数据设置为失效状态或直接更新所述待更新数据。
步骤S503,写入点客户端根据所述待更新数据的特征值key1查找热点信息目录表,确定哪些客户端的读缓存中缓存有该待更新数据(如图5所示,假设第一客户端和第二客户端的读缓存中缓存有所述待更新数据(key1),第三客户端则没有缓存该待更新数据。则写入点客户端通过查询热点信息目录表后确认数据更新通知仅需要发送给第一和第二客户端)。
所述热点信息目录表是在集群规模很大的情况下,为了减少广播数据更新通知的数量而采取的方法。仅作为示例而非限制,该热点信息目录表具体生成过程如下:系统中任一客户端在认为某份数据是热点数据并准备迁移到该客户端的读缓存空间之前,将该热点数据准备迁移到该客户端的读缓存空间中这一迁移动作信息广播给其他所有的与该客户端在同一集群内的客户端节点,其他客户端节点收到此迁移动作的广播消息后,在各自本地的热点信 息目录表中增加某个客户端的读缓存中含有某个数据的这样一条记录。
步骤S504,写入点客户端向步骤S503中确认的在读缓存中存有所述待更新数据的热点客户端(第一和第二客户端)广播写入(Entry)消息,所述写入消息携带上述待更新数据的特征值key1,所述写入(Entry)消息用于告知其它所有客户端将所述待更新数据置为失效状态从而不对外提供读服务。
步骤S505,第一客户端接收写入点客户端广播的所述写入消息,根据所述写入消息中携带的特征值key1将其读缓存中缓存的所述待更新数据设置为失效状态(但保存住该待更新数据的热点信息)。
步骤S506,第一客户端向写入点客户端返回处理成功的响应,该处理成功的响应用于指示第一客户端已针对所述待更新数据对其读缓存作了更新处理。
步骤S507,第二客户端接收写入点客户端广播的所述写入消息,根据所述写入消息中携带的特征值key1将其读缓存中缓存的所述待更新数据设置为失效状态(但保存住该待更新数据的热点信息)。
步骤S508,第二客户端向写入点客户端返回处理成功的响应,该处理成功的响应用于指示第二客户端已针对所述待更新数据对其读缓存作了更新处理。
步骤S509,写入点客户端若接收到步骤S503中所确认的所有热点客户端发来的处理成功响应,则对应用返回处理成功。
本发明实施例中,写入点客户端在接收待更新数据的处理流程中,通过查找热点信息目录表,确定哪些客户端的读缓存中缓存有该待更新数据,再向读缓存中缓存有该待更新数据的热点客户端发送数据更新通知,使得缓存有该待更新数据的客户端能够知道并记录该数据已经失效,从而在确保服务器集群下各个机头侧读cache的一致性的同时,减少了数据更新通知的数量,避免了网络堵塞提升了系统IO性能。
图6为依据本发明一实施例的服务器集群中的写入点客户端600的逻辑结构示意图,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有写缓存,所述写缓存用于缓存写入所述每个客户端的数据。该写入点客户600可以但不限于具体应用在上述图1所示的分布式服务器集群系统100或基于该系统100进行灵活变形而 得到的其他分布式服务器集群系统中。需要注意的是,本发明实施例中提到了多个模块或单元,本领域技术人员应该知道上述多个模块或单元的功能可以拆分到更多的子模块或子单元中,也可以组合成更少的模块或单元中实现同样的技术效果,因此都应落入本发明实施例的保护范围。
如图6所示,该写入点客户端600包括接收模块610、确定模块620、发送模块630、通知模块640。
接收模块610,用于接收数据写入消息,所述数据写入消息请求写入待写入数据;
确定模块620,用于根据所述待写入数据的特征值确定用于保存所述待写入数据的主写入客户端和至少一个备写入客户端,所述主写入客户端与每个备写入客户端分别归属于不同的服务器;
发送模块630,用于分别将所述待写入的数据发送到所述主写入客户端和所述每个备写入客户端各自的写缓存中;
通知模块640,用于当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,向所述主写入客户端和所述每个备写入客户端发送第一通知消息,所述第一通知消息用于告知所述主写入客户端及所述每个备写入客户端将各自记录的所述待写入数据的同步状态从未同步改为已同步。
可选的,当确定所述待写入的数据在所述主写入客户端或所述至少一个备写入客户端中发生写入失败时,则所述通知模块640还用于发送第二通知消息,所述第二通知消息用于告知所述主写入客户端及所述至少一个备写入客户端中写成功的客户端将自身记录的所述待写入数据的同步状态记为与写失败的客户端未同步。
可选的,所述写入点客户端600还包括数据分布视图(未示出),所述数据分布视图用于指示每个分区partition对应的主写入客户端和备写入客户端,则所述确定模块620具体用于:根据所述待写入数据的特征值,应用一致性哈希算法计算所述待写入数据的特征值对应的哈希值,根据所述哈希值确定所述待写入数据所属的分区partition;根据所述数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端。
可选的,所述确定模块620具体用于:根据所述数据分布视图确定所述 待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端;
判断所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端中是否存在故障;
将所述待写入数据所属的分区partition对应的不存在故障的主写入客户端和不存在故障的备写入客户端确定为所述主写入客户端和所述至少一个备写入客户端。
本发明实施例中,写入点客户端600的接收模块610接收待写入数据,并由确定模块620根据所述待写入数据的特征值确定存储所述待写入数据的主写入客户端以及至少一个备写入客户端,再由发送模块630发送该待写入数据至所述主写入客户端及所述至少一个备写入客户端各自的写缓存中;当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,由通知模块640通知所述主写入客户端和所述每个备写入客户端该待写入数据已同步。确保了待写入数据在服务器集群下的分布式Cache中的写缓存的一致性。
图7为依据本发明一实施例的服务器集群中的写入点客户端700的逻辑结构示意图,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据。该写入点客户700可以但不限于具体应用在上述图1所示的分布式服务器集群系统100或基于该系统100进行灵活变形而得到的其他分布式服务器集群系统中。需要注意的是,本发明实施例中提到了多个模块或单元,本领域技术人员应该知道上述多个模块或单元的功能可以拆分到更多的子模块或子单元中,也可以组合成更少的模块或单元中实现同样的技术效果,因此都应落入本发明实施例的保护范围。
如图7所示,该写入点客户端700包括接收模块710、处理模块720、通知模块730。
接收模块710,用于接收数据更新请求,所述数据更新请求用于请求更新待更新数据;
处理模块720,用于根据所述待更新数据的特征值生成数据更新通知,所述数据更新通知携带指示所述待更新数据的所述特征值;
通知模块730,用于向所述服务器集群中的所述数据更新请求对应的读客户端发送所述数据更新通知,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
当接收到所述读客户端发送的所述待更新数据处理成功的响应消息,则所述通知模块730还用于发送所述待更新数据更新成功的响应消息,所述待更新数据更新成功的响应消息用于指示所述读客户端已针对所述待更新数据对所述读客户端各自的读缓存作了更新处理。
可选的,所述写入点客户端700还包括热点信息目录表(未示出),所述热点信息目录表用于指示所有在读缓存中缓存有所述待更新数据的客户端,所述处理模块720还用于查找所述热点信息目录表,根据所述待更新数据的所述特征值确定所述读客户端。
可选的,所述接收模块710还用于接收来自所述读客户端广播的热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据,所述处理模块还用于将所述读客户端记录进所述热点信息目录表中。
本发明实施例中,写入点客户端700通过接收模块710接收数据更新请求,再由处理模块720根据待更新数据的特征值生成数据更新通知,并由通知模块730向读客户端(所述读客户端包括所述服务器集群中除所述写入点客户端700之外的其它所有客户端,或者保存了所述待更新数据的客户端)发送所述数据更新通知,从而确保了服务器集群下各个客户端的分布式cache中读缓存的一致性。
图8为依据本发明一实施例的服务器集群中的读客户端800的逻辑结构示意图,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据。该读客户800可以但不限于具体应用在上述图1所示的分布式服务器集群系统100或基于该系统100进行灵活变形而得到的其他分布式服务器集群系统中。需要注意的是,本发明实施例中提到了多个模块或单元,本领域技术人员应该知道上述多个模块或单元的功能可以拆分到更多的子模块或子单元中,也可以组合成更少的模块或单元中实现同样的技术效果,因此都应落入本发明实施例的保护范围。
如图8所示,该读客户端800包括接收模块810、处理模块820、发送模 块830。
接收模块810,用于接收写入点客户端发送的数据更新通知,所述数据更新通知携带指示待更新数据的特征值;
处理模块820,用于根据所述数据更新通知对所述读客户端的读缓存进行更新处理;
发送模块830,用于对所述写入点客户端发送待更新数据处理成功的响应消息。
可选的,所述处理模块820具体用于根据所述特征值确认所述读客户端的读缓存中是否缓存有所述待更新数据,若确认没有缓存所述待更新数据,则添加所述待更新数据为失效的记录。
可选的,所述读客户端800在将所述待更新数据缓存进所述读客户端的读缓存中时,所述发送模块还用于向所述服务器集群中除所述读客户端之外的其他所有客户端广播热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
本发明实施例中,读客户端800通过接收模块810接收写入点客户端发送的数据更新通知,再由处理模块820根据所述数据更新通知对所述读客户端的读缓存进行更新处理,并由发送模块830向写入点客户端发送待更新数据处理成功的响应消息,从而确保了服务器集群下各个客户端的分布式cache中读缓存的一致性。
如图9,为本发明实施例的计算机900的逻辑结构组成示意图。本发明实施例的计算机可包括:
处理器901、存储器902、系统总线904和通信接口905。处理器901、存储器902和通信接口905之间通过系统总线904连接并完成相互间的通信。
处理器901可能为单核或多核中央处理单元,或者为特定集成电路,或者为被配置成实施本发明实施例的一个或多个集成电路。
存储器902可以为高速RAM存储器,也可以为非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
存储器902用于计算机执行指令903。具体的,计算机执行指令903中可以包括程序代码。
当计算机运行时,处理器901运行计算机执行指令903,可以执行图2、 3、4或图5任意之一所述的方法流程。
本领域普通技术人员将会理解,本发明的各个方面、或各个方面的可能实现方式可以被具体实施为系统、方法或者计算机程序产品。因此,本发明的各方面、或各个方面的可能实现方式可以采用完全硬件实施例、完全软件实施例(包括固件、驻留软件等等),或者组合软件和硬件方面的实施例的形式,在这里都统称为“电路”、“模块”或者“系统”。此外,本发明的各方面、或各个方面的可能实现方式可以采用计算机程序产品的形式,计算机程序产品是指存储在计算机可读介质中的计算机可读程序代码。
计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质包含但不限于电子、磁性、光学、电磁、红外或半导体系统、设备或者装置,或者前述的任意适当组合,如随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或者快闪存储器)、光纤、便携式只读存储器(CD-ROM)。
计算机中的处理器读取存储在计算机可读介质中的计算机可读程序代码,使得处理器能够执行在流程图中每个步骤、或各步骤的组合中规定的功能动作;生成实施在框图的每一块、或各块的组合中规定的功能动作的装置。
计算机可读程序代码可以完全在用户的计算机上执行、部分在用户的计算机上执行、作为单独的软件包、部分在用户的计算机上并且部分在远程计算机上,或者完全在远程计算机或者服务器上执行。也应该注意,在某些替代实施方案中,在流程图中各步骤、或框图中各块所注明的功能可能不按图中注明的顺序发生。例如,依赖于所涉及的功能,接连示出的两个步骤、或两个块实际上可能被大致同时执行,或者这些块有时候可能被以相反顺序执行。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (21)

  1. 一种服务器集群中写缓存一致性的方法,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有写缓存,所述写缓存用于缓存写入所述每个客户端的数据,所述方法包括:
    写入点客户端接收数据写入消息,所述数据写入消息请求写入待写入数据,根据所述待写入数据的特征值确定用于保存所述待写入数据的主写入客户端和至少一个备写入客户端,所述主写入客户端与每个备写入客户端分别归属于不同的服务器;
    分别将所述待写入的数据发送到所述主写入客户端和所述每个备写入客户端各自的写缓存中;
    当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,向所述主写入客户端和所述每个备写入客户端发送第一通知消息,所述第一通知消息用于告知所述主写入客户端及所述每个备写入客户端将各自记录的所述待写入数据的同步状态从未同步改为已同步。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当确定所述待写入的数据在所述主写入客户端或所述至少一个备写入客户端中发生写入失败时,向所述主写入客户端和所述至少一个备写入客户端中写成功的客户端发送第二通知消息,所述第二通知消息用于告知所述主写入客户端及所述至少一个备写入客户端中的写成功的客户端将自身记录的所述待写入数据的同步状态记为与写失败的客户端未同步。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述待写入数据的特征值确定主写入客户端和至少一个备写入客户端,具体包括:
    根据所述待写入数据的特征值,应用一致性哈希算法计算所述特征值对应的哈希值,根据所述哈希值确定所述待写入数据所属的分区partition;
    根据数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端,所述数据分布视图用于指示每个分区partition各自对应的主写入客户端备写入客户端。
  4. 根据权利要求3所述的方法,其特征在于,所述根据数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端包括:
    根据所述数据分布视图确定所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端;
    判断所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端中是否存在故障;
    将所述待写入数据所属的分区partition对应的不存在故障的主写入客户端和不存在故障的备写入客户端确定为所述主写入客户端和所述至少一个备写入客户端。
  5. 一种服务器集群中读缓存一致性的方法,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述方法包括:
    写入点客户端接收数据更新请求,所述数据更新请求用于请求更新待更新数据,根据所述待更新数据的特征值生成数据更新通知,所述数据更新通知携带指示所述待更新数据的所述特征值;
    向所述服务器集群中的所述数据更新请求对应的读客户端发送所述数据更新通知,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
    当接收到所述读客户端发送的所述待更新数据处理成功的响应消息,则发送所述待更新数据更新成功的响应消息,所述待更新数据更新成功的响应消息用于指示所述读客户端已针对所述待更新数据对所述读客户端各自的读缓存作了更新处理。
  6. 根据权利要求5所述的方法,其特征在于,当所述读客户端为保存了所述待更新数据的客户端时,所述方法还包括:所述写入点客户端查找热点信息目录表,根据所述待更新数据的所述特征值确定所述读客户端,所述热点信息目录表用于指示所有在读缓存中缓存有所述待更新数据的客户端。
  7. 根据权利要求6所述的方法,其特征在于,在所述写入点客户端查找热点信息目录表之前,所述方法还包括:
    所述写入点客户端接收来自所述读客户端广播的热点信息,将所述读客户端记录进所述热点信息目录表中,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
  8. 一种服务器集群中读缓存一致性的方法,其特征在于,所述服务器 集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述方法包括:
    读客户端接收写入点客户端发送的数据更新通知,所述数据更新通知携带指示待更新数据的特征值,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
    根据所述数据更新通知对所述读客户端各自的读缓存进行更新处理,并对所述写入点客户端发送待更新数据处理成功的响应消息。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述数据更新通知对所述读客户端各自的读缓存进行更新处理,具体包括:所述读客户端根据所述特征值确认所述读客户端各自的读缓存中是否缓存有所述待更新数据,若确认没有缓存所述待更新数据,则添加所述待更新数据为失效的记录。
  10. 根据权利要求8或9所述的方法,其特征在于,所述方法还包括:
    所述读客户端在将所述待更新数据缓存进所述读客户端各自的读缓存中时,向所述服务器集群中除所述读客户端之外的其他所有客户端广播热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
  11. 一种服务器集群中的写入点客户端,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有写缓存,所述写缓存用于缓存写入所述每个客户端的数据,所述写入点客户端包括:
    接收模块,用于接收数据写入消息,所述数据写入消息请求写入待写入数据;
    确定模块,用于根据所述待写入数据的特征值确定用于保存所述待写入数据的主写入客户端和至少一个备写入客户端,所述主写入客户端与每个备写入客户端分别归属于不同的服务器;
    发送模块,用于分别将所述待写入的数据发送到所述主写入客户端和所述每个备写入客户端各自的写缓存中;
    通知模块,用于当确定所述待写入数据在所述主写入客户端和所述每个备写入客户端都保存成功时,向所述主写入客户端和所述每个备写入客户端发送第一通知消息,所述第一通知消息用于告知所述主写入客户端及所述每 个备写入客户端将各自记录的所述待写入数据的同步状态从未同步改为已同步。
  12. 根据权利要求11所述的写入点客户端,其特征在于,当确定所述待写入的数据在所述主写入客户端或所述至少一个备写入客户端中发生写入失败时,则所述通知模块还用于发送第二通知消息,所述第二通知消息用于告知所述主写入客户端及所述至少一个备写入客户端中写成功的客户端将自身记录的所述待写入数据的同步状态记为与写失败的客户端未同步。
  13. 根据权利要求11或12所述的写入点客户端,其特征在于,所述写入点客户端还包括数据分布视图,所述数据分布视图用于指示每个分区partition对应的主写入客户端和备写入客户端,则所述确定模块具体用于:根据所述待写入数据的特征值,应用一致性哈希算法计算所述待写入数据的特征值对应的哈希值,根据所述哈希值确定所述待写入数据所属的分区partition;根据所述数据分布视图确定所述待写入数据所属的分区partition对应的所述主写入客户端和所述至少一个备写入客户端。
  14. 根据权利要求13所述的写入点客户端,其特征在于,所述确定模块具体用于:
    根据所述数据分布视图确定所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端;
    判断所述待写入数据所属的分区partition对应的全部主写入客户端和全部备写入客户端中是否存在故障;
    将所述待写入数据所属的分区partition对应的不存在故障的主写入客户端和不存在故障的备写入客户端确定为所述主写入客户端和所述至少一个备写入客户端。
  15. 一种服务器集群中的写入点客户端,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述写入点客户端包括:
    接收模块,用于接收数据更新请求,所述数据更新请求用于请求更新待更新数据;
    处理模块,用于根据所述待更新数据的特征值生成数据更新通知,所述数据更新通知携带指示所述待更新数据的所述特征值;
    通知模块,用于向所述服务器集群中的所述数据更新请求对应的读客户端发送所述数据更新通知,所述读客户端包括所述服务器集群中除所述写入点客户端之外的其它所有客户端,或者保存了所述待更新数据的客户端;
    当接收到所述读客户端发送的所述待更新数据处理成功的响应消息,则所述通知模块还用于发送所述待更新数据更新成功的响应消息,所述待更新数据更新成功的响应消息用于指示所述读客户端已针对所述待更新数据对所述读客户端各自的读缓存作了更新处理。
  16. 根据权利要求15所述的写入点客户端,其特征在于,所述写入点客户端还包括热点信息目录表,所述热点信息目录表用于指示所有在读缓存中缓存有所述待更新数据的客户端,所述处理模块还用于查找所述热点信息目录表,根据所述待更新数据的所述特征值确定所述读客户端。
  17. 根据权利要求16所述的写入点客户端,其特征在于,所述接收模块还用于接收来自所述读客户端广播的热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据,所述处理模块还用于将所述读客户端记录进所述热点信息目录表中。
  18. 一种服务器集群中的读客户端,其特征在于,所述服务器集群包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个客户端,每个客户端配置有读缓存,所述读缓存用于缓存每个客户端中被应用频繁访问的热点数据,所述读客户端包括:
    接收模块,用于接收写入点客户端发送的数据更新通知,所述数据更新通知携带指示待更新数据的特征值;
    处理模块,用于根据所述数据更新通知对所述读客户端的读缓存进行更新处理;
    发送模块,用于对所述写入点客户端发送待更新数据处理成功的响应消息。
  19. 根据权利要求18所述的读客户端,其特征在于,所述处理模块具体用于根据所述特征值确认所述读客户端的读缓存中是否缓存有所述待更新数据,若确认没有缓存所述待更新数据,则添加所述待更新数据为失效的记录。
  20. 根据权利要求18或19所述的读客户端,其特征在于,所述读客户端在将所述待更新数据缓存进所述读客户端的读缓存中时,所述发送模块还 用于向所述服务器集群中除所述读客户端之外的其他所有客户端广播热点信息,所述热点信息用于指示所述读客户端已经缓存有所述待更新数据。
  21. 一种服务器集群系统,其特征在于,所述服务器集群系统包括n个服务器,n为≥2的自然数,每个服务器配置有至少一个如权利要求11至17任一项所述的写入点客户端,以及如权利要求18至20任一项所述的读客户端。
PCT/CN2016/077385 2015-06-10 2016-03-25 一种服务器集群系统中的缓存方法、写入点客户端和读客户端 WO2016197666A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510317612.8A CN104935654B (zh) 2015-06-10 2015-06-10 一种服务器集群系统中的缓存方法、写入点客户端和读客户端
CN201510317612.8 2015-06-10

Publications (1)

Publication Number Publication Date
WO2016197666A1 true WO2016197666A1 (zh) 2016-12-15

Family

ID=54122622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/077385 WO2016197666A1 (zh) 2015-06-10 2016-03-25 一种服务器集群系统中的缓存方法、写入点客户端和读客户端

Country Status (2)

Country Link
CN (2) CN104935654B (zh)
WO (1) WO2016197666A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582730A (zh) * 2018-10-11 2019-04-05 阿里巴巴集团控股有限公司 缓存同步方法、装置、电子设备及计算机可读存储介质
CN110471939A (zh) * 2019-07-11 2019-11-19 平安普惠企业管理有限公司 数据访问方法、装置、计算机设备及存储介质
CN112416973A (zh) * 2020-11-02 2021-02-26 网宿科技股份有限公司 分布式数据库读写分离的方法、服务器和系统
CN113242285A (zh) * 2021-04-30 2021-08-10 北京京东拓先科技有限公司 一种热点数据处理方法、装置和系统
CN113485772A (zh) * 2021-07-28 2021-10-08 江苏创源电子有限公司 一种应用程序的配置更新方法、装置、设备及介质
CN113746641A (zh) * 2021-11-05 2021-12-03 深圳市杉岩数据技术有限公司 一种基于分布式存储的odx协议处理方法
CN114676166A (zh) * 2022-05-26 2022-06-28 阿里巴巴(中国)有限公司 数据处理方法及装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897859B (zh) * 2016-03-25 2021-07-30 北京书生云科技有限公司 一种存储系统
CN104935654B (zh) * 2015-06-10 2018-08-21 华为技术有限公司 一种服务器集群系统中的缓存方法、写入点客户端和读客户端
CN106855869B (zh) * 2015-12-09 2020-06-12 中国移动通信集团公司 一种实现数据库高可用的方法、装置和系统
CN105549905B (zh) * 2015-12-09 2018-06-01 上海理工大学 一种多虚拟机访问分布式对象存储系统的方法
CN105868038B (zh) * 2016-03-28 2020-03-24 联想(北京)有限公司 内存错误处理方法及电子设备
CN106776798A (zh) * 2016-11-23 2017-05-31 深圳市中博睿存科技有限公司 一种集群文件系统基于客户端的可传播缓存方法
CN106708636B (zh) * 2016-12-29 2020-10-16 北京奇虎科技有限公司 基于集群的数据缓存方法及装置
CN110196680B (zh) * 2018-03-27 2021-10-26 腾讯科技(深圳)有限公司 数据处理方法、装置及存储介质
CN109165321B (zh) * 2018-07-28 2020-06-02 华中科技大学 一种基于非易失内存的一致性哈希表构建方法和系统
CN110955382A (zh) * 2018-09-26 2020-04-03 华为技术有限公司 一种在分布式系统中写入数据的方法和装置
CN109561151B (zh) * 2018-12-12 2021-09-17 北京达佳互联信息技术有限公司 数据存储方法、装置、服务器和存储介质
CN110781373B (zh) * 2019-10-29 2022-09-06 北京字节跳动网络技术有限公司 榜单更新方法、装置、可读介质和电子设备
CN111309262B (zh) * 2020-02-16 2021-01-29 西安奥卡云数据科技有限公司 一种分布式存储缓存读取和写入方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006330A (zh) * 2010-12-01 2011-04-06 北京瑞信在线系统技术有限公司 分布式缓存系统、数据的缓存方法及缓存数据的查询方法
CN103268318A (zh) * 2013-04-16 2013-08-28 华中科技大学 一种强一致性的分布式键值数据库系统及其读写方法
CN104142896A (zh) * 2013-05-10 2014-11-12 阿里巴巴集团控股有限公司 一种缓存控制方法和系统
CN104935654A (zh) * 2015-06-10 2015-09-23 华为技术有限公司 一种服务器集群系统中的缓存方法、写入点客户端和读客户端

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1770954A1 (en) * 2005-10-03 2007-04-04 Amadeus S.A.S. System and method to maintain coherence of cache contents in a multi-tier software system aimed at interfacing large databases
CN101668046B (zh) * 2009-10-13 2012-12-19 成都市华为赛门铁克科技有限公司 资源缓存方法及其装置、系统
CN102780763B (zh) * 2012-06-29 2015-03-04 华中科技大学 一种分布式hss数据存储方法和分布式hss数据提取方法
CN103049574B (zh) * 2013-01-04 2015-12-09 中国科学院高能物理研究所 实现文件动态副本的键值文件系统及方法
CN104156361B (zh) * 2013-05-13 2017-11-17 阿里巴巴集团控股有限公司 一种实现数据同步的方法及系统
CN104239310A (zh) * 2013-06-08 2014-12-24 中国移动通信集团公司 分布式数据库数据同步方法和装置
CN103747073A (zh) * 2013-12-30 2014-04-23 乐视网信息技术(北京)股份有限公司 一种分布式缓存的方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006330A (zh) * 2010-12-01 2011-04-06 北京瑞信在线系统技术有限公司 分布式缓存系统、数据的缓存方法及缓存数据的查询方法
CN103268318A (zh) * 2013-04-16 2013-08-28 华中科技大学 一种强一致性的分布式键值数据库系统及其读写方法
CN104142896A (zh) * 2013-05-10 2014-11-12 阿里巴巴集团控股有限公司 一种缓存控制方法和系统
CN104935654A (zh) * 2015-06-10 2015-09-23 华为技术有限公司 一种服务器集群系统中的缓存方法、写入点客户端和读客户端

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582730A (zh) * 2018-10-11 2019-04-05 阿里巴巴集团控股有限公司 缓存同步方法、装置、电子设备及计算机可读存储介质
CN110471939A (zh) * 2019-07-11 2019-11-19 平安普惠企业管理有限公司 数据访问方法、装置、计算机设备及存储介质
CN112416973A (zh) * 2020-11-02 2021-02-26 网宿科技股份有限公司 分布式数据库读写分离的方法、服务器和系统
CN113242285A (zh) * 2021-04-30 2021-08-10 北京京东拓先科技有限公司 一种热点数据处理方法、装置和系统
CN113485772A (zh) * 2021-07-28 2021-10-08 江苏创源电子有限公司 一种应用程序的配置更新方法、装置、设备及介质
CN113746641A (zh) * 2021-11-05 2021-12-03 深圳市杉岩数据技术有限公司 一种基于分布式存储的odx协议处理方法
CN113746641B (zh) * 2021-11-05 2022-02-18 深圳市杉岩数据技术有限公司 一种基于分布式存储的odx协议处理方法
CN114676166A (zh) * 2022-05-26 2022-06-28 阿里巴巴(中国)有限公司 数据处理方法及装置
CN114676166B (zh) * 2022-05-26 2022-10-11 阿里巴巴(中国)有限公司 数据处理方法及装置

Also Published As

Publication number Publication date
CN104935654A (zh) 2015-09-23
CN108418900B (zh) 2021-05-04
CN108418900A (zh) 2018-08-17
CN104935654B (zh) 2018-08-21

Similar Documents

Publication Publication Date Title
WO2016197666A1 (zh) 一种服务器集群系统中的缓存方法、写入点客户端和读客户端
US20210004355A1 (en) Distributed storage system, distributed storage system control method, and storage medium
US11320991B2 (en) Identifying sub-health object storage devices in a data storage system
KR102457611B1 (ko) 터넌트-어웨어 스토리지 쉐어링 플랫폼을 위한 방법 및 장치
US11487787B2 (en) System and method for near-synchronous replication for object store
WO2019127916A1 (zh) 基于分布式一致性协议实现的数据读写方法及装置
CN107885758B (zh) 一种虚拟节点的数据迁移方法和虚拟节点
US8904117B1 (en) Non-shared write-back caches in a cluster environment
US9152501B2 (en) Write performance in fault-tolerant clustered storage systems
WO2017097059A1 (zh) 分布式数据库系统及其自适应方法
US9830088B2 (en) Optimized read access to shared data via monitoring of mirroring operations
WO2017113276A1 (zh) 分布式存储系统中的数据重建的方法、装置和系统
US9262323B1 (en) Replication in distributed caching cluster
CN111274310A (zh) 一种分布式数据缓存方法及系统
US20150058570A1 (en) Method of constructing share-f state in local domain of multi-level cache coherency domain system
US20140115251A1 (en) Reducing Memory Overhead of Highly Available, Distributed, In-Memory Key-Value Caches
WO2014169649A1 (zh) 一种数据处理方法、装置及计算机系统
US20150189039A1 (en) Memory Data Access Method and Apparatus, and System
CN107018185B (zh) 云存储系统的同步方法和装置
JP5686034B2 (ja) クラスタシステム、同期制御方法、サーバ装置および同期制御プログラム
US11151005B2 (en) System and method for storage node data synchronization
CN113010549A (zh) 基于异地多活系统的数据处理方法、相关设备及存储介质
US11288237B2 (en) Distributed file system with thin arbiter node
US20140244936A1 (en) Maintaining cache coherency between storage controllers
US9672150B2 (en) Migrating write information in a write cache of a storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16806564

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16806564

Country of ref document: EP

Kind code of ref document: A1