CN115858181B - Distributed storage inclined work load balancing method based on programmable switch - Google Patents

Distributed storage inclined work load balancing method based on programmable switch Download PDF

Info

Publication number
CN115858181B
CN115858181B CN202310170363.9A CN202310170363A CN115858181B CN 115858181 B CN115858181 B CN 115858181B CN 202310170363 A CN202310170363 A CN 202310170363A CN 115858181 B CN115858181 B CN 115858181B
Authority
CN
China
Prior art keywords
storage server
key
programmable switch
ver
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310170363.9A
Other languages
Chinese (zh)
Other versions
CN115858181A (en
Inventor
胡增
江大白
汪刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Applied Technology Co Ltd
Original Assignee
China Applied Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Applied Technology Co Ltd filed Critical China Applied Technology Co Ltd
Priority to CN202310170363.9A priority Critical patent/CN115858181B/en
Publication of CN115858181A publication Critical patent/CN115858181A/en
Application granted granted Critical
Publication of CN115858181B publication Critical patent/CN115858181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of distributed storage, and discloses a distributed storage inclined workload balancing method based on a programmable switch, which utilizes the programmable switch to process the workload of a path, ensures the consistency of data by using a consistency directory in a network, and can increase the throughput by 10 times or reduce the number of storage servers required by 90 percent compared with the existing load balancing method. The invention can quickly respond to the condition that the hot key is changed, and can provide better load balance characteristic by copying a small quantity of hot objects, so that the hot objects can be quickly identified and tracked no matter the hot objects are kept unchanged or the hot objects are quickly changed.

Description

Distributed storage inclined work load balancing method based on programmable switch
Technical Field
The invention relates to the field of distributed storage, in particular to a distributed storage inclined work load balancing method based on a programmable switch.
Background
The real workload of a storage system typically exhibits a highly tilted object access pattern, i.e., a small fraction of hot objects receive far more requests than the rest of the objects. Many such workloads can be modeled using a Zipfian access distribution, but some actual workloads may exhibit very high tilt levels (e.g., a Zipf distribution using a >1, which is a special case of a Zipfian distribution). Furthermore, the collection of hot objects may change dynamically, and in some cases, the hot objects may lose heat on average within 10 minutes.
Distributed storage systems typically store objects on multiple storage servers to achieve scalability and load distribution. The high tilt of the workload means that the load across the storage servers is also uneven: a few storage servers storing the hottest objects will receive disproportionate traffic over other storage servers. Too high an access tilt may overload the storage servers by loading the objects with more than the processing power of a single storage server. To reduce the performance penalty, the system requires over-configuration, greatly increasing the overall cost.
Tilted workloads include a wide variety of such as re-read workloads (more than 95% of requests are read requests), re-write workloads, and hybrid workloads. In addition, there is a great difference in object (value) size, for example, within a program, the system may store smaller values (several bytes), larger values (kilobytes to megabytes), or a combination of both. Thus, there is a need for a better tilting workload balancing approach to address the above.
The prior art comprises the following steps:
and (3) caching: caching has long been the standard method of accelerating database-supported web applications, and has proven its effectiveness both theoretically and practically. However, the caching method has two limitations: first, the effectiveness of a cache depends on whether a cache can be built that can handle more orders of magnitude of data, and requires that the cache be requested much faster than the storage server. The above object is easily achieved when the cache system itself is not limited by the memory storage hardware. However, due to the current hardware limitations, achieving the above objective by the caching method becomes a difficult challenge. Second, the caching solution only benefits the read-only workload because the cached copy must be invalidated before the storage server processes the write operation.
Selective replication: selective replication is another common solution to achieve load balancing. By selectively replicating hot objects to multiple storage servers, requests for these hot objects can be sent to any one storage server with a replica, effectively distributing the load among the storage servers. However, existing selective replication methods face two challenges. First, the client must be able to identify the hot object and its duplicate location, but the hot object and its duplicate location may change as the object warmth changes. Although this can be achieved by using a centralized directory service or by copying the directory to the clients, both have scalability limitations because a centralized directory service can easily become a bottleneck and it is difficult to synchronize the directory among hundreds or thousands of clients. A second challenge of the selective replication scheme is to provide consistency for hot objects, which the prior art fails to address.
Disclosure of Invention
In order to solve the technical problems, the invention provides a distributed storage inclined work load balancing method based on a programmable switch.
In order to solve the technical problems, the invention adopts the following technical scheme:
a distributed storage tilting workload balancing method based on a programmable switch is used for balancing workload between a client and a distributed storage server, wherein the client sends a request containing an application layer message to the storage server, and the request comprises a write request for storing a value corresponding to a key to the storage server and a read request for reading the value corresponding to the key from the storage server; the storage server returns a response containing the application layer message to the client; the workload comprises a client request and a storage server response, and the workload passes through the programmable switch; the storage server comprises a main storage server and a plurality of copy storage servers; the application layer message comprises information of a key and an object version number of the key;
the workload balancing method specifically comprises the following steps:
p1: the programmable switch counts the hottest O (nlogn) hotkeys when a client interacts with a storage server
Figure SMS_2
N represents the number of all the different keys, and (2)>
Figure SMS_4
Representing complexity; the programmable switch will be hotkey->
Figure SMS_6
Corresponding value +.>
Figure SMS_3
Is stored in a copy storage server and the hotkey +_ is recorded by a programmable switch>
Figure SMS_5
Hotkey->
Figure SMS_7
The highest object version number ver_completed and the value +.>
Figure SMS_8
Storage server at home->
Figure SMS_1
A list;
p2: when a client sends a write request to a storage server:
the programmable switch allocates an incremental object version number to each write request key in turn; if a key in the write request has a record in the programmable switch, the key is a hot key, and the programmable switch selects one or more storage servers from all storage servers to forward the write request; if the key in the write request does not have a record in the programmable switch, forwarding the write request to the primary storage server;
when a client sends a read request to a storage server:
if a key in the read request has a record in the programmable switch, the key is a hotkey, find hotkey
Figure SMS_9
Storage server where the value of (2) is located +.>
Figure SMS_10
To forward the read request; if the key in the read request does not have a record in the programmable switch, forwarding the read request to the primary storage server;
p3: the main storage server records the object version numbers of all keys and the values of the keys, and the duplicate storage server records the object version numbers of the hot keys and the values of the hot keys; when processing a write request, the storage server compares the object version number ver of the key in the write request with the object version number of the key in the storage server, and only when the object version number ver of the key in the write request is higher, the storage server updates the value of the key and the object version number;
p4: either one of themStorage server
Figure SMS_11
Returning a response to the client, and the key in the response has a record in the programmable switch, then:
comparing hotkeys in response
Figure SMS_12
Object version number ver of the programmable switch and hotkey recorded +.>
Figure SMS_15
And if ver > ver_completed, updating the value of ver_completed to ver and programming the hotkey recorded by the switch->
Figure SMS_17
Corresponding storage server->
Figure SMS_14
After the list content is emptied, the storage server is again left +.>
Figure SMS_16
Joining a storage server->
Figure SMS_18
A list; if ver=ver_completed, the server will be stored +.>
Figure SMS_19
Joining a storage server
Figure SMS_13
A list.
Specifically, the application layer header includes an OP field, a KEYHASH field, a VER field, and a server field; the OP field represents the workload type, the KEYHASH field content is a hash value of a key generated in the programmable switch, the VER field content is an object version number VER of the key distributed by the programmable switch for the workload, and the SERVRID field content is a storage server identifier filled by the storage server when responding; the contents of the OP field include READ, WRITE, READ-REPLY or WRITE-REPLY, READ indicating a READ request, WRITE indicating a WRITE request, READ-REPLY indicating a REPLY to the READ request, WRITE-REPLY indicating a REPLY to the WRITE request.
Specifically, the programmable switch has an intra-network coherence directory therein, and the intra-network coherence directory is hot-keyed by a hash table
Figure SMS_20
Hotkey->
Figure SMS_21
Object version number and value +.>
Figure SMS_22
Storage server at home->
Figure SMS_23
Recording is performed.
Specifically, when a key
Figure SMS_24
When heat is lost, i.e. bond +.>
Figure SMS_25
O (nlogn) hot bonds which are no longer the hottest +.>
Figure SMS_26
The programmable switch is for the key->
Figure SMS_27
Marking and receiving the inclusion key +.>
Figure SMS_28
In response to the programmable switch pair key is deleted->
Figure SMS_29
Is recorded in the database.
Specifically, when a hotkey is required from the primary storage server
Figure SMS_30
When the value of (2) is copied to the copy storage server, the programmable switch issues a virtual write command to write the hot key +.>
Figure SMS_31
Hot key for modifying object version number into programmable switch record>
Figure SMS_32
Is the highest object version number ver_completed and will be hotkey +.>
Figure SMS_33
Hotkey->
Figure SMS_34
Corresponding value and hotkey->
Figure SMS_35
The object version number of (2) is sent to a copy storage server for storage.
Compared with the prior art, the invention has the beneficial technical effects that:
the invention utilizes the programmable switch and can increase throughput by 10 times or reduce the number of storage servers by 90% compared with the existing load balancing method.
The invention can quickly respond to the condition that the hot key is changed, and can provide better load balance characteristic by copying a small quantity of hot objects, so that the hot objects can be quickly identified and tracked no matter the hot objects are kept unchanged or the hot objects are quickly changed.
The present invention is applicable to a variety of workloads, such as a reread workload and a rewriter workload, while also being applicable to workloads having different sizes of objects and different tilt levels.
Drawings
FIG. 1 is a system model diagram of the present invention;
fig. 2 is a diagram of an application layer header data format of a workload of the present invention.
Detailed Description
A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
The invention is implemented by a programmable switch having a programmable data plane, such as a programmable switch having a BarefotTofino, caviem XPliant, or Broadcomptrident 3 series chip. These chips have the following characteristics: (1) application layer header specific programmable parsing; (2) A flexible packet processing pipeline, typically consisting of 10-20 pipeline stages, each capable of performing a match lookup and one or more ALU operations; (3) general purpose memory on the order of 10 MB. And the above-mentioned features are all on the data plane of the programmable switch, which means that these chips can be used when processing data packets at full line rate. The tilted workload balancing method of the present invention provides load balancing at the rack level, i.e. 32 to 256 storage servers are connected through one programmable switch, but does not provide fault tolerance guarantees inside the rack.
The invention uses the top of rack (ToR) programmable switch as the central point of the system, the ToR programmable switch is positioned on the path of each client request and storage server response, so the abstraction of the consistency catalogue in the network can be realized in the ToR programmable switch. The ToR programmable switch may track the location of each hot object (value) in the system through a coherence directory within the network and forward the request to a storage server with available capacity, even by determining the location where the write request was sent to alter the number or location of copies of the value. With this intra-network coherence directory, the present invention designs a version-based coherence protocol that guarantees linearity and is very efficient in handling value updates. The present invention can provide good load balancing even for write-intensive workloads.
As shown in FIG. 1, in this embodiment, an intra-network coherence directory is implemented in a rack-level storage system. The storage system includes a plurality of storage servers, all located in one rack. The storage server comprises a main storage server and a copy storage server, wherein the main storage server is the storage server where the value script is located, and the copy storage server is the storage server which stores the value copy after the value becomes a hot object. The system used in the present invention includes a programmable switch and a controller for the programmable switch.
Each key corresponds to a value stored on the storage server; the client can send out a request; the storage server is capable of data storage and responding to requests issued by clients. If the read request is the read request, the storage server finds the stored value according to the key in the read request and returns the value to the client; if the value is a write request, the storage server stores the value which needs to be written in the write request.
1. Programmable switch
Taking ToR programmable switch as an example, the intra-network consistency directory maintained by the present invention is described: the consistency directory in the network stores a group of hotkeys, the highest object version number of each hotkey and a storage server list rset where the value corresponding to each hotkey is located. To reduce programmable switch resource overhead and support arbitrarily sized keys, the intra-network coherence directory stores keys via a fixed-size hash table.
The present invention defines an application layer header embedded in a four layer (L4) workload, as shown in fig. 2. The invention reserves a special UDP port for the programmable switch to match the system data packet. The application layer header includes an ETH field, an IP field, a UDP field, an OP field, a KEYHASH field, a VER field, and a server field. The OP field content may be READ, WRITE, READ-REPLY or WRITE-REPLY, READ indicating a READ request, WRITE indicating a WRITE request, READ-REPLY indicating a REPLY to the READ request, WRITE-REPLY indicating a REPLY to the WRITE request; the KEYHASH field content is a fixed-size key hash value generated in the programmable switch, and the VER field content is an object version number of a key allocated by the programmable switch for a workload; the server field content is a unique identification of the storage server, which is populated upon reply.
The system message forwarding not of the invention uses standard two-layer (L2) or three-layer (L3) routing, keeping the programmable switch fully compatible with existing network protocols.
2. Controller for controlling a power supply
The programmable switch controller of the present invention decides which keys are hot keys and is responsible for updating the intra-network coherence directory with the hottest O (nlogn) keys, n representing the number of all the different keys. To this end, the present invention designs a request statistics engine in a programmable switch that tracks the access rate of each key using the data plane of the programmable switch and the programmable switch CPU. The controller may run on a programmable switch CPU or a remote storage server, reading the access rate of each key from the request statistics engine, taking the most common key as the hotkey. The controller only keeps soft state, and can be replaced immediately when in fault. The controller copies the value corresponding to the hotkey from the primary storage server to the replica storage server.
3. Request processing
The processing of the request is divided into client request processing and storage server response processing, and the client request and the storage server response belong to the workload.
Processing algorithm of client request (algorithm one):
1:if pkt1.op=WRITE then
2:pkt1.ver←ver_next++;
3:end if;
4:if rkeys.contains(pkt1.keyhash)then
5:if pkt1.op=READthen
6:pkt1.dst←select replica from rset[pkt1.keyhash];
7:elseif pkt1.op= WRITE then
8:pkt1.dst←select from all servers;
9:endif;
10:end if;
11:Forward packet;
wherein pkt1 represents a client request, pkt1.OP represents the OP field content of pkt1, pkt1.VER represents the VER field content of pkt1, that is, the object version number of a key, ver_next represents the next object version number, pkt1.KEYHASH represents the KEYHASH field content of pkt1, pkt1.Dst represents a target storage server to which the request is forwarded, rset [ pkt1.KEYHASH ] represents a storage server list rset corresponding to a hot key in pkt1, and Forward packet represents a packet from which the request is sent. rkeys represents a consistency directory within the network maintained by a programmable switch.
For processing of client requests, lines 1 to 3 of algorithm one: the programmable switch of the present invention assigns an object version number to each WRITE Request (WRITE), and adds 1 to ver_next after writing ver_next into the VER field of the request. Lines 4 to 10 of algorithm one: determining how to forward the request by matching the key hash value of the request to a coherence directory within the network: if the key is not a hot key, the request is forwarded to the original destination, i.e., the key's primary storage server; for a read request of a hotkey, forwarding the read request by selecting a storage server from the list rset of storage servers of the hotkey; for a hot-key write request, one or more storage servers are selected from all storage servers to forward the write request, based on storage server selection policy decisions.
The storage server maintains an object version number for each key and a key value. When processing a write request, the storage server compares the object version number in the application layer header VER field with the object version number of the key in the storage server, and only when the object version number in the VER field is higher, the storage server updates the key value and the key object version number.
Processing algorithm of storage server response (algorithm two):
1:if rkeys.contains(pkt2.keyhash)then
2:if pkt2.ver>ver_completed[pkt2.keyhash]then
3:ver_completed[pkt2.keyhash]←pkt2.ver;
4:rset[pkt2.keyhash]<-set(pkt2.serverid);
5:elseif pkt2.ver=ver_completed[pkt2.keyhash]then
6:rset[pkt2.keyhash].add(pkt2.serverid);
7:endif;
8:end if;
9:Forward packet;
pkt2 represents the response of the storage server, pkt2.KEYHASH represents the KEYHASH field contents of pkt2, ver_completed represents the highest object version number of the keys stored within the programmable switch; pkt2.SERVERID represents the content of the SERVERID field representing pkt2, i.e., the identification id of the storage server that responded.
Algorithm two, lines 1 to 7: for processing of storage server responses, when the programmable switch receives a response of the READ-REPLY or WRITE-REPLY type, searching a key in the response in a consistency directory in the programmable switch network, if the key is contained, the key in the response is a hot key, and the programmable switch compares the object version number in the application layer header VER field with the latest object version number ver_completed of the hot key stored in the programmable switch:
if the hot key in the response has a higher object version number, the programmable switch updates ver_completed and resets the storage server list rset corresponding to the hot key, and then the server sending the response is stored in the storage server list rset corresponding to the hot key; if the two object version numbers are equal, the programmable switch directly stores the server sending the response to the storage server list rset corresponding to the hotkey.
The effects of algorithm one and algorithm two are as follows: after a write request for a key is sent to one or more storage servers, when the storage servers complete and acknowledge the write operation, the programmable switch can record these storage servers, which will send all future read requests for the key to the one or more storage servers, which is sufficient to ensure linearity.
4. Adding and deleting hotkeys
The hotness of the keys is constantly changing, and the programmable switch controller of the present invention continuously monitors the access frequency of the keys and updates the intra-network coherence directory by the hottest O (nlogn) keys. When a new key becomes a hot key, a directory entry is created for it, and the controller implements the creation of the directory entry by adding a key to the primary storage server, where the object version number of the key is the highest object version number ver_completed for that key stored in the programmable switch. Finally, the controller adds a hotkey to the intra-network coherence directory. In addition, after a key becomes a hotkey, the value of the hotkey is not immediately moved or copied to other storage servers, and a later write request for the hotkey is sent to a new storage server, thereby enabling the creation of a copy of the value of the hotkey.
When a key is no longer a hot key, the controller need only mark the key in the in-network coherence directory; the next write request for the key is sent to its primary storage server and the key is deleted from the intra-network coherence directory once the programmable switch receives a reply to the write request for the key.
The above scheme moves or copies the value of a key to a new storage server only on the next write request of that key. While this simplifies the design, it is not applicable to read-only values or values that are not frequently modified. The present invention can also solve this problem by performing a write operation that does not change the value when it is desired to move or copy the value of a key. More precisely, the controller can force copying or moving the value of the key by issuing a virtual write command to the key's main storage server, instruct the storage server to increase the stored object version number of the key to the highest object version number ver_completed of the key stored in the programmable switch, and forward the value of the key to other storage servers, which are then added to the storage server list rset corresponding to the key by the response of the storage servers, helping to provide the reading service.
5. Storage server selection policy
The present invention can currently select the following two storage server selection policies: the first strategy is to randomly select a storage server and rely on statistics for load balancing. Another strategy is to use weighted polling: based on the collected load statistics of the storage servers, the controller assigns a weight to each storage server and instructs the programmable switch to select the storage server at a frequency proportional to the weight.
The read request is sent to only one storage server and the write request may be sent to one or more storage servers. Larger storage servers provide more options for future read requests, improving load balancing, but larger storage servers increase the cost of write operations. For workloads where the write load is large, increasing the write cost easily negates the benefits of any load balancing.
When the invention selects the value copy number (replication factor) of the hotkey, the programmable switch tracks the reading times of each hotkey in a period of time, and the replication factor proportional to the reading times is selected, so that the cost can be limited.
6. Hash collision
The intra-network coherence directory of the present invention holds only hotkeys, not all keys. If there is a hash collision of a hot key with a non-hot key, a request for the non-hot key may be sent to the wrong storage server. To address this problem, each storage server keeps track of all current hotkeys (kept up-to-date by the controller). With this information, the storage server can forward incorrectly forwarded requests to the correct storage server. Since the number of hotkeys is small and the number of non-hotkey requests is small, the request linking method has little influence on the performance. When hash collisions occur rarely with two hotkeys, the present invention only copies the value of one hotkey to ensure correctness.
7. Object version number
The object version number must monotonically increase. The invention uses 64-bit object version number, and the full linear rate processing of the programmable switch exceeds 100 years, so that the overflow of the object version number is possible, and the service life is extremely long.
8. Garbage collection
The present invention does not explicitly invalidate or delete the old version when processing a write request of a hotkey. While the intra-network coherence directory can ensure that all requests are forwarded to the latest version without affecting correctness, always preserving an outdated value copy wastes memory space on the storage server. The present invention addresses this problem by garbage collection. The controller has informed the storage server which keys are hot keys and reports the highest object version number of the keys on a regular basis. Then, if the version of a key in the storage servers is over time, or the key is no longer a hot key and the storage server is not the primary storage server for that key, then each storage server may safely delete that key.
The invention provides a distributed storage inclined work load balancing method based on a programmable switch, which has the following characteristics: 1) Providing good load balancing for high-tilt dynamic workloads; 2) The system can work together with a rapid memory storage system; 3) An object of arbitrary size can be handled; 4) The linearity is guaranteed; 5) The same is valid for the re-read workload, the re-write workload, and the hybrid read-write workload.
In addition, the invention can ensure the consistency of data, in particular linearity, by using the consistency directory in the network without introducing other performance losses, and achieve faster performance by using the memory storage.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a single embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to specific embodiments, and that the embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.

Claims (5)

1. The distributed storage tilting workload balancing method based on the programmable switch is characterized by being used for balancing the workload between a client and a distributed storage server, wherein the client sends a request containing an application layer message to the storage server, and the request comprises a write request for storing a value corresponding to a key into the storage server and a read request for reading the value corresponding to the key from the storage server; the storage server returns a response containing the application layer message to the client; the workload comprises a client request and a storage server response, and the workload passes through the programmable switch; the storage server comprises a main storage server and a plurality of copy storage servers; the application layer message comprises information of a key and an object version number of the key;
the workload balancing method specifically comprises the following steps:
p1: the programmable switch counts the hottest O (nlogn) hotkeys when a client interacts with a storage server
Figure QLYQS_2
N represents the number of all the different keys, and (2)>
Figure QLYQS_5
Representing complexity; the programmable switch will be hotkey->
Figure QLYQS_7
Corresponding value +.>
Figure QLYQS_3
Is stored in a copy storage server and the hotkey +_ is recorded by a programmable switch>
Figure QLYQS_4
Hotkey->
Figure QLYQS_6
The highest object version number ver_completed and the value +.>
Figure QLYQS_8
Storage server at home->
Figure QLYQS_1
A list;
p2: when a client sends a write request to a storage server:
the programmable switch allocates an incremental object version number to each write request key in turn; if a key in the write request has a record in the programmable switch, the key is a hot key, and the programmable switch selects one or more storage servers from all storage servers to forward the write request; if the key in the write request does not have a record in the programmable switch, forwarding the write request to the primary storage server;
when a client sends a read request to a storage server:
if a key in the read request has a record in the programmable switch, the key is a hotkey, find hotkey
Figure QLYQS_9
Storage server where the value of (2) is located +.>
Figure QLYQS_10
To forward the read request; if the key in the read request does not have a record in the programmable switch, forwarding the read request to the primary storage server;
p3: the main storage server records the object version numbers of all keys and the values of the keys, and the duplicate storage server records the object version numbers of the hot keys and the values of the hot keys; when processing a write request, the storage server compares the object version number ver of the key in the write request with the object version number of the key in the storage server, and only when the object version number ver of the key in the write request is higher, the storage server updates the value of the key and the object version number;
p4: any storage server
Figure QLYQS_11
Returning a response to the client, and the key in the response has a record in the programmable switch, then:
comparing hotkeys in response
Figure QLYQS_13
Object version number ver of the programmable switch and hotkey recorded +.>
Figure QLYQS_15
And if ver > ver_completed, updating the value of ver_completed to ver and programming the hotkey recorded by the switch->
Figure QLYQS_17
Corresponding storage server->
Figure QLYQS_14
After the list content is emptied, the storage server is again left +.>
Figure QLYQS_16
Joining a storage server->
Figure QLYQS_18
A list; if ver=ver_completed, the server will be stored +.>
Figure QLYQS_19
Joining a storage server->
Figure QLYQS_12
A list.
2. The programmable switch-based distributed storage tilting workload balancing method according to claim 1, wherein: the application layer header includes an OP field, a KEYHASH field, a VER field, and a server field; the OP field represents the workload type, the KEYHASH field content is a hash value of a key generated in the programmable switch, the VER field content is an object version number VER of the key distributed by the programmable switch for the workload, and the SERVRID field content is a storage server identifier filled by the storage server when responding; the contents of the OP field include READ, WRITE, READ-REPLY or WRITE-REPLY, READ indicating a READ request, WRITE indicating a WRITE request, READ-REPLY indicating a REPLY to the READ request, WRITE-REPLY indicating a REPLY to the WRITE request.
3. The method of claim 2, wherein the programmable switch has an intra-network coherence directory, the intra-network coherence directory being hot-keyed by a hash table pair
Figure QLYQS_20
Hotkey->
Figure QLYQS_21
Object version number and value +.>
Figure QLYQS_22
Storage server at home->
Figure QLYQS_23
Recording is performed.
4. The programmable switch-based distributed storage tilting workload balancing method according to claim 1, wherein when a key
Figure QLYQS_24
When heat is lost, i.e. bond +.>
Figure QLYQS_25
O (nlogn) hot bonds which are no longer the hottest +.>
Figure QLYQS_26
The programmable switch is for the key->
Figure QLYQS_27
Marking and receiving the inclusion key +.>
Figure QLYQS_28
In response to the programmable switch pair key is deleted->
Figure QLYQS_29
Is recorded in the database.
5. The programmable switch-based distributed storage tilting workload balancing method according to claim 1, wherein hotkeys are used when needed from the primary storage server
Figure QLYQS_30
When the value of (2) is copied to the copy storage server, the programmable switch issues a virtual write command to write the hot key +.>
Figure QLYQS_31
Hotkey for modifying object version number into programmable switch record
Figure QLYQS_32
Is the highest object version number ver_completed and will be hotkey +.>
Figure QLYQS_33
Hotkey->
Figure QLYQS_34
Corresponding value and hotkey->
Figure QLYQS_35
The object version number of (2) is sent to a copy storage server for storage. />
CN202310170363.9A 2023-02-27 2023-02-27 Distributed storage inclined work load balancing method based on programmable switch Active CN115858181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310170363.9A CN115858181B (en) 2023-02-27 2023-02-27 Distributed storage inclined work load balancing method based on programmable switch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310170363.9A CN115858181B (en) 2023-02-27 2023-02-27 Distributed storage inclined work load balancing method based on programmable switch

Publications (2)

Publication Number Publication Date
CN115858181A CN115858181A (en) 2023-03-28
CN115858181B true CN115858181B (en) 2023-06-06

Family

ID=85659136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310170363.9A Active CN115858181B (en) 2023-02-27 2023-02-27 Distributed storage inclined work load balancing method based on programmable switch

Country Status (1)

Country Link
CN (1) CN115858181B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117614956B (en) * 2024-01-24 2024-03-29 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Intra-network caching method and system for distributed storage and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207841A (en) * 2013-03-06 2013-07-17 青岛海信传媒网络技术有限公司 Method and device for data reading and writing on basis of key-value buffer
CN107948233A (en) * 2016-10-13 2018-04-20 华为技术有限公司 The method of processing write requests or read request, interchanger, control node
CN113315744A (en) * 2020-07-21 2021-08-27 阿里巴巴集团控股有限公司 Programmable switch, flow statistic method, defense method and message processing method
CN114844846A (en) * 2022-04-14 2022-08-02 南京大学 Multi-level cache distributed key value storage system based on programmable switch
CN115113893A (en) * 2021-03-22 2022-09-27 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and computer equipment
CN115277145A (en) * 2022-07-20 2022-11-01 北京志凌海纳科技有限公司 Distributed storage access authorization management method, system, device and readable medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027862A1 (en) * 2003-07-18 2005-02-03 Nguyen Tien Le System and methods of cooperatively load-balancing clustered servers
US8869157B2 (en) * 2012-06-21 2014-10-21 Breakingpoint Systems, Inc. Systems and methods for distributing tasks and/or processing recources in a system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207841A (en) * 2013-03-06 2013-07-17 青岛海信传媒网络技术有限公司 Method and device for data reading and writing on basis of key-value buffer
CN107948233A (en) * 2016-10-13 2018-04-20 华为技术有限公司 The method of processing write requests or read request, interchanger, control node
CN113315744A (en) * 2020-07-21 2021-08-27 阿里巴巴集团控股有限公司 Programmable switch, flow statistic method, defense method and message processing method
CN115113893A (en) * 2021-03-22 2022-09-27 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and computer equipment
CN114844846A (en) * 2022-04-14 2022-08-02 南京大学 Multi-level cache distributed key value storage system based on programmable switch
CN115277145A (en) * 2022-07-20 2022-11-01 北京志凌海纳科技有限公司 Distributed storage access authorization management method, system, device and readable medium

Also Published As

Publication number Publication date
CN115858181A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
US7222150B1 (en) Network server card and method for handling requests received via a network interface
EP1569085B1 (en) Method and apparatus for increasing data storage capacity
JP4154893B2 (en) Network storage virtualization method
Rowstron et al. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
US8046422B2 (en) Automatic load spreading in a clustered network storage system
US9405781B2 (en) Virtual multi-cluster clouds
US8086634B2 (en) Method and apparatus for improving file access performance of distributed storage system
US20090094380A1 (en) Shared storage network system and a method for operating a shared storage network system
US10212248B2 (en) Cache management on high availability routers in a content centric network
JP2004252663A (en) Storage system, sharing range deciding method and program
JP2010519630A (en) Consistent fault-tolerant distributed hash table (DHT) overlay network
US8621154B1 (en) Flow based reply cache
EP2314027A2 (en) Switching table in an ethernet bridge
CN115858181B (en) Distributed storage inclined work load balancing method based on programmable switch
US10523753B2 (en) Broadcast data operations in distributed file systems
US6725218B1 (en) Computerized database system and method
CN114844846A (en) Multi-level cache distributed key value storage system based on programmable switch
US20150106468A1 (en) Storage system and data access method
US10057348B2 (en) Storage fabric address based data block retrieval
US10387043B2 (en) Writing target file including determination of whether to apply duplication elimination
CN104951475B (en) Distributed file system and implementation method
US9667735B2 (en) Content centric networking
WO2011099098A1 (en) Storage device
JP5446378B2 (en) Storage system
EP1860846B1 (en) Method and devices for managing distributed storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant