WO2024041140A1

WO2024041140A1 - Data processing method, accelerator, and computing device

Info

Publication number: WO2024041140A1
Application number: PCT/CN2023/101332
Authority: WO
Inventors: 毛修斌; 何泽耀
Original assignee: 华为技术有限公司
Priority date: 2022-08-23
Filing date: 2023-06-20
Publication date: 2024-02-29
Also published as: CN117666921A

Abstract

A data processing method and an accelerator (1102), which are applied to a computing device (1100) supporting a key value (KV) service. The computing device comprises the accelerator (1102) and a processor (1101). The method comprises: the accelerator (1102) acquiring a KV operation request, and determining an execution mode according to the KV operation request, the execution mode comprising an unloading mode and a non-unloading mode; and then performing processing of the KV operation request according to the execution mode. In this way, the KV operation request is unloaded to the accelerator (1102) for completion, and a completely bypassed CPU is processed on a data plane within an unloading capability range, so that the throughput of a system can be improved, and the occupation of the CPU can be reduced.

Description

Data processing methods, accelerators and computing devices

This application claims priority to the Chinese patent application submitted to the State Intellectual Property Office of China on August 23, 2022, with application number 202211017087.4 and the invention name "Data processing method, accelerator and computing device", the entire content of which is incorporated by reference. in this application.

Technical field

The present application relates to the field of computer technology, and in particular, to a data processing method, an accelerator and a computing device.

Background technique

With the continuous development of distributed storage technology, scenarios such as databases, big data, high performance computing (HPC), and artificial intelligence (artificial intelligence) have begun to widely use distributed storage technology to store data to support better expansion. performance and improve resource utilization.

In distributed storage, data is distributed and stored on multiple nodes, and data is accessed across nodes through a high-performance network. Among them, in addition to using a more general tree structure for data/metadata management and indexing, local key data such as hotspot data can be organized using a key value (KV) data structure to achieve lower query latency. and higher concurrency. The application of non-volatile storage media as memory makes it possible to build a larger memory pool. Based on this, storing all data/metadata in the memory pool in the KV data structure can effectively improve the efficiency of management or indexing.

However, in distributed storage systems, the KV storage process is often implemented based on the central processing unit (CPU). The CPU needs to calculate and determine the KV data structure, which occupies the CPU's computing resources and network bandwidth. When multiple data needs to be concurrently When storing, the KV service that relies on the CPU can provide limited throughput and cannot meet the performance requirements of high-concurrency KV operations.

Contents of the invention

This application provides a data processing method, which is completed by offloading KV operation requests to the accelerator. Data plane processing within the offloading capability range completely bypasses the CPU, which can not only improve the throughput of the system, but also reduce the occupation of the CPU. , which meets the performance requirements of high-concurrency KV operations. This application also provides corresponding data processing devices, accelerators, computing equipment, computer-readable storage media, and computer program products.

In the first aspect, this application provides a data processing method. The method is applied to computing devices supporting key-value KV services. The computing device includes an accelerator and a processor. Specifically, the accelerator can obtain the KV operation request, and then determine the execution mode based on the KV operation request. The execution mode includes an offloading mode and a non-offloading mode. The offloading mode is used to instruct the accelerator to perform the operation requested by the KV operation, and the non-offloading mode is used to instruct the processor to perform the operation requested by the KV operation. Then the accelerator performs the processing of the KV operation request according to the execution mode.

This method offloads the basic operations of distributed KV data to the accelerator by accelerating the programmability and on-path processing capabilities of the processor. Data plane processing within the offloading capability range completely bypasses the CPU, which can improve the throughput of the system. , and can reduce CPU usage. Moreover, this method retains all the functions of KV on the CPU side, and a small number of KV operations that exceed the offload specifications are still forwarded to the CPU for processing, forming a hierarchical KV service that takes into account both performance and versatility.

In some possible implementations, the accelerator can obtain the operation metadata in the KV operation request and use the operation metadata to determine how to execute the KV operation request. Specifically, when the operation metadata satisfies the preset conditions, the execution mode is determined to be the offloading mode; otherwise, the execution mode is determined to be the non-offloading mode.

This method filters KV operation requests through operation metadata, determines the execution mode of KV operation requests within the range of the accelerator's offloading capability as the coloading mode, and determines the execution mode of KV operation requests outside the range of the accelerator's offloading capability as The non-coloaded method forms a hierarchical KV service, which not only improves the performance of the KV service through the accelerator, but also reduces the CPU usage, enabling the CPU to process complex operations and ensuring versatility.

In some possible implementations, operation metadata includes operation type and key length. Wherein, the operation metadata satisfies the preset conditions, which may be that the operation type is one of adding, deleting, querying, modifying or batch adding, batch deleting, batch querying or batch modifying, or Multiple types, and the bond length is less than the preset length.

This method is triggered by the operation type of the KV operation that the accelerator itself can handle and the maximum key length of the KV data that the KV operation can operate, and sets conditions for filtering KV operation requests within the offloading capability of the accelerator, so that it can accurately filter out those that meet the conditions. The KV operation request is offloaded to avoid the need for additional resources and time to be forwarded to the CPU side for processing due to low screening progress.

In some possible implementations, when the execution mode determined according to the KV operation request is the offloading mode, the accelerator can perform the target KV operation on the memory of the computing device according to the KV operation request. For example, the accelerator can perform add operations, delete operations, modify operations, query operations, or batch add operations, batch delete operations, batch modify operations, and batch query operations on the memory of the computing device.

In this method, the accelerator can effectively reduce pressure on the CPU side, reduce CPU usage, and improve KV operation performance by performing the above basic operations on the memory.

In some possible implementations, the memory can use KV blocks to store KV data. KV block includes key content field and value content field. Based on this, when the accelerator performs a target KV operation, it can write the target KV block to the memory of the computing device or query the target KV block from the memory of the computing device according to the KV operation request.

In this method, the accelerator can write the target KV block to the memory or query the target KV block from the memory according to the relevant information in the KV operation request to complete the KV operation. Since the KV operation request is offloaded to the accelerator, the performance of the KV operation is improved. .

In some possible implementations, the memory uses a hash table to store the metadata of the KV data. The hash table includes multiple hash buckets, each hash bucket in the multiple hash buckets includes multiple hash slots, and multiple hash slots belonging to the same hash bucket are used to store hash values. The fingerprints, key lengths and corresponding block addresses of the same multiple keys. In this way, it is helpful to quickly search the target KV block or quickly write the target KV block, which improves the efficiency of KV operations.

In some possible implementations, the KV operation request is a KV modification request, and the accelerator can determine the hash value and fingerprint based on the key content of the target KV block, for example, hashing the key content of the target KV block separately according to different hashing algorithms. Hash operation is performed to determine the hash value and fingerprint, and then the hash bucket corresponding to the target KV block is determined based on the hash value, and the hash slot in which the fingerprint in the hash bucket matches the fingerprint determined by the key content in the target KV block is updated.

This method further improves KV operation performance by designing a data structure suitable for accelerator processing to store KV data and its metadata.

In some possible implementations, the KV block also includes a lower-level KV block identification field and a lower-level KV block address field. KV operation request is a KV increase request. The accelerator may determine the hash value and fingerprint based on the key content of the target KV block, and then determine the hash bucket corresponding to the target KV block based on the hash value. When the fingerprint stored in the target hash slot in the hash bucket matches the fingerprint determined by the key content in the target KV block, read the lower-level KV block identification field and lower-level KV block address field in the target hash slot, and based on the lower-level KV block identification field and the field value of the lower-level KV block address field to determine the KV block at the end of the linked list, write the block address of the target KV block into the lower-level KV block address field in the KV block at the end of the linked list, and write the lower-level KV block in the KV block at the end of the linked list. The field value of the block identification field is identified as valid.

This method manages KV data with the same fingerprint by setting up a linked list, and solves the conflict problem of adding (inserting) operations.

In some possible implementations, the KV operation request is a KV increase request. Specifically, the accelerator can determine the hash value and fingerprint based on the key content of the target KV block, then determine the hash bucket corresponding to the target KV block based on the hash value, and then write the target KV block into the empty hash slot in the hash bucket. The block address and key content in the target KV block determine the fingerprint.

This method writes the block address of the target KV block and the fingerprint determined by the key content in the target KV block into the empty hash slot in the hash bucket, so that subsequent data query and modification can be performed based on the above block address and fingerprint.

In some possible implementations, the KV operation request is a KV query request. The KV operation request includes the block address and key content of the target KV block to be queried. Correspondingly, the accelerator can determine the hash value according to the key content of the target KV block, determine the corresponding hash bucket according to the hash value, and then perform address translation according to the block address of the target KV block to obtain the physical address. The physical address reads the hash bucket, and then the accelerator can determine the fingerprint based on the key content of the target KV block, query the hash bucket based on the fingerprint determined by the key content of the target KV block, and obtain the value content corresponding to the key content.

In this method, the accelerator can quickly query the target KV block based on the data structure designed for the accelerator, which improves operating performance. Moreover, the data structure designed by the accelerator fully considers conflict situations. Even if there is a conflict, the target KV block you are looking for can be accurately found, meeting business needs.

In a second aspect, this application provides a data processing device. The data processing device includes various units for executing the data processing method in the first aspect or any possible implementation of the first aspect.

In a third aspect, this application provides an accelerator. The accelerator includes processing modules and communication interfaces. The communication interface is used to provide network communication for the processing module, and the accelerator is used to execute the data processing method in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, the present application provides a computing device including an accelerator. Among them, computing devices include accelerators and processors. The processor can be a central processing unit, and the central processing unit can also provide KV services. The accelerator is used to execute the data processing method in the first aspect or any possible implementation manner of the first aspect to accelerate the KV service.

In a fifth aspect, the present application provides a computer-readable storage medium in which instructions are stored, and the instructions instruct the computing device to execute the above-mentioned first aspect or any implementation of the first aspect. data processing methods.

In a sixth aspect, the present application provides a computer program product containing instructions that, when run on a computing device, causes the computing device to execute the data processing method described in the above first aspect or any implementation of the first aspect. .

Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.

Description of drawings

Figure 1 is a schematic architectural diagram of a data processing system provided by this application;

Figure 2 is a flow chart of a data processing method provided by this application;

Figure 3 is a schematic structural diagram of a KV server node provided by this application;

Figure 4 is a schematic flow chart of a data processing method provided by this application;

Figure 5 is a schematic structural diagram of a data processing system provided by this application;

Figure 6 is a schematic structural diagram of KV data provided by this application;

Figure 7 is a schematic flow chart of a data processing method provided by this application;

Figure 8 is a schematic flow chart of a data processing method provided by this application;

Figure 9 is a schematic flow chart of a data processing method provided by this application;

Figure 10 is a schematic structural diagram of a data processing device provided by this application;

Figure 11 is a schematic structural diagram of a computing device provided by this application.

Detailed ways

To facilitate understanding, some technical terms involved in the embodiments of this application are first introduced.

Distributed storage refers to the distributed storage of data on multiple independent devices (such as storage servers and other storage devices). Distributed storage system refers to a storage system that uses distributed storage for data storage. Distributed storage systems usually have a scalable system structure, which can use multiple storage servers to share the storage load and use location servers to locate storage information. In this way, it not only improves the reliability, availability and access efficiency of the storage system, but also makes it easy to expand.

Data or metadata in distributed storage can be organized using a tree structure or key value (KV). Considering query performance and query cost, local key data (such as hotspot data) can usually be organized using key-value data structures to obtain lower query latency and higher concurrency.

Key value KV is specifically a method of describing the mapping relationship between elements that are related to each other. Each pair of elements contains a key and a value. The corresponding value can be retrieved by combining the specific key with the data structure used.

In order to meet the performance requirements of highly concurrent KV operations, this application provides a data processing method. This method can be applied to computing devices that support KV services, where supporting KV services includes supporting adding, deleting, querying or modifying KV data, or batch increment, batch deletion, batch query or batch modification of KV data. Computing devices include accelerators and processors. Among them, the processor can be a CPU, and the CPU can also support KV services. An accelerator refers to a device that works with a processor to accelerate services. The accelerator can be a data processing unit (DPU) or an infrastructure processing unit (IPU). For ease of description, the accelerator is used as an example of DPU in the following description.

DPU is specifically an on-chip system that focuses on associated processing and calculation of data. Among them, associated processing refers to associated signaling processing. Associated signaling is the various signaling required in call connection through the connection. The relay circuit occupied to transmit the signaling. In addition to its ability to accelerate network forwarding, virtualization, storage and other scenarios, DPU usually has certain programmability capabilities and can perform customized offload acceleration according to application scenarios. The DPU can deploy an operating system (also called a small system) independently and provide KV services. At this time, the computing device where the DPU is located includes two operating systems, namely a general operating system running in the CPU and a small system running in the DPU. In addition, DPU also It can exist as an external device to the CPU and form a heterogeneous system together with processors such as a graphics processing unit (GPU).

Specifically, the DPU can obtain the KV operation request, and then the DPU can determine the execution mode according to the KV operation request. The execution mode includes an offloading mode and a non-offloading mode. The offloading mode is used to instruct an accelerator such as the DPU to perform the operation requested by the KV operation. The non-offloading mode The mode is used to instruct the processor, such as the CPU, to perform the operation requested by the KV operation. The DPU can perform the processing of the KV operation request according to the execution mode.

This method uses the programmability and on-line processing capabilities of data processors such as DPU to offload the basic operations of distributed KV data to the DPU. Data plane processing within the offloading capability completely bypasses the CPU, which can improve the system. throughput and reduce CPU usage. Moreover, this method retains all the functions of KV on the CPU side, and a small number of KV operations that exceed the offload specifications are still forwarded to the CPU for processing, forming a hierarchical KV service that takes into account both performance and versatility.

Next, the system architecture of the embodiment of the present application will be introduced with reference to the accompanying drawings.

Referring to the schematic architectural diagram of the data processing system shown in Figure 1, the data processing system 10 includes a KV server node 100 and a KV client node 200. Among them, the KV server node 100 is a computing device that supports KV services, such as a server that supports KV services. The KV client node 200 is a device that supports access to the KV server node 100. The KV client node 200 may be a lightweight device, including but not limited to a laptop, a tablet, or a smartphone. The KV server node 100 and the KV client node 200 may be interconnected through a network, for example, through a high-performance network. It should be noted that according to the network scale of different business scenarios, the data processing system 10 may include one or more KV server nodes 100. Similarly, the data processing system 10 may include one or more KV client nodes 200.

Among them, the KV server node 100 includes DPU102 and CPU104. In the example of FIG. 1 , the CPU 104 is placed in the host, and the DPU 102 is used as an external device of the host. The KV server node 100 also includes a memory 106, which is used to store KV data to speed up the access efficiency of KV data. The memory 106 can be externally connected to the host. Based on this, the memory 106 can also be called host memory. Each host can be connected to multiple external memories 106, and the multiple memories 106 can be used to form a memory pool.

The KV client node 200 is deployed with applications. Application processes can be spawned when the application is running. The application process can call the KV service interface to initiate a KV operation. Correspondingly, the KV client (for example, the KV client process) on the KV client node 200 can generate a KV operation request based on the KV operation. The KV operation request can be (remote Direct memory access Remote Direct Memory Access, RDMA) message format. Then the KV client node 200 sends the KV operation request to the KV server node 100. The DPU 102 of the KV server node 100 is responsible for receiving and processing various KV operation requests.

Specifically, the DPU 102 may determine the execution mode according to the KV operation request, and then perform the processing of the KV operation request according to the execution mode. For example, the DPU 102 can obtain the operation metadata in the KV operation request, including the operation type and key length (Klen). When the operation metadata meets the preset conditions, for example, the operation type is one or more of add, delete, query, modification or batch addition, batch deletion, batch query or batch modification, and the key length is less than the preset length, then OK The execution mode is an offloading mode, and the DPU 102 can perform a target KV operation on the memory 106 according to the KV operation request. When the operation metadata does not meet the preset conditions, the execution mode is determined to be a non-offloading mode, and the DPU 102 forwards the KV operation request to the CPU 104, and the CPU 104 performs the target KV operation on the memory 106 according to the KV operation request. That is, KV operation requests within the processing capability of the DPU 102 are processed by the DPU 102 , and KV operation requests beyond the processing capability of the DPU 102 may be forwarded to the CPU 104 for processing by the CPU 104 .

Based on the data processing system 100 shown in Figure 1, an embodiment of the present application also provides a data processing method. The data processing method of the embodiment of the present application is introduced below with reference to the accompanying drawings.

Referring to the flow chart of the data processing method shown in Figure 2, the method includes:

S202: The DPU 102 receives the KV operation request sent by the KV client node 200.

KV operation requests include operation metadata. Among them, the operation metadata can include operation type and key length. Operation types may include basic operations, such as one or more of add (create), delete (delete), query (read), and modify (update). Adding, deleting, querying, and modifying can be collectively referred to as adding, deleting, querying, and modifying, and are recorded as CRUD. In some embodiments, multiple KV operations may be encapsulated in the KV operation request. Based on this, the operation type may include batch basic operations, such as one or more of batch addition, batch deletion, batch query, or batch modification. The key length refers to the length of the key, that is, the length of the key content.

S204: DPU102 obtains the operation metadata in the KV operation request. When the operation metadata meets the preset conditions, S206 is executed. When the operation metadata does not satisfy the preset conditions, S208 is executed.

S206: DPU102 determines that the execution mode is the offloading mode. Then execute S210.

S208: DPU102 determines that the execution mode is a non-offloading mode. Then execute S214.

Specifically, the DPU 102 can parse the KV operation request and obtain the operation metadata in the KV operation request. The operation metadata includes the operation type and key length. Under normal circumstances, DPU102 has the ability to process basic operations, while CPU104 has the ability to process complex operations, and DPU102 is limited by hardware capabilities and is usually used to process KV whose key length is within the preset length (key length is less than the preset length) Based on this data, the DPU 102 can determine whether the DPU 102 has the ability to process the KV operation in the KV operation request according to the operation type and key length, thereby determining the execution method of the KV operation.

For example, the DPU 102 can compare the operation type in the KV operation request with the operation type of the basic operation (such as add, delete, query, modify or batch add, batch delete, batch query, batch modification), and compare the operation type in the KV operation request. The bond length is compared to a preset length.

When the operation type matches the operation type of the basic operation and the key length is less than the preset length, it indicates that the DPU 102 has the ability to process the KV operation in the KV operation request, and the execution mode can be determined to be the offloading mode. The offloading mode is used to instruct the DPU 102 to perform the operation requested by the KV operation.

When the operation type does not match the operation type of the basic operation, and/or the key length is not less than the preset length, it means that the DPU 102 does not have the ability to process the KV operation in the KV operation request, and the execution mode can be determined to be a non-offloading mode. The non-offloading mode is used to instruct the CPU 104 to execute the operation requested by the KV operation.

The above preset length can be set according to the hardware type of the DPU 102. Depending on the hardware type of the DPU 102, the preset length can be different. For example, the default length can be set to 128 bytes (byte, B), or to 1 kilobyte (kilo byte, KB).

It should be noted that the above-mentioned S204 to S208 is a specific implementation manner in which the DPU 102 determines the execution method according to the KV operation request in the embodiment of the present application. In other possible implementation methods of the embodiment of the present application, the DPU may not perform the above steps. Or use other implementation methods. For example, the DPU 102 can directly try to perform the target KV operation. When the execution is successful, the result is returned. When the execution is unsuccessful, the KV operation request is forwarded to the CPU for processing by the CPU.

S210: The DPU 102 performs the target KV operation on the memory 106 according to the KV operation request.

The memory 106 uses KV blocks to store KV data. Based on this, the DPU 102 can write the target KV block to the memory 106 or query the target KV block from the memory 106 according to the KV operation request. When the operation type in the KV operation request is add, modify, or batch add or batch modify, the DPU 102 performs the operation of writing the target KV block into the memory 106 . When the operation type in the KV operation request is query or batch query, the DPU 102 performs the operation of querying the target KV block from the memory 106 . When the operation type in the KV operation request is delete or batch delete, the DPU 102 may perform an operation of deleting the target KV block from the memory 106 .

It should be noted that when the KV operation is to add, modify, or batch add or batch modify, the KV operation request may also include the value content in the KV data to be added or modified. For example, the KV operation request may include the key content "name" and the value content "Zhang San" to request the addition of the KV data "name, Zhang San".

S212: The DPU 102 returns the KV operation result to the KV client node 200.

Depending on the type of operation, the KV operation results may include different information. For example, when the operation type is add, modify, delete, or batch add, batch modify, or batch delete, the KV operation result can include operation success or operation failure. For another example, when the operation type is query or batch query, the KV operation result may include the queried KV data.

In order to facilitate understanding, this application will be described in detail with an example. For query or batch query operations, the DPU 102 can encapsulate the queried KV data and the operation success field value in the same response message, and then return the response message to the client node 200 . In some embodiments, the DPU 102 can also encapsulate the queried KV data and the operation success field value in different response messages, and then return them to the client node 200 respectively.

S214: DPU102 forwards the KV operation request to CPU104.

S216: The CPU 104 performs the target KV operation on the memory 106 according to the KV operation request.

S218: The CPU 104 returns the KV operation result to the KV client node 200.

Among them, the CPU 104 performs the target KV operation on the memory 106 according to the KV operation request and returns the KV operation result to the KV client node 200. For the specific implementation, please refer to the relevant content description of the DPU 102 performing the target KV operation and returning the KV operation result, which will not be discussed here. Repeat.

The above-mentioned S210 to S212 and the above-mentioned S214 to S218 are some processes in which the DPU 102 executes the KV operation request according to the execution mode. Specific implementation manner: In other possible implementation manners of the embodiments of this application, the processing of the KV operation request may also be performed in other ways.

This method uses the programmability and on-path processing capabilities of DPU102 to offload the basic operations of distributed KV data to DPU102. Data plane processing within the offloading capability completely bypasses the host CPU 104, which can improve the throughput of the system. , and can reduce the occupation of the host CPU104. Moreover, this method retains all functions of KV on the CPU 104 side, and a small number of KV operations that exceed the offloading specifications are still forwarded to the host CPU 104 for processing, forming a hierarchical KV service that takes into account both performance and versatility.

As a possible implementation method, the DPU 102 can be logically divided into a data plane and a control plane. The embodiment shown in Figure 2 mainly introduces the data processing method from the perspective of the data plane. The method of the embodiment of the present application will be described in detail from the perspective of the control plane and the data plane.

First, refer to the schematic structural diagram of the KV server node 100 shown in Figure 3. As shown in Figure 3, the DPU 102 is logically divided into two parts: the data plane and the control plane. The data plane is responsible for network IO communication and on-line processing of KV transactions. ,The control plane is responsible for managing the ,context information of communication and transaction processing and ,managing the state of the transaction. CPU104 runs the KV server and generates a KV server process. The KV server process can be responsible for connection management with the KV client process, KV resource management and scheduling in the memory 106, and has complete KV operation processing capabilities. A small number of KV operation requests that cannot be accelerated by the DPU 102 can be served by KV The end process completes processing.

Memory 106 resides in the KV data structure. The KV data structure is used to describe the organizational form of KV data. KV data can be organized in the form of KV blocks. In order to improve search efficiency, a hash table also resides in the memory 106 . The embodiment of the present application designs the corresponding KV data structure in a manner suitable for acceleration by the DPU 102 so that it can be processed more efficiently by the DPU 102 .

Next, refer to the flow diagram of the data processing method shown in Figure 4, which specifically includes the following steps:

Step 1: The KV operation request from the KV client node 200 passes through the switching network in the form of an RDMA message and reaches the network port of the DPU102 in the KV server node 100. The associated processing unit located in the data processing plane of the DPU102 processes the message. Parse to obtain operational metadata.

Among them, the switching network can also be called the connection network. Specifically, it is a network that establishes a communication channel between the source and destination of communication to realize information transmission. A switched network is usually implemented by switching equipment, which can include switches, routers and other equipment that implements information exchange.

Operation metadata may include one or more of operation types or key lengths. Further, the KV operation request may also include key content. For add, modify, batch add, and batch modify operations, the KV operation request can also include value content. Considering that multiple versions of data can exist, operational metadata can also include version numbers. Similarly, operational metadata can also include sequence numbers.

Step 2: The associated processing unit checks whether the current KV operation can be offloaded and accelerated on the DPU102. If it exceeds the offloading capability of the DPU102, it is forwarded to the host CPU 104 for processing by the KV server process.

Step 3: The path-associated processing unit writes the context to the control plane to record the network connection status and the status of the current KV operation.

The context includes the processing information necessary to perform the current KV operation and the status of the current KV operation. The processing information may include key content, and further, the processing information may also include value content. The status of the current KV operation includes the execution node and execution result of the current KV operation.

Step 4: According to the data requirements of the KV operation, the associated processing unit pulls the necessary operation data from the memory 106 through the high-speed bus through the IO processing module for processing.

The high-speed bus can be a standard bus such as Peripheral Component Interconnect Express (PCIe) or Compute Express Link (CXL), or it can be a private bus type. The operation of dragging data can use Direct Memory Access (DMA) or other memory access semantics, such as load store. Depending on the complexity of different KV operations, the associated processing unit may need to complete data extraction and processing several times.

Step 5: The path-associated processing unit writes the intermediate results and final results of processing into the context, and updates the operation status.

When the KV operation is an add operation, a query operation, a batch add operation, or a batch query operation, there may be conflicts, for example, the characteristic value of the key content in the KV data added in the add operation and the key content of other KV data in the hash slot. The eigenvalues are the same, or the eigenvalues of the key content in the KV data to be queried by the query operation are the same as the eigenvalues of the key contents of other KV data in the hash slot, then the representations conflict. The associated processing unit can perform conflict resolution operations, and the results generated during this operation can be intermediate results. For example, the intermediate results may include a linked list recording conflict information.

It should be noted that when the KV operation is an add operation, a delete operation, etc., the intermediate results may not be included, and the accompanying processing unit may write the final results into the context.

Step 6: The associated processing unit encapsulates the final result of the KV operation into an RDMA message and sends it back to the KV client node 100 to complete a KV operation.

During the entire process, the processing of the KV operation can be completed by the associated processing unit on the DPU 102, and the data interaction with the memory 106 is completed through the high-speed bus pass-through, without the participation of the computing power of the CPU 104.

Compared with Figure 2, Figure 4 not only describes the flow of the data processing method from the data plane, but also introduces the data processing method in detail from the control plane. It should also be noted that Figure 4 illustrates the interaction between a KV client node 200 and a KV server node 100. In actual application, multiple KV server nodes 200 can store KV data in a distributed manner and provide one or more A KV client node 200 provides services.

Referring to the schematic structural diagram of the data processing system 10 shown in Figure 5, several KV client nodes 200 can be interconnected with several KV end service nodes 100 through the RDMA network. The entire KV data can be divided into different domain segments and stored in the memory 106 of different KV server nodes 100. Among them, the KV data can be divided into different domain segments according to the key range, or divided into different domain segments in other ways, and distributed and stored in different KV server nodes 100 to achieve load balancing.

The KV server process in the KV server node 100 may include multiple execution threads to support certain concurrency. Each KV server node 100 establishes at least one RDMA connection with each KV client node 200 to complete message transmission. For example, there are N KV client nodes 200 and M KV server nodes 100 in Figure 5. Correspondingly, there are at least N and M RDMA connections on the KV server nodes and KV client nodes 200 respectively.

When an application triggers a KV operation, the KV client process can initiate a KV operation request on the corresponding RDMA queue pair (QP) according to the division of data domain segments. QP is a virtual interface between hardware and software. QP is a queue structure, which stores tasks issued by software to hardware in order, that is, Work Queue Element (WQE). WQE contains information such as where the data is taken out and how long it is sent to, and to which destination it is sent. . Each QP is independent and isolated from each other by a Protection Domain (PD). Therefore, a QP can be regarded as a resource exclusive to a user, and a user can also use multiple QPs at the same time. QP has many service types, including reliable connection (RC), reliable datagram (RD), unreliable connection (UC) and unreliable datagram (UD), etc., all Data interaction is possible when the source QP and destination QP are of the same type.

In this example, the DPU 102 on the KV server node 100 side receives the KV operation request and checks whether the operation type, key length and other information of the KV operation are within the acceleration range supported by the DPU 102. If supported, the DPU 102 will complete the corresponding request processing, otherwise it will be forwarded. The processing is completed by the host CPU 104.

The KV data structure resident in the memory 106 can be seen in Figure 6 . As shown in Figure 6, the hash table includes multiple hash buckets, such as multiple columns of the hash table in Figure 6, denoted as Entry 0...Entry M, each hash bucket in the multiple hash buckets Includes multiple hash slots, corresponding to multiple rows in a column, denoted as Slot 0...Slot N. Multiple hash slots belonging to the same hash bucket are used to store the fingerprints (Fingerprint), key length (Klen) and corresponding block addresses (KV Block address) of multiple keys with the same hash value.

In some possible implementations, for the key of KV data, the result calculated by hash algorithm 1 (i.e., the hash value described above) can be used to index the hash bucket entry, and each hash bucket contains For multiple slots, keys with the same calculation result of hash algorithm 1 will be placed in different slots in the same hash bucket. The header of each Slot includes a Fingerprint field, which stores the result of key calculation through hash algorithm 2 (i.e., the fingerprint mentioned above). During specific implementation, DPU102 or CPU104 can distinguish and search multiple keys stored in the same hash bucket through the Fingerprint field. Furthermore, the Slot also stores the virtual address and Key length (Klen) information of the KV Block, as well as the verification key required for address translation. The corresponding KV Block can be read and written through the virtual address.

The header field segment in KV Block includes the complete Key content and Value length information (Vlen). Further, the header field segment of the KV Block may also include a subordinate KV block identification field, such as the flags field in Figure 6. In the case of serious conflicts, the Fingerprint calculated by different keys in the same hash bucket may be the same. At this time, these keys share the same Slot, and their corresponding KV Blocks are managed in the form of a linked list. Specifically, the Flags field identifies whether a next-level KV Block exists, and stores the virtual address and verification key of the next-level KV Block in the previous-level KV Block (for example, the Next field segment of the KV Block). If no next-level KV Block exists, the corresponding address and verification key fields are invalid, but the corresponding space can be reserved. For example, the field value can be set to rsvd for possible subsequent insertion operations.

Under this data structure design, when DPU102 performs KV Block search, it only needs to load the header field and Next field in Figure 6 from the memory 106, and does not need to load the entire KV Block. When the key value verification is completed and the search is successful, , directly from memory 106 Drag the Value content to assemble the RDMA message, and then send the RDMA message to return the KV operation result. Because the length of the Key value in most scenarios is relatively short, for example, within 1KB, it can be loaded to the DPU102 for processing, while the length of the Value may reach the MB level. Loading the Value to the DPU102 has a great impact on the on-chip buffer. This data structure design reduces the specification requirements for DPU102 and is more suitable for offload acceleration.

In order to make the technical solution of the present application clearer and easier to understand, the following is an example of adding (also called inserting), modifying (also called updating), querying, and batch (batch) operations.

First, refer to the flow diagram of the data processing method shown in Figure 7. This method mainly describes the insertion or update process from the perspective of interaction between the KV client node 200 and the KV server node 100, specifically including the following steps:

Step 1: The KV client node 200 sends a KV operation request to the KV server node 100.

Specifically, the KV client node 200 may send a KV insert request or a KV update request to the KV server node 100 through an RDMA write operation. The KV insertion request or KV update request carries the necessary information to generate the KV Block and the corresponding write address (virtual address).

Among them, the necessary information to generate a KV Block may include the key content and value content of the target KV block to be added or modified, and the write address may be the block address of the target KV block. Further, the KV insertion request or KV update request may also include a verification key, that is, a verification key.

It should be noted that if the message length of the KV insertion request or KV update request exceeds the size of a maximum transmission unit (Maximum Transmission Unit, MTU), the KV client node 200 can complete the sending through multiple messages.

Step 2: DPU 102 receives the KV operation request, writes the value content to the memory 106 according to the KV operation request, and determines the hash value and fingerprint according to the key content and the hash algorithm.

Specifically, the DPU 102 directly writes the value content (Value part) of the target KV block into the memory 106 through DMA according to the block address of the target KV block in the KV insertion request or KV update request, and then writes the value content (Value part) of the target KV block into the memory 106 according to the key content (key part) and Hash algorithm 1 determines the hash value, determines the hash bucket corresponding to the target KV block based on the hash value, and then determines the fingerprint of the target KV block based on hash algorithm 2.

Step 3: DPU102 reads the hash bucket corresponding to the target KV block, and sequentially compares whether the fingerprints in each hash slot Slot of the hash bucket are the same as the fingerprints of the target KV block calculated in step 2. If the current update operation is, the Slot with successful fingerprint matching is refreshed; if the current insert operation is, DPU102 can try to find a blank Slot to write the block address and fingerprint of the KV Block.

If a fingerprint conflict occurs, conflict resolution can be performed. Among them, DPU102 can read the KV Block header information in the conflicting Slot, check whether the next-level KV Block address is valid through the field value of the lower-level KV block identification field, and if it is valid, read the lower-level KV block address field to read the next The block address of the first-level KV Block until the KV Block at the end of the linked list is found, then the address and verification key in the KV insertion request are written into the next-level KV Block address and verification key fields, and the Flags field is modified to identify the current KV Block The next hop address is valid.

Step 4: The KV server node 100 returns the KV operation result to the KV client node 200.

Specifically, the KV server node 100 can notify the KV client node 200 that the current KV insert/update operation is completed through RDMA Send.

It should be noted that when the KV client node 100 sends a KV operation request through RDMA Write, the DPU 102 can return a response message ACK to the KV client node 100 after receiving the KV operation request. Similarly, when the KV server node 100 (for example, DPU 102) notifies the completion of the operation through RDMA Send, the KV client node 200 can also return a response message ACK to the KV server node 100.

In the above scenario, the KV write operation (insert or update) involves more than 3 read and write operations (more times are needed when conflicts occur) to the memory 106 and corresponding logical processing operations. Among them, data read and write can be passed on the DPU 102 The DMA method is directly completed, and the KV logic processing can also be completed through the core on the DPU. Compared with the unilateral RDMA method of reading data from the remote end to the KV client node 200 for processing through multiple RDMA operations, the path for processing the DMA data from the memory 106 to the DPU 102 in this application is shorter, more efficient, and can be significantly shortened Operation delay. Compared with the CPU-based RPC solution, this application is based on DPU102's on-path processing, which completes the KV operation while processing RDMA protocol forwarding. It can not only bypass and release host CPU resources, but also obtain higher throughput and latency. performance.

Referring further to the flow diagram of the data processing method shown in Figure 8, this method mainly describes the query process from the perspective of interaction between the KV client node 200 and the KV server node 100, specifically including the following steps:

Specifically, the KV client node 200 may send a KV query request to the KV server node 100 through an RDMA write operation. The KV query request carries the key content (specifically, the key part in the KV data) to request the query for the value content corresponding to the key content (specifically, the value part in the KV data).

It should be noted that the RDMA read operation usually does not have a payload. Therefore, the KV client node 200 carries the key content and the operation type in the payload of the RDMA write operation, thereby sending the KV query request through the RDMA write operation.

Step 2: The DPU 102 in the KV server node 100 receives the KV operation request and loads the hash bucket according to the KV operation request.

Specifically, the operation type in the KV operation request is query, which is used to request the value content corresponding to the query key content. Based on this, DPU102 can calculate the corresponding hash bucket based on the key content and hash algorithm 1 in the KV query request, complete the translation from the virtual address to the physical address based on the virtual address and verification key in the KV query request, and then use the physical address to complete the translation. The address loads the hash bucket from memory 106.

Step 3: When the hash bucket returns, DPU102 can calculate the fingerprint based on the key content and hash algorithm 2, and search the Slot with the same fingerprint as the calculated fingerprint in the hash bucket to obtain the virtual address and calibration of the KV Block in the Slot. Check Key.

Step 4: DPU102 can complete the translation from the virtual address of the corresponding KV Block to the physical address based on the virtual address and verification key, and read the Header field and Next field.

Step 5: After the Header field and Next field of the KV Block are returned, DPU102 can compare whether the key content in the Header and the key content requested by the KV query are the same. If they are the same, go to step 6; if they are not the same, get the virtual address and verification key of the next KV Block in the linked list from the Next field, and return to step 4.

Step 6: DPU102 can load value content, generate KV operation results, and return KV operation results to KV client node 200.

Specifically, DPU 102 can assemble the RDMA message according to the value content, send the value content back to the KV client node 200 through the RDMA Send operation, and complete the KV query operation.

It should be noted that when the KV client node 100 sends a KV operation request for querying data through RDMA Write, the DPU 102 can return a response message ACK to the KV client node 100 after receiving the KV operation request. Similarly, when the DPU 102 of the KV server node 100 notifies the completion of the operation through RDMA Send, the KV client node 200 can also return a response message ACK to the KV server node 100.

In the above scenario, the KV read operation involves more than two (more times when conflicts occur) read and write operations to the memory 106 and corresponding logical processing operations. Similar to the KV write operation, data reading and writing in this application can be completed directly through DMA on the DPU102, and KV logic processing can also be completed through the core on the DPU102. Compared with the unilateral RDMA implementation solution, this application can shorten the data transmission from: server memory -> client memory to: server memory -> server DPU cache, greatly shortening the data transmission delay. Compared with the CPU-based RPC solution, this application can achieve CPU bypass, save CPU resources, and improve the overall throughput and latency performance of the system through the multi-core and associated path processing capabilities of DPU102.

As another possible implementation, this application also proposes a design that supports transaction batch processing, allowing the KV client node 200 to encapsulate multiple KV operations in a single KV operation request, such as carrying operation metadata of multiple operations, DPU102 generates multiple transactions according to KV operation requests. Each transaction in the multiple transactions is used to perform one KV operation in multiple KV operations. Correspondingly, DPU102 can execute multiple transactions in parallel through multiple cores, thereby improving the overall efficiency.

Referring to the flow diagram of the data processing method shown in Figure 9, this method mainly describes the batch operation process from the perspective of interaction between the KV client node 200 and the KV server node 100, and specifically includes the following steps:

Step 1: After the KV operation request is received by DPU102, a thread of DPU102 is responsible for processing.

In Figure 9, KV operation requests can be handled by Thread 0. Specifically, Thread 0 obtains the operation type and number of operations of the current Batch operation by parsing the message header.

Step 2: DPU102 parses each operation domain segment in turn through the distribution and status management logic on Thread 0, and initializes the status information of each operation in the status table. Then it generates a new operation transaction and puts it in the transaction queue, and generates an operation notification. Enqueued into the notification queue.

Among them, the notification queue can be implemented in the form of a doorbell queue in the figure.

Step 3: The scheduler on the DPU102 can schedule the generated operation notifications to different threads, and the different threads can perform corresponding KV operations respectively.

In Figure 9, the scheduler on DPU102 schedules the doorbell to Thread 1 to Thread N. Correspondingly, Thread 1 to Thread N read the information needed to complete the operation from the transaction queue by parsing the doorbell, and then perform the corresponding operation. For example, KV insertion operation.

Step 4: After the operation is completed, each thread writes back the corresponding status table.

During the process of each thread writing back the status table, the queue or order-preserving resources inside the DPU102 ensure the atomicity of the status update. The last completed thread generates a Doorbell that replies to the request and merges it into the Doorbell queue.

Step 5: The scheduler on the DPU 102 schedules the Doorbell that responds to the request to a certain thread for execution. The thread collects status information. When the status information representation operation is completed, a completion message is generated and sent back to the KV client node 200.

In this scenario, the requesting end can encapsulate multiple KV operations into one message, use DPU102 to distribute multiple transactions in the message to different processing cores for parallel processing, and implement batch processing operations through process combination and orchestration, thus Improve overall transaction processing efficiency.

It is worth noting that those skilled in the art can think of other reasonable step combinations based on the above description, which also fall within the protection scope of this application. Secondly, those skilled in the art should also be familiar with the fact that the embodiments described in the specification are all optional embodiments, and the actions involved are not necessarily necessary for this application.

The data processing method provided by the embodiment of the present application is introduced above with reference to Figures 1 to 9. Next, the functions of the data processing device provided by the embodiment of the present application and the computing device that implements the data processing device are introduced with reference to the accompanying drawings.

Referring to Figure 10, a schematic structural diagram of a data processing device is shown. The data processing device 1000 can be a software device or a hardware device in an accelerator. The device 1000 includes:

Obtaining unit 1002, used to obtain KV operation requests;

Determining unit 1004, configured to determine an execution mode according to the KV operation request. The execution mode includes an offloading mode and a non-offloading mode. The offloading mode is used to instruct the accelerator to perform the operation requested by the KV operation. The non-offloading mode is used to instruct the processor to perform the operation requested by the KV operation;

The execution unit 1006 is configured to execute the processing of the KV operation request according to the execution mode.

It should be understood that the device 1000 in the embodiment of the present invention can be implemented by a central processing unit (CPU), an application-specific integrated circuit (ASIC), or a programmable logic device. (programmable logic device, PLD) implementation, the above PLD can be a complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL ), data processing unit (DPU), system on chip (SoC), or any combination thereof. When the data processing methods shown in Figures 2 to 9 can also be implemented through software, the device 1000 and its respective modules can also be software modules.

In some possible implementations, the determining unit 1004 is specifically used to:

Obtain the operation metadata in the KV operation request;

When the operation metadata satisfies the preset condition, the execution mode is determined to be the uninstallation mode; otherwise, the execution mode is determined to be the non-offloading mode.

In some possible implementations, the operation metadata includes operation type and key length, and the operation metadata satisfies preset conditions, including: the operation type is add, delete, query, modify, or batch add or batch delete. , batch query or batch modification, and the key length is less than the preset length.

In some possible implementations, when the execution mode determined according to the KV operation request is the offloading mode, the execution unit 1006 is specifically configured to:

According to the KV operation request, a target KV operation is performed on the memory of the computing device.

In some possible implementations, the memory uses KV blocks to store KV data, and the KV blocks include key content fields and value content fields;

The execution unit 1006 is specifically used for:

According to the KV operation request, a target KV block is written to or queried from the memory of the computing device.

In some possible implementations, the memory uses a hash table to store the metadata of the KV data. The hash table includes multiple hash buckets, and each hash bucket in the multiple hash buckets Including multiple hash slots, multiple hash slots belonging to the same hash bucket are used to store the fingerprints, key lengths and corresponding block addresses of multiple keys with the same hash value.

In some possible implementations, the KV operation request is a KV modification request, and the execution unit 1006 is also used to:

Determine the hash value and fingerprint based on the key content of the target KV block;

Determine the hash bucket corresponding to the target KV block according to the hash value;

Update the hash slot in the hash bucket whose fingerprint matches the fingerprint determined by the key content in the target KV block.

In some possible implementations, the KV block also includes a lower-level KV block identification field and a lower-level KV block address field, the KV operation request is a KV increase request, and the execution unit 1006 is also used to:

When the fingerprint stored in the target hash slot in the hash bucket matches the fingerprint determined by the key content in the target KV block, read the lower-level KV block identification field and lower-level KV block address field in the target hash slot, Determine the KV block at the end of the linked list based on the field values of the lower-level KV block identification field and the lower-level KV block address field, and write the block address of the target KV block into the lower-level KV block address in the KV block at the end of the linked list. field, marking the field value of the lower-level KV block identification field in the KV block at the end of the linked list as valid.

In some possible implementations, the KV operation request is a KV increase request, and the execution unit 1006 is also used to:

Write the block address of the target KV block and the fingerprint determined by the key content in the target KV block into the empty hash slot in the hash bucket.

In some possible implementations, the KV operation request is a KV query request, and the KV operation request includes the block address and key content of the target KV block to be queried, and the execution unit 1006 is specifically used to:

Determine a hash value according to the key content of the target KV block, and determine the corresponding hash bucket according to the hash value;

Perform address translation according to the block address of the target KV block to obtain a physical address, and read the hash bucket according to the physical address;

Determine the fingerprint according to the key content of the target KV block, query the hash bucket according to the fingerprint determined by the key content of the target KV block, and obtain the value content corresponding to the key content.

Since the data processing device 1000 shown in Figure 10 corresponds to the methods shown in Figures 2, 4, 7, 8, and 9, the specific implementation of the data processing device 1000 shown in Figure 10 and its For technical effects, please refer to the relevant descriptions in the foregoing embodiments and will not be described again here.

Figure 11 is a hardware structure diagram of a computing device 1100 provided by this application. The computing device 1100 can be the aforementioned KV server node 100, used to implement the functions of the data processing device 1000 in the embodiment shown in Figure 10.

As shown in FIG. 11 , the computing device 1100 includes a processor 1101 , an accelerator 1102 and a memory 1103 . Among them, the processor 1101, the accelerator 1102 and the memory 1103 can communicate through the bus 1104, or can also communicate through other means such as wireless transmission. The computing device 1100 also includes a communication interface 1105, which is used for communicating with external devices, such as the KV client node 200 and other devices. In some possible implementations, computing device 1100 may also include memory 1106 .

The processor 1101 may be a central processing unit (CPU), and the accelerator 1102 may be a data processor (DPU) or an infrastructure processor (IPU). Among them, the accelerator 1102 is used to offload the workload of the processor 1101 to realize the acceleration function. It should be noted that the accelerator 1102 can deploy an operating system (ie, a small system) independently to provide KV services. At this time, the computing device 1100 includes two operating systems, namely, a general operating system running in the processor 1101, and a general operating system running in the accelerator 1102. small system running. In addition, the accelerator 1102 may also exist as an external device to the processor 1101, and the processor 1101 and the accelerator 1102 may constitute a heterogeneous system.

Memory 1103 refers to the internal memory that directly exchanges data with the processor 1101 or the accelerator 1102. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs. Memory can include at least two types of memory. For example, memory can be either Random Access Memory (RAM) or Read Only Memory (ROM). For example, random access memory is dynamic random access memory (Dynamic RAM, DRAM), static random access memory (Static RAM, SRAM), or storage class memory (Storage Class Memory, SCM), etc. The read-only memory can be a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable Programmable ROM, EPROM), etc. In addition, the memory 1103 can also be a dual in-line memory module or a dual in-line memory module (Dual In-line Memory Module, DIMM), that is, a module composed of DRAM, or a solid state drive (Solid State Disk, SSD). In addition, the memory 1103 can be configured to have a power-saving function. The power-guaranteing function means that the data stored in the memory 1103 will not be lost when the system is powered off and then on again. Memory with a power-saving function is called non-volatile memory.

In addition to the data bus, the bus 1104 may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, the various buses are labeled bus 1104 in the figure.

The communication interface 1105 is used to communicate with external devices such as the KV client node 200. Specifically, the communication interface 1105 may be a network card, RDMA network card as mentioned above. The communication interface 1105 may be used to receive a KV operation request sent by the KV client node 200 through RDMA Write.

Memory 1106, also known as external memory or external storage, is typically used to persistently store data or instructions. The memory 1106 may include a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive. When the computing device 1100 also includes a memory 1106, the memory 1103 is also used to temporarily store data exchanged with the memory 1106.

The memory 1103 is used to store instructions, which can be instructions solidified in the memory 1103, or instructions exchanged from the memory 1106. The accelerator 1102 is used to execute the instructions stored in the memory 1103 to perform the following operations:

Get KV operation request;

The execution mode is determined according to the KV operation request. The execution mode includes an offloading mode and a non-offloading mode. The offloading mode is used to instruct the accelerator to perform the operation requested by the KV operation. The non-offloading mode is used to indicate The processor performs the operation requested by the KV operation;

The KV operation request is processed according to the execution mode.

Optionally, the accelerator 1102 is also used to execute instructions stored in the memory 1103 to execute other steps of the data processing method in the embodiment of the present application.

It should be understood that the computing device 1100 according to the embodiment of the present application may correspond to the data processing device 1000 in the embodiment of the present application, and may correspond to the KV server node 100 executing the method shown in Figure 2 according to the embodiment of the present application, And the above and other operations and/or functions implemented by the computing device 1100 are respectively to implement the corresponding processes of the method in Figure 2. For the sake of simplicity, they will not be described again here.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc. The computer-readable storage medium includes instructions that instruct the computing device to execute the above-described application to the data processing apparatus 1000 for performing the data processing method.

An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the above data processing method.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to depart from the protection scope of the technical solutions of the various embodiments of the present invention.

Claims

A data processing method, characterized in that the method is applied to a computing device that supports key-value KV services, the computing device includes an accelerator and a processor, the method is executed by the accelerator, and includes:

Get KV operation request;

The execution mode is determined according to the KV operation request. The execution mode includes an offloading mode and a non-offloading mode. The offloading mode is used to instruct the accelerator to perform the operation requested by the KV operation. The non-offloading mode is used to indicate The processor performs the operation requested by the KV operation;

The KV operation request is processed according to the execution mode.
The method according to claim 1, characterized in that determining the execution mode according to the KV operation request includes:

Obtain the operation metadata in the KV operation request;

When the operation metadata satisfies the preset condition, the execution mode is determined to be the uninstallation mode; otherwise, the execution mode is determined to be the non-offloading mode.
The method according to claim 2, characterized in that the operation metadata includes operation type and key length, and the operation metadata satisfies preset conditions, including: the operation type is add, delete, query, modify or One or more of batch addition, batch deletion, batch query or batch modification, and the key length is less than the preset length.
The method according to any one of claims 1 to 3, characterized in that when the execution mode determined according to the KV operation request is the offloading mode, the KV is executed according to the execution mode. Processing of operation requests, including:

According to the KV operation request, a target KV operation is performed on the memory of the computing device.
The method according to claim 4, characterized in that the memory uses KV blocks to store KV data, and the KV blocks include key content fields and value content fields;

Performing a target KV operation on the memory of the computing device according to the KV operation request includes:

According to the KV operation request, a target KV block is written to or queried from the memory of the computing device.
The method according to claim 5, characterized in that the memory uses a hash table to store the metadata of the KV data, the hash table includes a plurality of hash buckets, and the Each hash bucket includes multiple hash slots. Multiple hash slots belonging to the same hash bucket are used to store the fingerprints, key lengths and corresponding block addresses of multiple keys with the same hash value.
The method according to claim 6, characterized in that the KV operation request is a KV modification request, and the method further includes:

Determine the hash value and fingerprint based on the key content of the target KV block;

Determine the hash bucket corresponding to the target KV block according to the hash value;

Update the hash slot whose fingerprint in the hash bucket matches the fingerprint determined by the key content in the target KV block.
The method according to claim 6, characterized in that the KV operation request encapsulates multiple KV operations;

Performing a target KV operation on the memory of the computing device according to the KV operation request includes:

Generate multiple transactions according to the KV operation request, each of the multiple transactions being used to perform one KV operation among the multiple KV operations;

The multiple transactions are executed in parallel through multiple cores.
An accelerator, characterized in that the accelerator includes a processing module and a communication interface, the communication interface is used to provide network communication for the processing module, and the accelerator is used to execute the method according to any one of claims 1 to 8. method described.
A computing device including an accelerator, characterized in that the computing device includes the accelerator and a processor, and the accelerator is used to perform the method according to any one of claims 1 to 8.