CN111459418A - RDMA (remote direct memory Access) -based key value storage system transmission method - Google Patents

RDMA (remote direct memory Access) -based key value storage system transmission method Download PDF

Info

Publication number
CN111459418A
CN111459418A CN202010413800.1A CN202010413800A CN111459418A CN 111459418 A CN111459418 A CN 111459418A CN 202010413800 A CN202010413800 A CN 202010413800A CN 111459418 A CN111459418 A CN 111459418A
Authority
CN
China
Prior art keywords
key
rdma
data
client
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010413800.1A
Other languages
Chinese (zh)
Other versions
CN111459418B (en
Inventor
蒋源
施凌鹏
唐斌
叶保留
陆桑璐
卢士达
张露维
胡钧毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
State Grid Shanghai Electric Power Co Ltd
Original Assignee
Nanjing University
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University, State Grid Shanghai Electric Power Co Ltd filed Critical Nanjing University
Priority to CN202010413800.1A priority Critical patent/CN111459418B/en
Publication of CN111459418A publication Critical patent/CN111459418A/en
Application granted granted Critical
Publication of CN111459418B publication Critical patent/CN111459418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Abstract

The invention discloses a RDMA-based key value storage system transmission method. According to the method, a memory key value storage system is combined with high-performance computing hardware RDMA, a fastest unilateral semantic RDMA write operation is adopted, GET and PUT instructions in the key value storage system are redesigned, the instructions and data are packaged through the write command, only one round-trip transmission is needed, data access is completed, multiple round-trip transmission time delay caused by original read operation is avoided, a client kernel is changed to be bypassed in a return stage, and the upper-layer application experience of a client user is improved. Meanwhile, aiming at the problem that read-write competition cannot be uniformly scheduled by a remote CPU (central processing unit) caused by the fact that a remote core is bypassed by unilateral operation in the prior art, a priority queue is introduced into an RDMA (remote direct memory access) memory of a server end, a multi-client parallel instruction is converted into serial processing with priority, and the problem that the read-write competition cannot be processed by the server end due to the fact that the single-side write command is adopted to bypass the core is solved.

Description

RDMA (remote direct memory Access) -based key value storage system transmission method
Technical Field
The invention belongs to the technical field of computer storage, and particularly relates to a transmission method of a RDMA-based key value storage system.
Background
With the maturity of cloud computing and big data processing technologies, the amount of data generated by internet applications is gradually exponentially increased. Meanwhile, with the rise of pictures and short videos, the data have the characteristics of various formats, different sizes, no structuralization and the like; in order to perform query analysis and permanent storage on the growing mass data, a higher-performance storage technology is required. And the traditional relational database has low concurrent processing capability, poor expansibility and fixed storage structure, and is difficult to be suitable for the storage requirements required by a novel unstructured data mode with dispersed formats.
For this reason, the key-value storage system of non-relational (NoSQ L) storage is widely used as a mainstream storage and analysis tool in the industry, wherein the memory key-value storage system is widely used for accelerating the workload of various data processing due to its advantages of fast access speed, strong expandability, etc., and includes on-line analysis workload and off-line data intensive workload.
However, in the face of the ever-increasing volume of data and the high computational load associated with processing large-scale data, conventional TCP/IP network protocols and hardware devices have slowly not kept pace with high-performance cores and high transmission lines (100 Gbs). Network IO performance and computational resource strain begin to become bottlenecks in key-value storage systems.
Thus, efficient network hardware and more advanced transport protocols are introduced into conventional key-value storage systems. As the price of high-performance computing hardware decreases, data centers gradually begin to use, for example, rdma (remote Direct memory access) technology to improve the transmission and computing performance of the memory key-value storage system. RDMA operations allow a machine to read (or write) from a pre-registered memory region of another machine without involving the CPU on the remote side. RDMA achieves minimal round-trip delay (in microseconds), highest throughput, and lowest CPU overhead compared to traditional messaging. By combining RDMA with a key value storage system, the online processing speed can be greatly improved, and the data intensive workload is reduced. Meanwhile, the RDMA starts to support RoCE (RDMA over converted Ethernet) protocol, which is an extension technology allowing RDMA hardware to run at the bottom layer of an Ethernet link, so that the RDMA high-performance hardware can be compatible with the traditional Ethernet, and the RDMA high-performance hardware is introduced into a traditional key value storage system and has good adaptability.
There are several problems to be solved when using RDMA-based key-value storage systems for data transfer. Through tests, data transmission between nodes needs 1-3 microseconds, and the searching of a memory only needs 60-120 nanoseconds, and the time delay of the node occupies a main part, which shows that whether the transmission efficiency is high or low directly influences the overall read-write performance of the key value storage system. However, in recent research work on RDMA-based key-value storage systems, remote memory access is mostly performed in RDMA read mode, such as the transfer mode used in Pilaf and FaRM systems. The RDMA read operation bypasses the kernel of the remote server, but also causes the remote to be unable to perform complex addressing, and the data transmission between the client and the server needs to be completed by multiple round trips. The time delay caused by the design of multiple round trip transmission is obviously longer than that of the design of single round trip transmission, and the user experience is obviously reduced. Therefore, the problem of multiple round-trip transmissions brought by the RDMA read mode will greatly reduce the overall performance of the key-value storage system.
In addition, while the RDMA read-based operation makes multiple round trips, although the kernel of the remote server is bypassed (which is also one of the reasons that the remote cannot perform complicated addressing), multiple transmissions can cause interruption and thread switching to the CPU of the client, and more than one client is often applied to use the CPU, so the experience of the user at the application layer level is greatly reduced. Meanwhile, the server side exists for providing services, the CPU of the server cannot have excessive application switching, and the server side looks inverted after an inner core of the server side is excessively pursued to be a perfect bypass.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to provide a RDMA-based key value storage system transmission method, which can effectively reduce the round-trip communication delay of a memory key value storage system, improve the throughput, and improve the upper-layer user experience of a client by bypassing the client kernel by using RDMA unilateral semantics.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a method for RDMA-based key-value storage system transmission includes the following steps:
the client side and the server side are connected with each other, the server side registers an RDMA memory for creating a command queue, and the client side registers the RDMA memory for receiving a return data block and mutually transmitting a memory address and an access key;
after the connection is successfully established, the client sends a GET/PUT instruction to the server in a unilateral write semantic form;
the server side receives parallel processing requests of multiple clients and stores the requests in a command queue, analyzes and responds data in the command queue according to RDMA unilateral write semantics, and sends value data to a memory of the client side in a mode of bypassing a kernel of the client side for a GET instruction; for PUT instructions, the value store is added or updated locally.
In a preferred embodiment, sending the GET/PUT command in unilateral write semantic form by the client is implemented by calling an RDMA write function, where the RDMA write function parameters include:
r _ address, which is the virtual memory mapping from the server,
r _ key, which is an access key transmitted from the server,
and the data is the relevant information of the request, and contains the information correspondingly required by the operation type on the basis of distinguishing the operation type.
As a preferred implementation, for a GET request, the data includes:
a command for distinguishing request types;
the key is a key of a target object of the request in a key value storage system and is used for searching a value address space in the index at the opposite end;
l _ address, which is an address space used for storing return data in the memory of the client; and
l _ key, access key for client.
As a preferred embodiment, for the GET request, the parsing and responding the data in the command queue according to RDMA single-side write semantics includes:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
according to command, determining that the command is GET, and creating a response function RDMA-write (l _ address, l _ key, r _ data);
accessing the hash table according to the key to obtain the address mapping of the storage block where the corresponding value is located, and taking the value out of the storage block according to the mapping address and packaging the value into r _ data of the response function;
directly filling the analyzed l _ address and l _ key into the l _ address and l _ key of the response function;
and after the key is successfully matched with the l _ key of the client, sending the data to a client memory specified by the l _ address in a form of bypassing a client kernel, and receiving a GET result by the client.
As a preferred implementation, for the PUT request, the data includes:
a command for distinguishing request types;
key, which is the key of the data block required to be written in the request in the key value storage system;
value, which is the value of the data block in the key value storage system that needs to be written for the request.
In a preferred embodiment, for a PUT request, the parsing and responding data in a command queue according to RDMA single-side write semantics includes:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
determining that the command is PUT according to command, starting an index access thread to execute write-in operation, and newly building a key value pair of < new _ key, new _ value > in a hash table;
writing the key into a new _ key of the newly-built key value pair according to the analyzed key;
and according to the analyzed value, newly building a section of data storage block in the memory area, copying the value into the newly built storage block, and writing the access address of the storage block into the key value pair new _ value.
As a preferred embodiment, when sending a request, the client divides priority levels according to task urgency, and sends a priority flag bit and a data block to the server, and the server commands the queue to receive the flag bit and the data block, then serially takes out and sequentially processes the flag bit and the data block according to priority.
In a second aspect, a data processing apparatus, the apparatus comprising:
one or more processors;
a memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, which when executed by the processors implement the RDMA-based key-value storage system data transfer method of the first aspect of the invention.
In a third aspect, a computer readable storage medium stores computer instructions which, when executed by a processor, implement the RDMA-based key-value storage system transfer method of the first aspect of the invention.
Compared with the traditional TCP/IP communication protocol and other RDMA semantic designs, the design method only needs one round-trip transmission, automatically processes the command queue, completes data access and simultaneously releases the CPU overhead of the client. The method can be applied to a scene that an internal memory key value storage system is used as a database engine under the RDMA hardware environment.
Drawings
FIG. 1 is a schematic diagram of a method of transmission of an RDMA-based key-value storage system, according to an embodiment of the invention;
FIG. 2 is a schematic diagram illustrating a client and a server establishing a connection with each other according to an embodiment of the present invention;
FIG. 3 is a diagram of a command queue and a polling thread according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a request phase of a GET instruction according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the response and return phases of a GET instruction, according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a PUT instruction client sending phase according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a phase of a server-side PUT instruction response according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
Fig. 1 is a general schematic diagram illustrating a transmission method of a RDMA-based key-value storage system according to an embodiment of the present invention, which redesigns GET and PUT instructions of the key-value storage system using higher-performance RDMA write semantics, thereby avoiding multiple round trips, reducing transmission delay, and improving throughput. And simultaneously, the server side analyzes the GET command, after the operation required by the command is obtained, the server returns the data to the client side by adopting RDMA write, and the client side kernel is changed into a bypass client side kernel, so that the CPU overhead is released for the user. The following describes how to redesign the GET command and the PUT command by fully utilizing the high-performance semantics of RDMA to improve the transmission performance of the key value storage system.
Fig. 2 is a schematic diagram illustrating a connection between a client and a server according to an embodiment of the present invention. Firstly, a server side starts an RDMA memory registration thread to create a Command Queue (Command Queue) for receiving GET instruction cache sent by a plurality of clients by using RDMA write, and sends a server memory address r _ address and an access key r _ key of the segment to each client in advance to establish connection. And simultaneously, each client starts a RDMA memory registration thread of the client, and a memory block is created for receiving a Data block (Receive Data) returned after the GET instruction is completed. And the memory address l _ address and the access key l _ key of the client are sent to the server in advance, and connection is established in advance, namely a connection-oriented data transmission protocol. After the connection is established, the remote kernel can be bypassed by the address and the key to access the memory block.
After the connection is established, the remote memory is virtually abstracted to the address space of the local network card, for upper-layer application, the access to the remote memory storage is equivalent to the command and operation of accessing the local memory, and the implementation details are completed by the RDMA network card protocol and hardware together.
Due to the adoption of the RDMA write unilateral semantics with high transmission performance, a remote kernel is not informed to complete the access to a remote memory in the data transmission process, so that higher transmission efficiency is brought. However, the kernel is not notified, so that no matter the above is a PUT instruction based on the unilateral semantic GET instruction or the unilateral semantic, the read-write competition problem which occurs when a plurality of clients concurrently access the server data storage area cannot be immediately coordinated and solved by the server kernel which is already bypassed. Therefore, the design scheme shown in FIG. 3 is proposed.
FIG. 3 is a diagram illustrating a command queue and a polling thread according to an embodiment of the invention. Firstly, RDMA supports that a section of memory is pre-opened in the memory for caching data directly sent by a client on one side, and the section of memory is defined as a message queue in the memory of a server for receiving the data of a parallel client. The client divides priority levels according to the urgency of tasks, writes a priority flag bit into a work queue and sends the priority flag bit to the server together with the data. And after receiving the zone bits and the data blocks, the server serially takes out the zone bits and the data blocks and sequentially processes the zone bits and the data blocks according to the priority. Because the RDMA write semantic of the client bypasses the kernel of the server, the serial taking-out step cannot be automatically executed by the kernel of the server and needs to be assisted by a new polling thread, and meanwhile, according to the design of a GET instruction and a PUT instruction of a key value storage system, a small amount of CPU (Central processing Unit) processes are required for accessing an index structure. Therefore, in the process of accessing the hash table, the invention additionally creates a new polling auxiliary thread p as the first step starting of the whole process space, and the thread p mainly plays a role of polling and searching the server RDMA for caching whether a memory area of the receiving queue has a new client request to work or not, so as to meet the requirements of periodically polling and processing the client requests in sequence according to the priority. In addition, the RDMA write unilateral semantic and message queue processing scheme is compatible and adaptive with each other and is used for processing the problem of distributed read-write competition of multiple clients during kernel-free reception. This queue is named "command queue" and the thread becomes a "polling thread". And entering a direct communication stage, namely specifically designing the GET instruction and the PUT instruction in the memory key value storage system.
FIG. 4 is a diagram illustrating a request phase of a GET instruction according to an embodiment of the present invention. After the connection is already established, the client initiates the connection actively, enters a request phase of a GET instruction, the RNIC network card of the client starts RDMA write communication semantics and calls a write (r _ address, r _ key, data) request function, the first parameter r _ address of the function is the mapping of the virtual memory transmitted from the far end when the connection is established, and the remote memory can be accessed directly through the r _ address parameter. The second parameter r _ key is a contract key set in consideration of the security of the bypass kernel, and when the key is matched and verified with the remote server, the remote memory can be read and written without notifying the remote kernel. The third parameter data is stored in a Command Queue (Command Queue) of the server, which has been registered in advance by the server RNIC, for receiving the request Command sent by the client to be stored in the data. The parameter data mainly comprises four parts which are respectively:
1) command: and the specific GET instruction content indicates the access property of the operation.
2) key: the key in the key-value storage system is used for searching the value address space in the index.
3) l _ key: and after obtaining the value, starting a thread for returning data to the client, avoiding the client key l _ key of the memory of the client from secret access, and bypassing the client kernel by key matching without interrupting the current thread of the application layer of the client.
4) client receiving address: in order to receive the address space l _ address of the client side for returning the value data, the address is abstracted to the memory mapping of the server side in the data returning stage, and the returning process of the data can be directly finished by executing the single-side write semantic meaning without informing the CPU of the client side.
The GET request of the client side is uniformly received in the command queue, and then the server polls the data in the queueiAnd (5) carrying out further treatment. Server-assisted processing is required because RDMA cannot support pointer tracking and index queries alone. Therefore, the server creates a Command Queue (Command Queue) and polls the data received in the QueueiAnd (6) parameter analysis.
FIG. 5 is a diagram illustrating the response and return phases of a GET instruction according to an embodiment of the present invention. As shown in the figure, the server-side kernel intervenes in the operation, and takes out and analyzes the data from the receiving queue according to the priorityiRequest work in (1), dataiThe first parameter command analysis instruction is GET or PUT, if the command is GET, ③ creates a response function RDMA-write (l _ address, l _ key, r _ data) for the return of the next value, ④ carries out the addressing of the index hash table by the second parameter key, the key value pair of the hash table is stored to obtain the value address mapping corresponding to the key, ⑤ takes out the value from the storage block according to the address mapping and packs the value into the parameter r _ data of the previously created response function, ⑥ dataiThe third and fourth parameters are directly written into the l _ key and l _ address of the response function RDMA-write. As described above, the l _ address parameter is used as the address of the memory address space used by the client to receive data, and the l _ key parameter is used as the bypass client kernelWhen the RDMA-write response function is connected with the client, l _ key matching is carried out, after matching is successful, ⑦ stores value data in the data which are directly transmitted to the client memory which is specified by the l _ address, and the GET process is completed.
The above process is a description of the request phase and the response and return phase of the entire GET instruction of the present invention based on high performance RDMA write semantics. The request phase shown in fig. 4 is incorporated into the request function initiated in the first step of fig. 5, so that all GET instruction completion steps can be visualized. Compared with other work related to a key value storage system based on RDMA, the RDMA write semantic with the lowest communication delay is introduced to be used as the whole-process communication basic semantic, and the request phase and the return phase are optimized, so that the transmission round trip is reduced from multiple times to only one round trip transmission to complete the whole GET instruction. The multiple transmission can bring interruption and thread switching to the CPU of the client, and the client often uses more than one CPU, so that the experience of the user on the application layer level can be greatly reduced. Meanwhile, the design of the invention changes the bypass of the client kernel, so that the client kernel with more software applications is liberated, and the most practical experience of the user in front of the client is improved. Meanwhile, the server side exists for providing services, the CPU of the server is used for storing the work instruction load of the system completely, and the phenomenon that the traditional RDMA read semantics excessively pursue the perfect bypass of the kernel of the server side and is turned over at the end of effectiveness is avoided.
Fig. 6 is a schematic diagram illustrating a sending stage of a PUT command client according to an embodiment of the present invention. When a client needs to write new data into a value storage block of a server-side key value storage system or update original old data, a PUT instruction needs to be used. The flow of the PUT instruction is relatively much simpler than if the GET instruction required three phases of request, response, and return. As in the beginning step of the GET command, RDMA between the client and the server still needs to establish a connection-oriented communication mode in advance. Is composed ofThe complexity of the PUT request function is reduced, memory resources are saved, and the PUT instruction and the GET instruction share an index hash table, a value storage block and a receiving Queue (Command Queue) of work requests in the server. Because the instructions share the buffer area, the receiving queue can not be changed because the request work is used as GET or PUT operation, so the work request in the queue still uses dataiNaming, unlike GET operations, work request data under PUT operationsiThere are only three parameters, command, key and value, respectively. Polling the data received in the queueiParameters and resolves the work request. For differentiation, in dataiThe first byte position of the command indicates that the command is a GET operation/PUT operation. The above description of the GET instruction has already elaborated the memory registration part of the command queue, and this part of the process will not be described herein again. After the memory registration is completed, the client knows the access address r _ address and the remote memory access key r _ key of the remote server, can directly access the memory of the server by the parameters r _ address and r _ key through RDMA write semantics and write data into a server command queue which is opened in advance, and generates a work request data in the queuei. The PUT operation can be designed based on the principle to supplement the PUT operation. Because of the unilateral operation, the client ends the thread of the client after the request function RDMA-write (r _ address, r _ key, data) is sent.
Fig. 7 is a schematic diagram illustrating a phase of a server-side PUT command response according to an embodiment of the present invention. The server end receives the work request dataiAnd then, starting a polling thread of the kernel of the server, and sequentially taking out and analyzing the data of the work request by the thread according to the priority of the work requesti. The work request is determined to be a GET command or a PUT command, where the command is designated as a PUT command, by the parsed first argument command. Starting an index access thread to execute a write operation, and newly building in a hash table<new_key,new_value>A key-value pair. Parsing work request dataiAnd obtaining a primary key according to the second parameter, and writing the primary key into a new _ key of the newly-built key value pair. And parses the work request dataiThe third parameter obtains the value of the written data block, which is in serviceAnd newly building a section of data storage block in the server storage area, copying the value parameter into the newly built storage block, writing the access address of the storage block into the key value pair new _ value, and completing the whole updating and writing of the key value pair index structure and the storage block.
The client successfully adds (or updates) a new key-value pair in the key value storage system of the server. Because the client closes the related thread after completing sending, the subsequent server does not have message notification to the client for the newly added index structure and the expansion of the storage area, so that the client still approximates the kernel bypass as a GET instruction, the CPU resource occupation is greatly reduced, and the computing resource is vacated to provide better upper-layer experience for the client with more application switching. Meanwhile, the messages do not come and go, the transmission delay is minimized, the transmission efficiency of the whole working load is improved, and the memory key value storage system is matched with the GET operation designed in the foregoing to realize high-performance transmission.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, the interaction between the control node and the edge computing node, the feedback information content collection and the online scheduling method in the present invention are applicable to all systems, and it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (9)

1. An RDMA-based key-value storage system transfer method, the method comprising the steps of:
the client side and the server side are connected with each other, the server side registers an RDMA memory for creating a command queue, and the client side registers the RDMA memory for receiving a return data block and mutually transmitting a memory address and an access key;
after the connection is successfully established, the client sends a GET/PUT instruction to the server in a unilateral write semantic form;
the server side receives parallel processing requests of multiple clients and stores the requests in a command queue, analyzes and responds data in the command queue according to RDMA unilateral write semantics, and sends value data to a memory of the client side in a mode of bypassing a kernel of the client side for a GET instruction; for PUT instructions, the value store is added or updated locally.
2. The RDMA-based key-value storage system transfer method of claim 1, wherein the client sending GET/PUT instructions in unilateral write semantic form is implemented by calling RDMA write functions, the RDMA write function parameters comprising:
r _ address, which is the virtual memory mapping from the server,
r _ key, which is an access key transmitted from the server,
and the data is the relevant information of the request, and contains the information correspondingly required by the operation type on the basis of distinguishing the operation type.
3. The RDMA-based key-value storage system transfer method of claim 2, wherein for GET requests, the data comprises:
a command for distinguishing request types;
the key is a key of a target object of the request in a key value storage system and is used for searching a value address space in the index at the opposite end;
l _ address, which is an address space used for storing return data in the memory of the client; and
l _ key, access key for client.
4. The RDMA-based key-value storage system transfer method of claim 3, wherein parsing and responding to data in a command queue according to RDMA single-side write semantics for GET requests comprises:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
according to command, determining that the command is GET, and creating a response function RDMA-write (l _ address, l _ key, r _ data);
accessing the hash table according to the key to obtain the address mapping of the storage block where the corresponding value is located, and taking the value out of the storage block according to the mapping address and packaging the value into r _ data of the response function;
directly filling the analyzed l _ address and l _ key into the l _ address and l _ key of the response function;
and after the key is successfully matched with the l _ key of the client, sending the data to a client memory specified by the l _ address in a form of bypassing a client kernel, and receiving a GET result by the client.
5. The RDMA-based key-value storage system transmission method of claim 2, wherein the data comprises, for a PUT request:
a command for distinguishing request types;
key, which is the key of the data block required to be written in the request in the key value storage system;
value, which is the value of the data block in the key value storage system that needs to be written for the request.
6. The RDMA-based key-value storage system transfer method of claim 5, wherein parsing and responding to data in a command queue according to RDMA single-side write semantics for PUT requests comprises:
the server creates the received data in the thread processing instruction queue and analyzes the parameters in the data;
determining that the command is PUT according to command, starting an index access thread to execute write-in operation, and newly building a key value pair of < new _ key, new _ value > in a hash table;
writing the key into a new _ key of the newly-built key value pair according to the analyzed key;
and according to the analyzed value, newly building a section of data storage block in the memory area, copying the value into the newly built storage block, and writing the access address of the storage block into the key value pair new _ value.
7. The RDMA-based key-value storage system transmission method according to any of claims 1 to 6, wherein the client, when sending a request, divides a priority hierarchy according to task urgency and sends a priority flag bit and a data block to the server, and the server command queue receives the flag bit and the data block, serially fetches the flag bit and the data block, and sequentially processes the flag bit and the data block according to priority.
8. A data processing apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, the programs when executed by the processors implement the RDMA-based key-value storage system transfer method of any of claims 1-7.
9. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the RDMA-based key-value storage system transfer method of any of claims 1-7.
CN202010413800.1A 2020-05-15 2020-05-15 RDMA (remote direct memory Access) -based key value storage system transmission method Active CN111459418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010413800.1A CN111459418B (en) 2020-05-15 2020-05-15 RDMA (remote direct memory Access) -based key value storage system transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010413800.1A CN111459418B (en) 2020-05-15 2020-05-15 RDMA (remote direct memory Access) -based key value storage system transmission method

Publications (2)

Publication Number Publication Date
CN111459418A true CN111459418A (en) 2020-07-28
CN111459418B CN111459418B (en) 2021-07-23

Family

ID=71681974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010413800.1A Active CN111459418B (en) 2020-05-15 2020-05-15 RDMA (remote direct memory Access) -based key value storage system transmission method

Country Status (1)

Country Link
CN (1) CN111459418B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486996A (en) * 2020-12-14 2021-03-12 上海交通大学 Object-oriented memory data storage system
CN112817887A (en) * 2021-02-24 2021-05-18 上海交通大学 Far memory access optimization method and system under separated combined architecture
CN113259439A (en) * 2021-05-18 2021-08-13 中南大学 Key value scheduling method based on receiving end drive
CN113608895A (en) * 2021-08-06 2021-11-05 湖南快乐阳光互动娱乐传媒有限公司 Web backend data access method and system
WO2022242723A1 (en) * 2021-05-21 2022-11-24 华为技术有限公司 Memory data sorting method and related device
CN115861082A (en) * 2023-03-03 2023-03-28 无锡沐创集成电路设计有限公司 Low-delay picture splicing system and method
CN115933973A (en) * 2022-11-25 2023-04-07 中国科学技术大学 Method for remotely updating data, RDMA system and storage medium
CN117215995A (en) * 2023-11-08 2023-12-12 苏州元脑智能科技有限公司 Remote direct memory access method, distributed storage system and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
US20170004109A1 (en) * 2013-12-30 2017-01-05 International Business Machines Corporation Remote direct memory access (rdma) high performance producer-consumer message processing
US20170315957A1 (en) * 2014-10-27 2017-11-02 International Business Machines Corporation Remote direct memory access (rdma) optimized high availability for in-memory data storage
CN107665154A (en) * 2016-07-27 2018-02-06 鄞州浙江清华长三角研究院创新中心 Authentic data analysis method based on RDMA and message transmission
CN107888657A (en) * 2017-10-11 2018-04-06 上海交通大学 Low latency distributed memory system
US20190102568A1 (en) * 2017-09-30 2019-04-04 Intel Corporation Techniques to provide client-side security for storage of data in a network environment
CN110147345A (en) * 2019-05-22 2019-08-20 南京大学 A kind of key assignments storage system and its working method based on RDMA
US10628353B2 (en) * 2014-03-08 2020-04-21 Diamanti, Inc. Enabling use of non-volatile media-express (NVMe) over a network
CN111078607A (en) * 2019-12-24 2020-04-28 上海交通大学 Method and system for deploying RDMA (remote direct memory Access) and non-volatile memory-oriented network access programming frame
CN111125049A (en) * 2019-12-24 2020-05-08 上海交通大学 RDMA (remote direct memory Access) -and-nonvolatile-memory-based distributed file data block reading and writing method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
US20170004109A1 (en) * 2013-12-30 2017-01-05 International Business Machines Corporation Remote direct memory access (rdma) high performance producer-consumer message processing
US10628353B2 (en) * 2014-03-08 2020-04-21 Diamanti, Inc. Enabling use of non-volatile media-express (NVMe) over a network
US20170315957A1 (en) * 2014-10-27 2017-11-02 International Business Machines Corporation Remote direct memory access (rdma) optimized high availability for in-memory data storage
CN107665154A (en) * 2016-07-27 2018-02-06 鄞州浙江清华长三角研究院创新中心 Authentic data analysis method based on RDMA and message transmission
US20190102568A1 (en) * 2017-09-30 2019-04-04 Intel Corporation Techniques to provide client-side security for storage of data in a network environment
CN107888657A (en) * 2017-10-11 2018-04-06 上海交通大学 Low latency distributed memory system
CN110147345A (en) * 2019-05-22 2019-08-20 南京大学 A kind of key assignments storage system and its working method based on RDMA
CN111078607A (en) * 2019-12-24 2020-04-28 上海交通大学 Method and system for deploying RDMA (remote direct memory Access) and non-volatile memory-oriented network access programming frame
CN111125049A (en) * 2019-12-24 2020-05-08 上海交通大学 RDMA (remote direct memory Access) -and-nonvolatile-memory-based distributed file data block reading and writing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LAILIKES: "RDMA", 《HTTPS://BLOG.CSDN.NET/SONGCHUWANG1868/ARTICLE/DETAILS/83178536》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486996A (en) * 2020-12-14 2021-03-12 上海交通大学 Object-oriented memory data storage system
CN112486996B (en) * 2020-12-14 2022-08-05 上海交通大学 Object-oriented memory data storage system
CN112817887A (en) * 2021-02-24 2021-05-18 上海交通大学 Far memory access optimization method and system under separated combined architecture
CN113259439A (en) * 2021-05-18 2021-08-13 中南大学 Key value scheduling method based on receiving end drive
WO2022242723A1 (en) * 2021-05-21 2022-11-24 华为技术有限公司 Memory data sorting method and related device
CN113608895A (en) * 2021-08-06 2021-11-05 湖南快乐阳光互动娱乐传媒有限公司 Web backend data access method and system
CN113608895B (en) * 2021-08-06 2024-04-09 湖南快乐阳光互动娱乐传媒有限公司 Web back-end data access method and system
CN115933973B (en) * 2022-11-25 2023-09-29 中国科学技术大学 Method for remotely updating data, RDMA system and storage medium
CN115933973A (en) * 2022-11-25 2023-04-07 中国科学技术大学 Method for remotely updating data, RDMA system and storage medium
CN115861082B (en) * 2023-03-03 2023-04-28 无锡沐创集成电路设计有限公司 Low-delay picture splicing system and method
CN115861082A (en) * 2023-03-03 2023-03-28 无锡沐创集成电路设计有限公司 Low-delay picture splicing system and method
CN117215995A (en) * 2023-11-08 2023-12-12 苏州元脑智能科技有限公司 Remote direct memory access method, distributed storage system and electronic equipment
CN117215995B (en) * 2023-11-08 2024-02-06 苏州元脑智能科技有限公司 Remote direct memory access method, distributed storage system and electronic equipment

Also Published As

Publication number Publication date
CN111459418B (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN111459418B (en) RDMA (remote direct memory Access) -based key value storage system transmission method
JP4334901B2 (en) Computer processing system and processing method executed by computer
US7415470B2 (en) Capturing and re-creating the state of a queue when migrating a session
JP3696563B2 (en) Computer processor and processing device
US7587400B2 (en) Suspending a result set and continuing from a suspended result set for transparent session migration
US7644108B1 (en) Network acceleration device cache supporting multiple historical versions of content
CN112597251B (en) Database cluster log synchronization method and device, server and storage medium
US11297141B2 (en) Filesystem I/O scheduler
US20200364080A1 (en) Interrupt processing method and apparatus and server
WO2023046141A1 (en) Acceleration framework and acceleration method for database network load performance, and device
CN116049085A (en) Data processing system and method
WO2021121041A1 (en) Data transmission optimization method and device, and readable storage medium
US7743333B2 (en) Suspending a result set and continuing from a suspended result set for scrollable cursors
CN114625762A (en) Metadata acquisition method, network equipment and system
WO2020215833A1 (en) Offline cache method and apparatus, and terminal and readable storage medium
WO2023185454A1 (en) Data access method and related apparatus
CN111209263A (en) Data storage method, device, equipment and storage medium
Yuan et al. Towards efficient deployment of cloud applications through dynamic reverse proxy optimization
CN108075989B (en) Extensible protocol-based load balancing network middleware implementation method
CN115686663A (en) Online file preview method and device and computer equipment
US9489424B2 (en) Cursor pre-fetching
CN112187916A (en) Cross-system data synchronization method and device
CN111881086A (en) Big data storage method, big data query method, electronic device and storage medium
CN110300153A (en) Method, apparatus, proxy server and the storage medium established the link with MySQL
CN115840862B (en) Rapid target query method and system in large-scale scene of network target range

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant