CN111858418B - Memory communication method and device based on remote direct memory access RDMA - Google Patents

Memory communication method and device based on remote direct memory access RDMA Download PDF

Info

Publication number
CN111858418B
CN111858418B CN201910364014.4A CN201910364014A CN111858418B CN 111858418 B CN111858418 B CN 111858418B CN 201910364014 A CN201910364014 A CN 201910364014A CN 111858418 B CN111858418 B CN 111858418B
Authority
CN
China
Prior art keywords
clients
group
data
time slice
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910364014.4A
Other languages
Chinese (zh)
Other versions
CN111858418A (en
Inventor
陆游游
舒继武
陈游旻
陈佩
徐君
林芃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Huawei Technologies Co Ltd
Original Assignee
Tsinghua University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Huawei Technologies Co Ltd filed Critical Tsinghua University
Priority to CN201910364014.4A priority Critical patent/CN111858418B/en
Publication of CN111858418A publication Critical patent/CN111858418A/en
Application granted granted Critical
Publication of CN111858418B publication Critical patent/CN111858418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/544Remote

Abstract

A memory communication method and device based on remote direct memory access RDMA belongs to the communication field. In the method, a server determines a first set of clients from a plurality of clients based on received RDMA messages sent by the plurality of clients. And in the first time slice, the server processes the data of the first group of clients cached in the first cache space. And, during the first time slice, the server determines a second set of clients from the plurality of clients, the network card of the server placing data read from the second set of clients according to the RDMA messages of the second set of clients into a second cache space of the server. And in a second time slice, the server processes the data of the second group of clients in the second cache space. The method and the device can improve the expansibility of the system.

Description

Memory communication method and device based on remote direct memory access RDMA
Technical Field
The present application relates to the field of communications, and in particular, to a memory communication method and apparatus based on remote direct memory access RDMA.
Background
With the development of distributed technologies, the application requirements of remotely accessing the server memory are increasing, and remotely accessing the server memory means that the client can directly access the remote server memory, for example, the client can write data into the remote server memory.
The server comprises a network card, a Last Level Cache (LLC) of the processor, a memory controller and the like, and when the client writes data to the memory of the server, the client sends a Remote Direct Memory Access (RDMA) message to the server. And the network card of the server reads the data which needs to be written into the memory of the server by the client from the client according to the RDMA message, and writes the data into the cache space of the LLC.
In the process of implementing the present application, the inventors found that the prior art has at least the following problems:
typically the size of the cache space used to process client memory access messages does not exceed 10% in LLC, the size of the space is limited. Because the cache space is small, the cache space cannot completely contain data of the memory space having a mapping relation with the cache space, so that a cache miss phenomenon frequently occurs when the server processes data of the client, the processing delay from the reading of the data from the memory to the cache by the server is increased, the speed of the server responding to the client is reduced, the number of the clients accessing the server is influenced by the cache space, and the expansibility of the system is influenced.
Disclosure of Invention
The embodiment of the application provides a memory communication method and device based on remote direct memory access RDMA (remote direct memory access), so as to improve the expansibility of a system. The technical scheme is as follows:
in a first aspect, a Remote Direct Memory Access (RDMA) -based memory communication method is provided, in which a server receives RDMA messages sent by a plurality of clients; the server determines a first group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients; in a first time slice, the server processes the data of the first group of clients cached in a first cache space, wherein the data of the first group of clients is the data read by a network card of the server from the first group of clients according to the RDMA messages of the first group of clients; during the first time slice, the server determines a second group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients; during the first time slice, the server reads data from the second group of clients according to the RDMA messages of the second group of clients and writes the read data into a second cache space of the server, wherein the first cache space and the second cache space are different parts of a Last Level Cache (LLC) of a processor of the server; and in a second time slice, the server suspends the processing of the data of the first group of clients cached in the first cache space and processes the data of the second group of clients in the second cache space, wherein the second time slice is the next time slice of the first time slice.
The method comprises the steps of determining a first group of clients from a plurality of clients before a first time slice, processing data of the first group of clients cached in a first cache space in the first time slice, determining a second group of clients from the plurality of clients, and processing data of the second group of clients cached in a second cache space in the second time slice, so that the plurality of clients share the first cache space and the second cache space in different time slices, and the number of the clients allowed by a server is not limited by the first cache space and the second cache space, so that the server can allow more clients to be distributed in different time slices to communicate with a memory of the server, and the expansibility is improved. The data of the first group of clients are cached in the first cache space before the first time slice, the server can process the data of the first group of clients cached in the first cache space at the beginning of the first time slice, the data of the second group of clients are cached in the second cache space before the second time slice, and the server can process the data of the second group of clients cached in the second cache space at the beginning of the second time slice, so that the data processing efficiency is improved.
In one possible implementation, the server places data read from the first set of clients according to the RDMA messages of the first set of clients into the first cache space during a time slice preceding the first time slice. Therefore, the server can process the data of the first group of clients cached in the first cache space after the first time slice starts, and the data processing efficiency is improved.
In one possible implementation manner, in the second time slice, the server determines a third group of clients from at least one client according to the RDMA message sent by at least one client; and during the second time slice, the server puts data read from the third group of clients according to the RDMA messages of the third group of clients into the first cache space, wherein the third group of clients is different from the second group of clients. Therefore, the server can process the data of the third group of clients cached in the first cache space by starting at the third time slice, and the data processing efficiency is improved.
In a possible implementation manner, the at least one client includes a client sending an RDMA message in the first time slice and the second time slice and/or a part of the plurality of clients, where the part of the clients may be clients that have not read data to be read completely in the first group of clients, or clients that have not processed data by the CPU13 in the first group of clients, or clients other than the first group of clients and the second group of clients in the plurality of clients
In a possible implementation manner, the third group of clients includes a part of clients in the first group of clients, where the part of clients is clients that the server has not read data before the first time slice is finished. This causes the server to continue reading the remaining data that was not read in the first time slice from the portion of the client and processing the remaining data in the third time slice.
In a possible implementation manner, the server further includes a first memory space and a second memory space, where the first memory space and the second memory space are different portions of a memory of the server, so that the server writes the data of the first group of clients cached in the first cache space into a memory message area of each client in the first memory space; and the server writes the data of the second group of clients cached in the second cache space into the memory message area of each client in the second memory space. This enables communication between a first set of clients and the server's memory in a first time slice and communication between a second set of clients and the server's memory in a second time slice.
In a possible implementation manner, the network card of the server reads data from the second group of clients according to metadata in the RDMA message of the second group of clients, respectively, where the metadata includes an address of the data to be read and a size of the data. Therefore, the server reads data from the client.
In a second aspect, a remote direct memory access RDMA-based memory communication apparatus is provided, the apparatus comprising: the system comprises a Central Processing Unit (CPU), a network card and a last-level cache (LLC) of the CPU, wherein the LLC comprises a first cache space and a second cache space; the network card is used for receiving RDMA messages sent by a plurality of clients; the CPU is used for determining a first group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients; processing, within a first time slice, data of the first group of clients buffered in a first buffer space, the data of the first group of clients being data read by the network card from the first group of clients according to RDMA messages of the first group of clients; determining a second group of clients from the plurality of clients according to RDMA messages sent by the plurality of clients within the first time slice; the network card is also used for reading data from the second group of clients according to the RDMA messages of the second group of clients in the first time slice and writing the read data into a second cache space; the CPU is further configured to suspend, within a second time slice, processing of the data of the first group of clients buffered in the first buffer space, and to process the data of the second group of clients in the second buffer space, where the second time slice is a time slice next to the first time slice.
Before the first time slice, the CPU determines a first group of clients from the plurality of clients, processes the data of the first group of clients cached in the first cache space in the first time slice, determines a second group of clients from the plurality of clients, and processes the data of the second group of clients cached in the second cache space in the second time slice, so that the plurality of clients share the first cache space and the second cache space in different time slices, and the number of the clients allowed by the server is not limited by the first cache space and the second cache space, so that more clients can be allowed, and the clients are distributed in different time slices to communicate with the memory of the device, thereby improving the expansibility. The network card caches the data of the first group of clients in a first cache space before a first time slice, the CPU can process the data of the first group of clients cached in the first cache space after the first time slice starts, the network card caches the data of the second group of clients in a second cache space before a second time slice, and the CPU can process the data of the second group of clients cached in the second cache space after the second time slice starts, so that the data processing efficiency is improved.
In a possible implementation manner, the CPU and the network card may be further configured to execute the operations of the method in any one of the possible implementation manners of the first aspect, and therefore, the detailed description is omitted here.
In a third aspect, the present application provides a computer program product, where the computer program product includes a computer program stored in a computer-readable storage medium, and the computer program is loaded through a central processing unit CPU and a network card to implement the method of the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, the present application provides a non-volatile computer-readable storage medium for storing a computer program, where the computer program is loaded through a central processing unit CPU and a network card to implement the method of the first aspect or any possible implementation manner of the first aspect.
Drawings
Fig. 1 is a schematic architecture diagram of a distributed system according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server provided in an embodiment of the present application;
fig. 3 is a schematic mapping diagram between a cache space in an LLC and a memory space in a memory according to an embodiment of the present application;
fig. 4 is a flowchart of a RDMA-based memory communication method according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for reading data according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an RDMA-based memory communication device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings. Referring to fig. 1, an embodiment of the present application provides a distributed system including a server 1 and a plurality of clients 2, where each client 2 communicates with the server 1 based on the RDMA protocol.
For each client 2, when the client 2 needs to write data into the memory of the server 1, a network connection is established between the client 2 and the server 1, and the client 2 sends an RDMA message to the server 1 through the network connection. The network card of the server 1 receives the RDMA message, reads data from the client 1 according to the RDMA message, writes the read data into the LLC of the server 1, and the server 1 writes the data cached by the LLC into the memory of the server 1. The detailed process of the client 2 writing data into the memory of the server 1 can be referred to the contents in any of the following embodiments, which will not be described in detail herein.
Referring to fig. 2, the present application provides a server 1, where the server 1 includes a network card 11, an LLC12, a CPU13, a memory controller 14, and a memory 15, and the network card 11, the LLC12, the CPU13, the memory controller 14, and the memory 15 are connected to each other through a bus.
Network connection is established between the network card 11 and a plurality of clients. A network connection is established between the network card 11 and the network card of each of the plurality of clients, so that each client sends an RDMA message to the server 1, the RDMA message carries metadata of the client, the metadata includes a storage address and a size of data to be read, and the data to be read is data that the client needs to write into the memory 15 of the server 1.
The network card 11 may receive RDMA messages sent by the multiple clients.
The CPU13 may allocate a time slice (also known as a "processor slice") which is a piece of CPU time that the operating system allocates to each running process on a microscopic level. For any time slice allocated by the CPU13, in a time slice before the time slice starts, the CPU13 determines a group of clients from the plurality of clients, the number of the clients included in the group of clients is greater than or equal to 1 and less than or equal to a preset value, the network card writes data read from the group of clients into a buffer space of the LLC12, and the CPU13 processes the data in the buffer space within the time slice.
For example, in a time slice before the first time slice allocated by the CPU13, where the first time slice is any time slice allocated by the CPU13, the CPU13 determines a first group of clients from the plurality of clients, where the first group of clients includes N clients, and N is an integer greater than or equal to 1 and less than or equal to a preset value. Before the first time slice, the network card 11 reads the data of the first group of clients from the first group of clients according to the metadata of the first group of clients, and caches the data of the first group of clients in a first cache space, where the first cache space is a part of the LLC22, referring to fig. 3, the first cache space corresponds to a first memory space in the memory 15, and the first memory space includes a memory message area of each client in the first group of clients.
In the first time slice, the network card 11 may further continue to read data of the first group of clients from the first group of clients according to the metadata of the first group of clients, and cache the data of the first group of clients in the first cache space. The CPU13 processes the data buffered in the first buffer space. The memory controller 14 may store the data in the first cache space in the first memory space. And in the first time slice, the CPU13 determines a second group of clients from the plurality of clients, where the second group of clients includes M clients, M is an integer greater than or equal to 1 and less than or equal to a preset value, and the first group of clients is different from the second group of clients. The network card 11 reads the data of the first group of clients from the second group of clients according to the metadata of the second group of clients, and caches the data of the second group of clients in a second cache space, which is another part in the LLC22, referring to fig. 3, the second cache space corresponds to a second memory space in the memory 15, and the second memory space includes a memory message area of each client in the second group of clients.
In the second time slice allocated by the CPU13, the network card 11 may further continue to read data of the second group of clients from the second group of clients according to the metadata of the second group of clients, and cache the data of the second group of clients in the second cache space, where the second time slice is a next time slice of the first time slice. The CPU13 processes the data buffered in the second buffer space. The memory controller 14 may store the data in the second cache space in the second memory space or the memory controller may store the data cached in the first cache space in the first memory space in case that the client writes new data to the server.
The network card 11 will continue to receive RDMA messages sent by new clients in the first time slice and/or the second time slice. The CPU13 and the network card 11 repeatedly perform the operation performed in the first time slice and repeatedly perform the operation performed in the second time slice. For example, in the second time slice, the CPU13 determines a third group of clients from at least one client, where the third group of clients includes V clients, where V is an integer greater than or equal to 1 and less than or equal to a preset value, and the third group of clients is different from the second group of clients, where the at least one client includes a new client that sends the RDMA message in the first time slice and the second time slice and/or a part of clients in the plurality of clients, and the part of clients may be clients that have not read the data to be read in the first group of clients, or clients that have not processed the data by the CPU13 in the first group of clients, or clients other than the first group of clients and the second group of clients in the plurality of clients. For a client that has not processed data by the CPU13 in the first time slice, the data of the client may still be cached in the first cache space, so that in the third time slice, the CPU13 may process the data processing of the client that is cached in the first cache space.
The network card 11 reads the data of the third group of clients from the third group of clients according to the metadata of the third group of clients, and caches the data of the third group of clients in a first cache space, where the first cache space corresponds to a first memory space in the memory 15, and the first memory space includes a memory message area of each client in the third group of clients.
Optionally, in the first time slice, the memory controller may not store the data cached in the first cache space in the first memory space, but may store the data cached in the first cache space in the first memory space before caching the data of the third group of clients in the first cache space.
In a third time slice allocated by the CPU13, the network card 11 may further continue to read data of the third group of clients from the third group of clients according to the metadata of the third group of clients, and cache the data of the third group of clients in the first cache space, and the CPU13 processes the data of the third group of clients cached in the first cache space, where the third time slice is a next time slice of the second time slice. Referring to fig. 3, the first memory space corresponding to the first cache space in the third time slice includes memory message areas corresponding to each client in the third group of clients, that is, the first memory space includes V memory message areas, and the first memory space is used to store data in the first cache space, that is, the V memory message areas are used to store data of the third group of clients.
In the third time slice, the CPU13 determines a fourth group of clients, and the network card 11 reads data of the fourth group of clients from the fourth group of clients according to the metadata of the fourth group of clients, and caches the data of the fourth group of clients in the second cache space, where the third group of clients is different from the fourth group of clients. Thus, in the fourth time slice, the CPU13 and the network card 11 may repeat the process executed in the second time slice.
In the embodiment of the application, before a first time slice, a CPU determines a first group of clients, processes data of the first group of clients cached in a first cache space in the first time slice, determines a second group of clients, and processes data of the second group of clients cached in a second cache space in a second time slice, so that the plurality of clients share the first cache space and the second cache space in different time slices, and a server allows the number of the clients not to be limited by the first cache space and the second cache space, so that more clients can be allowed to communicate with a memory of the server in different time slices, and the expansibility is improved. Before the first time slice, the network card caches the data of the first group of clients in the first cache space, the CPU can process the data of the first group of clients cached in the first cache space after the first time slice starts, the network card caches the data of the second group of clients in the second cache space before the second time slice starts, and the CPU can process the data of the second group of clients cached in the second cache space after the second time slice starts, so that the data processing efficiency is improved.
Referring to fig. 4, an embodiment of the present application provides an RDMA-based memory communication method, which may be applied to the network architecture shown in fig. 1 and the server shown in fig. 2, and includes:
step 101: and the network card of the server receives the RDMA messages sent by the plurality of clients.
When a client needs to access the memory of the server 1, the client can establish network connection with the network card of the server, and send an RDMA message to the server through the network connection, where the RDMA message carries metadata of the client, the metadata includes a storage address addr and a size of data to be read, the data to be read is data that the client needs to write into the memory of the server, and the storage address addr is a storage address of the data to be read in the client.
When the network card of the server receives the RDMA message sent by the client, the server may store the identifier of the client, the metadata of the client, and the state of the client in a corresponding relationship among the identifier, the metadata, and the state, where the state of the client is to be processed.
Each record in the corresponding relationship among the identifier, the metadata and the state is a client entry of one client, that is, the client entry of the client includes the identifier, the metadata and the state of the client. The correspondence of the identifier, metadata and state may be stored in a memory of the server.
The server's CPU allocates time slices, and within a time slice the server's CPU allows a group of clients to communicate with the server's memory, as described in detail below.
Step 102: the server determines a first set of clients from the plurality of clients based on the RDMA messages of the plurality of clients.
Before a first time slice distributed by a CPU of the server, the CPU of the server selects N clients with states to be processed or suspended from the clients corresponding to the identifiers stored in the corresponding relation among the identifiers, the metadata and the states, and determines the N clients as a first group of clients, wherein N is an integer which is greater than or equal to 1 and less than or equal to a preset numerical value.
The CPU of the server may also modify the state of each client in the first set of clients into processing in the correspondence of the identifier, the metadata, and the state.
And when the CPU of the server selects the identifier of the client with the state to be processed or suspended, preferentially selecting the client with the state of suspended.
For the clients corresponding to the identifiers stored in the corresponding relation of the identifiers, the metadata and the states, each client also has a priority, so that the CPU of the server selects the client with the state to be processed and the client with the state to be suspended from the clients corresponding to the identifiers stored in the corresponding relation of the identifiers, the metadata and the states, and selects N clients from the clients with the state to be processed and the clients with the state to be suspended according to the priorities.
Optionally, a scheduling thread may be run in the CPU of the server, and the first group of clients is determined by the scheduling thread.
Step 103: the network card of the server stores data read from the first group of clients according to the RDMA messages sent by the first group of clients into a first cache space, which is a part of the space in the LLC of the server.
In this step, before the first time slice, the network card of the server may obtain the metadata of each client in the first group of clients from the correspondence between the identifier, the metadata, and the state, read data from each client in the first group of clients according to the metadata of each client in the first group of clients, and store the read data in the first cache space.
Referring to fig. 5, for any client in the first group of clients, the metadata of the client includes a storage address addr and a size of data to be read, and the network card of the server reads the data from the client according to the storage address addr and the size. When the method is implemented, the network card of the server sends a read request message to the client, wherein the read request message carries the storage address addr and the size. The client receives the read request message, reads data according to the memory address addr carried by the read request message, and sends a read response message to the server, wherein the size1 of the read data is smaller than or equal to that of the data to be read, and the read response message carries the read data. And the network card of the server receives the read response message and caches the data carried by the read response message in a first cache space. The network card of the server sends a read request message to the client, wherein the read request message carries a storage address addr + size1 and a size-size1. The client receives the read request message, reads data according to the memory address addr + size1 carried by the read request message, and sends a read response message to the server, wherein the size2 of the read data is smaller than or equal to the size-size1 of the data to be read, and the read response message carries the read data. And the network card of the server receives the read response message and caches the data carried by the read response message in a first cache space. And repeating the process until the data to be read is read or the first time slice is finished.
Step 104: and in the first time slice, the CPU of the server processes the data cached in the first cache space.
Referring to fig. 3, the first cache space corresponds to a first memory space in a memory of the server, the first memory space includes a memory message area corresponding to each client in the first group of clients, that is, the first memory space includes N memory message areas, and the first memory space is used to store data in the first cache space, that is, the N memory message areas are used to store data of the first group of clients. The N memory message areas are message areas that are isolated from each other.
And in the first time slice, the CPU of the server processes the data cached in the first cache space, and a processed result obtained after processing is still stored in the first cache space. The memory controller of the server may store the data cached in the first cache space into the first memory space, or the memory controller may store the data cached in the first cache space into the first memory space when the client writes new data into the server. For example, if the client requests the server to execute a plurality of requests according to the parameters provided by the client, in the first time slice, the server processes the requests of the client according to the parameters provided by the client, and returns the execution result to the client. Moreover, it can be understood that, when the client writes new data to the server, the memory controller of the server may store the data cached in the first cache space into the first memory space.
Before the first time slice, the network card of the server may not read the data of the first group of clients, so that in the first time slice, the network card of the server continues to read the data from each client in the first group of clients respectively according to the metadata of each client in the first group of clients, and stores the read data in the first cache space. And the CPU of the server processes the data cached in the first cache space.
The CPU of the server runs a plurality of working threads, each client in the first group of clients corresponds to one working thread, that is, the first group of clients corresponds to N working threads, the N working threads can run in the CPU of the server, and the data of the first group of clients cached in the first cache space is processed through the N working threads.
Step 105: within the first time slice, a second set of clients is determined from the plurality of clients according to the RDMA messages of the plurality of clients, the first set of clients being different from the second set of clients.
In the first time slice, the CPU of the server selects M clients with states to be processed or suspended from the clients corresponding to the identifiers in the corresponding relation among the identifiers, the metadata and the states, and determines the M clients as a second group of clients, wherein M is an integer which is greater than or equal to 1 and less than or equal to a preset numerical value.
The CPU of the server may also modify the state of each client in the second set of clients into processing in the correspondence of the identification, the metadata, and the state.
When the CPU of the server selects the client with the state to be processed or suspended, the client with the state suspended is preferentially selected.
For the clients corresponding to the identifiers stored in the correspondence between the identifiers and the metadata and the states, each client also has a priority, so the CPU of the server 1 selects the clients whose states are to be processed and the clients whose states are pending from the clients corresponding to the identifiers stored in the correspondence between the identifiers and the metadata and the states, and selects M clients from the clients whose states are to be processed and the clients whose states are pending according to the priorities.
Alternatively, the determining of the operation of the second group of clients may be that the CPU of the server runs a scheduling thread, and the scheduling thread is used to execute the determining of the operation of the second group of clients.
Step 106: in the first time slice, the network card of the server stores the data read from the second group of clients according to the RDMA messages sent by the second group of clients into a second cache space, and the second cache space is another part of space in the LLC of the server.
In the first time slice, the network card of the server may obtain the metadata of each client in the second group of clients from the correspondence between the identifier, the metadata, and the state, read data from each client in the second group of clients according to the metadata of each client in the second group of clients, and store the read data in the second cache space.
In the first time slice, the CPU of the server may set the state of the first cache space to an active state, and set the state of the second cache space to a warm-up state. In this way, in the first time slice, the CPU of the server processes the data of the first group of clients cached in the first cache space, and does not process the data of the second group of clients cached in the second cache space.
When the first time slice ends, for any one of the first group of clients, if the total size of the data read by the network card of the server from the client is smaller than the size of the data to be read included in the metadata of the client, that is, when the first time slice ends, the network card of the server does not finish reading the data to be read of the client from the client, the server may determine the storage address and size of the remaining data to be read in the data to be read according to the total size and the storage address and size of the data to be read, and replace the storage address and size of the data to be read included in the metadata of the client with the storage address and size of the data to be read in the corresponding relationship between the identifier, the metadata and the state. For example, assuming that the metadata of the client includes the memory address of the data to be read as addr and the size of the data to be read as size, and the total size is X, according to X, addr and size, the memory address of the remaining unread data in the data to be read is determined as addr + X, and the size of the unread data is determined as size-X. And in the corresponding relation among the identification, the metadata and the state, replacing addr and size included by the metadata of the client by addr + X and size-X respectively.
If the total size of the data read from the client by the network card of the server is equal to the size of the data to be read included in the metadata of the client, that is, when the first time slice is finished, the network card of the server finishes reading the data to be read of the client from the client, the server may delete the record including the identifier, the non-data and the state of the client in the corresponding relationship between the identifier, the metadata and the state.
Step 107: and in the second time slice, the CPU of the server processes the data cached in the second cache space.
In the first time slice, the network card of the server may not finish reading the data of the second group of clients, so that when the second time slice starts, the network card of the server continues to read the data from each client in the second group of clients respectively according to the metadata of each client in the second group of clients, and stores the read data in the second cache space. And in the second time slice, the CPU of the server processes the data cached in the second cache space.
At the beginning of the second time slice, the CPU of the server may switch the state of the first cache space to a warm-up state and the state of the second cache space to an active state. And processing the data of the second group of clients cached in the second cache space in a second time slice by the working thread running in the CPU of the server.
And when the second time slice begins, the CPU of the server suspends the processing of the data in the first cache space.
Optionally, in the second time slice, the data of the second group of clients cached in the second cache space may be processed by using M worker threads respectively.
Referring to fig. 3, the second cache space corresponds to a second memory space in the memory of the server, the second memory space includes a memory message area corresponding to each client in the second group of clients, that is, the second memory space includes M memory message areas, and the second memory space is used to store data in the second cache space, that is, the M memory message areas are used to store data of the second group of clients.
And the CPU of the server processes the data cached in the second cache space, and the processed result is still stored in the second cache space. The memory controller of the server may store the data cached in the second cache space into the second memory space, or the memory controller may store the data cached in the second cache space into the second memory space when the client writes new data into the server.
In the first time slice and the second time slice, the network card of the server may receive an RDMA message sent by the new client, the RDMA message carrying metadata of the new client. The server sets the state of the new client to be processed, and stores the identifier, the metadata and the state of the new client in the corresponding relation of the identifier, the metadata and the state.
In the second time slice, the CPU and the network card of the server repeatedly execute the operation in the first time slice, that is, in the second time slice, the CPU of the server may continue to select V clients whose states are to be processed or suspended from the clients corresponding to each identifier in the correspondence between the identifier, the metadata, and the state, and determine the V clients as a third group of clients, where V is an integer greater than or equal to 1 and less than or equal to a preset value. The CPU of the server modifies the state of each client in the third group of clients into processing in the correspondence of the identification, the metadata, and the state. And in the second time slice, the network card of the server reads data from each client in the third group of clients respectively according to the identification, the metadata and the metadata of each client in the third group of clients in the corresponding relation of the state, and stores the read data in the first cache space.
In the third group of clients, there may be a part of clients belonging to the first group of clients, where the part of clients is a part of clients in the first group of clients that have not been processed by the server within the first time slice, or the part of clients is a part of clients that have not been read by the server in the first group of clients. At the end of the first time slice, the state of the portion of clients is set to pending, so when determining the third group of clients, the portion of clients is preferentially selected as the third group of clients. That is, for the clients that do not complete memory communication with the server in the first time slice, the clients in the second time slice are selected as part of the third group of clients to continue processing in the third time slice, where the third time slice is the next time slice of the second time slice.
For each client in the portion of clients, the memory message region corresponding to the client in the third time slice may be the same as the memory message region corresponding to the client in the first time slice. Referring to fig. 3, at this time, the first memory space corresponding to the first cache space includes the memory message area of each client in the third set of clients, that is, the first memory space includes V memory message areas.
When the second time slice is finished, for any one of the second group of clients, if the network card of the server does not finish reading the data to be read of the client from the client, the CPU of the server may update the metadata of the client stored in the correspondence relationship between the identifier, the metadata and the state, and modify the state of the client to be suspended in the correspondence relationship between the identifier, the metadata and the state.
And in a third time slice, the CPU of the server processes the data of the third group of clients cached in the first cache space, and the memory controller of the server stores the data cached in the first cache space. And in the third time slice, the network card of the server can also continue to read data from each client in the third group of clients respectively according to the metadata of each client in the third group of clients, and store the read data in the first cache space.
In the embodiment of the application, before a first time slice, a CPU determines a first group of clients, processes data of the first group of clients cached in a first cache space in the first time slice, determines a second group of clients, and processes data of the second group of clients cached in a second cache space in a second time slice, so that the plurality of clients share the first cache space and the second cache space in different time slices, and a server allows the number of the clients not to be limited by the first cache space and the second cache space, so that more clients can be allowed to communicate with a memory of the server in different time slices, and the expansibility is improved. When a group of clients is determined, the clients in the suspension state are preferentially selected or the clients are selected according to the priority, so that the clients which do not read the data to be read before the first time slice are preferentially selected, or the clients with high priority are preferentially selected, and the timeliness of the service of the clients can be improved. Before the first time slice, the network card caches the data of the first group of clients in the first cache space, the CPU can process the data of the first group of clients cached in the first cache space after the first time slice starts, the network card caches the data of the second group of clients in the second cache space before the second time slice starts, and the CPU can process the data of the second group of clients cached in the second cache space after the second time slice starts, so that the data processing efficiency is improved.
Referring to fig. 6, an embodiment of the present application provides an RDMA-based memory communication apparatus 200, including: the system comprises a CPU201, a network card 202 and an LLC203, wherein the LLC203 comprises a first cache space and a second cache space;
the network card 202 is used for receiving RDMA messages sent by a plurality of clients;
a CPU201 for: determining a first set of clients from the plurality of clients based on RDMA messages sent by the plurality of clients;
processing data of a first group of clients buffered in a first buffer space in a first time slice, wherein the data of the first group of clients is data read by the network card 202 from the first group of clients according to RDMA (remote direct memory access) messages of the first group of clients; and
determining a second group of clients from the plurality of clients according to RDMA messages sent by the plurality of clients within a first time slice;
the network card 202 is further used for reading data from the second group of clients according to the RDMA messages of the second group of clients in the first time slice, and writing the read data into a second cache space;
the CPU201 is further configured to suspend, in a second time slice, processing of data buffered in the first group of clients in the first buffer space, and process data of the second group of clients in the second buffer space, where the second time slice is a time slice next to the first time slice.
Optionally, the network card 202 is configured to place, in a time slice before the first time slice, data read from the first group of clients according to the RDMA messages of the first group of clients into the first buffer space.
Optionally, the CPU201 is further configured to determine, in the second time slice, a third group of clients from the multiple clients according to the RDMA messages sent by the multiple clients;
the network card 202 is further configured to place, in the second time slice, data read from a third group of clients according to the RDMA messages of the third group of clients into the first buffer space, where the third group of clients is different from the second group of clients.
Optionally, the first memory space corresponding to the first cache space includes a memory message area of each client in the first group of clients, and the second memory space corresponding to the second cache space includes a memory message area of each client in the second group of clients; further comprising:
a memory 204 comprising a first memory space and a second memory space; and
a memory controller 205, configured to write the data of the first group of clients cached in the first cache space into the first memory space; and
and writing the data of the second group of clients cached in the second cache space into the second memory space.
Optionally, the network card 202 is specifically configured to, in the first time slice, read data from the second group of clients according to metadata in an RDMA message of the second group of clients, where the metadata includes an address of the data to be read and a size of the data.
In the embodiment of the application, before the first time slice, the first group of clients is determined from the plurality of clients, the data of the first group of clients cached in the first cache space is processed in the first time slice, the second group of clients is determined from the plurality of clients, and the data of the second group of clients cached in the second cache space is processed in the second time slice, so that the plurality of clients share the first cache space and the second cache space in different time slices, more clients are allowed to communicate with the memory, and the expansibility is improved. The data of the first group of clients are cached in the first cache space before the first time slice, the CPU can process the data of the first group of clients cached in the first cache space after the first time slice starts, the data of the second group of clients are cached in the second cache space before the second time slice starts, and the data of the second group of clients cached in the second cache space can be processed by the CPU after the second time slice starts, so that the data processing efficiency is improved.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.
The above description is intended only to illustrate the alternative embodiments of the present application, and should not be construed as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for Remote Direct Memory Access (RDMA) -based memory communication, comprising:
the server receives RDMA messages sent by a plurality of clients;
the server determines a first group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients;
in a first time slice, the server processes the data of the first group of clients buffered in a first buffer space, wherein the data of the first group of clients is the data read by a network card of the server from the first group of clients according to the RDMA messages of the first group of clients in the first time slice and in the previous time slice of the first time slice;
during the first time slice, the server determines a second group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients;
during the first time slice, the server reads data from the second group of clients according to the RDMA messages of the second group of clients and writes the read data into a second cache space of the server, wherein the first cache space and the second cache space are different parts of a Last Level Cache (LLC) of a processor of the server;
the server suspends processing of data buffered in the first group of clients of the first cache space for a second time slice, reads data from the second group of clients according to RDMA messages of the second group of clients, writes the read data to a second cache space of the server, and processes data of the second group of clients in the second cache space, wherein the second time slice is next to the first time slice.
2. The method of claim 1, further comprising:
within the second time slice, the server determines a third group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients; and
and during the second time slice, the server puts data read from the third group of clients according to the RDMA messages of the third group of clients into the first cache space, wherein the third group of clients is different from the second group of clients.
3. The method according to claim 1 or 2, characterized in that: the server further includes a first memory space and a second memory space, where the first memory space and the second memory space are different portions of the memory of the server, respectively, and the method further includes:
the server writes the data of the first group of clients cached in the first cache space into the first memory space;
and the server writes the data of the second group of clients cached in the second cache space into the second memory space.
4. The method of claim 1, wherein the server reading data from the second set of clients according to the RDMA messages of the second set of clients comprises:
and the network card of the server reads data from the second group of clients respectively according to metadata in the RDMA message of the second group of clients, wherein the metadata comprises the address of the data to be read and the size of the data.
5. A remote direct memory access, RDMA, based memory communication device, comprising: the system comprises a Central Processing Unit (CPU), a network card and a last-level cache (LLC) of the CPU, wherein the LLC comprises a first cache space and a second cache space;
the network card is used for receiving RDMA (remote direct memory access) messages sent by a plurality of clients;
the CPU is used for:
determining a first group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients;
processing, within a first time slice, data of the first set of clients buffered in a first buffer space, the data of the first set of clients being data read by the network card from the first set of clients according to RDMA messages of the first set of clients within and a time slice preceding the first time slice; and
determining a second group of clients from the plurality of clients according to RDMA messages sent by the plurality of clients within the first time slice;
the network card is also used for reading data from the second group of clients according to the RDMA messages of the second group of clients in the first time slice and writing the read data into a second cache space;
the CPU is further configured to suspend processing of the data buffered in the first group of clients of the first cache space within a second time slice, read data from the second group of clients according to RDMA messages of the second group of clients, write the read data into a second cache space of the server, and process the data of the second group of clients in the second cache space, where the second time slice is a next time slice of the first time slice.
6. The apparatus of claim 5,
the CPU is further used for determining a third group of clients from the plurality of clients according to the RDMA messages sent by the plurality of clients in the second time slice;
the network card is further configured to place, in the second time slice, data read from the third group of clients according to the RDMA messages of the third group of clients into the first cache space, where the third group of clients is different from the second group of clients.
7. The apparatus of claim 5 or 6, further comprising:
the memory comprises a first memory space and a second memory space; and
the memory controller is used for writing the data of the first group of clients cached in the first cache space into the first memory space; and
and writing the data of the second group of clients cached in the second cache space into the second memory space.
8. The apparatus of claim 5, wherein:
the network card is specifically configured to, in the first time slice, read data from the second group of clients according to metadata in the RDMA message of the second group of clients, where the metadata includes an address of the data to be read and a size of the data.
9. A computer-readable storage medium, having stored thereon a computer program or instructions, which, when executed by a processor, carry out the method of any one of claims 1-4.
10. A computer program product having a computer program stored thereon, characterized in that the computer program, when being executed by a processor, carries out the method of any one of claims 1-4.
CN201910364014.4A 2019-04-30 2019-04-30 Memory communication method and device based on remote direct memory access RDMA Active CN111858418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910364014.4A CN111858418B (en) 2019-04-30 2019-04-30 Memory communication method and device based on remote direct memory access RDMA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910364014.4A CN111858418B (en) 2019-04-30 2019-04-30 Memory communication method and device based on remote direct memory access RDMA

Publications (2)

Publication Number Publication Date
CN111858418A CN111858418A (en) 2020-10-30
CN111858418B true CN111858418B (en) 2023-04-07

Family

ID=72965186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910364014.4A Active CN111858418B (en) 2019-04-30 2019-04-30 Memory communication method and device based on remote direct memory access RDMA

Country Status (1)

Country Link
CN (1) CN111858418B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115981548A (en) * 2021-10-14 2023-04-18 华为技术有限公司 Flow control method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
CN103902486A (en) * 2014-04-08 2014-07-02 华为技术有限公司 System, device and method for implementation of remote direct memory access
CN103929415A (en) * 2014-03-21 2014-07-16 华为技术有限公司 Method and device for reading and writing data under RDMA and network system
CN104468638A (en) * 2013-09-12 2015-03-25 北大方正集团有限公司 Distributed data processing method and system
CN104679688A (en) * 2013-12-02 2015-06-03 华为技术有限公司 Data access method, device and system
CN105393239A (en) * 2013-09-05 2016-03-09 谷歌公司 Isolating clients of distributed storage systems
CN105408880A (en) * 2013-07-31 2016-03-16 甲骨文国际公司 Direct access to persistent memory of shared storage
CN105426321A (en) * 2015-11-13 2016-03-23 上海交通大学 RDMA friendly caching method using remote position information
CN105450588A (en) * 2014-07-31 2016-03-30 华为技术有限公司 RDMA-based data transmission method and RDMA network cards
CN105630426A (en) * 2016-01-07 2016-06-01 清华大学 Method and system for obtaining remote data based on RDMA (Remote Direct Memory Access) characteristics
CN106657365A (en) * 2016-12-30 2017-05-10 清华大学 High concurrent data transmission method based on RDMA (Remote Direct Memory Access)
CN107077441A (en) * 2014-12-09 2017-08-18 英特尔公司 Accessed using Remote Direct Memory(RDMA)With the isomery input/output of active message(I/O)
CN107479833A (en) * 2017-08-21 2017-12-15 中国人民解放军国防科技大学 Key value storage-oriented remote nonvolatile memory access and management method
CN108268208A (en) * 2016-12-30 2018-07-10 清华大学 A kind of distributed memory file system based on RDMA
CN108989237A (en) * 2017-06-01 2018-12-11 华为技术有限公司 The method and apparatus of data transmission

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7921178B2 (en) * 2008-12-04 2011-04-05 Voltaire Ltd. Device, system, and method of accessing storage
US10083193B2 (en) * 2015-01-09 2018-09-25 International Business Machines Corporation Efficient remote pointer sharing for enhanced access to key-value stores

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404212A (en) * 2011-11-17 2012-04-04 曙光信息产业(北京)有限公司 Cross-platform RDMA (Remote Direct Memory Access) communication method based on InfiniBand
CN105408880A (en) * 2013-07-31 2016-03-16 甲骨文国际公司 Direct access to persistent memory of shared storage
CN105393239A (en) * 2013-09-05 2016-03-09 谷歌公司 Isolating clients of distributed storage systems
CN104468638A (en) * 2013-09-12 2015-03-25 北大方正集团有限公司 Distributed data processing method and system
CN104679688A (en) * 2013-12-02 2015-06-03 华为技术有限公司 Data access method, device and system
CN103929415A (en) * 2014-03-21 2014-07-16 华为技术有限公司 Method and device for reading and writing data under RDMA and network system
CN103902486A (en) * 2014-04-08 2014-07-02 华为技术有限公司 System, device and method for implementation of remote direct memory access
CN105450588A (en) * 2014-07-31 2016-03-30 华为技术有限公司 RDMA-based data transmission method and RDMA network cards
CN107077441A (en) * 2014-12-09 2017-08-18 英特尔公司 Accessed using Remote Direct Memory(RDMA)With the isomery input/output of active message(I/O)
CN105426321A (en) * 2015-11-13 2016-03-23 上海交通大学 RDMA friendly caching method using remote position information
CN105630426A (en) * 2016-01-07 2016-06-01 清华大学 Method and system for obtaining remote data based on RDMA (Remote Direct Memory Access) characteristics
CN106657365A (en) * 2016-12-30 2017-05-10 清华大学 High concurrent data transmission method based on RDMA (Remote Direct Memory Access)
CN108268208A (en) * 2016-12-30 2018-07-10 清华大学 A kind of distributed memory file system based on RDMA
CN108989237A (en) * 2017-06-01 2018-12-11 华为技术有限公司 The method and apparatus of data transmission
CN107479833A (en) * 2017-08-21 2017-12-15 中国人民解放军国防科技大学 Key value storage-oriented remote nonvolatile memory access and management method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Youmin Chen,Youyou Lu,etc..Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing.《Fourteenth Eurosys Conference 2019》.ACM,2019,第1-14页. *

Also Published As

Publication number Publication date
CN111858418A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
US20110153976A1 (en) Methods and apparatuses to allocate file storage via tree representations of a bitmap
EP3441884B1 (en) Method for managing translation lookaside buffer and multi-core processor
JPWO2012026034A1 (en) Scheduler, multi-core processor system, and scheduling method
CN108139974B (en) Distributed cache live migration
US11698757B2 (en) Memory system and method of controlling nonvolatile memory
CN103064797A (en) Data processing method and virtual machine management platform
JP2018531471A6 (en) Distributed cache live migration
CN107341114A (en) A kind of method of directory management, Node Controller and system
CN111124270A (en) Method, apparatus and computer program product for cache management
CN114168490A (en) Method for determining memory recovery threshold and related equipment
CN110209354B (en) Method, apparatus, device and medium for processing data
CN111858418B (en) Memory communication method and device based on remote direct memory access RDMA
CN110046114B (en) DMA controller based on PCIE protocol and DMA data transmission method
CN110162396A (en) Method for recovering internal storage, device, system and storage medium
US10042773B2 (en) Advance cache allocator
CN113794764A (en) Request processing method and medium for server cluster and electronic device
JPH11143779A (en) Paging processing system for virtual storage device
US9858204B2 (en) Cache device, cache system, and cache method
CN110347614B (en) Storage space mapping algorithm, cache state machine, storage device, and storage medium
JP2005339299A (en) Method for cache control of storage device
CN104252423A (en) Consistency processing method and device based on multi-core processor
CN111221773A (en) Data storage architecture method based on RMDA high-speed network and skip list
CN108287793B (en) Response message buffering method and server
JP2006350633A (en) Data management method and data management system
CN111143418B (en) Method, device, equipment and storage medium for reading data from database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant