CN116578504A - Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area - Google Patents

Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area Download PDF

Info

Publication number
CN116578504A
CN116578504A CN202310612656.8A CN202310612656A CN116578504A CN 116578504 A CN116578504 A CN 116578504A CN 202310612656 A CN202310612656 A CN 202310612656A CN 116578504 A CN116578504 A CN 116578504A
Authority
CN
China
Prior art keywords
rdma
memory
page table
tlb
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310612656.8A
Other languages
Chinese (zh)
Inventor
张学利
黄勇平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yunbao Intelligent Co ltd
Original Assignee
Shenzhen Yunbao Intelligent Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yunbao Intelligent Co ltd filed Critical Shenzhen Yunbao Intelligent Co ltd
Priority to CN202310612656.8A priority Critical patent/CN116578504A/en
Publication of CN116578504A publication Critical patent/CN116578504A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/14Protection against unauthorised use of memory or access to memory
    • G06F12/1458Protection against unauthorised use of memory or access to memory by checking the subject access rights
    • G06F12/1483Protection against unauthorised use of memory or access to memory by checking the subject access rights using an access-table, e.g. matrix or list
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/154Networked environment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application discloses a method for improving the efficiency of an RDMA engine for accessing a memory area, which comprises the following steps: receiving an RDMA request message from a remote host, and analyzing and obtaining a memory area and a virtual address which are requested to be used by the request message; performing authentication processing on the request message by adopting a memory protection conversion table (MPT) in a host memory; after authentication is successful, searching whether a matched page table entry exists in a bypass translation buffer (TLB) preset in the MPT according to a page table pointer corresponding to a virtual address in a request message, and if so, acquiring a physical address corresponding to the virtual address; and performing read-write operation on the memory area in the host memory pointed by the physical address according to the request message. The application also discloses a corresponding device, a system, a chip and a storage medium. By implementing the application, the efficiency of RDMA communication can be improved.

Description

Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area
Technical Field
The present application relates to the field of remote direct memory access (Remote Direct Memory Access, RDMA) technology, and in particular, to a method, apparatus, system, chip and storage medium for improving the efficiency of access to a memory area by an RDMA engine.
Background
When the RDMA engine accesses a Memory Region (MR) of the host Memory, the RDMA engine needs to access a Memory protection translation table (Memory Protection Table, MPT) for permission checking, then access a Memory address translation table (Memory Translation Table, MTT) for virtual address to physical address translation, and if the number of stages corresponding to the MTT address translation is greater than 0, the RDMA engine needs to read the MTT for address translation for multiple times to the host Memory. Generally, when MR is larger than 1 physical page, the number of required address translation stages is larger than 0.
In the prior art, when a user uses RDMA, the number of MTT address translation stages corresponding to MR is generally greater than 0, so each time the RDMA engine accesses MR, the MTT is generally required to be accessed, and the MTT is generally placed in the host memory, so that the page table in the memory is generally required to be read for multiple times to obtain the final physical address information. This affects the latency of RDMA command execution and the bandwidth of PCIE.
Disclosure of Invention
The application aims to solve the technical problem of providing a method, a device, a system, a chip and a storage medium for improving the efficiency of accessing a memory area by an RDMA engine, which can improve the efficiency of RDMA communication
In order to solve the above technical problems, as one aspect of the present application, a method for improving efficiency of accessing a memory area by an RDMA engine is provided, which at least includes the following steps:
receiving an RDMA request message from a remote end, and analyzing and obtaining a memory area and a virtual address which are requested to be used by the RDMA request message;
performing authentication processing on the request message by adopting a memory protection conversion table (MPT) in a host memory;
after authentication is successful, searching whether a matched Page table entry exists in a bypass translation buffer (TLB) preset in the MPT according to a Page table pointer (Page Index) corresponding to a virtual address in a request message, and if so, acquiring a physical address corresponding to the virtual address;
and performing read-write operation on the memory area in the host memory pointed by the physical address according to the request message.
Wherein, further include:
before RDMA communication, register memory area (MR), establish memory address translation table (MTT) and memory protection translation table (MPT) corresponding to the memory area, establish a bypass translation buffer (TLB) in the MPT, store at least one page table entry in the TLB, each page table entry stores the corresponding relation between virtual address and physical address of a page table.
Wherein, further include:
if no matched page table entry exists in the TLB, reading the MTT, performing address translation, traversing a multi-stage page table, and obtaining a corresponding physical address;
and establishing a new page table entry according to the virtual address and the physical address acquired through the MTT, and updating the new page table entry into the TLB.
Wherein, further include:
judging the number of stages in the MTT, if the number of stages of the MTT is zero, searching the MPT to directly acquire a physical address corresponding to the virtual address according to a Page table pointer (Page Index) corresponding to the virtual address in a request message by the RDMA engine.
Correspondingly, in another aspect of the present application, there is further provided an apparatus for improving efficiency of accessing a memory area by an RDMA engine, where the apparatus is applied to the RDMA engine, and the apparatus at least includes:
the message parsing unit is used for receiving an RDMA request message from a remote end and parsing and obtaining a memory area and a virtual address which are requested to be used by the RDMA request message;
an authentication processing unit, configured to perform authentication processing on the request packet by using a memory protection conversion table (MPT) in a host memory;
the matching processing unit is used for searching whether a matched Page table entry exists in a bypass translation buffer (TLB) preset in the memory protection translation table (MPT) according to a Page table pointer (Page Index) corresponding to a virtual address in a request message, and acquiring a physical address corresponding to the virtual address if the matched Page table entry exists in the bypass translation buffer (TLB); the bypass translation buffer (TLB) stores at least one page table entry, and each page table entry stores the corresponding relation between the virtual address and the physical address of a page table;
and the access processing unit is used for performing read-write operation on the memory area in the host memory pointed by the physical address according to the request message.
Wherein, further include:
the translation processing unit is used for reading a memory address translation table (MTT) in a host memory and performing address translation when no matched page table entry exists in the TLB, traversing a multi-level page table and obtaining a corresponding physical address;
and the TLB updating unit is used for establishing a new page table entry according to the virtual address and the physical address acquired by the translation processing unit and updating the new page table entry into the TLB.
Wherein, further include:
the RDMA engine searches MPT to directly obtain the physical address corresponding to the virtual address according to the Page table pointer (Page Index) corresponding to the virtual address in the request message if the number of the MTT is zero.
Correspondingly, in still another aspect of the present application, a system for improving the efficiency of accessing a memory area by an RDMA engine is provided, which at least includes a host, a host memory, and an RDMA engine, where the RDMA engine is deployed with the device for improving the efficiency of accessing a memory area by an RDMA engine.
Wherein, the local end host machine at least comprises:
a registration unit, configured to register a memory area (MR) for application software before RDMA communication, establish a memory address translation table (MTT) and a memory protection translation table (MPT) corresponding to the memory area, and establish a bypass translation buffer (TLB) in the MPT, where the bypass translation buffer (TLB) stores at least one page table entry, and each page table entry stores a correspondence between a virtual address and a physical address of a page table;
an RDMA connection establishment unit for establishing RDMA connection with remote end and writing current request queue element (RQ WQE) into main memory.
Accordingly, in yet another aspect of the present application, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.
Accordingly, in yet another aspect of the present application, a chip is provided that incorporates the aforementioned means for improving the efficiency of access to a memory region by an RDMA engine.
The embodiment of the application has the following beneficial effects:
the application provides a method, a device, a system, a chip and a storage medium for improving efficiency of access of an RDMA engine to a memory area. When the application software registers MR and creates MPT, a TLB space is set in the MPT for storing the latest used page table entry, and the page table entry stores the mapping relation between virtual address and physical address; when the RDMA engine accesses the MR, the TLB is queried first, and if the Page Index corresponding to the accessed physical Page is in the TLB, the physical address is obtained directly from the TLB, without accessing the MTT for address translation. If not in the TLB, access to the MTT is required for address translation to obtain the physical address. Meanwhile, when the RDMA engine accesses the MR and the TLB is not matched, the physical Page obtained from the MTT and the Page Index corresponding to the physical Page are used for refreshing the Page table entry in the TLB in time. By implementing the application, the delay and PCIE bandwidth for executing the RDMA command can be reduced on the whole, and the efficiency of RDMA communication is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that it is within the scope of the application to one skilled in the art to obtain other drawings from these drawings without inventive faculty.
FIG. 1 is a schematic diagram of a main flow of an embodiment of a method for improving the efficiency of an RDMA engine to access a memory region according to the present application;
FIG. 2 is a schematic view of an application environment of the method according to the present application;
FIG. 3 is a schematic diagram illustrating an embodiment of an apparatus for improving the efficiency of accessing a memory region by an RDMA engine according to the present application;
fig. 4 is a schematic structural diagram of the local host CPU in fig. 3.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent.
Embodiments of the application relate to communication between local RDMA and remote RDMA. The local RDMA is at the local host end, the remote RDMA is at the remote host end, and the general flow is as follows: a respective Work Queue (WQ) is first created, and then the operation instructions to be executed, otherwise referred to as Work Queue elements (Work Queue Element, WQEs), are placed in the Work Queue and stored in host memory. And then the RDMA engine takes out the corresponding SQ WQE from the host memory, interprets and executes the SQ WQE to generate a request message, and transmits the request message to the remote RDMA through a physical network port and a network physical link. The remote RDMA processes the corresponding processing and returns a response message; when execution of a WQE is Completed, a completion Queue element (Completed Queue Element, CQE) is generated and placed in a pre-created Completion Queue (CQ). Finally the RDMA engine informs the initiator of the corresponding WQE that the WQE has been completed with CQ.
FIG. 1 is a schematic diagram illustrating the main flow of one embodiment of a method for improving the efficiency of an RDMA engine to access a memory region according to the present application; as also shown in fig. 2, in this embodiment, the method operates in the RDMA engine, and the method at least includes the following steps:
step S10, an RDMA engine receives an RDMA request message from a remote end, and analyzes and obtains a memory area and a virtual address which are requested to be used by the RDMA request message;
it can be appreciated that the local RDMA engine may convert the received remote RDMA request packet into a specific DMA operation request, so as to access the local host memory, for example, direct data read/write to the corresponding memory, etc. In the embodiment of the present application, the RDMA request packet may carry indication information for indicating a memory area and a virtual address to be operated, and the RDMA engine may obtain, for example, a Page table pointer (Page Index) corresponding to the virtual address according to the indication information, where the Page table pointer is used to obtain a specific physical address indicating the memory area.
Step S11, the RDMA engine adopts a memory protection conversion table (MPT) in the memory of the host to carry out authentication processing on the RDMA request message;
in a specific example, the RDMA engine may query the corresponding MPT according to a memory Key (e.g., r_key or l_key, etc.) in the RDMA request message to verify access rights. If the read request authority is recorded in the corresponding MPT table entry, confirming that the RDMA request is legal. If the request is not contained by the scope of rights recorded in the MPT entry accordingly, the RDMA request is confirmed to be illegal.
Step S13, after authentication is successful, the RDMA engine searches whether a matched Page table entry exists in a bypass translation buffer (TLB) preset in the MPT according to a Page table pointer (Page Index) corresponding to a virtual address in a request message, and if so, the physical address corresponding to the virtual address is acquired;
it will be appreciated that in the embodiment of the present application, the local host CPU needs to perform the following steps in advance:
registering a Memory Region (MR) for application software before RDMA communication, establishing a memory address translation table (MTT) and a memory protection translation table (MPT) corresponding to the memory region, establishing a bypass translation buffer (Translation Lookaside Buffer, TLB) in the MPT, storing at least one page table entry in the TLB, and storing the corresponding relation between the virtual address and the physical address of a page table in each page table entry;
establishing RDMA connection with a remote host, and writing a current request queue element (RQ WQE) into a host memory;
it will be appreciated that in the embodiment of the present application, if the upper layer communication application needs to access the hardware resource, it first needs to divide a continuous virtual memory area, and then registers the memory area with the RDMA engine. After registration is complete, the RDMA engine may perform RDMA operations on the segment of memory region.
The memory registration is actually that the RDMA engine establishes a translation table between virtual addresses and physical addresses for the segment of memory region, so as to implement address translation. When registering the memory, the operation authority of the memory area is set, including but not limited to local read-write authority, remote read-write authority and the like. Each memory registration may set a local Key (L Key) and a remote Key (R Key). The L_Key is used for controlling the authority of the local RDMA engine to access the local memory, and the R_Key is used for controlling the authority of the remote RDMA engine to access the local memory. Multiple registrations can be performed on the same memory area, each registration having a different key.
Generally, since the number of MTTs is large, there are a plurality of levels. Each time an RDMA engine accesses virtual memory through the MTT, the virtual address must be translated to a corresponding physical address. The translation requires traversing the page table, e.g., the page table in the MTT is a three-level page table, and then accessing the memory level by level (3 times) to obtain the final physical address.
In the embodiment of the application, a bypass translation buffer (TLB) is established in the MPT, at least one page table entry is stored in the TLB, and the corresponding relation between the virtual address and the physical address of a page table is stored in each page table entry; storing frequently used page table entries in the TLB, and if the virtual address in the RDMA request message is exactly in the page table entries of the TLB, directly obtaining the corresponding characteristic address, thereby improving the efficiency of obtaining the physical address;
in step S14, the RDMA engine performs a read or write operation on the memory area in the host memory pointed to by the physical address according to the request packet.
Wherein, in step S13, further comprising:
if no matched page table entry exists in the TLB, the RDMA engine reads the MTT and performs address translation, and traverses the multi-stage page table to obtain a corresponding physical address;
establishing a new page table entry according to the virtual address and the physical address acquired through the MTT, and updating the new page table entry into the TLB; so that the TLB stores as much of the mapping of the most recently accessed physical address to the virtual address as possible. In a specific example, the update may be added directly to the TLB, and if the TLB's memory space is full, the earliest page table entry may be replaced.
In a specific example, before the step S13, the method further includes the following steps:
judging the number of stages in the MTT, if the number of stages of the MTT is zero, searching the MPT to directly acquire a physical address corresponding to the virtual address according to a Page table pointer (Page Index) corresponding to the virtual address in a request message by the RDMA engine.
To facilitate a further understanding of the present application, a more detailed complete flow and principles of one example of the present application are described below in connection with fig. 2:
step 1, upper layer application software registers a memory area (MR), establishes MPT and MTT corresponding to MR, and writes two physical Page addresses of Page0 and Page1 corresponding to MR and Page Index corresponding to the two physical Page addresses (namely the position of the MR) into MPT to form two Page entries, so as to form initial TLB content. It will be appreciated that in other examples, more than two page entries may be stored;
step 2, establishing RDMA connection with the remote host, and writing RQ WQE into a host memory.
Step 3, after RDMA communication is carried out between the RDMA engine and the remote host, the RDMA engine receives network side RDMA related messages and interprets the messages.
And 4, if the send message is received, reading the corresponding RQ WQE, and interpreting the RQ WQE to obtain the MR and the virtual address used by the send message request.
And 5, reading the MPT in the host memory by the RDMA engine to carry out authority check, and continuing to process the following steps after the authority check is passed, otherwise reporting an error.
Step 6, when the address conversion level corresponding to the MTT is 0, the MTT does not need to be read, and the RDMA engine directly uses the physical address carried by the MPT;
if the number of stages is not 0, the RDMA engine calculates the Page Index corresponding to the physical Page where the message is located, then the Page Index obtained by calculation is matched with the TLB in the MPT, if the Page Index hits, the MTT is not required to be read for address conversion, and the physical address corresponding to the Page Index in the TLB is directly used;
if there is no hit, the MTT needs to be read from the host memory, and then the physical address is obtained through address translation. It can be appreciated that the number of times the RDMA engine reads the host memory at the time of MTT address translation is equal to the number of stages of the MTT itself; meanwhile, a Page table entry is formed by the corresponding relation between the physical address obtained by conversion and the Page Index corresponding to the physical address, and the Page table entry is refreshed into the TLB in the MPT.
In step 7, the RDMA engine writes the load (payload) in the send message to the memory region pointed to by the physical address obtained in step 6.
It can be understood that the above steps illustrate a process of writing data into the local memory by the remote host, and the principle of the process of reading data from the local memory by the remote host is similar, which is not described herein.
From the above, the method according to the embodiment of the present application opens up a TLB space for storing a plurality of page table entries used recently inside the MPT when registering MR and creating the MPT; when an RDMA engine accesses the MR, if the Page Index corresponding to the accessed physical address is within the TLB, the physical address is obtained directly from the TLB, thereby avoiding accessing the MTT for address translation. Latency and PCIE bandwidth for executing RDMA commands may be reduced.
FIG. 3 is a schematic diagram illustrating one embodiment of an apparatus for improving the efficiency of an RDMA engine to access a memory region according to the present application. The device for improving the efficiency of accessing the memory area by the RDMA engine is applied to the RDMA engine shown in fig. 2, and at least comprises:
the message parsing unit 10 is configured to receive an RDMA request message from a remote host, and parse and obtain a memory area and a virtual address requested to be used by the RDMA request message;
an authentication processing unit 11, configured to perform authentication processing on the request packet by using a memory protection conversion table (MPT) in a host memory;
a matching processing unit 12, configured to retrieve, according to a Page table pointer (Page Index) corresponding to a virtual address in a request packet, whether a matched Page table entry exists in a bypass translation buffer (TLB) preset in the memory protection translation table (MPT), and if so, acquire a physical address corresponding to the virtual address; the bypass translation buffer (TLB) stores at least one page table entry, and each page table entry stores the corresponding relation between the virtual address and the physical address of a page table;
and the access processing unit 13 is used for performing read-write operation on the memory area in the host memory pointed by the physical address according to the request message.
In a specific example, the apparatus 1 further comprises:
the translation processing unit 14 is configured to read a memory address translation table (MTT) in the host memory and perform address translation when no matched page table entry exists in the TLB, and traverse the multi-level page table to obtain a corresponding physical address;
and a TLB updating unit 15, configured to establish a new page table entry according to the virtual address and the physical address acquired by the translation processing unit, and update the new page table entry into the TLB.
In a specific example, the apparatus 1 further comprises:
the stage number judging processing unit 16 is configured to judge the stage number in the MTT, and if the stage number of the MTT is zero, the RDMA engine retrieves the MPT to directly obtain the physical address corresponding to the virtual address according to a Page table pointer (Page Index) corresponding to the virtual address in the request packet.
For more details, reference is made to and the description of fig. 1 and 2 is combined with the foregoing description, and details are not repeated here.
Accordingly, in still another aspect of the present application, a system for improving the efficiency of accessing a memory area by an RDMA engine is further provided, and in particular, reference may be made to fig. 2, where the system for improving the efficiency of accessing a memory area by an RDMA engine includes at least a local host CPU, a host memory, and an RDMA engine, where the RDMA engine is deployed with the aforementioned device for improving the efficiency of accessing a memory area by an RDMA engine.
As shown in fig. 4, the local host CPU 2 at least includes:
a registration unit 20, configured to register a Memory Region (MR) for application software before RDMA communication, establish a memory address translation table (MTT) and a memory protection translation table (MPT) corresponding to the memory region, and establish a bypass translation buffer (TLB) in the MPT, where the bypass translation buffer (TLB) stores at least one page table entry, and each page table entry stores a correspondence between a virtual address and a physical address of a page table;
an RDMA connection establishment unit 21 is configured to establish an RDMA connection with the remote host and write a current request queue element (RQ WQE) to the host memory.
For more details, reference is made to the description of fig. 3 and the description is omitted here.
In yet another aspect of the present application, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described in the previous figures 1 to 2. For more details, reference is made to the foregoing descriptions of fig. 1 and 2, and no further description is given here.
In yet another aspect of the application, a chip is provided that incorporates the means for improving the efficiency of access to memory regions by RDMA engines as described above in connection with FIGS. 3 and 4. For more details, reference may be made to the foregoing description of fig. 3 and fig. 4, and details are not repeated here.
The embodiment of the application has the following beneficial effects:
the application provides a method, a device, a system, a chip and a storage medium for improving efficiency of access of an RDMA engine to a memory area. When the application software registers MR and creates MPT, a TLB space is set in the MPT for storing the latest used page table entry, and the page table entry stores the mapping relation between virtual address and physical address; when the RDMA engine accesses the MR, the TLB is queried first, and if the Page Index corresponding to the accessed physical Page is in the TLB, the physical address is obtained directly from the TLB, without accessing the MTT for address translation. If not in the TLB, access to the MTT is required for address translation to obtain the physical address.
Meanwhile, when the RDMA engine accesses the MR and the TLB is not matched, the physical Page obtained from the MTT and the Page Index corresponding to the physical Page are used for refreshing Page table entries in the TLB in time;
by implementing the application, the delay and PCIE bandwidth for executing the RDMA command can be reduced on the whole, and the efficiency of RDMA communication is improved.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above disclosure is only a preferred embodiment of the present application, and it is needless to say that the scope of the application is not limited thereto, and therefore, the equivalent changes according to the claims of the present application still fall within the scope of the present application.

Claims (11)

1. A method for improving efficiency of access to a memory region by an RDMA engine, comprising at least the steps of:
receiving an RDMA request message from a remote end, and analyzing and obtaining a Memory Region (MR) and a virtual address which are requested to be used by the RDMA request message;
performing authentication processing on the request message by adopting a memory protection conversion table (MPT) in a host memory;
after authentication is successful, determining whether a matched page table entry exists in a bypass translation buffer (TLB) preset in the MPT according to a page table pointer corresponding to the virtual address, and if so, acquiring a physical address corresponding to the virtual address;
and performing read-write operation on the memory area in the host memory pointed by the physical address according to the RDMA request message.
2. The method as recited in claim 1, further comprising:
before RDMA communication, register memory area (MR), establish memory address translation table (MTT) and memory protection translation table (MPT) corresponding to the memory area, establish a bypass translation buffer (TLB) in the MPT, store at least one page table entry in the TLB, each page table entry stores the corresponding relation between virtual address and physical address of a page table.
3. The method as recited in claim 2, further comprising:
if no matched page table entry exists in the TLB, reading the MTT, performing address translation, traversing a multi-stage page table, and obtaining a corresponding physical address;
and establishing a new page table entry according to the virtual address and the physical address acquired through the MTT, and updating the new page table entry into the TLB.
4. A method as claimed in any one of claims 1 to 3, further comprising:
judging the number of stages in the MTT, if the number of stages of the MTT is zero, searching the MPT to directly acquire a physical address corresponding to the virtual address according to a page table pointer corresponding to the virtual address in the request message by the RDMA engine.
5. An apparatus for improving efficiency of accessing a memory area by an RDMA engine, applied to the RDMA engine, comprising at least:
the message parsing unit is used for receiving an RDMA request message from a remote end and parsing and obtaining a memory area and a virtual address which are requested to be used by the RDMA request message;
an authentication processing unit, configured to perform authentication processing on the request packet by using a memory protection conversion table (MPT) in a host memory;
a matching processing unit, configured to retrieve, according to a page table pointer corresponding to a virtual address in the RDMA request packet, whether a matched page table entry exists in a bypass translation buffer (TLB) preset in the memory protection translation table (MPT), and if so, acquire a physical address corresponding to the virtual address; the bypass translation buffer (TLB) stores at least one page table entry, and each page table entry stores the corresponding relation between the virtual address and the physical address of a page table;
and the access processing unit is used for performing read-write operation on the memory area in the host memory pointed by the physical address according to the RDMA request message.
6. The apparatus as recited in claim 5, further comprising:
the translation processing unit is used for reading a memory address translation table (MTT) in a host memory and performing address translation when no matched page table entry exists in the TLB, traversing a multi-level page table and obtaining a corresponding physical address;
and the TLB updating unit is used for establishing a new page table entry according to the virtual address and the physical address acquired by the translation processing unit and updating the new page table entry into the TLB.
7. The apparatus as recited in claim 6, further comprising:
the level judgment processing unit is used for judging the level in the MTT, and if the level of the MTT is zero, the RDMA engine searches the MPT to directly acquire the physical address corresponding to the virtual address according to the page table pointer corresponding to the virtual address in the request message.
8. A system for improving efficiency of an RDMA engine accessing a memory region, comprising at least a local host, a host memory, and an RDMA engine, wherein:
the RDMA engine being deployed with the means for improving RDMA engine access to memory regions as claimed in any of claims 5-7.
9. The system of claim 8, wherein the home host comprises at least:
a registration unit, configured to register a memory area (MR) for application software before RDMA communication, establish a memory address translation table (MTT) and a memory protection translation table (MPT) corresponding to the memory area, and establish a bypass translation buffer (TLB) in the MPT, where the bypass translation buffer (TLB) stores at least one page table entry, and each page table entry stores a correspondence between a virtual address and a physical address of a page table;
an RDMA connection establishment unit for establishing an RDMA connection with the remote host and writing a current request queue element (RQ WQE) into the host memory.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 4.
11. A chip incorporating the apparatus of claim 5 or 6 for improving efficiency of access to memory regions by an RDMA engine.
CN202310612656.8A 2023-05-26 2023-05-26 Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area Pending CN116578504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310612656.8A CN116578504A (en) 2023-05-26 2023-05-26 Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310612656.8A CN116578504A (en) 2023-05-26 2023-05-26 Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area

Publications (1)

Publication Number Publication Date
CN116578504A true CN116578504A (en) 2023-08-11

Family

ID=87533915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310612656.8A Pending CN116578504A (en) 2023-05-26 2023-05-26 Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area

Country Status (1)

Country Link
CN (1) CN116578504A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117851286A (en) * 2023-12-13 2024-04-09 天翼云科技有限公司 Memory address translation table compression method in RDMA ROCE
CN118093468A (en) * 2024-04-23 2024-05-28 北京数渡信息科技有限公司 PCIe exchange chip with RDMA acceleration function and PCIe switch
CN118312098A (en) * 2024-04-09 2024-07-09 中科驭数(北京)科技有限公司 RDMA-based physical memory management method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117851286A (en) * 2023-12-13 2024-04-09 天翼云科技有限公司 Memory address translation table compression method in RDMA ROCE
CN118312098A (en) * 2024-04-09 2024-07-09 中科驭数(北京)科技有限公司 RDMA-based physical memory management method, device, equipment and medium
CN118093468A (en) * 2024-04-23 2024-05-28 北京数渡信息科技有限公司 PCIe exchange chip with RDMA acceleration function and PCIe switch

Similar Documents

Publication Publication Date Title
CN116578504A (en) Method, device, system, chip and storage medium for improving efficiency of access of RDMA engine to memory area
US9086987B2 (en) Detection of conflicts between transactions and page shootdowns
US11474951B2 (en) Memory management unit, address translation method, and processor
KR102287677B1 (en) Data accessing method, apparatus, device, and storage medium
CN111737564B (en) Information query method, device, equipment and medium
US11467977B2 (en) Method and apparatus for monitoring memory access behavior of sample process
EP3276494B1 (en) Memory space management
US9086986B2 (en) Detection of conflicts between transactions and page shootdowns
CN111367831B (en) Deep prefetching method and component for translation page table, microprocessor and computer equipment
US10275175B2 (en) System and method to provide file system functionality over a PCIe interface
US8190853B2 (en) Calculator and TLB control method
US20190079795A1 (en) Hardware accelerated data processing operations for storage data
CN115269454A (en) Data access method, electronic device and storage medium
KR102061069B1 (en) Texture cache memory system of non-blocking for texture mapping pipeline and operation method of the texture cache memory
CN106649143B (en) Cache access method and device and electronic equipment
US10678717B2 (en) Chipset with near-data processing engine
US20060230157A1 (en) Method and system for maintaining buffer registrations in a system area network
CN116931830A (en) Data moving method, device, equipment and storage medium
US10936227B2 (en) Method, device, and computer program product for recognizing reducible contents in data to be written
CN111143418B (en) Method, device, equipment and storage medium for reading data from database
US10268418B1 (en) Accessing multiple data snapshots via one access point
CN112527390B (en) Data acquisition method, microprocessor and device with storage function
CN110727612B (en) Calculation buffer memory device based on accurate prefetching
CN108762666B (en) Access method, system, medium and device of storage system
CN118331904A (en) Data processing method, device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination