CN115934625A - Doorbell knocking method, device and medium for remote direct memory access - Google Patents

Doorbell knocking method, device and medium for remote direct memory access Download PDF

Info

Publication number
CN115934625A
CN115934625A CN202310244405.9A CN202310244405A CN115934625A CN 115934625 A CN115934625 A CN 115934625A CN 202310244405 A CN202310244405 A CN 202310244405A CN 115934625 A CN115934625 A CN 115934625A
Authority
CN
China
Prior art keywords
hardware
work queue
pointer
software
doorbell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310244405.9A
Other languages
Chinese (zh)
Other versions
CN115934625B (en
Inventor
陈雅民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Xingyun Zhilian Technology Co Ltd
Original Assignee
Zhuhai Xingyun Zhilian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Xingyun Zhilian Technology Co Ltd filed Critical Zhuhai Xingyun Zhilian Technology Co Ltd
Priority to CN202310244405.9A priority Critical patent/CN115934625B/en
Publication of CN115934625A publication Critical patent/CN115934625A/en
Application granted granted Critical
Publication of CN115934625B publication Critical patent/CN115934625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Combined Controls Of Internal Combustion Engines (AREA)
  • Stored Programmes (AREA)

Abstract

The application provides a doorbell method, apparatus and medium for remote direct memory access. The method comprises the following steps: providing a work queue for interaction between software and hardware of the RDMA chip; providing a first pointer, a second pointer and a third pointer which are associated with a work queue, wherein the first pointer indicates the position of a last doorbell knocking, the second pointer indicates an invalid bit in the work queue fed back by hardware, and the third pointer indicates the head position of a down-sent work queue element queue; reading the work queue elements one by one according to a specific sequence through hardware, judging whether an invalid bit is read again after a specific time when the invalid bit is read for the first time, and informing software and feeding back the invalid bit when the invalid bit is read for the second time; and judging whether the condition of knocking the doorbell is met or not through software according to the relative position relation between the pointers, and if so, knocking the doorbell to inform the hardware. Therefore, the misjudgment rate is reduced, and the software and hardware interaction efficiency is improved.

Description

Doorbell knocking method, device and medium for remote direct memory access
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a medium for knocking a doorbell for remote direct memory access.
Background
With the development of computer technology and network technology, remote Direct Memory Access (RDMA) is used to achieve Direct Access from the Memory of one computer to the Memory of another computer. RDMA directly copies data from a physical line to the memory of an application program through a network adapter, and the data does not need to be copied between the memory of the application program and a data buffer of an operating system in the process, so that a Central Processing Unit (CPU) and cache resources are saved, context switching is not involved, and transmission delay is reduced. Software and hardware interaction under the RDMA mechanism is carried out through a Work Queue (WQ), wherein a Work Queue Element (WQE) is added into the WQ by software, and the WQE is obtained from the WQ by hardware, so that the software sends a Work request or task to the hardware through the storage operation of the software and the obtaining operation of the hardware. In the prior art, a network card chip for RDMA generally obtains WQE by blind reading, but the way of obtaining WQE by blind reading cannot effectively cope with the situation of time delay, for example, micro time delay caused by cross-device operation, and may make misjudgment and cannot cope with the problem in software and hardware interaction.
Therefore, the application provides a doorbell method, device and medium for remote direct memory access, which are used for solving the technical problems in the prior art.
Disclosure of Invention
The embodiment of the application provides a doorbell method, device and medium for remote direct memory access, which are used for solving the problems in the prior art.
In a first aspect, the present application provides a doorbell method for remote direct memory access. The doorbell knocking method comprises the following steps: providing a work queue for interaction between software and hardware of a remote direct memory access, RDMA, chip, the interaction between the software and the hardware being achieved by the software writing work queue elements to the work queue and by the hardware reading work queue elements in the work queue; providing a first pointer, a second pointer and a third pointer associated with the work queue, wherein the first pointer indicates a position of the software knocking doorbell in the work queue last time, the second pointer indicates an invalid bit in the work queue fed back by the hardware, and the third pointer indicates a head position of a work queue element queue issued by the software; reading the work queue elements in the work queue one by one according to a specific sequence through the hardware, wherein when the hardware reads an invalid bit for the first time, the hardware delays for a specific time and then judges whether the hardware reads the invalid bit again at the position where the hardware reads the invalid bit for the first time, when the hardware reads the invalid bit for the second time, the hardware informs the software and feeds back the position where the hardware reads the invalid bit for the first time to be the invalid bit in the work queue, and the specific time is at least longer than the PCI (peripheral component interconnect) delay of the shortest peripheral component of the RDMA (remote direct memory Access) chip; and judging whether a doorbell knocking condition is met or not according to the relative position relation among the first pointer, the second pointer and the third pointer through the software, and informing the hardware of the doorbell knocking when the relative position relation meets the doorbell knocking condition.
According to the first aspect of the application, a doorbell method for remote direct memory access is provided for a doorbell mechanism in an RDMA application scene, so that the problem of invalid bits encountered in a hardware reading process can be effectively solved, the influence caused by factors such as possible time delay and the like is considered, the misjudgment rate can be reduced, the interaction coordination of software and hardware can be improved, and the interaction efficiency can be further improved by using a direct work queue element.
In a possible implementation manner of the first aspect of the present application, the doorbell method further includes: by the software, when the relative positional relationship satisfies the doorbell tapping condition, the software writes the work queue element to a Base Address Register (BAR) space of the RDMA chip, and the hardware reads the work queue element from the BAR space.
In one possible implementation of the first aspect of the present application, the software writes two direct work queue elements that are both 64 bits to merge into the 128-bit work queue element.
In one possible implementation manner of the first aspect of the present application, the work queue is located in an internal register of the RDMA chip, and the hardware reads a work queue element in the work queue through the internal register of the RDMA chip.
In one possible implementation manner of the first aspect of the present application, the doorbell condition is that the second pointer is located between the first pointer and the third pointer.
In one possible implementation of the first aspect of the present application, the specific time is 2 microseconds to 4 microseconds.
In a possible implementation manner of the first aspect of the present application, when the hardware reads an invalid bit for the first time, the hardware continues to process the flow and delays for a certain time in a polling manner, and then determines again whether the hardware reads an invalid bit at a position where the hardware reads the invalid bit for the first time.
In a possible implementation manner of the first aspect of the present application, when the hardware reads the invalid bit for the second time, the hardware sets a discard flag in the work queue at a position where the hardware reads the invalid bit for the first time.
In one possible implementation manner of the first aspect of the present application, when the hardware reads an invalid bit for the second time, the hardware further sets a queue empty flag for indicating that an invalid bit exists in the work queue.
In a possible implementation manner of the first aspect of the present application, when the hardware reads an invalid bit for the second time, the hardware updates the second pointer according to a position where the hardware reads the invalid bit for the first time.
In one possible implementation of the first aspect of the present application, the invalid bit is an invalid work queue element in the work queue.
In one possible implementation manner of the first aspect of the present application, the invalid work queue element is that the software is not written to or that the software is not updated.
In one possible implementation of the first aspect of the present application, the software is a driver of the RDMA chip.
In a possible implementation manner of the first aspect of the present application, the work queue is located in a memory of a host, and the hardware obtains a number through a doorbell register of the RDMA chip and reads a work queue element in the work queue through the number.
In a second aspect, embodiments of the present application further provide a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method according to any one of the implementation manners of the foregoing aspects when executing the computer program.
In a third aspect, embodiments of the present application further provide a computer-readable storage medium storing computer instructions that, when executed on a computer device, cause the computer device to perform the method according to any one of the implementation manners of any one of the above aspects.
In a fourth aspect, the present application further provides a computer program product, which includes instructions stored on a computer-readable storage medium, and when the instructions are run on a computer device, the computer device is caused to execute the method according to any one of the implementation manners of any one of the above aspects.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a doorbell mechanism according to an embodiment of the present application;
fig. 2 is a flowchart of a doorbell method for remote direct memory access according to an embodiment of the present application;
fig. 3 is a schematic diagram of a transmission queue according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the application provides a doorbell method, equipment and medium for remote direct memory access, which are used for solving the problems in the prior art. The method and the device provided by the embodiment of the application are based on the same inventive concept, and because the principles of solving the problems of the method and the device are similar, the embodiments, the implementation modes, the examples or the implementation modes of the method and the device can be mutually referred, and repeated parts are not described again.
It should be understood that in the description of the present application, "at least one" means one or more than one, and "a plurality" means two or more than two. Additionally, the terms "first," "second," and the like, unless otherwise indicated, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or order.
Fig. 1 is a schematic view of an application scenario of a doorbell mechanism according to an embodiment of the present application. The doorbell mechanism is used for software and hardware interaction in a Remote Direct Memory Access (RDMA) application scenario. Specifically, software and hardware interaction under the RDMA mechanism is performed through a Work Queue (WQ), wherein a Work Queue Element (WQE) is added to the WQ by software, and WQE is acquired from the WQ by hardware, so that the software issues a Work request or task to the hardware through a logging operation of the software and an acquiring operation of the hardware. The blind reading mechanism refers to that after the hardware receives a Doorbell (Doorbell, DB), the hardware is processed downwards according to the valid bit of the WQE in the queue until an invalid WQE is encountered, and the hardware pointer is updated at the moment and the software uses the pointer to judge that the DB does not need to be knocked continuously. RDMA communication involves sending and receiving, and a sender and a receiver involved in the communication process respectively form a combination by a Send work Queue (SQ) and a Receive work Queue (RQ). The SQ and RQ are both WQs for the software at the sender and receiver to issue requests or tasks to the hardware, respectively. And the software of the receiving end also writes the WQE corresponding to the receiving task into the RQ of the receiving end. Therefore, after the hardware of the sending end acquires the WQE from the SQ and sends out the data, the hardware of the receiving end determines the memory position to be written after receiving the data through the WQE of the RQ. For example, the contents of the WQE corresponding to the send task in the SQ are data of a specific length at a specific address in the memory to be sent to a specific node, and the contents of the WQE corresponding to the receive task in the RQ are memory areas stored at a specific address after receiving the data. Therefore, the sending and receiving flows are realized by the respective corresponding WQEs in the SQ and RQ. The combination of SQ, i.e., the send work Queue, and RQ, i.e., the receive work Queue, corresponds to a Queue Pair (QP). The QP may be understood as a data structure containing data acquisition information and data transmission information for storing in order work requests issued by software to hardware, i.e., WQEs, and for interfacing between hardware and software. The QP may be a memory space containing multiple WQEs, and the hardware reads the contents of the WQEs by accessing the space where the QP is located. A Queue Pair Context (QPC) is used to store QP related attributes. Hardware can know through QPC how to utilize the contents of WQEs stored by QPs, such as the service type of QP, etc. The Queue Pair Number (QPN) is a Number of each QP, and generally 24 bits are used to represent the QPN, that is, each node can use a QPN of 24 bits to represent the QP resources it can use.
With continued reference to FIG. 1, driver 100 is coupled to doorbell register 102 and is configured to write a corresponding value, such as a write queue pair number, i.e., QPN, to doorbell register 102. After the hardware 104 detects the doorbell, i.e. DB, it obtains the QPN from the doorbell register 102, and then uses the QPN to schedule the queue pair, i.e. QP, and the hardware 104 can also read the queue pair context, i.e. QPC in advance. Driver 100 represents software that issues work tasks to hardware 104, and may drive driver 100 to write a QPN to doorbell register 102 via an Application Program Interface (API). The hardware response rate of a chip for RDMA may be increased by the doorbell mechanism shown in fig. 1. In other words, by driver 100 storing a QPN or other corresponding value in the doorbell register of the RDMA chip, the hardware can do subsequent processing after receiving the doorbell signal, i.e., DB signal, schedule QP with QPN and can read QPC in advance, thus completing the software issuing work request to the hardware. As mentioned above, QPN is the number of QP, and the specific contents of QP corresponding to QPN, i.e. WQE, are written in the memory of the host. Therefore, after driver 100 writes a QPN to doorbell register 102, a blind read mechanism and a doorbell mechanism are combined to notify hardware to fetch WQEs from host memory, i.e., hardware is notified to schedule QPNs to fetch WQEs using QPNs. As described above, the blind read mechanism refers to processing down after receiving a doorbell, i.e., DB, according to the valid bit of the WQE in the queue until an invalid WQE is encountered. Thus, hardware 104 shown in FIG. 1 fetches WQEs in a blind read mechanism, meaning that after hardware 104 receives DB once, i.e., detects DB, it fetches a QPN from doorbell register 102, then uses the QPN to schedule QPs, reading the valid bits of WQEs in WQs one by one in a particular order until an invalid WQE is encountered. If the sender's send work queue, i.e., SQ, is taken as an example, the hardware 104, after detecting DB, reads WQEs one by one in a particular order, e.g., in the order in which WQEs are stored in SQ, e.g., in a top-down direction, until an invalid WQE is encountered. And after the invalid bit stops, updating the pointer of the hardware and judging whether the DB needs to be continuously knocked by the software according to the pointer. If the condition for tapping the DB is met, the DB is tapped once to inform the hardware, and after tapping the DB, the hardware 104 continues to perform blind reading processing at the stopped position, i.e., where the invalid bit is encountered. If the condition for knocking the DB is not met, the DB is not knocked, which also means that the hardware is performing normal blind reading. The reason for invalid WQEs or invalid bits may be that the software does not issue a new WQE, or may be that the software writes the contents of WQEs before updating the valid bits of WQEs when writing WQEs, which may result in the software and hardware not having written contents of WQEs yet due to incompatibility and updating the valid bits of WQEs, which have been processed by the hardware, so that the WQEs that should be identified as valid bits may be finally determined by the hardware as encountering invalid bits. Compared with the way of writing WQEs in a host memory and writing QPNs in the doorbell register 102, WQEs can be directly written into internal registers of an RDMA chip, namely, direct work queue elements, so that hardware 104 can directly read WQEs from internal registers of the chip without scheduling QPNs, and interaction efficiency is further improved. However, the requirement of the direct WQE method for misjudgment and coordination between software and hardware is more strict, and a micro-delay caused by cross-device operation, for example, a delay between Peripheral Component Interconnect (PCI) devices, may cause that software fails to obtain a timely or updated pointer value, which may cause that the software misjudges that the hardware performs normal blind reading processing so as not to knock DB notification hardware, and thus may affect a transmission flow. The following describes how to provide a doorbell method for remote direct memory access for a doorbell mechanism in an RDMA application scenario in detail with reference to other embodiments of the present application, which not only can effectively deal with the problem of invalid bits encountered in a hardware reading process, but also can reduce the misjudgment rate and improve the interaction coordination of software and hardware by considering the influence caused by factors such as possible time delay, and is further beneficial to further improve the interaction efficiency by using a direct work queue element.
Fig. 2 is a flowchart of a doorbell method for remote direct memory access according to an embodiment of the present application. The doorbell method shown in fig. 2 is suitable for the application scenario of the doorbell mechanism shown in fig. 1, including software and hardware interaction in an RDMA application scenario, for example, a data center, live video, a game, and other scenarios with high requirements on software and hardware interaction efficiency and data transmission speed. According to the doorbell method for remote direct memory access, the problem of invalid bits encountered in a hardware reading process is effectively solved, possible incompatibility in software and hardware interaction aspects such as time delay during software writing and time delay of hardware feedback to software is considered, and influences of time delay of cross-device operation such as PCI time delay are considered, so that misjudgment rate is reduced and software and hardware interaction efficiency is improved. As shown in fig. 2, the doorbell method comprises the following steps.
Step S202: providing a work queue for interaction between software and hardware of a remote direct memory access, RDMA, chip, the interaction between the software and the hardware being achieved by the software writing work queue elements to the work queue and by the hardware reading work queue elements in the work queue.
Step S204: providing a first pointer, a second pointer and a third pointer which are associated with the work queue, wherein the first pointer indicates the position of the software knocking doorbell in the work queue last time, the second pointer indicates invalid bits in the work queue fed back by the hardware, and the third pointer indicates the head position of a work queue element queue issued by the software.
Step S206: reading the work queue elements in the work queue one by one according to a specific sequence through the hardware, wherein when the hardware reads an invalid bit for the first time, the hardware delays for a specific time and then judges whether the hardware reads the invalid bit again at the position where the hardware reads the invalid bit for the first time, when the hardware reads the invalid bit for the second time, the hardware informs the software and feeds back the position where the hardware reads the invalid bit for the first time to be the invalid bit in the work queue, and the specific time is at least longer than the PCI (peripheral component interconnect) delay of the shortest peripheral component of the RDMA (remote direct memory Access) chip.
Step S208: and judging whether a doorbell knocking condition is met or not according to the relative position relation among the first pointer, the second pointer and the third pointer through the software, and informing the hardware of the doorbell knocking when the relative position relation meets the doorbell knocking condition.
With reference to the above steps, the interaction between the software and the hardware in the RDMA chip is performed through a work queue, i.e., WQ, wherein a work queue element, i.e., WQE, is added to the WQ by the software, and the WQE is acquired from the WQ by the hardware, so that the software issues a work request or task to the hardware through a logging operation of the software and an acquiring operation of the hardware. Thus, in step S202, a work queue is provided for interaction between software and hardware of a remote direct memory access RDMA chip, the interaction between the software and the hardware being achieved by the software writing work queue elements to the work queue and by the hardware reading work queue elements in the work queue. Further, in step S204, a first pointer, a second pointer and a third pointer associated with the work queue are provided. The first pointer, the second pointer and the third pointer are used for indicating the state of the hardware in the process of reading the work queue to the software side. Specifically, the first pointer indicates a position of the last doorbell knocked by the software in the work queue, the second pointer indicates an invalid bit in the work queue fed back by the hardware, and the third pointer indicates a head position of a work queue element queue issued by the software. In this way, the status of the hardware during the process of reading the work queue can be reflected by the work queue and the associated first pointer, second pointer and third pointer, and the software can thereby determine whether measures need to be taken. How the work queue and corresponding pointer are used for interaction between hardware and software is described below in conjunction with fig. 3.
Referring to fig. 3, fig. 3 is a schematic diagram of a transmit queue according to an embodiment of the present disclosure. As shown in fig. 3, the Send Queue 300 includes multiple Send Queue Elements (SQEs), each of the SQEs may correspond to a work Queue Element of a Send task, that is, the content of a WQE, and the Send Queue 300 is also a WQ. The send queue 300 includes a plurality of SQEs for use by software to issue work requests to hardware at the sender, where each SQE may indicate information corresponding to a send task, such as sending a particular length of data at a particular address in memory to a specified node. The transmit queue 300 may also be considered a QP, i.e., a queue pair, which is a data structure that includes data acquisition information and data transmission information. Any number of invalid bits or invalid send queue elements or invalid work queue elements may be present in send queue 300. The reason for the invalid bit may be that the software does not issue a new WQE, i.e. there are only so many valid SQEs in send queue 300. The reason for the presence of the invalid bit may also be: when the WQE is written by software, the content of the WQE is generally written firstly, and then the valid bit of the WQE is updated, so that the condition that the WQE content written by the software is not yet updated due to incoordination between the software and the hardware, and the WQE is processed by the hardware, and therefore the WQE which is supposed to be recognized as the valid bit can be finally judged to encounter an invalid bit by the hardware. The hardware reads the contents of the valid contents, i.e., the stored send queue elements, through send queue 300. The hardware reads the elements in send queue 300 one by one in a particular order, illustrated in the top-down direction in fig. 3. Generally, the hardware reading order is in the writing precedence order. The hardware reads the WQEs in the Send queue 300, in which the software determines whether the hardware needs to be notified using a doorbell mechanism by using three pointers. These three pointers are used for different purposes, respectively. The first pointer 302 is used for indicating the position of the last doorbell knock of the software, namely, the position of the DB, and corresponds to the position indicated by the last DB knock; the second pointer 304 is used to indicate the location of the invalid bit or otherwise when a logical process encounters an invalid WQE, which corresponds to the location of the most recent invalid bit that the hardware has fed back during the read WQE; the third pointer 306 is used to indicate the WQE queue head position issued by the software, which corresponds to the initial position of the most recently imported data. Through the first pointer 302, the second pointer 304, and the third pointer 306, the software side can learn about the status and problems of the hardware in reading WQEs in the Send queue 300. By comparing the relative positions of the three pointers, i.e., the first pointer 302, the second pointer 304, and the third pointer 306, it can be determined whether the hardware should continue to follow the blind reading process, i.e., read until the invalid bit is encountered, or whether the DB should be tapped once to notify the hardware. Since the frequency of the tap DB notification hardware is controlled as much as possible because the tap DB notification hardware affects the data transmission efficiency. Generally, when the second pointer, i.e. the pointer corresponding to the position of the invalid bit, is located between the first pointer and the third pointer, this means that if the hardware continues to follow the blind reading process and inevitably encounters the invalid bit, the DB notification hardware should be tapped once to stop the flow. Fig. 3 illustrates an example of a first pointer 302, a second pointer 304 and a third pointer 306 from top to bottom along the send queue 300 (the direction is assumed to be the direction of WQE in the hardware read send 300), that is, the second pointer 304 on the send queue 300 is located between the first pointer 302 and the third pointer 306, which means that it is suitable to tap the DB notification hardware once.
Also shown in FIG. 3 is a Send queue 310, where Send queue 310 also includes a number of Send queue elements, i.e., valid WQEs, and any number of invalid bits may be present in Send queue 310. The valid WQEs and invalid bits included in each of send queue 300 and send queue 310 are in consistent correspondence, for exemplary purposes only. The send queue 300 and send queue 310 are merely exemplary WQs and virtually any possible combination of valid and invalid bits may exist. The hardware reads the WQEs in the send queue 310, in which the software determines whether the hardware needs to be notified using the doorbell mechanism by using three pointers, i.e. the software side learns about the status and problems in the process of the hardware reading the send queue 310 through the first pointer 312, the second pointer 314, and the third pointer 316. Similarly, the first pointer 312 is used to indicate the position of the last bell knocked by the software, i.e. the position of the knock DB, which corresponds to the position indicated by the last knock DB; a second pointer 314 is used to indicate the location of when a logical process encounters a dirty bit or dirty WQE, which corresponds to the location of the nearest dirty bit that the hardware feeds back during a read WQE; the third pointer 316 is used to indicate the WQE queue head position issued by the software, which corresponds to the initial position of the most recently imported data. However, unlike the second pointer 304 on the send queue 300, which is located between the first pointer 302 and the third pointer 306, the second pointer 314, the first pointer 312, and the third pointer 316 are arranged in order from top to bottom along the send queue 310 (assuming that the direction is the direction of WQE in the hardware read send 310). Therefore, the relative positions of the first pointer 312, the second pointer 314, and the third pointer 316 in the send queue 310 indicate that the second pointer 314 is located above the first pointer 312, and the first pointer 312 is located above the third pointer 316, so that the software can determine, according to the relative positions of the first pointer 312, the second pointer 314, and the third pointer 316 in the send queue 310, that the hardware has no problem when reading the send queue 310, and can continue to read blindly. However, an actual pointer 320 is also shown in fig. 3, where the actual pointer 320 represents the location of the second pointer that is actually updated by the hardware when reading the transmit queue 310, that is, the location actually used to indicate that the logical process encountered the invalid bit, but because of latency problems such as operation latency across devices, the actual pointer 320 is not timely fed back to the software, which causes the second pointer 314 shown in fig. 3 to be read by the software instead of the actual pointer 320. It can be seen that the actual pointer 320 is located between the first pointer 312 and the third pointer 316, which means that the send queue 310 is similar to the send queue 300, which means that the software should tap the DB notification hardware once to stop the flow. However, since the pointer positions are not updated in time, that is, the software cannot learn the positions of the actual pointers 320 in time, the software makes a judgment according to the relative positions among the first pointer 312, the second pointer 314 and the third pointer 316 in the transmission queue 310, rather than making a judgment according to the positions of the invalid bits actually encountered in the transmission queue 310, that is, the relative positions among the actual pointers 320, the first pointer 312 and the third pointer 316, so that the software misjudges that the hardware is performing normal blind reading processing and does not knock the DB notification hardware, which may affect the transmission flow.
Referring to fig. 2 and 3, three pointers, a first pointer, a second pointer and a third pointer associated with the work queue, are provided in step S204 in fig. 2. Referring to the first pointer 302, the second pointer 304, and the third pointer 306 associated with the send queue 300, and the first pointer 312, the second pointer 314, and the third pointer 316 associated with the send queue 310 in fig. 3, it can be seen that, through the relative position relationship among the first pointer, the second pointer, and the third pointer, it can be determined whether the hardware continues to blind read and may encounter an invalid bit, which may help the software determine whether a doorbell needs to be knocked to notify the hardware. In addition, fig. 3 also illustrates the effect of possible delays. For example, the actual pointer 320 in fig. 3 represents the location of the second pointer that is actually updated by the hardware when reading the transmit queue 310, i.e., the location actually used to indicate that the logical process encountered the invalid bit, but because of latency issues such as operating latency across devices, the actual pointer 320 is not timely fed back to the software, which causes the software to read the second pointer 314 shown in fig. 3 instead of the actual pointer 320. Additionally, it may also exist that software has written the contents of a valid work queue element, but has not yet had time to update the valid bit, i.e., the software fills in the contents of a WQE before filling in the valid bit of a WQE, and if the hardware reads the location just in the meantime, it may cause the hardware to think the location is invalid and make a false determination. Therefore, the latency of software writing, such as the software having written the contents of a valid work queue element but not having to update the valid bit, the latency of hardware feeding back to software, such as the location of the invalid bit actually encountered by the hardware not being updated to the corresponding pointer and informing the software in time, and the latency of cross-device operations, such as PCI latency, need to be considered when designing the bell method for RDMA. For this purpose, in step S206, the hardware reads the work queue elements in the work queue one by one according to a specific sequence, wherein when the hardware reads an invalid bit for the first time, the hardware delays for a specific time and then determines whether the hardware reads the invalid bit again at the position where the hardware reads the invalid bit for the first time, when the hardware reads the invalid bit for the second time, the hardware notifies the software and feeds back the position where the hardware reads the invalid bit for the first time as the invalid bit in the work queue, and the specific time is at least longer than the shortest peripheral component interconnect PCI delay of the RDMA chip. Here, the hardware reads the work queue one by one in a particular order, which may be in the order of the writes, such as the exemplary top-to-bottom direction shown in FIG. 3. After the hardware reads an invalid bit for the first time, that is, after an invalid work queue element, the process flow may be re-checked (recheck) in a polling manner, for example, that is, when the hardware reads an invalid bit for the first time, the hardware delays for a certain time and then determines whether the hardware reads an invalid bit at a position where the hardware reads the invalid bit for the first time again. Delaying a particular time means that the hardware waits for a particular time after waiting for the first time to check at the location where the invalid bit was read. As mentioned above, the delay in writing software, the delay in hardware feedback to software, and the delay in cross-device operations such as PCI delay, etc. are reduced by waiting for a specific time, in particular, by limiting the specific time to be at least longer than the shortest peripheral component interconnect PCI delay of the RDMA chip, which means that the time for updating the valid bit of software is given, and the time for the pointer of hardware feedback to be known by software is also given, thereby reducing the misjudgment rate and improving the interaction efficiency. For example, taking the transmission queue 310 shown in fig. 3 as an example, the position of the invalid bit actually encountered by the hardware, i.e. the actual pointer 320, is not updated to the corresponding pointer in time, the software side learns the information of the second pointer 314 when the hardware reads the invalid bit for the first time, but after waiting for a certain time, the software side learns the information of the actual pointer 320, so that the software side can make a correct judgment whether to knock the doorbell according to the relative positions of the position of the invalid bit actually encountered in the transmission queue 310, i.e. the actual pointer 320 and the first pointer 312 and the third pointer 316. And when the hardware reads the invalid bit for the second time, the hardware informs the software and feeds back the position of the invalid bit read by the hardware for the first time as the invalid bit in the work queue, so that the second time of reading is still invalid, the software can be informed and the position of the invalid bit read by the hardware for the first time as the invalid bit in the work queue can be fed back. In contrast, if the first read is a no-bit and the second read is a valid bit, the read may continue, e.g., re-enter the transmit flow for processing. Next, in step S208, determining, by the software, whether a doorbell condition is satisfied according to a relative positional relationship among the first pointer, the second pointer, and the third pointer, and notifying, by the software, the hardware of the doorbell when the relative positional relationship satisfies the doorbell condition. As mentioned above, by checking again after waiting for a certain time after the hardware reads the invalid bit position for the first time in step S206, the certain time can be utilized to overcome the possible influence of various delays. In step S208, it is determined correctly whether to knock the doorbell by software based on the relative positional relationship between the pointers. Therefore, the doorbell method for remote direct memory access is provided for a doorbell mechanism in an RDMA application scene, the problem of invalid bit in a hardware reading process can be effectively solved, the influence caused by factors such as possible time delay and the like is considered, the misjudgment rate can be reduced, the software and hardware interaction coordination can be improved, and the interaction efficiency can be further improved by using a direct work queue element mode.
In one possible embodiment, the doorbell method further comprises: with the software, when the relative positional relationship satisfies the doorbell tap condition, the software writes the work queue element to a Base Address Registers (BAR) space of the RDMA chip, and the hardware reads the work queue element from the BAR space. In some embodiments, the software writes two direct work queue elements that are 64 bits each to merge into the 128-bit work queue element. Thus, by directly writing in the BAR space, the corresponding speed of hardware can be realized faster. By merging direct work queue elements, it is possible to have drivers submit WQEs directly to hardware registers.
In one possible implementation, the work queue is located in an internal register of the RDMA chip, and the hardware reads the work queue elements in the work queue through the internal register of the RDMA chip. Thus, by directly storing the work queue and WQE on the internal register of the RDMA chip, the hardware response speed is improved. Moreover, as described above, the direct WQE method has stricter requirements for misjudgment, software and hardware interaction coordination, and microscopic time delay caused by cross-device operation, such as time delay between peripheral component interconnect devices, may cause software to fail to obtain a timely or updated pointer value, which may cause the software to misjudge that the hardware is performing normal blind reading processing, so that the DB notification hardware is not knocked, which may affect the transmission flow. Therefore, the doorbell method for remote direct memory access provided by the above embodiment can effectively overcome the problem of time delay, thereby realizing a direct WQE manner.
In a possible embodiment, the doorbell condition is that the second pointer is located between the first pointer and the third pointer. When the second pointer, i.e. the pointer corresponding to the position of the invalid bit, is located between the first pointer and the third pointer, this means that if the hardware continues to follow the blind read process and inevitably encounters an invalid bit, in which case the DB notification hardware should be tapped once to stop the flow.
In one possible embodiment, the specific time is 2 microseconds to 4 microseconds. Here, the specific time is designed to cover possible delays, such as a delay in software writing, a delay in hardware feedback to software, and a delay in cross device operation such as a PCI delay, etc. In some embodiments, the specific time is an RDMA chip based application scenario setting, such as a data center, game, live, etc. application scenario. In addition, in order to notify the hardware most quickly, the shortest delay scenario needs to be considered, for example, the specific time is set to be at least longer than the shortest PCI delay.
In one possible implementation, when the hardware reads an invalid bit for the first time, the hardware continues to process the flow and determines again whether the hardware reads an invalid bit at the position where the hardware reads the invalid bit for the first time after delaying for a certain time in a polling manner. In some implementations, when the hardware reads an invalid bit a second time, the hardware sets an abort flag in the work queue where the hardware first read the invalid bit. In some implementations, when the hardware reads a dirty bit for a second time, the hardware also sets a queue empty flag to indicate that a dirty bit is present in the work queue. In this manner, processing can be continued by continuing the processing flow, rather than pausing, so that the hardware can read the invalid bit for the first time, and the process can be retested and recognized by polling. Additionally, status updates may be implemented by setting a discard flag to the location where the hardware first reads an invalid bit. In addition, the queue empty mark is set to indicate that the invalid bit exists in the work queue, so that the overall state of the work queue can be better embodied, and macroscopic management is facilitated.
In one possible implementation, when the hardware reads an invalid bit for a second time, the hardware updates the second pointer according to the location where the hardware read the invalid bit for the first time. Therefore, the second pointer is updated after the invalid bit is read for the second time, and the specific time is limited to be at least longer than the shortest Peripheral Component Interconnect (PCI) delay of the RDMA chip by waiting for the specific time, so that the time for updating the valid bit by software is given, and the time for the pointer fed back by hardware to be known by the software is also given, so that the misjudgment rate is reduced, and the interaction efficiency is improved.
In one possible implementation, the invalid bit is an invalid work queue element in the work queue. In one possible implementation, the invalid work queue element is either the software is not written to or the software is not updated. In one possible implementation, the software is a driver of the RDMA chip. In one possible implementation, the work queue is located in a memory of a host, and the hardware obtains a number via a doorbell register of the RDMA chip and reads a work queue element in the work queue via the number.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computing device provided in an embodiment of the present application, where the computing device 400 includes: one or more processors 410, a communication interface 420, and a memory 430. The processor 410, communication interface 420, and memory 430 are interconnected by a bus 440. Optionally, the computing device 400 may further include an input/output interface 450, and the input/output interface 450 is connected with an input/output device for receiving parameters set by a user, and the like. The computing device 400 can be used to implement some or all of the functionality of the device embodiments or system embodiments described above in this application; the processor 410 can also be used to implement some or all of the operational steps of the method embodiments described above in the embodiments of the present application. For example, specific implementations of the computing device 400 to perform various operations may refer to specific details in the above-described embodiments, such as the processor 410 being configured to perform some or all of the steps or some or all of the operations in the above-described method embodiments. For another example, in this embodiment of the application, the computing device 400 may be used to implement part or all of the functions of one or more components in the above-described apparatus embodiments, and the communication interface 420 may be specifically used to implement the communication functions and the like necessary for the functions of these apparatuses and components, and the processor 410 may be specifically used to implement the processing functions and the like necessary for the functions of these apparatuses and components.
It should be understood that the computing device 400 of fig. 4 may include one or more processors 410, and the processors 410 may cooperatively provide processing capabilities in a parallelized, serialized, deserialized, or any connection, or the processors 410 may form a processor sequence or an array of processors, or the processors 410 may be separated into a main processor and an auxiliary processor, or the processors 410 may have different architectures such as employing heterogeneous computing architectures. Further, the computing device 400 shown in FIG. 4, the associated structural and functional descriptions are exemplary and non-limiting. In some example embodiments, computing device 400 may include more or fewer components than shown in FIG. 4, or combine certain components, or split certain components, or have a different arrangement of components.
The processor 410 may be implemented in various specific forms, for example, the processor 410 may include one or more combinations of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a neural-Network Processing Unit (NPU), a Tensor Processing Unit (TPU), or a Data Processing Unit (DPU), and the embodiments of the present application are not limited in particular. Processor 410 may also be a single core processor or a multi-core processor. The processor 410 may be a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof. The processor 410 may also be implemented as a single logic device with built-in processing logic, such as an FPGA or a Digital Signal Processor (DSP). The communication interface 420 may be a wired interface, such as an ethernet interface, a Local Interconnect Network (LIN), or the like, or a wireless interface, such as a cellular network interface or a wireless lan interface, for communicating with other modules or devices.
The memory 430 may be a non-volatile memory, such as a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable PROM (EPROM), an Electrically Erasable Programmable ROM (EEPROM), or a flash memory. The memory 430 may also be volatile memory, which may be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). The memory 430 may also be used to store program codes and data for the processor 410 to call the program codes stored in the memory 430 to perform some or all of the operation steps of the above-described method embodiments, or to perform the corresponding functions in the above-described apparatus embodiments. Moreover, computing device 400 may contain more or fewer components than shown in FIG. 4, or have a different arrangement of components.
The bus 440 may be a peripheral component interconnect express (PCIe) bus, an Extended Industry Standard Architecture (EISA) bus, a unified bus (UBs or UBs), a computer express link (CXL), a cache coherent interconnect protocol (CCIX) bus, or the like. The bus 440 may be divided into an address bus, a data bus, a control bus, and the like. The bus 440 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. However, for clarity, only one thick line is shown in FIG. 4, but this does not represent only one bus or one type of bus.
Embodiments of the present application further provide a system, where the system includes a plurality of computing devices, and the structure of each computing device may refer to the structure of the computing device described above. The functions or operations that can be implemented by the system may refer to specific implementation steps in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described in detail herein. Embodiments of the present application further provide a computer-readable storage medium, in which computer instructions are stored, and when the computer instructions are executed on a computer device (such as one or more processors), the method steps in the above method embodiments may be implemented. The specific implementation of the processor of the computer-readable storage medium in executing the above method steps may refer to the specific operations described in the above method embodiments and/or the specific functions described in the above apparatus embodiments, which are not described herein again. Embodiments of the present application further provide a computer program product, which includes instructions stored on a computer-readable storage medium, and when the instructions are run on a computer device, the computer device is caused to execute the method steps in the above method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. The present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Embodiments of the present application may be implemented, in whole or in part, by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium. The computer program product includes one or more computer instructions. When loaded or executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium. The semiconductor medium may be a solid state disk, or may be a random access memory, flash memory, read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, registers, or any other form of suitable storage medium.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. Each flow and/or block in the flow charts and/or block diagrams, and combinations of flows and/or blocks in the flow charts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments. It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. The steps in the method of the embodiment of the application can be sequentially adjusted, combined or deleted according to actual needs; the modules in the system of the embodiment of the application can be divided, combined or deleted according to actual needs. If these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, then the present application is intended to include these modifications and variations as well.

Claims (16)

1. A doorbell method for remote direct memory access, the doorbell method comprising:
providing a work queue for interaction between software and hardware of a remote direct memory access, RDMA, chip, the interaction between the software and the hardware being achieved by the software writing work queue elements to the work queue and by the hardware reading work queue elements in the work queue;
providing a first pointer, a second pointer and a third pointer associated with the work queue, wherein the first pointer indicates a position of the software doorbell knocked last in the work queue, the second pointer indicates an invalid bit in the work queue fed back by the hardware, and the third pointer indicates a head position of a work queue element queue issued by the software;
reading the work queue elements in the work queue one by one according to a specific sequence through the hardware, wherein when the hardware reads an invalid bit for the first time, the hardware delays for a specific time and then judges whether the hardware reads the invalid bit again at the position where the hardware reads the invalid bit for the first time, when the hardware reads the invalid bit for the second time, the hardware informs the software and feeds back the position where the hardware reads the invalid bit for the first time to be the invalid bit in the work queue, and the specific time is at least longer than the PCI (peripheral component interconnect) delay of the shortest peripheral component of the RDMA (remote direct memory Access) chip;
and judging whether a doorbell knocking condition is met or not according to the relative position relation among the first pointer, the second pointer and the third pointer through the software, and informing the hardware of the doorbell knocking by the software when the relative position relation meets the doorbell knocking condition.
2. The method for striking a doorbell in accordance with claim 1 further comprising: by the software, when the relative positional relationship satisfies the doorbell tapping condition, the software writes the work queue element to a Base Address Register (BAR) space of the RDMA chip, and the hardware reads the work queue element from the BAR space.
3. The method of claim 2, wherein said software writes two direct work queue elements that are each 64 bits to merge into said work queue elements that are 128 bits.
4. The doorbell method of claim 1 wherein the work queue is located in an internal register of the RDMA chip, the hardware reading the work queue elements in the work queue through the internal register of the RDMA chip.
5. The doorbell method of claim 1 wherein the doorbell condition is the second pointer being located between the first pointer and the third pointer.
6. The method of claim 1, wherein the specific time is 2 microseconds to 4 microseconds.
7. A method as recited in claim 1, wherein when the hardware first reads an invalid bit, the hardware continues the process and delays for a certain time in a polling manner to determine again whether the hardware has read an invalid bit at the position where the hardware first read the invalid bit.
8. The method of claim 7, wherein the hardware sets a discard flag in the work queue at a location where the hardware first read an invalid bit when the hardware reads an invalid bit a second time.
9. The method of claim 8, wherein the hardware further sets a queue empty flag to indicate the presence of a dirty bit in the work queue when the hardware reads a dirty bit for a second time.
10. A method for striking a doorbell in accordance with claim 1 wherein when the hardware reads an invalid bit for a second time, the hardware updates the second pointer in accordance with the location where the hardware first read the invalid bit.
11. The method of claim 1, wherein the invalid bit is an invalid work queue element in the work queue.
12. The method of claim 11, wherein the invalid work queue element is not written to by the software or not updated by the software.
13. The doorbell method of claim 1 wherein the software is a driver of the RDMA chip.
14. The method of claim 1, wherein the work queue is located in a memory of a host, and wherein the hardware obtains a number via a doorbell register of the RDMA chip and reads a work queue element in the work queue via the number.
15. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 14 when executing the computer program.
16. A computer-readable storage medium storing computer instructions which, when executed on a computer device, cause the computer device to perform the method of any one of claims 1 to 14.
CN202310244405.9A 2023-03-15 2023-03-15 Doorbell knocking method, equipment and medium for remote direct memory access Active CN115934625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310244405.9A CN115934625B (en) 2023-03-15 2023-03-15 Doorbell knocking method, equipment and medium for remote direct memory access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310244405.9A CN115934625B (en) 2023-03-15 2023-03-15 Doorbell knocking method, equipment and medium for remote direct memory access

Publications (2)

Publication Number Publication Date
CN115934625A true CN115934625A (en) 2023-04-07
CN115934625B CN115934625B (en) 2023-05-16

Family

ID=85825553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310244405.9A Active CN115934625B (en) 2023-03-15 2023-03-15 Doorbell knocking method, equipment and medium for remote direct memory access

Country Status (1)

Country Link
CN (1) CN115934625B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573602A (en) * 2024-01-16 2024-02-20 珠海星云智联科技有限公司 Method and computer device for remote direct memory access message transmission
CN117573603A (en) * 2024-01-17 2024-02-20 珠海星云智联科技有限公司 Data processing method and computer equipment for remote direct memory access

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060262799A1 (en) * 2005-05-19 2006-11-23 International Business Machines Corporation Transmit flow for network acceleration architecture
US20140089444A1 (en) * 2012-09-27 2014-03-27 Vadim Makhervaks Methods, apparatus and systems for facilitating rdma operations with reduced doorbell rings
CN110888827A (en) * 2018-09-10 2020-03-17 华为技术有限公司 Data transmission method, device, equipment and storage medium
CN112559436A (en) * 2020-12-16 2021-03-26 中国科学院计算技术研究所 Context access method and system of RDMA communication equipment
CN114584492A (en) * 2022-02-15 2022-06-03 珠海星云智联科技有限公司 Time delay measuring method, system and related equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060262799A1 (en) * 2005-05-19 2006-11-23 International Business Machines Corporation Transmit flow for network acceleration architecture
US20140089444A1 (en) * 2012-09-27 2014-03-27 Vadim Makhervaks Methods, apparatus and systems for facilitating rdma operations with reduced doorbell rings
CN110888827A (en) * 2018-09-10 2020-03-17 华为技术有限公司 Data transmission method, device, equipment and storage medium
CN112559436A (en) * 2020-12-16 2021-03-26 中国科学院计算技术研究所 Context access method and system of RDMA communication equipment
CN114584492A (en) * 2022-02-15 2022-06-03 珠海星云智联科技有限公司 Time delay measuring method, system and related equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573602A (en) * 2024-01-16 2024-02-20 珠海星云智联科技有限公司 Method and computer device for remote direct memory access message transmission
CN117573602B (en) * 2024-01-16 2024-05-14 珠海星云智联科技有限公司 Method and computer device for remote direct memory access message transmission
CN117573603A (en) * 2024-01-17 2024-02-20 珠海星云智联科技有限公司 Data processing method and computer equipment for remote direct memory access
CN117573603B (en) * 2024-01-17 2024-04-19 珠海星云智联科技有限公司 Data processing method and computer equipment for remote direct memory access

Also Published As

Publication number Publication date
CN115934625B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
KR102245247B1 (en) GPU remote communication using triggered actions
CN115934625B (en) Doorbell knocking method, equipment and medium for remote direct memory access
US20090248934A1 (en) Interrupt dispatching method in multi-core environment and multi-core processor
US10802995B2 (en) Unified address space for multiple hardware accelerators using dedicated low latency links
WO2018120780A1 (en) Method and system for pcie interrupt
CN103647807A (en) Information caching method, device and communication apparatus
CN106095604A (en) The communication method between cores of a kind of polycaryon processor and device
CN112214166A (en) Method and apparatus for transmitting data processing requests
CN112559436B (en) Context access method and system of RDMA communication equipment
US20080235713A1 (en) Distributed Processing System and Method
US7093037B2 (en) Generalized queue and specialized register configuration for coordinating communications between tightly coupled processors
CN115248795A (en) Peripheral Component Interconnect Express (PCIE) interface system and method of operating the same
US10318424B2 (en) Information processing device
US20080189719A1 (en) Operation processor apparatus
US20140052879A1 (en) Processor, information processing apparatus, and interrupt control method
KR102206313B1 (en) System interconnect and operating method of system interconnect
US10423546B2 (en) Configurable ordering controller for coupling transactions
CN116932454B (en) Data transmission method, device, electronic equipment and computer readable storage medium
CN116601616A (en) Data processing device, method and related equipment
CN112416826A (en) Special computing chip, DMA data transmission system and method
CN115604198B (en) Network card controller, network card control method, equipment and medium
US11785087B1 (en) Remote direct memory access operations with integrated data arrival indication
US12008243B2 (en) Reducing index update messages for memory-based communication queues
JP6222079B2 (en) Computer system, processing method thereof, and program
CN111124987B (en) PCIE-based data transmission control system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant