CN113742267B - DMA communication system and method for RDMA communication equipment - Google Patents

DMA communication system and method for RDMA communication equipment Download PDF

Info

Publication number
CN113742267B
CN113742267B CN202111041270.3A CN202111041270A CN113742267B CN 113742267 B CN113742267 B CN 113742267B CN 202111041270 A CN202111041270 A CN 202111041270A CN 113742267 B CN113742267 B CN 113742267B
Authority
CN
China
Prior art keywords
request
read
interrupt
data
requests
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111041270.3A
Other languages
Chinese (zh)
Other versions
CN113742267A (en
Inventor
谭光明
康宁
杨帆
王展
马潇潇
马振龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202111041270.3A priority Critical patent/CN113742267B/en
Publication of CN113742267A publication Critical patent/CN113742267A/en
Application granted granted Critical
Publication of CN113742267B publication Critical patent/CN113742267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/24Handling requests for interconnection or transfer for access to input/output bus using interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Abstract

The application provides a DMA system for an RDMA communication device, comprising a plurality of data processing channels (10), an arbitration module (20), an interrupt processing module (30) and a data distribution module (40), wherein each data processing channel (10) is used for receiving a read request, a write request and a read response to local data; an arbitration module (20) for scheduling read requests or write requests in a plurality of data processing channels (10); the interrupt processing module (30) is used for carrying out combination processing on the received interrupt request; the data distribution module (40) is used for distributing the read response data to the corresponding data processing channels (10). Based on the embodiment of the application, the method has the advantages of reducing a large amount of context switching overhead caused by frequent interruption to the computer system, ensuring the sequence relation between the interruption request and the data writing request, realizing the configurable number of data processing channels and flexibly meeting the requirements of RDMA communication equipment on different numbers of data processing channels.

Description

DMA communication system and method for RDMA communication equipment
Technical Field
The present application relates to the field of communications technologies, and in particular, to a DMA communication system and method for an RDMA communication device.
Background
The remote direct memory access (Remote Direct Memory Access, RDMA) technology enables a remote central processing unit (Central Processing Unit, CPU) to directly write a message into a memory without sensing the arrival of the message by directly accessing the remote memory; and after the data transmission is completed, the CPU is informed of the receipt of the message through the interrupt. The method reduces the time delay in the data transmission process and simultaneously reduces the load of processing data communication by the CPU of the server. In general, RDMA enabled servers typically include a CPU, a memory unit, and an RDMA enabled host channel adapter (Host Channel Adapter, HCA). The HCA connects HCAs of other servers through cables, thereby enabling RDMA communication between servers.
Current RDMA communication devices have reached a 100Gb level bandwidth, with frequent interrupt message notifications having CPU time slices completely filled with interrupts. Thus, a method or system is devised to implement interrupt coalescing, thereby reducing the number of interrupts, which is a problem that needs to be addressed in current RDMA communication devices.
In addition, the internal module of the RDMA communication equipment and the local computer system realize quick data transmission through the PCIe controller. In existing RDMA communication devices, the transfer channel is typically implemented by a DMA engine. Because there are multiple different data access requirements between the RDMA communication device and the computer system, for example, the data requests between different channels (the sending channel and the receiving channel in the RDMA communication device) generally do not need to be guaranteed orderly in hardware, but there are often strict requirements on the order between the data requests and the interrupt requests in the same channel. Therefore, implementing a system and method that meets the different data access requirements between an RDMA communication device and a computer system is also a problem that needs to be addressed in RDMA communication devices.
Disclosure of Invention
According to a first aspect of the application, there is provided a DMA system for an RDMA communication device, comprising a plurality of data processing channels (10), an arbitration module (20), an interrupt processing module (30) and a data distribution module (40),
wherein each data processing channel (10) is configured to receive a read request, a write request, and a read response to local data;
an arbitration module (20) for scheduling read requests or write requests in a plurality of data processing channels (10);
the interrupt processing module (30) is used for carrying out combination processing on the received interrupt request;
the data distribution module (40) is used for distributing the read response data to the corresponding data processing channels (10).
In one embodiment of the application, wherein the data processing channel (10) comprises: a write request processing unit (101), a read request processing unit (102), a read response processing unit (103), a tag allocation unit (104), and a read/write request arbitration unit (105), wherein
A write request processing unit (101) for receiving and processing a write request for data, and an interrupt request;
a read request processing unit (102) for receiving and processing a read request;
the read response processing unit (103) is used for receiving and processing the read response issued by the data distribution module (40);
the label distribution unit (104) is used for distributing a label for each read request in the read request processing unit (102) and notifying the distributed label number to the read response processing unit (103);
and the read-write request arbitration unit (105) is used for arbitrating interrupt requests, write requests and read requests sent by the write request unit (101) and the read request processing unit (102).
In one embodiment of the present application, the arbitration module (20) is configured to receive data read requests, data write requests, interrupt requests sent by the plurality of data processing channels (10), and read data packet headers requested in each channel, and perform different operations according to different types of requests: forwarding the interrupt request to an interrupt handling module (30); for a read request or a write request, it is forwarded to the PCIe controller.
In one embodiment of the present application, the interrupt processing module (30) further includes an interrupt timer and an interrupt request counter, wherein the interrupt timer controls the interrupt occurrence time, and when the timer is judged to be overtime, it is determined whether the corresponding interrupt request counter waiting for the interrupt request received in the overtime process of the interrupt timer reaches a threshold value, and if the threshold value is reached, the interrupt request is sent to the PCIe interface.
In one embodiment of the present application, the data distribution module (40) is configured to distribute a label to each read request packet according to different label segments preset by each data processing channel (10), respond to each read request packet, and find a corresponding data channel according to the label field thereof to forward the data packet.
In one embodiment of the application, wherein within each data processing channel (10) for a received write request and its subsequent interrupt requests, the write request is first processed and then the next interrupt request is received.
In one embodiment of the application, a read response processing unit (103) is used to merge multiple response packets, reorder the read response packets out of order, and merge multiple read request responses, and forward the response packets to an RDMA communication device.
In one embodiment of the application, wherein the arbitration module (20) processes in the following order of preference:
1) When the channel contains a read request, preferentially arbitrating the read request; 2) When the plurality of channels contain read requests or the channels do not contain read requests, adopting a polling strategy to arbitrate; 3) When an interrupt request is included in the channel, it is forwarded directly to the interrupt handling module 30; 4) When a plurality of interrupt requests are contained in the channel, the interrupt requests are forwarded to the interrupt handling module 30 using a polling strategy.
According to a second aspect of the present application, there is provided a write request processing method for a DMA system for an RDMA communication device of the present application, comprising:
a data processing channel (10) receives a data write request initiated by an RDMA communication device to a DMA communication system;
dividing the write request into a plurality of write requests meeting the PCIe maximum request limit and sequentially sending the write requests to an arbitration module (20);
an arbitration module (20) writes the write request to the computer system through the PCIe controller.
According to a third aspect of the application, there is provided a read request method for a DMA system of the application for an RDMA communication device, comprising:
a data processing channel (10) receives a data read request initiated by an RDMA communication device to a DMA communication system;
dividing the read request into a plurality of read requests meeting the PCIe maximum request limit, distributing labels for each read request, and sequentially sending the read requests to an arbitration module (20);
the arbitration module (20) will input the read request to the computer system through the PCIe controller.
In one embodiment of the application, the method further comprises the steps of:
receiving a read request response returned by the computer system, wherein for one read request sent to the computer system, the computer system sequentially returns a plurality of read responses of the same tag, and for read responses of different tags, the order returned by the computer system can be different from the order in which the read requests are sent;
the method comprises the steps that tags responding to read requests are distributed to corresponding data processing channels (10) according to the tags of the read requests, the data processing channels (10) combine read response data packets of the same tag, and the different requests are combined into one read response data packet according to the sequence of the read request tag distribution;
the response packet is returned to the RDMA communication device.
Compared with the prior art, the application has the advantages that:
1. by setting an independent interrupt processing module, an interrupt merging mechanism can be realized on hardware, and a great amount of context switching overhead caused by frequent interrupts to a computer system is reduced.
2. The interrupt request and the data write request are transmitted by using the same channel interface, so that the sequence relation of the interrupt request and the data write request is ensured, and the time sequence error caused by the completion of the interrupt before the write request in the RDMA communication equipment is avoided.
3. The method realizes the configurable number of data processing channels and can flexibly meet the requirements of RDMA communication equipment on different numbers of data processing channels.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 is a general block diagram of a DMA communication system according to the present application;
FIG. 2 is a block diagram of a data processing channel 10 according to the present application;
FIG. 3 is a block diagram of a read response processing unit 103 according to the present application;
FIG. 4 is a block diagram of a tag distribution unit 104 according to the present application;
FIG. 5 is a block diagram of an interrupt handling module 30 according to the present application;
FIG. 6 is a flow chart illustrating a process of writing a request packet according to the present application;
FIG. 7 is a flow chart illustrating a read request packet processing procedure according to the present application;
FIG. 8 is a process flow of a label assignment step according to the present application;
FIG. 9 is a flow chart illustrating a read response packet processing procedure according to the present application;
FIG. 10 is a flowchart illustrating an interrupt handling procedure according to the present application.
Detailed Description
The inventors have studied against the problems presented in the background art, and have proposed a DMA communication system and method for RDMA communication devices, which implement interrupt aggregation for different interrupt requests on the premise of guaranteeing the sequence of data requests and interrupt requests, thereby reducing the number of interrupts.
The application provides a DMA communication system for an RDMA communication device, which can meet interrupt request aggregation and multi-channel communication requirements of the RDMA communication device. According to one embodiment of the application, the structure of the system 1 is shown in fig. 1, and comprises the following:
a plurality of data processing channels 10: each data processing channel 10 is configured to receive a read request, a write request, and a read response to local data, and the number of data processing channels 10 may be configured according to the number of DMA communications requirements of the RDMA communication device;
arbitration module 20: for scheduling read requests or write requests in a plurality of data processing channels 10;
interrupt handling module 30: the method is used for controlling the submission of the interrupt request and realizing a configurable interrupt request merging mechanism;
data distribution module 40: for distributing read response data to the different data processing channels 10.
Preferably, the data processing channel 10 is configured to receive and forward data read requests, data write requests, and interrupt requests sent by different modules in the RDMA communication device, and respectively add different data packet headers according to the types of the requests, and receive and forward data read responses sent by the data distribution module 40. According to one embodiment of the application, as shown in FIG. 2, the data processing channel 10 includes the following elements: a write request processing unit 101; a read request processing unit 102; a read response processing unit 103; a label distribution unit 104; a read-write request arbitration unit 105. The data processing channel 10 further comprises the following interfaces: a sub-channel that interacts with the RDMA communication device and a channel that interacts with the arbitration module 20, the data allocation module 40, for sending requests and receiving read responses, respectively, wherein the sub-channel that interacts with the RDMA communication device comprises: the maximum size of the data request sent by each sub-channel is 4KB, and the accessed address is a 4KB aligned address.
The individual units in the data processing channel 10 are described separately below with reference to fig. 2.
The write request processing unit 101 is configured to receive and process a data write request transmitted from a write request channel, and an interrupt request. For a data write request, since the maximum size of the requested packet (4 KB) is larger than the PCIe possible receive size, the write request unit 101 repackages the received write request packet according to the maximum request packet size given by PCIe, the newly generated packet will be assigned a new packet header, and the address fields in the packet header will also be recalculated according to the offset of the data within the packet. These packets are then forwarded to an arbitration module 20 (described in more detail below). For interrupt requests, the write request processing unit 101 directly forwards it to the arbitration module 20 for the next operation. The method for binding the interrupt request and the write request to the same port has the beneficial effects that: for requests of the same port, the requests have a definite sequence relation, and the interrupt request and the write request are bound to the same port, so that the interrupt request sent later cannot be sent to the PCIe bus before the previous write request, and the confusion of communication sequence of the RDMA communication device when the RDMA communication device interacts with a host system is avoided.
The read request processing unit 102 is configured to receive and process a read request. When a data packet arrives, the unit firstly judges whether the incoming data request is larger than the maximum request allowed by PCIe, and if so, the read request processing unit 102 carries out subpackaging processing on the received read request; otherwise, directly carrying out the next processing. Subsequently, the read request processing unit 102 will request a tag allocation unit 104 (to be described later in detail) to acquire a tag of the read request; when the tag is obtained, the read request is forwarded to the read-write request arbitration unit 105.
The read response processing unit 103 is configured to receive and process the read response issued by the data distribution module 40, mainly by merging a plurality of response data packets, reordering the read response data packets out of order, and merging a plurality of read request responses, and then forwarding the response data packets to the read response channel for receiving by the RDMA communication device. According to one embodiment of the present application, the read response processing unit 103 is configured as shown in fig. 3, and includes control logic 1031 for overall control of the read response unit; an out-of-order reorder buffer 1032 for temporarily storing read response data to be reordered; the read request merge buffer 1033 is configured to merge the read request response packets split during the read request processing procedure into one response data packet.
The tag allocation unit 104 is configured to allocate a tag to each read request in the read request processing unit 102, and notify the read response request unit of the allocated tag number. The read request response request employs the same tag as the corresponding read request. According to one embodiment of the present application, the tag allocation unit 104 is structured as shown in fig. 4, and the module is composed of 4 pieces FIFO (First in First Out) including an allocated (tag) FIFO1042, an unallocated (tag) FIFO1041, a request packet FIFO1043, and a request size FIFO1044. Wherein, the unallocated (tag) FIFO1041 is used for storing the tag labels that can be allocated in the data processing channel 10, the content in the FIFO is written in by initialization, pre-allocated when the hardware is designed, and the tags contained in different data processing channels 10 cannot be repeated, the FIFO is written in by the read response processing unit 103 and fetched by the read request processing unit 102; the allocated (tag) FIFO1042 is used to hold tag numbers that have been allocated to the read request processing unit 102 and always transfer the first element in the FIFO to the read response processing unit 103, the FIFO being written by the read request processing unit 102 and fetched by the read response processing unit 103; the request packet FIFO1043 and the request size FIFO1044 are respectively used to store whether the read request is the last request in the split sub-requests and the data size requested by the sub-requests, and they are all controlled by the read request processing unit 102 for writing and the read response processing unit 103 for fetching. And after the received read response request determines the corresponding read request according to the label, releasing the label, wherein the released label can be used as an unassigned label to be assigned again.
The read-write request arbitration unit 105 is used for arbitrating the interrupt request, the write request and the read request sent by the write request unit and the read request processing unit 102. Preferably, the module employs a prioritized request arbitration policy, i.e., the read channel has a higher priority than the write channel. The beneficial effect of this allocation priority is: the read request is shorter than the write request, and prioritizing the read request advantageously reduces the overall latency of the request.
The arbitration module 20 is configured to receive a data read request, a data write request, and an interrupt request sent from each data processing channel 10, read a data packet header of the request in each channel, and perform different operations according to different types of the request: if the request is an interrupt request, forwarding it to the interrupt handling module 30; if the request is a read request or a write request, it is forwarded to the PCIe controller and the request is sent directly to the computer system. Preferably, the arbitration module 20 employs a prioritized arbitration policy, i.e., 1) when a read request is included in a channel, arbitrates the read request preferentially; 2) When the plurality of channels contain read requests or the channels do not contain read requests, adopting a polling strategy to arbitrate; 3) When an interrupt request is contained in the channel, the interrupt request is directly forwarded to the interrupt processing module 30, and the arbitration of read-write requests of other channels is not affected due to different destination addresses; 4) When a plurality of interrupt requests are contained in the channel, the interrupt requests are forwarded to the interrupt processing module 30 by adopting a polling strategy, and arbitration of read-write requests of other channels is not affected. Preferably, the arbitration module 20 is arranged in the order of 1), 2), 3), and 4) above.
The interrupt handling module 30 is further configured to handle interrupt requests in the RDMA communication device, including a separate interrupt merge mechanism for a plurality of interrupt vectors, a separate interrupt delay for a plurality of interrupt vectors, and an expired forwarding of interrupt requests. The structure of the interrupt handling module 30 is shown in fig. 5, according to one embodiment of the application. The interrupt control unit 301 is configured to receive an interrupt request sent from the arbitration module 20, and read an interrupt message entry in the PIO module according to the interrupt request; the expiration forwarding unit 302 is configured to determine whether a timer of each interrupt vector expires or the number of requests of a certain interrupt vector reaches a certain threshold; the timer and the interrupt metadata are unique resources for each interrupt vector and are used for generating timing counts or indicating whether the request of the interrupt vector reaches a certain threshold value, wherein the interrupt metadata are some data unique to the interrupt, and include information such as whether the timer is started or not, the current timing time, the time threshold value for generating the interrupt and the like.
The data distribution module 40 is configured to distribute each read response packet to a different data processing channel 10 according to a preset tag segment. As described above, each data processing channel 10 is pre-configured with a different tag field, so that a data channel can be uniquely found to forward a data packet based on the tag field in the data packet.
According to an aspect of the present application, there is provided a method of writing requests in a DMA communication system based on the RDMA communication device described above, comprising: the RDMA communication device initiates a data write request to the DMA communication system via a write request channel, which is first received by the data processing channel 10, then split into a plurality of write requests meeting the PCIe maximum request limit, and sequentially sent to the arbitration module 20, which is then written to the computer system via the PCIe controller.
Preferably, the method for writing requests in a DMA communication system of an RDMA communication device is as shown in fig. 6, comprising the steps of:
step 601, when the write request processing unit 101 receives a request data packet, the step 602 is skipped, otherwise, the step 601 is continuously executed;
step 602, judging that the request is an interrupt request, jumping to step 603, otherwise jumping to step 604;
step 603, directly forwarding the interrupt request to the read-write request arbitration unit 105, and jumping to step 601;
step 604, the request is a data write request, if the data size of the request exceeds the maximum request limit of PCIe requirements, the step 605 is skipped, otherwise, step 606 is executed;
step 605, dividing the data in the write request into a plurality of sub write requests according to the maximum request limit of the PCIe request, and transmitting the sub write requests to the read-write request arbitration unit 105 one by one, and jumping to step 601;
step 606. The interrupt request is directly forwarded to the read/write request arbitration unit 105, and the process goes to step 601.
According to one aspect of the present application, there is provided a method for responding to a read request of a DMA communication system based on the RDMA communication device, including a read request packet processing, a tag allocation, and a read request response packet processing: the RDMA communication device initiates a data read request to the DMA communication system through a read request channel, the data processing channel 10 first receives the read request, then splits the read request into a plurality of read requests meeting the PCIe maximum request limit, assigns a tag to each read request, sends the read requests to the arbitration module 20 in sequence, and writes the read requests to the computer system through the PCIe controller.
For a read request sent to a computer system, the computer system returns multiple read responses of the same tag in sequence, and for read responses of different tags, the order in which the computer system returns may be different from the order in which the read requests were sent.
When the computer system returns a read response packet via the PCIe controller, the data allocation module 40 of the DMA communication system first receives the response and allocates it to the corresponding data processing channel 10 according to the tag of the response; the data processing channel 10 merges the read response data packets of the same tag and merges the different requests into one read response data packet according to the sequence of the read request tag allocation; and finally, returning the response data packet to the RDMA communication device.
Preferably, read request packet processing for a method of read request response in a DMA communication system of an RDMA communication device is as shown in fig. 7, comprising the steps of:
step 701, the read request processing unit 102 receives a read request, and jumps to step 702, otherwise, the step 701 is continuously executed;
step 702, jumping to step 703 if the request data size exceeds the PCIe maximum request limit, otherwise jumping to step 704;
step 703, generating a sub-read request according to the maximum PCIe request limit, acquiring a tag identifying the request from the tag distribution unit 104, simultaneously sending the data size of the request to the tag distribution unit 104, and sending a newly generated read request with the tag to the read-write request arbitration unit 105; calculating the rest read request data addresses, generating a new read request, waiting for the next processing, and jumping to step 702;
step 704, the tag of the request is obtained from the tag distribution unit 104, the data size of the request is sent to the tag distribution unit 104, the last request identifier in the request group is sent to the read-write request arbitration unit 105, and the step 701 is skipped.
Preferably, a method for tag assignment for a method of read request response in a DMA communication system of an RDMA communication device is as shown in fig. 8, and includes the steps of:
step 801, the label distribution unit 104 receives a label distribution or release request, and jumps to step 802; otherwise, continuing to execute step 801;
step 802. The request is a label allocation request, and the step 803 is skipped; otherwise, the step is to jump to step 805;
step 803. In the tag allocation unit 104, if the unallocated (tag) FIFO1041 is empty, no tag can be allocated, and the process goes to step 801 to complete the request; otherwise, jumping to step 804;
step 804. A tag is fetched from the unallocated (tag) FIFO1041 and stored in the allocated (tag) FIFO1042, and information of the request size and whether the request is the last request is acquired from the read request processing unit 102, stored in the request size FIFO1044 and the request packet FIFO1043, respectively, and the step 801 is skipped;
step 805. If the request is a tag release request, and if it is determined that the allocated (tag) FIFO1042 in the tag allocation unit 104 is empty, the process directly jumps to step 801 to complete the release request; otherwise, jumping to step 806;
step 806. One tag is fetched from the allocated (tag) FIFO1042 and stored in the unallocated (tag) FIFO1041, while one element in the request size FIFO1044 and the request packet FIFO1043 is released, and the process goes to step 801.
Preferably, a method for read request response packet processing for a method of read request response in a DMA communication system of an RDMA communication device is shown in fig. 9, including merging of a plurality of read request responses and out-of-order reordering of read response packets, comprising the steps of:
step 901, when the read response processing unit 103 receives a read response data packet, the process jumps to step 902; otherwise, continuing to execute step 901;
step 902, if the tag field of the received read response packet is consistent with the next valid tag sent from the tag allocation unit 104, the process goes to step 903; otherwise, jumping to step 904;
step 903, the response packet is the last response packet of the read request represented by the tag, and step 905 is skipped; otherwise jump to step 906;
step 904, storing the data packet in the disordered rearrangement buffer 1032 in the read response processing unit 103 according to the label, recording the message label stored in the disordered rearrangement buffer 1032, and jumping to step 901;
step 905, transmitting all read request response messages of the read response data packet label to the all request merging buffer area, and jumping to step 907;
step 906, storing the data packet in the disordered rearrangement buffer 1032 in the read response processing unit 103 according to the label, recording the message label stored in the disordered rearrangement buffer 1032, and jumping to step 901;
step 907, obtaining from the tag allocation unit 104 whether the tag of the read response message that was just sent to the request merge buffer is the last request in the degree request group, if so, jumping to step 908; otherwise jump to step 909;
step 908, sending all the data in the all-request merge buffer to the read response channel, and jumping to step 909;
step 909, if the next read request tag can be read from the tag distribution unit 104, the process goes to step 910; otherwise, jumping to step 901;
step 910, if the message tag stored in the out-of-order rearrangement buffer 1032 contains the read request tag sent from the tag allocation unit 104, the process goes to step 903; otherwise, go to step 901.
The purpose of tag allocation and out-of-order reordering is to reconcile the order of read responses with the order of reads within the same channel, which is a requirement for RDMA.
According to an aspect of the present application, there is provided a method for processing an interrupt request in a DMA communication system based on the RDMA communication device, including: the RDMA communication device initiates an interrupt request to the DMA communication system via some write request channel, which the data processing channel 10 first receives and then forwards to the interrupt processing module 30; the interrupt handling module 30 has several possibilities at this time: 1) If an interrupt request with the same interrupt number has not been received within a timer timeout period, the interrupt processing module 30 starts a corresponding timer. Waiting for the timer to timeout, and writing the interrupt request into the computer system through the PCIe controller; 2) When an interrupt request with the same interrupt number is received within the timeout time of a timer, adding 1 to the state of the counter at the moment, and continuing to wait for the timeout of the timer or a new interrupt request; 3) And after a timer times out, a plurality of interrupt requests with the same interrupt number are received, the counter state is about to reach the threshold value, the counter state reaches the threshold value, the timer and the counter with the interrupt numbers are emptied, and the interrupt requests are written into the computer system through the PCIe controller.
Preferably, as shown in fig. 10, a method for processing an interrupt request for a method of read request response in a DMA communication system of an RDMA communication device, the processing of the interrupt request mainly includes receiving the interrupt request and acquiring an interrupt vector entry, delaying the interrupt request, and forwarding the interrupt request, including the steps of:
step 1001, if a timer in the interrupt processing module 30 expires, jumping to step 1002; otherwise, go to step 1003;
step 1002, clearing an interrupt counter and a timer, sending an interrupt request corresponding to the entry to a PCIe controller, and jumping to step 1001;
step 1003, the interrupt processing module 30 receives the interrupt request from the arbitration module 20, and jumps to step 1004; otherwise, jumping to step 1001;
step 1004, according to the function field in the interrupt request, obtaining the interrupt vector entry information from the PIO system, obtaining the entry number of the interrupt request, and jumping to step 1005;
step 1005, judging that the corresponding numbered items start a timer, and jumping to step 1006; otherwise jump to step 1007;
step 1006, determining that the interrupt request counter of the corresponding numbered item reaches a threshold value, and jumping to step 1002; otherwise, jumping to step 1008;
step 1007, opening a timer of the corresponding numbered item, adding one to the interrupt request counter, and jumping to step 1001;
step 1008, increment the interrupt request counter of the corresponding numbered entry, and jump to step 1001.
In the present application, an RDMA communication device initiates a write request to a DMA communication system via a write request channel, and then initiates an interrupt request on the channel, where the two requests are ordered. The data processing channel 10 first processes the write request and forwards it to the arbitration module 20 before it begins to receive the next interrupt request, at which point the write request is already passed to the PCIe controller, so that execution of the interrupt request cannot exceed execution of the write request preceding it, thereby ensuring a sequential relationship between the data write request and the interrupt request in the RDMA communication device.
The beneficial effects of the application are as follows:
1. by providing a separate interrupt handling module 30, an interrupt merge mechanism may be implemented on hardware that reduces the significant context switch overhead that frequent interrupts cause to the computer system.
2. The interrupt request and the data write request are transmitted by using the same channel interface, so that the sequence relation of the interrupt request and the data write request is ensured, and the time sequence error caused by the completion of the interrupt before the write request in the RDMA communication equipment is avoided.
3. A configurable number of data processing channels 10 are implemented, which can flexibly meet the requirements of RDMA communication devices for different numbers of data processing channels 10.
The foregoing is merely an embodiment of the present application, and a specific structure and characteristics of common knowledge in the art, which are well known in the scheme, are not described herein, so that a person of ordinary skill in the art knows all the prior art in the application date or before the priority date, can know all the prior art in the field, and has the capability of applying the conventional experimental means before the date, and a person of ordinary skill in the art can complete and implement the present embodiment in combination with his own capability in the light of the present application, and some typical known structures or known methods should not be an obstacle for a person of ordinary skill in the art to implement the present application. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present application, and these should also be considered as the scope of the present application, which does not affect the effect of the implementation of the present application and the utility of the patent. The protection scope of the present application is subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims (7)

1. A DMA system for an RDMA communication device includes a plurality of data processing channels (10), an arbitration module (20), an interrupt processing module (30), and a data distribution module (40), wherein,
each data processing channel (10) is configured to receive a read request, a write request, and a read response to local data, wherein the data processing channels (10) comprise: a write request processing unit (101), a read request processing unit (102), a read response processing unit (103), a tag allocation unit (104), and a read/write request arbitration unit (105), wherein:
a write request processing unit (101) for receiving and processing a write request for data, and an interrupt request;
a read request processing unit (102) for receiving and processing a read request;
the read response processing unit (103) is used for receiving and processing the read response issued by the data distribution module (40);
the label distribution unit (104) is used for distributing a label for each read request in the read request processing unit (102) and notifying the distributed label number to the read response processing unit (103);
a read-write request arbitration unit (105) for arbitrating the interrupt request, the write request and the read request sent by the write request unit (101) and the read request processing unit (102);
the arbitration module (20) is used for scheduling read requests or write requests in a plurality of data processing channels (10), wherein the arbitration module (20) is used for receiving data read requests, data write requests and interrupt requests sent by the plurality of data processing channels (10), reading data packet heads of the requests in each channel, and executing different operations according to different types of the requests: forwarding the interrupt request to an interrupt handling module (30); for a read request or a write request, forwarding it to the PCIe controller;
the interrupt processing module (30) is used for carrying out merging processing on the received interrupt requests, wherein the interrupt processing module (30) further comprises an interrupt timer and an interrupt request counter, the interrupt timer controls interrupt occurrence time, the timer is judged to be overtime, whether the corresponding interrupt request counter waiting for the interrupt request received in the overtime process of the interrupt timer reaches a threshold value is determined, and if the threshold value is reached, the interrupt request is sent to the PCIe interface;
the data distribution module (40) is used for distributing read response data to the corresponding data processing channels (10), wherein the data distribution module (40) is used for distributing labels to each read request data packet according to different label segments preset by each data processing channel (10), and finding out the corresponding data channel according to the label field of each read request response data packet so as to forward the data packet.
2. The DMA system according to claim 1, wherein within each data processing channel (10) for a received write request and its subsequent interrupt request, the write request is first processed and then the next interrupt request is received.
3. The DMA system according to claim 1, wherein the read response handling unit (103) is configured to merge a plurality of response packets, reorder the read response packets out of order, and merge a plurality of read request responses, and forward the response packets to the RDMA communication device.
4. The DMA system according to claim 1, wherein the arbitration module (20) processes in the following priority order:
1) When the channel contains a read request, preferentially arbitrating the read request; 2) When the plurality of channels contain read requests or the channels do not contain read requests, adopting a polling strategy to arbitrate; 3) When an interrupt request is included in the channel, forwarding it directly to the interrupt handling module (30); 4) When a plurality of interrupt requests are contained in the channel, the interrupt requests are forwarded to the interrupt handling module (30) using a polling strategy.
5. A write request processing method for the DMA system of any of claims 1-4, comprising:
a data processing channel (10) receives a data write request initiated by an RDMA communication device to a DMA communication system;
dividing the write request into a plurality of write requests meeting the PCIe maximum request limit and sequentially sending the write requests to an arbitration module (20);
an arbitration module (20) writes the write request to the computer system through the PCIe controller.
6. A read request method for the DMA system of any of claims 1-4, comprising:
a data processing channel (10) receives a data read request initiated by an RDMA communication device to a DMA communication system;
dividing the read request into a plurality of read requests meeting the PCIe maximum request limit, distributing labels for each read request, and sequentially sending the read requests to an arbitration module (20);
the arbitration module (20) will input the read request to the computer system through the PCIe controller.
7. The read request method of claim 6, further comprising the steps of:
receiving a read request response returned by the computer system, wherein for one read request sent to the computer system, the computer system sequentially returns a plurality of read responses of the same tag, and for read responses of different tags, the order returned by the computer system can be different from the order in which the read requests are sent;
the method comprises the steps that tags responding to read requests are distributed to corresponding data processing channels (10) according to the tags of the read requests, the data processing channels (10) combine read response data packets of the same tag, and the different requests are combined into one read response data packet according to the sequence of the read request tag distribution;
the response packet is returned to the RDMA communication device.
CN202111041270.3A 2021-09-07 2021-09-07 DMA communication system and method for RDMA communication equipment Active CN113742267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111041270.3A CN113742267B (en) 2021-09-07 2021-09-07 DMA communication system and method for RDMA communication equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111041270.3A CN113742267B (en) 2021-09-07 2021-09-07 DMA communication system and method for RDMA communication equipment

Publications (2)

Publication Number Publication Date
CN113742267A CN113742267A (en) 2021-12-03
CN113742267B true CN113742267B (en) 2023-10-27

Family

ID=78736279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111041270.3A Active CN113742267B (en) 2021-09-07 2021-09-07 DMA communication system and method for RDMA communication equipment

Country Status (1)

Country Link
CN (1) CN113742267B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991781B (en) * 2023-09-25 2024-02-06 京东科技信息技术有限公司 Request processing device, method, chip, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101351784A (en) * 2005-12-30 2009-01-21 阿西式·A·潘迪亚 Runtime adaptable search processor
CN103714027A (en) * 2014-01-10 2014-04-09 浪潮(北京)电子信息产业有限公司 Data transmission method and device for direct memory access controller
CN108459988A (en) * 2017-02-17 2018-08-28 英特尔公司 Duration direct distance input and output
CN110121698A (en) * 2016-12-31 2019-08-13 英特尔公司 System, method and apparatus for Heterogeneous Computing
CN112256624A (en) * 2020-11-03 2021-01-22 中国人民解放军国防科技大学 DMA communication device, chip, equipment and method for high-speed interconnection network interface chip

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8484396B2 (en) * 2011-08-23 2013-07-09 Oracle International Corporation Method and system for conditional interrupts

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101351784A (en) * 2005-12-30 2009-01-21 阿西式·A·潘迪亚 Runtime adaptable search processor
CN103714027A (en) * 2014-01-10 2014-04-09 浪潮(北京)电子信息产业有限公司 Data transmission method and device for direct memory access controller
CN110121698A (en) * 2016-12-31 2019-08-13 英特尔公司 System, method and apparatus for Heterogeneous Computing
CN108459988A (en) * 2017-02-17 2018-08-28 英特尔公司 Duration direct distance input and output
CN112256624A (en) * 2020-11-03 2021-01-22 中国人民解放军国防科技大学 DMA communication device, chip, equipment and method for high-speed interconnection network interface chip

Also Published As

Publication number Publication date
CN113742267A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
US11799764B2 (en) System and method for facilitating efficient packet injection into an output buffer in a network interface controller (NIC)
US7295565B2 (en) System and method for sharing a resource among multiple queues
US8848532B2 (en) Method and system for processing data
US9747233B2 (en) Facilitating routing by selectively aggregating contiguous data units
CN113742267B (en) DMA communication system and method for RDMA communication equipment
EP1891503B1 (en) Concurrent read response acknowledge enhanced direct memory access unit
JP2002041449A (en) Bus system and method for controlling execution sequence of the bus system
CN112328520B (en) PCIE equipment, and data transmission method and system based on PCIE equipment
US7577774B2 (en) Independent source read and destination write enhanced DMA
US8090801B1 (en) Methods and apparatus for performing remote access commands between nodes
CN112328519B (en) PCIE equipment, and SR-IOV-based data packet ordered transmission method and system
US8135878B1 (en) Method and apparatus for improving throughput on a common bus
EP2588965B1 (en) Method, apparatus and system for maintaining transaction coherecy in a multiple data bus platform
CN116488956B (en) Communication device, communication method, bus, and storage medium
US20070130390A1 (en) Method and apparatus for effective package memory bandwidth management
US9548947B2 (en) PPI de-allocate CPP bus command
US9559988B2 (en) PPI allocation request and response for accessing a memory system
WO1992015058A1 (en) Data storage subsystem
US9413665B2 (en) CPP bus transaction value having a PAM/LAM selection code field
US9699107B2 (en) Packet engine that uses PPI addressing
CN117931391A (en) Lossless and efficient data processing method based on RMDA and network interface card
WO1992015054A1 (en) Data transfer between a data storage subsystem and host system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant